cameleer-server/docs/superpowers/specs/2026-04-22-app-deployment-page-design.md

# Unified App Deployment Page — Design

**Status:** Design approved, awaiting implementation plan
**Date:** 2026-04-22
**Related issue:** [cameleer-server#147](https://gitea.siegeln.net/cameleer/cameleer-server/issues/147) (concurrent-edit protection — deferred)

## Problem

Today, managing an application is split across two pages:

- `/apps/new` (`CreateAppView`) — form to create + initially deploy an app. Requires manually entering name and slug, picking an environment from a dropdown, selecting a JAR, and a "deploy immediately" toggle.
- `/apps/:slug` (`AppDetailView`) — manages an existing app. Has an `Upload JAR` button in the header that uploads immediately, and an `Overview` / `Configuration` sub-tab split. Config saves are pushed live to agents via SSE the moment Save is clicked.

Pain points:

1. Users can't stage a configuration change without immediately applying it (agent config tab is live-push; container config requires a full redeploy). There's no "draft next deploy" concept.
2. The primary action doesn't reflect deploy state — `Upload JAR` remains the label even when a new JAR has been uploaded and is waiting to be deployed.
3. App name must be typed manually. The JAR filename is the obvious source and isn't used.
4. The environment picker on the create page duplicates the environment already chosen in the top-nav switcher, inviting mistakes (create app in wrong env).
5. After deploy, the deployment progress bar and startup log disappear from the page lifecycle once the user navigates away or the deploy completes, so users can't revisit "what happened during the last deploy?" without round-tripping through ClickHouse logs.
6. The full config of an app is split across two sub-tabs (`Configuration` for monitoring/resources/variables/traces/recording, `Overview` for versions/deployments), which forces context switches for routine checks.

## Goal

One unified deployment page that handles the full lifecycle of an app — from initial creation through every subsequent redeploy — with a clear Save-then-Deploy two-step workflow, a dirty-state model that makes "what will change on redeploy" explicit, and persistent access to the last deployment's progress + log.

## Non-goals

- Real-time collaborative editing, presence awareness, or optimistic-locking protection against concurrent edits (tracked in issue #147).
- Restructuring the environment model, slug rules, or any backend orchestration mechanics beyond what's required for staged-vs-live config writes and deployment snapshotting.
- Changing the agent SSE protocol.
- Pruning or archiving JAR versions (retention is an environment-level setting, already exists).

## Design

### Page structure

Routes:

- `/apps/new` — unified page in **net-new mode** (no app record exists yet).
- `/apps/:slug` — unified page in **existing-app mode**.

The `CreateAppView` / `AppDetailView` split goes away. A single component (`AppDeploymentPage`) renders both modes; the only differences are which fields are editable and which buttons are enabled.

Layout top-to-bottom:

1. **Page header** — title (app display name or "Create Application"), env badge, status badge, Delete App action (existing apps only), and the **primary action button** (Save / Redeploy / Deploying…).
2. **Identity & Artifact section** — always visible.
3. **Config tabs row** — `Monitoring | Resources | Variables | Sensitive Keys | Deployment | ● Traces & Taps | ● Route Recording`.
4. **Active tab content.**

The old `Overview` sub-tab is removed. Its deployments table becomes the Deployment tab's history disclosure; its version list is rolled into the Identity & Artifact section as a Checkpoints disclosure.

### Identity & Artifact section

| Field | Net-new mode | Existing / deployed mode |
|---|---|---|
| Application Name | `Input`, editable | read-only display text |
| Slug | auto-derived from name, displayed for preview only; never directly editable | read-only display (slug is immutable post-create per project conventions) |
| Environment | read-only chip showing currently-selected env | read-only chip |
| External URL | computed preview (existing formula — `routingMode === 'subdomain'` vs path-style) | same |
| Current Version | — | `v5 · payment-gateway-1.2.3.jar · 42 MB · 3 days ago` |
| Application JAR | `Select JAR` button; shows filename + size once staged client-side | `Change JAR` button; shows "staged: `<filename>`" badge when a new JAR is pending |
| Checkpoints | disclosure; empty when no prior successful deploys | disclosure; lists past successful deployments |

**Auto-derive rule** — triggered when the user selects a JAR file and the name field is empty OR still matches the previously auto-derived value (never overwrite manual edits):

1. Take filename, strip `.jar`.
2. Truncate at the first character that is a digit (`0-9`) or a `.`.
3. Replace `-` and `_` with spaces.
4. Strip any resulting 1-char orphan tokens (e.g. trailing `v` from `my-app-v2`).
5. Title-case remaining words.

The derived name is a suggestion — the user can override by typing.

Examples:

- `payment-gateway-1.2.0.jar` → `Payment Gateway`
- `order-service.jar` → `Order Service`
- `my-app-v2.jar` → `My App`
- `acme_billing-3.jar` → `Acme Billing`

**Slug derivation** remains the existing `slugify(name)` logic. The user cannot edit slug directly in net-new mode (auto-tracks name) and cannot edit at all post-create (immutable per existing project conventions).

### Checkpoints (past deployments as restore points)

A checkpoint = one past **successful** deployment, carrying the full snapshot `{jarVersionId, agentConfig, containerConfig, sensitiveKeys}` frozen at deploy time. JARs that were uploaded but never successfully deployed do not appear — they are obsolete freight.

**Restore flow:**

1. User expands Checkpoints, picks a row.
2. Form fields across all four staged tabs reset to that snapshot's values; JAR slot points to the snapshot's JAR version (by checksum reference — no re-upload).
3. Dirty evaluation re-runs against the **latest successful deploy snapshot**, as always → the primary button becomes `Redeploy`.
4. The user may tweak further before deploying or deploy as-is.

Restore is pure client-state hydration — it doesn't write to DB until the user clicks Save.

**Edge cases:**

- The currently-running deployment is **hidden** from the Checkpoints list (restoring to it is equivalent to Discard).
- A checkpoint whose JAR version has been pruned (per the env-level retention policy) shows as "archived, JAR unavailable" with the Restore action disabled and a tooltip explaining why.

Collapsed by default.

### Dirty state + primary button

**What counts as dirty** (any one is sufficient):

- A new JAR file is staged in client state (not yet uploaded).
- A selected past version (via Restore) differs from the currently-deployed version.
- Form values on any of the four **staged** tabs (Monitoring, Resources, Variables, Sensitive Keys) differ from the last-saved DB values.
- DB-saved config differs from the snapshot captured at the last successful deploy.

**What does not count:**

- Changes on Traces & Taps or Route Recording tabs (live-apply — see below).
- Changes made via Dashboard / Runtime pages.

**State machine:**

| App state | Form has unsaved local edits? | DB matches last deploy? | Button label | Action |
|---|---|---|---|---|
| Net-new, nothing entered | — | — | `Save` | disabled |
| Net-new, form has content | yes | n/a | `Save` | create app + upload JAR + write config; transitions to "exists, no deploy yet" |
| Exists, no deploy yet | either | no (never deployed) | `Redeploy` | deploy current DB state |
| Exists, form edits pending | yes | either | `Save` | persist local edits; after save, re-evaluates to `Save` (disabled) or `Redeploy` |
| Exists, nothing local, DB = deploy | no | yes | `Save` | disabled |
| Exists, nothing local, DB ≠ deploy | no | no | `Redeploy` | deploy DB state |
| Deploy in progress | — | — | `Deploying…` | disabled, spinner |

A secondary `Discard` ghost button appears adjacent to the primary button whenever the form has unsaved local edits. It resets form fields to DB-saved values.

**Net-new first-deploy flow** — clicking Save on a net-new form creates the app record, uploads the JAR as version 1, persists container + agent config, and routes to `/apps/:slug`. It does **not** deploy. The transition lands the user on the same page in existing-app mode with the button showing `Redeploy`. This is the deliberate trade-off for unifying the button label across modes.

### Traces & Taps + Route Recording — live-apply tabs

These tabs remain on the Deployment page (single-source-of-truth for the full config) but are visually distinguished:

- A persistent info banner at the top of each: *"Live controls — changes apply immediately to running agents and do not participate in the Save/Redeploy cycle."*
- Tab labels carry a `●` live indicator.
- Editors remain fully interactive — user still manages processors and route recording from this page.
- These tabs' writes do **not** flip the dirty indicator; the primary button is unaffected.

### Deployment tab

Auto-activates when the user clicks Redeploy (and when landing on a page whose app currently has a STARTING deployment).

Contents top-to-bottom:

1. **Current deployment card** — status badge + `StatusDot`, version, JAR filename, JAR checksum (short), replica count, external URL (linkified when RUNNING), deployed-at timestamp. Action buttons: `Stop` (RUNNING/STARTING/DEGRADED), `Start` (STOPPED).
2. **Progress bar** — only rendered when `status === STARTING`. Existing `DeploymentProgress` 7-stage step indicator, unchanged.
3. **Startup log panel** — existing `StartupLogPanel`, uses `useStartupLogs` (3s polling while STARTING).
   - Flex-grow inside the tab: fills whatever vertical space is left after the status card, progress bar, and history disclosure.
   - Minimum height ~200px. Internal scroll on overflow.
   - Does **not** auto-close on success or failure. Remains mounted until the user navigates away or a newer deploy replaces its content.
4. **History disclosure** (collapsed by default) — compact table of past deployments: timestamp, version, status, duration, started by. Row click expands its startup log inline (lazy-loaded). This is also the raw JAR-version-history affordance.

**Empty state** (net-new, no deploys ever): `No deployments yet. Save your configuration and click Redeploy to launch.`

**Behavior during an active deploy:**

- Primary button: `Deploying…` (disabled).
- Config tabs remain editable — the user can stage the next iteration while the current one runs.
- Local edits during deploy cannot be saved until the current deploy completes. Once it does, button re-evaluates normally.

### Backend changes

#### 1. Agent config write path gains a staged/live flag

The existing `ApplicationConfigController` endpoint persists config to DB **and** pushes an SSE `config-update` to live agents in one atomic call.

**Change:** add a query parameter `?apply=staged|live` (default `live`, preserving existing non-UI callers).

- `apply=staged` — write to DB only, no SSE push. Used by the deployment page.
- `apply=live` — write to DB and push SSE. Used by the existing real-time UI on Dashboard / Runtime pages, and any non-UI caller that relies on current behavior.

This keeps one endpoint and one DTO. The gating happens in the service layer.

#### 2. Deployment snapshot column

Flyway V2 adds `deployed_config_snapshot JSONB` to the `deployments` table:

```
ALTER TABLE deployments
  ADD COLUMN deployed_config_snapshot JSONB;
```

The snapshot contains `{jarVersionId, agentConfig, containerConfig, sensitiveKeys}` captured at the moment a deployment transitions to a successful `RUNNING` state (not at deploy start — see failure semantics below).

**No backfill for existing deployments.** The column is `NULL` for historical rows. Dirty detection treats "no snapshot on last successful deployment" the same as "no successful deployment" — everything is dirty, and the first Redeploy after migration will populate the first snapshot. This is acceptable because dirty-state is the only reader of the column.

Dirty check reads the last successful deployment's snapshot for the `(app, environment)` pair and compares against the current DB state. If no successful deploy exists yet (or the snapshot is NULL), everything is dirty by definition.

#### 3. Dirty-state endpoint

```
GET /api/v1/environments/{env}/apps/{slug}/dirty-state
```

Returns:

```json
{
  "dirty": true,
  "lastSuccessfulDeploymentId": "…",
  "differences": [
    { "field": "agentConfig.samplingRate", "staged": "1.0", "deployed": "0.5" },
    { "field": "containerConfig.memoryLimitMb", "staged": "1024", "deployed": "512" },
    { "field": "jarVersion", "staged": "v6", "deployed": "v5" }
  ]
}
```

The UI uses this to drive the button label and per-tab dirty markers (asterisks on tab labels). Keeping the comparison server-side means the source of truth for "what will change on redeploy" is one service rather than two implementations at risk of drift.

#### 4. Checkpoint restore — no new endpoint

Past deployments are already queryable via `GET /deployments`. The restore action is pure client-side: pick a deployment, read its `deployed_config_snapshot`, hydrate form fields. The server sees only the eventual Save + Redeploy calls.

#### 5. JAR upload staging — no API change

Client-state only until Save. The existing `POST /apps/{slug}/versions` multipart endpoint is unchanged; it's invoked during the Save handler as part of a sequence (create app if needed → upload JAR → write config with `?apply=staged`).

### Migration & clean-break

- `ui/src/pages/AppsTab/AppsTab.tsx` (1387 lines) is split. `AppListView` stays. New directory `ui/src/pages/AppsTab/AppDeploymentPage/` contains the unified page, split into child files for the Identity section, each config tab, the Deployment tab, Checkpoints, and shared hooks (dirty detection, config sync, filename → name derivation).
- `CreateAppView`, `AppDetailView`, `OverviewSubTab`, `ConfigSubTab`, `VersionRow` are deleted.
- No backwards-compat shims, no legacy flags, no query-string redirects. Removed sub-routes (`/apps/:slug?tab=overview`) simply land on the default tab.
- `.claude/rules/ui.md` Deployments bullet is rewritten in the same commit.
- `.claude/rules/app-classes.md` (if it documents controllers) notes the new `?apply=staged|live` parameter.
- OpenAPI schema is regenerated per the CLAUDE.md procedure. `ui/src/api/openapi.json` and `ui/src/api/schema.d.ts` are regenerated and committed alongside the backend change.

### Failure modes & edge cases

- **Save failure (JAR upload timeout, DB error):** button returns to `Save`. Form keeps local edits. Toast with the error (24h duration — matches existing AppsTab pattern). No partial commits — if JAR upload succeeds but config write fails, the orphan JAR version is harmless.
- **Deploy failure:** `Deploying…` → `Redeploy` (still dirty, snapshot not written). Progress bar sticks on the failed stage (red). Log stays mounted. User can fix config or upload different JAR, re-Save, click Redeploy again.
- **Snapshot-on-success-only:** `deployed_config_snapshot` is populated only when a deployment reaches a successful `RUNNING` state. Failed deployments exist in history but do not participate in "last known good".
- **User edits form during active deploy:** config tabs editable, primary button stays `Deploying…`. On completion, button re-evaluates against the new snapshot.
- **Concurrent edit (two users, same app):** out of scope for v1 — tracked in [#147](https://gitea.siegeln.net/cameleer/cameleer-server/issues/147). Current behavior: last-write-wins.
- **Browser refresh during active deploy:** state is server-side. Progress re-renders from `deployment.deployStage`, log re-fetches from startup logs endpoint. Deployment tab auto-activates on load if any `STARTING` deployment exists; otherwise default is Monitoring.
- **Unsaved-change warning on navigation:** router-level blocker using the DS `ConfirmDialog` (same pattern as existing delete-app confirmation). Triggered when form has staged edits and the user navigates away via sidebar, back button, or any in-app route change. Not `window.beforeunload` — DS-themed dialog only.
- **Environment switch:** intentionally discards unsaved work. No warning. Page remounts per existing behavior.
- **App doesn't exist in selected env:** 404 via `@EnvPath`. Preserve the existing "Unmanaged Application" empty state when the app exists in catalog (discovered via agent) but has no managed record in this env, with the "Create Managed App" CTA.

### Testing

**Backend (integration, REST-API-driven per project preference):**

- Net-new save flow: `POST apps → POST versions → PUT config?apply=staged → PUT container-config` completes without creating any deployment row.
- `?apply=staged` write does not emit SSE `config-update` to a connected agent; `?apply=live` write does.
- `deployed_config_snapshot` is populated on a deployment that reaches RUNNING; not populated on a deployment that reaches FAILED.
- `GET /dirty-state` returns `dirty=true` when desired state differs from the last-successful-deployment snapshot; `dirty=false` when they match.
- Checkpoint restore: hydrating form from a past deployment's snapshot and saving produces a new desired state identical to the snapshot.

**UI (Vitest):**

- Dirty-detection pure function against a matrix of input combinations.
- Filename → name derivation against the examples table above (including orphan stripping and `_` handling).
- Router blocker dialog opens on nav-away with dirty form; does not open on clean form.

**Manual browser verification (per CLAUDE.md):** walk through the 4 visual states (net-new, clean, dirty, deploying) including an end-to-end Save → Redeploy cycle, a checkpoint restore, and a deploy failure path before claiming done.

## Open questions carried forward

- Issue [#147](https://gitea.siegeln.net/cameleer/cameleer-server/issues/147) — optimistic locking / concurrent-edit protection. Deferred.

## Visual reference

ASCII mockups (State A: net-new, State B: deployed clean, State C: dirty with staged JAR, State D: active deploy on Deployment tab) are preserved in the brainstorming transcript. When implementing, these are the target screens.