Files

hsiegeln 58ec67aef9 spec(deploy): unified app deployment page design

Single page at /apps/:slug (+ /apps/new in net-new mode) replacing the
CreateAppView/AppDetailView split. Save ↔ Redeploy state machine driven
by a deployment snapshot on the deployments table, agent-config writes
gain ?apply=staged|live, Identity & Artifact always visible, new
Deployment tab carries progress + startup log, and checkpoints restore
full prior state (JAR + config) from past successful deploys.

Concurrent-edit protection deferred to #147.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-22 21:02:50 +02:00

18 KiB

Raw Blame History

Unified App Deployment Page — Design

Status: Design approved, awaiting implementation plan Date: 2026-04-22 Related issue: cameleer-server#147 (concurrent-edit protection — deferred)

Problem

Today, managing an application is split across two pages:

/apps/new (CreateAppView) — form to create + initially deploy an app. Requires manually entering name and slug, picking an environment from a dropdown, selecting a JAR, and a "deploy immediately" toggle.
/apps/:slug (AppDetailView) — manages an existing app. Has an Upload JAR button in the header that uploads immediately, and an Overview / Configuration sub-tab split. Config saves are pushed live to agents via SSE the moment Save is clicked.

Pain points:

Users can't stage a configuration change without immediately applying it (agent config tab is live-push; container config requires a full redeploy). There's no "draft next deploy" concept.
The primary action doesn't reflect deploy state — Upload JAR remains the label even when a new JAR has been uploaded and is waiting to be deployed.
App name must be typed manually. The JAR filename is the obvious source and isn't used.
The environment picker on the create page duplicates the environment already chosen in the top-nav switcher, inviting mistakes (create app in wrong env).
After deploy, the deployment progress bar and startup log disappear from the page lifecycle once the user navigates away or the deploy completes, so users can't revisit "what happened during the last deploy?" without round-tripping through ClickHouse logs.
The full config of an app is split across two sub-tabs (Configuration for monitoring/resources/variables/traces/recording, Overview for versions/deployments), which forces context switches for routine checks.

Goal

One unified deployment page that handles the full lifecycle of an app — from initial creation through every subsequent redeploy — with a clear Save-then-Deploy two-step workflow, a dirty-state model that makes "what will change on redeploy" explicit, and persistent access to the last deployment's progress + log.

Non-goals

Real-time collaborative editing, presence awareness, or optimistic-locking protection against concurrent edits (tracked in issue #147).
Restructuring the environment model, slug rules, or any backend orchestration mechanics beyond what's required for staged-vs-live config writes and deployment snapshotting.
Changing the agent SSE protocol.
Pruning or archiving JAR versions (retention is an environment-level setting, already exists).

Design

Page structure

Routes:

/apps/new — unified page in net-new mode (no app record exists yet).
/apps/:slug — unified page in existing-app mode.

The CreateAppView / AppDetailView split goes away. A single component (AppDeploymentPage) renders both modes; the only differences are which fields are editable and which buttons are enabled.

Layout top-to-bottom:

Page header — title (app display name or "Create Application"), env badge, status badge, Delete App action (existing apps only), and the primary action button (Save / Redeploy / Deploying…).
Identity & Artifact section — always visible.
Config tabs row — Monitoring | Resources | Variables | Sensitive Keys | Deployment | ● Traces & Taps | ● Route Recording.
Active tab content.

The old Overview sub-tab is removed. Its deployments table becomes the Deployment tab's history disclosure; its version list is rolled into the Identity & Artifact section as a Checkpoints disclosure.

Identity & Artifact section

Field	Net-new mode	Existing / deployed mode
Application Name	`Input`, editable	read-only display text
Slug	auto-derived from name, displayed for preview only; never directly editable	read-only display (slug is immutable post-create per project conventions)
Environment	read-only chip showing currently-selected env	read-only chip
External URL	computed preview (existing formula — `routingMode === 'subdomain'` vs path-style)	same
Current Version	—	`v5 · payment-gateway-1.2.3.jar · 42 MB · 3 days ago`
Application JAR	`Select JAR` button; shows filename + size once staged client-side	`Change JAR` button; shows "staged: `<filename>`" badge when a new JAR is pending
Checkpoints	disclosure; empty when no prior successful deploys	disclosure; lists past successful deployments

Auto-derive rule — triggered when the user selects a JAR file and the name field is empty OR still matches the previously auto-derived value (never overwrite manual edits):

Take filename, strip .jar.
Truncate at the first character that is a digit (0-9) or a ..
Replace - and _ with spaces.
Strip any resulting 1-char orphan tokens (e.g. trailing v from my-app-v2).
Title-case remaining words.

The derived name is a suggestion — the user can override by typing.

Examples:

payment-gateway-1.2.0.jar → Payment Gateway
order-service.jar → Order Service
my-app-v2.jar → My App
acme_billing-3.jar → Acme Billing

Slug derivation remains the existing slugify(name) logic. The user cannot edit slug directly in net-new mode (auto-tracks name) and cannot edit at all post-create (immutable per existing project conventions).

Checkpoints (past deployments as restore points)

A checkpoint = one past successful deployment, carrying the full snapshot {jarVersionId, agentConfig, containerConfig, sensitiveKeys} frozen at deploy time. JARs that were uploaded but never successfully deployed do not appear — they are obsolete freight.

Restore flow:

User expands Checkpoints, picks a row.
Form fields across all four staged tabs reset to that snapshot's values; JAR slot points to the snapshot's JAR version (by checksum reference — no re-upload).
Dirty evaluation re-runs against the latest successful deploy snapshot, as always → the primary button becomes Redeploy.
The user may tweak further before deploying or deploy as-is.

Restore is pure client-state hydration — it doesn't write to DB until the user clicks Save.

Edge cases:

The currently-running deployment is hidden from the Checkpoints list (restoring to it is equivalent to Discard).
A checkpoint whose JAR version has been pruned (per the env-level retention policy) shows as "archived, JAR unavailable" with the Restore action disabled and a tooltip explaining why.

Collapsed by default.

Dirty state + primary button

What counts as dirty (any one is sufficient):

A new JAR file is staged in client state (not yet uploaded).
A selected past version (via Restore) differs from the currently-deployed version.
Form values on any of the four staged tabs (Monitoring, Resources, Variables, Sensitive Keys) differ from the last-saved DB values.
DB-saved config differs from the snapshot captured at the last successful deploy.

What does not count:

Changes on Traces & Taps or Route Recording tabs (live-apply — see below).
Changes made via Dashboard / Runtime pages.

State machine:

App state	Form has unsaved local edits?	DB matches last deploy?	Button label	Action
Net-new, nothing entered	—	—	`Save`	disabled
Net-new, form has content	yes	n/a	`Save`	create app + upload JAR + write config; transitions to "exists, no deploy yet"
Exists, no deploy yet	either	no (never deployed)	`Redeploy`	deploy current DB state
Exists, form edits pending	yes	either	`Save`	persist local edits; after save, re-evaluates to `Save` (disabled) or `Redeploy`
Exists, nothing local, DB = deploy	no	yes	`Save`	disabled
Exists, nothing local, DB ≠ deploy	no	no	`Redeploy`	deploy DB state
Deploy in progress	—	—	`Deploying…`	disabled, spinner

A secondary Discard ghost button appears adjacent to the primary button whenever the form has unsaved local edits. It resets form fields to DB-saved values.

Net-new first-deploy flow — clicking Save on a net-new form creates the app record, uploads the JAR as version 1, persists container + agent config, and routes to /apps/:slug. It does not deploy. The transition lands the user on the same page in existing-app mode with the button showing Redeploy. This is the deliberate trade-off for unifying the button label across modes.

Traces & Taps + Route Recording — live-apply tabs

These tabs remain on the Deployment page (single-source-of-truth for the full config) but are visually distinguished:

A persistent info banner at the top of each: "Live controls — changes apply immediately to running agents and do not participate in the Save/Redeploy cycle."
Tab labels carry a ● live indicator.
Editors remain fully interactive — user still manages processors and route recording from this page.
These tabs' writes do not flip the dirty indicator; the primary button is unaffected.

Deployment tab

Auto-activates when the user clicks Redeploy (and when landing on a page whose app currently has a STARTING deployment).

Contents top-to-bottom:

Current deployment card — status badge + StatusDot, version, JAR filename, JAR checksum (short), replica count, external URL (linkified when RUNNING), deployed-at timestamp. Action buttons: Stop (RUNNING/STARTING/DEGRADED), Start (STOPPED).
Progress bar — only rendered when status === STARTING. Existing DeploymentProgress 7-stage step indicator, unchanged.
Startup log panel — existing StartupLogPanel, uses useStartupLogs (3s polling while STARTING).
- Flex-grow inside the tab: fills whatever vertical space is left after the status card, progress bar, and history disclosure.
- Minimum height ~200px. Internal scroll on overflow.
- Does not auto-close on success or failure. Remains mounted until the user navigates away or a newer deploy replaces its content.
History disclosure (collapsed by default) — compact table of past deployments: timestamp, version, status, duration, started by. Row click expands its startup log inline (lazy-loaded). This is also the raw JAR-version-history affordance.

Empty state (net-new, no deploys ever): No deployments yet. Save your configuration and click Redeploy to launch.

Behavior during an active deploy:

Primary button: Deploying… (disabled).
Config tabs remain editable — the user can stage the next iteration while the current one runs.
Local edits during deploy cannot be saved until the current deploy completes. Once it does, button re-evaluates normally.

Backend changes

1. Agent config write path gains a staged/live flag

The existing ApplicationConfigController endpoint persists config to DB and pushes an SSE config-update to live agents in one atomic call.

Change: add a query parameter ?apply=staged|live (default live, preserving existing non-UI callers).

apply=staged — write to DB only, no SSE push. Used by the deployment page.
apply=live — write to DB and push SSE. Used by the existing real-time UI on Dashboard / Runtime pages, and any non-UI caller that relies on current behavior.

This keeps one endpoint and one DTO. The gating happens in the service layer.

2. Deployment snapshot column

Flyway V2 adds deployed_config_snapshot JSONB to the deployments table:

ALTER TABLE deployments
  ADD COLUMN deployed_config_snapshot JSONB;

The snapshot contains {jarVersionId, agentConfig, containerConfig, sensitiveKeys} captured at the moment a deployment transitions to a successful RUNNING state (not at deploy start — see failure semantics below).

No backfill for existing deployments. The column is NULL for historical rows. Dirty detection treats "no snapshot on last successful deployment" the same as "no successful deployment" — everything is dirty, and the first Redeploy after migration will populate the first snapshot. This is acceptable because dirty-state is the only reader of the column.

Dirty check reads the last successful deployment's snapshot for the (app, environment) pair and compares against the current DB state. If no successful deploy exists yet (or the snapshot is NULL), everything is dirty by definition.

3. Dirty-state endpoint

GET /api/v1/environments/{env}/apps/{slug}/dirty-state

Returns:

{
  "dirty": true,
  "lastSuccessfulDeploymentId": "…",
  "differences": [
    { "field": "agentConfig.samplingRate", "staged": "1.0", "deployed": "0.5" },
    { "field": "containerConfig.memoryLimitMb", "staged": "1024", "deployed": "512" },
    { "field": "jarVersion", "staged": "v6", "deployed": "v5" }
  ]
}

The UI uses this to drive the button label and per-tab dirty markers (asterisks on tab labels). Keeping the comparison server-side means the source of truth for "what will change on redeploy" is one service rather than two implementations at risk of drift.

4. Checkpoint restore — no new endpoint

Past deployments are already queryable via GET /deployments. The restore action is pure client-side: pick a deployment, read its deployed_config_snapshot, hydrate form fields. The server sees only the eventual Save + Redeploy calls.

5. JAR upload staging — no API change

Client-state only until Save. The existing POST /apps/{slug}/versions multipart endpoint is unchanged; it's invoked during the Save handler as part of a sequence (create app if needed → upload JAR → write config with ?apply=staged).

Migration & clean-break

ui/src/pages/AppsTab/AppsTab.tsx (1387 lines) is split. AppListView stays. New directory ui/src/pages/AppsTab/AppDeploymentPage/ contains the unified page, split into child files for the Identity section, each config tab, the Deployment tab, Checkpoints, and shared hooks (dirty detection, config sync, filename → name derivation).
CreateAppView, AppDetailView, OverviewSubTab, ConfigSubTab, VersionRow are deleted.
No backwards-compat shims, no legacy flags, no query-string redirects. Removed sub-routes (/apps/:slug?tab=overview) simply land on the default tab.
.claude/rules/ui.md Deployments bullet is rewritten in the same commit.
.claude/rules/app-classes.md (if it documents controllers) notes the new ?apply=staged|live parameter.
OpenAPI schema is regenerated per the CLAUDE.md procedure. ui/src/api/openapi.json and ui/src/api/schema.d.ts are regenerated and committed alongside the backend change.

Failure modes & edge cases

Save failure (JAR upload timeout, DB error): button returns to Save. Form keeps local edits. Toast with the error (24h duration — matches existing AppsTab pattern). No partial commits — if JAR upload succeeds but config write fails, the orphan JAR version is harmless.
Deploy failure: Deploying… → Redeploy (still dirty, snapshot not written). Progress bar sticks on the failed stage (red). Log stays mounted. User can fix config or upload different JAR, re-Save, click Redeploy again.
Snapshot-on-success-only: deployed_config_snapshot is populated only when a deployment reaches a successful RUNNING state. Failed deployments exist in history but do not participate in "last known good".
User edits form during active deploy: config tabs editable, primary button stays Deploying…. On completion, button re-evaluates against the new snapshot.
Concurrent edit (two users, same app): out of scope for v1 — tracked in #147. Current behavior: last-write-wins.
Browser refresh during active deploy: state is server-side. Progress re-renders from deployment.deployStage, log re-fetches from startup logs endpoint. Deployment tab auto-activates on load if any STARTING deployment exists; otherwise default is Monitoring.
Unsaved-change warning on navigation: router-level blocker using the DS ConfirmDialog (same pattern as existing delete-app confirmation). Triggered when form has staged edits and the user navigates away via sidebar, back button, or any in-app route change. Not window.beforeunload — DS-themed dialog only.
Environment switch: intentionally discards unsaved work. No warning. Page remounts per existing behavior.
App doesn't exist in selected env: 404 via @EnvPath. Preserve the existing "Unmanaged Application" empty state when the app exists in catalog (discovered via agent) but has no managed record in this env, with the "Create Managed App" CTA.

Testing

Backend (integration, REST-API-driven per project preference):

Net-new save flow: POST apps → POST versions → PUT config?apply=staged → PUT container-config completes without creating any deployment row.
?apply=staged write does not emit SSE config-update to a connected agent; ?apply=live write does.
deployed_config_snapshot is populated on a deployment that reaches RUNNING; not populated on a deployment that reaches FAILED.
GET /dirty-state returns dirty=true when desired state differs from the last-successful-deployment snapshot; dirty=false when they match.
Checkpoint restore: hydrating form from a past deployment's snapshot and saving produces a new desired state identical to the snapshot.

UI (Vitest):

Dirty-detection pure function against a matrix of input combinations.
Filename → name derivation against the examples table above (including orphan stripping and _ handling).
Router blocker dialog opens on nav-away with dirty form; does not open on clean form.

Manual browser verification (per CLAUDE.md): walk through the 4 visual states (net-new, clean, dirty, deploying) including an end-to-end Save → Redeploy cycle, a checkpoint restore, and a deploy failure path before claiming done.

Open questions carried forward

Issue #147 — optimistic locking / concurrent-edit protection. Deferred.

Visual reference

ASCII mockups (State A: net-new, State B: deployed clean, State C: dirty with staged JAR, State D: active deploy on Deployment tab) are preserved in the brainstorming transcript. When implementing, these are the target screens.

18 KiB Raw Blame History