Files
cameleer-server/docs/superpowers/specs/2026-04-23-checkpoints-table-redesign-design.md
hsiegeln 1f0ab002d6 spec(deploy): checkpoints table redesign + deployment audit gap
Replaces the cramped Checkpoints disclosure with a real DataTable + a
side drawer (Logs / Config with snapshot/diff modes) and closes the
audit-log gap discovered in DeploymentController (deploy/stop/promote
currently make zero auditService.log calls).

Cap visible checkpoints at Environment.jarRetentionCount — beyond that,
JARs are pruned and rows aren't restorable. Logs scoped per-deployment
via instance_id IN (...) computed from replicaStates (no time window
needed). Compare folded into Config as a view-mode toggle. Two-phase
rollout (backend ships first to close the audit gap immediately).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:31:50 +02:00

16 KiB
Raw Permalink Blame History

Checkpoints table redesign + deployment audit gap closure

Date: 2026-04-23 Status: Spec — pending implementation Affects: App deployment page, deployments backend, audit log

Context

The Checkpoints disclosure on the unified app deployment page (ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx) currently renders past deployments as a cramped row list — a Badge, a "12m ago" label, and a Restore button. It hides the operator information that matters most when reasoning about a checkpoint: who deployed it, the JAR filename (not just the version number), the deployment outcome, and access to the logs and config snapshot the deployment ran with.

Investigating this also surfaced a gap in the audit log: DeploymentController.deploy / stop / promote make zero auditService.log(...) calls. Container deployments — the most consequential operations the server performs — leave no audit trail today. Closing this gap is in scope because it's prerequisite to the "Deployed by" column.

Goals

  1. Replace the cramped checkpoints list with a real table (DS DataTable) showing version, JAR filename, deployer, time, strategy, and outcome.
  2. Capture and display "who deployed" — backend gains a created_by column on deployments, populated from SecurityContextHolder.
  3. Audit deploy / stop / promote operations under a new AuditCategory.DEPLOYMENT value.
  4. Provide an in-page detail view (side drawer) where the operator can review the deployment's logs and config snapshot before deciding to restore, with an optional diff against the current live config.
  5. Cap the visible checkpoint list at the environment's JAR retention count, since older entries cannot be restored.

Out of scope

  • Sortable column headers (default newest-first is enough)
  • Deep-linking via ?checkpoint=<id> query param
  • "Remember last drawer tab" preference
  • Bulk actions on checkpoints
  • Promoting SideDrawer into @cameleer/design-system (wait for a second consumer)

Backend changes

Audit category

Add DEPLOYMENT to cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java:

public enum AuditCategory {
    INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
    OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
    ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE,
    DEPLOYMENT
}

The AuditCategory.valueOf(...) lookup in AuditLogController picks this up automatically. The Admin → Audit page filter dropdown gets one new option in ui/src/pages/Admin/AuditLogPage.tsx.

Audit calls in DeploymentController

Add AuditService injection and write audit rows on every successful and failed lifecycle operation. Action codes:

Method Action Target Details
deploy deploy_app deployment.id().toString() { appSlug, envSlug, appVersionId, jarFilename, version }
stop stop_deployment deploymentId.toString() { appSlug, envSlug }
promote promote_deployment deploymentId.toString() { sourceEnv, targetEnv, appSlug, appVersionId }

Each try branch writes AuditResult.SUCCESS; catch (IllegalArgumentException) writes AuditResult.FAILURE with the exception message in details before returning the existing 404. Pattern matches OutboundConnectionAdminController.

Flyway migration V2__add_deployment_created_by.sql

ALTER TABLE deployments ADD COLUMN created_by TEXT REFERENCES users(user_id);
CREATE INDEX idx_deployments_created_by ON deployments (created_by);

Nullable — existing rows stay NULL (rendered as in UI). New rows always populated. No backfill: pre-V2 history is unrecoverable, and the column starts paying off from the next deploy onward.

Service signature change

DeploymentService.createDeployment(appId, appVersionId, envId, createdBy) and promote(targetAppId, sourceVersionId, targetEnvId, createdBy) both gain a trailing String createdBy parameter. PostgresDeploymentRepository writes it to the new column.

DeploymentController resolves createdBy via the existing user-id convention: strip "user:" prefix from SecurityContextHolder.getContext().getAuthentication().getName(). Same helper pattern as AlertRuleController / OutboundConnectionAdminController.

DTO change

com.cameleer.server.core.runtime.Deployment record gains createdBy: String. UI Deployment interface in ui/src/api/queries/admin/apps.ts gains createdBy: string | null.

Log filter for the drawer

LogQueryController.GET /api/v1/environments/{envSlug}/logs accepts a new multi-value query param instanceIds (comma-split, OR-joined). Translates to WHERE instance_id IN (...) against the existing LowCardinality(String) index on logs.instance_id (already part of the ORDER BY key).

LogSearchRequest gains instanceIds: List<String> (null-normalized). Service layer adds the IN (...) clause when non-null and non-empty.

The drawer client computes the instance_id list from Deployment.replicaStates: for each replica, instance_id = "{envSlug}-{appSlug}-{replicaIndex}-{generation}" where generation is the first 8 chars of deployment.id. This is the documented format from .claude/rules/docker-orchestration.md — pure client-side derivation, no extra server endpoint.

Drawer infrastructure

The design system provides Modal but no drawer. Building a project-local component is preferred over submitting to DS first (single consumer; easier to iterate locally).

File: ui/src/components/SideDrawer.tsx + SideDrawer.module.css (~120 LOC total).

API:

<SideDrawer
  open={!!selectedCheckpoint}
  onClose={() => setSelectedCheckpoint(null)}
  title={`Deployment v${version} · ${jarFilename}`}
  size="lg"   // 'md'=560px, 'lg'=720px, 'xl'=900px
  footer={<Button onClick={handleRestore}>Restore this checkpoint</Button>}
>
  {/* scrollable body */}
</SideDrawer>

Behavior:

  • React portal to document.body (mirrors DS Modal).
  • Slides in from right via transform: translateX(100% → 0) over 240ms ease-out.
  • Click-blocking transparent backdrop (no dim — the parent table stays readable). Clicking outside closes.
  • ESC closes.
  • Focus trap on open; focus restored to trigger on close.
  • Sticky header (title + close ×) and optional sticky footer.
  • Body uses overflow-y: auto.
  • All colors via DS CSS variables (--bg, --border, --shadow-lg).

Unsaved-changes interaction: Opening the drawer is unrestricted. The drawer is read-only — only Restore mutates form state, and Restore already triggers the existing unsaved-changes guard via useUnsavedChangesBlocker.

Checkpoints table

File: ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx — replaces Checkpoints.tsx.

Columns (left to right):

Column Source Notes
Version versionMap.get(d.appVersionId).version Badge "v6" with auto-color (matches existing pattern)
JAR versionMap.get(d.appVersionId).jarFilename Monospace; truncate with tooltip on overflow
Deployed by d.createdBy Bare username; OIDC users show oidc:<sub> truncated with tooltip; null shows muted
Deployed d.deployedAt Relative ("12m ago") + ISO subline
Strategy d.deploymentStrategy Small pill: "blue/green" or "rolling"
Outcome d.status Tinted pill: STOPPED (slate), DEGRADED (amber)
(chevron) Visual affordance for "row click opens drawer"

Interaction:

  • Row click opens CheckpointDetailDrawer (no separate "View" button).
  • No per-row Restore button — Restore lives inside the drawer to force review before action.
  • Pruned-JAR rows (!versionMap.has(d.appVersionId)) render at 55% opacity with a strikethrough on the filename and an amber "archived — JAR pruned" hint. Row stays clickable; Restore inside the drawer is disabled with tooltip.
  • Currently-running deployment is excluded (already represented by StatusCard above).

Empty state: When zero checkpoints, render a single full-width muted row: "No past deployments yet."

Pagination

Visible cap = Environment.jarRetentionCount rows (newest first). Anything older has likely been pruned and is not restorable, so it's hidden by default.

  • total ≤ jarRetentionCount → render all, no expander.
  • total > jarRetentionCount → render newest jarRetentionCount rows + an expander row: "Show older (N) — archived, postmortem only". Expanding renders the full list (older rows already styled as archived).
  • jarRetentionCount === 0 (unlimited or unconfigured) → fall back to a default cap of 10.

jarRetentionCount comes from useEnvironments() (already in the env-store).

Drawer detail view

File: ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/index.tsx plus three panel files: LogsPanel.tsx, ConfigPanel.tsx, ComparePanel.tsx.

Header:

  • Version badge + JAR filename + outcome pill.
  • Meta line: "Deployed by {createdBy} · {relative} ({ISO}) · Strategy: {strategy} · {N} replicas · ran for {duration}".
  • Close × top-right.

Tabs (DS Tabs):

  • Logs — default on open
  • Config — read-only render of the live config sub-tabs, with a view-mode toggle for "Snapshot" vs "Diff vs current"

Logs panel

Reuses useInfiniteApplicationLogs with the new instanceIds filter. The hook signature gets an optional instanceIds: string[] parameter that flows through to the LogQueryController query string.

Filters (in addition to instanceIds):

  • Existing source/level multi-select pills
  • New replica filter dropdown: "all (N)" / "0" / "1" / ... / "N-1" — narrows to a single replica when troubleshooting blue-green or rolling deploys.

Default sort: newest first (matches operator mental model when investigating a stopped deployment).

Total line count displayed in the filter bar.

Config panel

Renders the five existing live config sub-tabs (Monitoring, Resources, Variables, SensitiveKeys, Deployment) read-only, hydrated from deployedConfigSnapshot.

Each sub-tab component (ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/*) gains an optional readOnly?: boolean prop. When readOnly is set:

  • All inputs disabled (disabled attribute + visual styling)
  • Save / edit buttons hidden
  • Live banners (LiveBanner) hidden — these are not applicable to a frozen snapshot

If a sub-tab currently mixes derived state with form state in a way that makes a clean readOnly toggle awkward, refactor that sub-tab as part of this work. Don't proceed with leaky read-only behavior.

View-mode toggle: "Snapshot" / "Diff vs current". Default = Snapshot (full read-only render). Diff mode shows differences only — both old and new values per changed field, with red/green left borders, grouped by sub-tab. Each sub-tab pill shows a change-count badge (e.g. "Resources (2)"); sub-tabs with zero differences are dimmed and render a muted "No differences in this section" message when clicked.

Diff base = current live config, pulled via the existing useApplicationConfig hook the live form already uses. Algorithm: deep-equal field-level walk between snapshot and current.

The toggle is hidden entirely when JAR is pruned (the missing JAR makes "current vs snapshot" comparison incomplete and misleading).

Footer: Sticky. Single primary button "Restore this checkpoint" + helper text "Restoring hydrates the form — you'll still need to Redeploy."

When JAR is pruned: button disabled with tooltip "JAR was pruned by the environment retention policy".

Restore behavior is unchanged from today: closes the drawer + hydrates the form via the existing onRestore(deploymentId) callback. No backend call; the eventual Redeploy generates the next deploy_app audit row.

Authorization

DeploymentController and AppController are already class-level @PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')"), so the deployment page is operator-gated. The new instanceIds filter on LogQueryController (which is VIEWER+) widens nothing — viewers can already query the same logs by application + environment; the filter just narrows.

Real-time updates

When a new deployment lands, the previous "current" becomes a checkpoint. TanStack Query already polls deployments via the existing useDeployments(appSlug, envSlug) hook; the new table consumes the same data — auto-refresh comes for free.

Tests

Backend integration tests:

Test What it asserts
V2MigrationIT created_by column exists, FK valid, index exists
DeploymentServiceCreatedByIT createDeployment(...createdBy) persists the value
DeploymentControllerAuditIT All three lifecycle actions write the expected audit row (action, category, target, details, actor, result) including FAILURE branches
LogQueryControllerInstanceIdsFilterIT ?instanceIds=a,b,c returns only matching rows; empty/missing param preserves prior behavior

UI component tests:

Test What it asserts
SideDrawer.test.tsx open/close, ESC closes, backdrop click closes, focus trap
CheckpointsTable.test.tsx row click opens drawer; pruned-JAR row dimmed + clickable; empty state
CheckpointDetailDrawer.test.tsx renders correct logs (mocked instance_id list); Restore disabled when JAR pruned
ConfigPanel.test.tsx snapshot mode renders all fields read-only; diff mode counts differences correctly per sub-tab; "no differences" message when section unchanged; toggle hidden when JAR pruned

Files touched

Backend:

  • New: cameleer-server-app/src/main/resources/db/migration/V2__add_deployment_created_by.sql
  • Modified: cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java (add DEPLOYMENT)
  • Modified: cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Deployment.java (record field)
  • Modified: cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java (signature + impl)
  • Modified: cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresDeploymentRepository.java (insert + map)
  • Modified: cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java (audit calls + createdBy resolution)
  • Modified: cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java (instanceIds param)
  • Modified: cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java (instanceIds field)
  • Regenerate: cameleer-server-app/src/main/resources/openapi.json (controller change → SPA types)

UI:

  • New: ui/src/components/SideDrawer.tsx + SideDrawer.module.css
  • New: ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx
  • New: ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/{index,LogsPanel,ConfigPanel}.tsx (Compare is a view-mode inside ConfigPanel, not a separate file)
  • Modified: ui/src/pages/AppsTab/AppDeploymentPage/IdentitySection.tsx (swap Checkpoints → CheckpointsTable)
  • Deleted: ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx
  • Modified: ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/{Monitoring,Resources,Variables,SensitiveKeys,Deployment}Tab.tsx (add readOnly? prop)
  • Modified: ui/src/api/queries/logs.ts (useInfiniteApplicationLogs accepts instanceIds)
  • Modified: ui/src/api/queries/admin/apps.ts (Deployment.createdBy field)
  • Modified: ui/src/api/schema.d.ts + ui/src/api/openapi.json (regenerated)
  • Modified: ui/src/pages/Admin/AuditLogPage.tsx (one new category in filter dropdown)

Docs / rules:

  • Modified: .claude/rules/app-classes.md (DeploymentController audit calls + LogQueryController instanceIds param)
  • Modified: .claude/rules/ui.md (CheckpointsTable + SideDrawer pattern)
  • Modified: .claude/rules/core-classes.md (AuditCategory.DEPLOYMENT, Deployment.createdBy)

Rollout

Two phases, ideally two PRs:

  1. Backend phase — V2 migration, AuditCategory.DEPLOYMENT, audit calls in DeploymentController, created_by plumbing through DeploymentService / record / repository, LogQueryController instanceIds param. Ships independently because the column is nullable, the audit category is picked up automatically, and the new log filter is opt-in.
  2. UI phaseSideDrawer, CheckpointsTable, CheckpointDetailDrawer, readOnly? props on the five config sub-tabs, audit-page dropdown entry. Depends on the backend PR being merged + the OpenAPI schema regenerated.

Splitting in this order means production gets the audit trail and created_by capture immediately, even before the new UI lands, so the audit gap is closed as quickly as possible.