Replaces the cramped Checkpoints disclosure with a real DataTable + a side drawer (Logs / Config with snapshot/diff modes) and closes the audit-log gap discovered in DeploymentController (deploy/stop/promote currently make zero auditService.log calls). Cap visible checkpoints at Environment.jarRetentionCount — beyond that, JARs are pruned and rows aren't restorable. Logs scoped per-deployment via instance_id IN (...) computed from replicaStates (no time window needed). Compare folded into Config as a view-mode toggle. Two-phase rollout (backend ships first to close the audit gap immediately). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Checkpoints table redesign + deployment audit gap closure
Date: 2026-04-23 Status: Spec — pending implementation Affects: App deployment page, deployments backend, audit log
Context
The Checkpoints disclosure on the unified app deployment page (ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx) currently renders past deployments as a cramped row list — a Badge, a "12m ago" label, and a Restore button. It hides the operator information that matters most when reasoning about a checkpoint: who deployed it, the JAR filename (not just the version number), the deployment outcome, and access to the logs and config snapshot the deployment ran with.
Investigating this also surfaced a gap in the audit log: DeploymentController.deploy / stop / promote make zero auditService.log(...) calls. Container deployments — the most consequential operations the server performs — leave no audit trail today. Closing this gap is in scope because it's prerequisite to the "Deployed by" column.
Goals
- Replace the cramped checkpoints list with a real table (DS
DataTable) showing version, JAR filename, deployer, time, strategy, and outcome. - Capture and display "who deployed" — backend gains a
created_bycolumn ondeployments, populated fromSecurityContextHolder. - Audit deploy / stop / promote operations under a new
AuditCategory.DEPLOYMENTvalue. - Provide an in-page detail view (side drawer) where the operator can review the deployment's logs and config snapshot before deciding to restore, with an optional diff against the current live config.
- Cap the visible checkpoint list at the environment's JAR retention count, since older entries cannot be restored.
Out of scope
- Sortable column headers (default newest-first is enough)
- Deep-linking via
?checkpoint=<id>query param - "Remember last drawer tab" preference
- Bulk actions on checkpoints
- Promoting
SideDrawerinto@cameleer/design-system(wait for a second consumer)
Backend changes
Audit category
Add DEPLOYMENT to cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java:
public enum AuditCategory {
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE,
DEPLOYMENT
}
The AuditCategory.valueOf(...) lookup in AuditLogController picks this up automatically. The Admin → Audit page filter dropdown gets one new option in ui/src/pages/Admin/AuditLogPage.tsx.
Audit calls in DeploymentController
Add AuditService injection and write audit rows on every successful and failed lifecycle operation. Action codes:
| Method | Action | Target | Details |
|---|---|---|---|
deploy |
deploy_app |
deployment.id().toString() |
{ appSlug, envSlug, appVersionId, jarFilename, version } |
stop |
stop_deployment |
deploymentId.toString() |
{ appSlug, envSlug } |
promote |
promote_deployment |
deploymentId.toString() |
{ sourceEnv, targetEnv, appSlug, appVersionId } |
Each try branch writes AuditResult.SUCCESS; catch (IllegalArgumentException) writes AuditResult.FAILURE with the exception message in details before returning the existing 404. Pattern matches OutboundConnectionAdminController.
Flyway migration V2__add_deployment_created_by.sql
ALTER TABLE deployments ADD COLUMN created_by TEXT REFERENCES users(user_id);
CREATE INDEX idx_deployments_created_by ON deployments (created_by);
Nullable — existing rows stay NULL (rendered as — in UI). New rows always populated. No backfill: pre-V2 history is unrecoverable, and the column starts paying off from the next deploy onward.
Service signature change
DeploymentService.createDeployment(appId, appVersionId, envId, createdBy) and promote(targetAppId, sourceVersionId, targetEnvId, createdBy) both gain a trailing String createdBy parameter. PostgresDeploymentRepository writes it to the new column.
DeploymentController resolves createdBy via the existing user-id convention: strip "user:" prefix from SecurityContextHolder.getContext().getAuthentication().getName(). Same helper pattern as AlertRuleController / OutboundConnectionAdminController.
DTO change
com.cameleer.server.core.runtime.Deployment record gains createdBy: String. UI Deployment interface in ui/src/api/queries/admin/apps.ts gains createdBy: string | null.
Log filter for the drawer
LogQueryController.GET /api/v1/environments/{envSlug}/logs accepts a new multi-value query param instanceIds (comma-split, OR-joined). Translates to WHERE instance_id IN (...) against the existing LowCardinality(String) index on logs.instance_id (already part of the ORDER BY key).
LogSearchRequest gains instanceIds: List<String> (null-normalized). Service layer adds the IN (...) clause when non-null and non-empty.
The drawer client computes the instance_id list from Deployment.replicaStates: for each replica, instance_id = "{envSlug}-{appSlug}-{replicaIndex}-{generation}" where generation is the first 8 chars of deployment.id. This is the documented format from .claude/rules/docker-orchestration.md — pure client-side derivation, no extra server endpoint.
Drawer infrastructure
The design system provides Modal but no drawer. Building a project-local component is preferred over submitting to DS first (single consumer; easier to iterate locally).
File: ui/src/components/SideDrawer.tsx + SideDrawer.module.css (~120 LOC total).
API:
<SideDrawer
open={!!selectedCheckpoint}
onClose={() => setSelectedCheckpoint(null)}
title={`Deployment v${version} · ${jarFilename}`}
size="lg" // 'md'=560px, 'lg'=720px, 'xl'=900px
footer={<Button onClick={handleRestore}>Restore this checkpoint</Button>}
>
{/* scrollable body */}
</SideDrawer>
Behavior:
- React portal to
document.body(mirrors DSModal). - Slides in from right via
transform: translateX(100% → 0)over 240ms ease-out. - Click-blocking transparent backdrop (no dim — the parent table stays readable). Clicking outside closes.
- ESC closes.
- Focus trap on open; focus restored to trigger on close.
- Sticky header (title + close ×) and optional sticky footer.
- Body uses
overflow-y: auto. - All colors via DS CSS variables (
--bg,--border,--shadow-lg).
Unsaved-changes interaction: Opening the drawer is unrestricted. The drawer is read-only — only Restore mutates form state, and Restore already triggers the existing unsaved-changes guard via useUnsavedChangesBlocker.
Checkpoints table
File: ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx — replaces Checkpoints.tsx.
Columns (left to right):
| Column | Source | Notes |
|---|---|---|
| Version | versionMap.get(d.appVersionId).version |
Badge "v6" with auto-color (matches existing pattern) |
| JAR | versionMap.get(d.appVersionId).jarFilename |
Monospace; truncate with tooltip on overflow |
| Deployed by | d.createdBy |
Bare username; OIDC users show oidc:<sub> truncated with tooltip; null shows — muted |
| Deployed | d.deployedAt |
Relative ("12m ago") + ISO subline |
| Strategy | d.deploymentStrategy |
Small pill: "blue/green" or "rolling" |
| Outcome | d.status |
Tinted pill: STOPPED (slate), DEGRADED (amber) |
| (chevron) | — | Visual affordance for "row click opens drawer" |
Interaction:
- Row click opens
CheckpointDetailDrawer(no separate "View" button). - No per-row Restore button — Restore lives inside the drawer to force review before action.
- Pruned-JAR rows (
!versionMap.has(d.appVersionId)) render at 55% opacity with a strikethrough on the filename and an amber "archived — JAR pruned" hint. Row stays clickable; Restore inside the drawer is disabled with tooltip. - Currently-running deployment is excluded (already represented by
StatusCardabove).
Empty state: When zero checkpoints, render a single full-width muted row: "No past deployments yet."
Pagination
Visible cap = Environment.jarRetentionCount rows (newest first). Anything older has likely been pruned and is not restorable, so it's hidden by default.
total ≤ jarRetentionCount→ render all, no expander.total > jarRetentionCount→ render newestjarRetentionCountrows + an expander row: "Show older (N) — archived, postmortem only". Expanding renders the full list (older rows already styled as archived).jarRetentionCount === 0(unlimited or unconfigured) → fall back to a default cap of 10.
jarRetentionCount comes from useEnvironments() (already in the env-store).
Drawer detail view
File: ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/index.tsx plus three panel files: LogsPanel.tsx, ConfigPanel.tsx, ComparePanel.tsx.
Header:
- Version badge + JAR filename + outcome pill.
- Meta line: "Deployed by {createdBy} · {relative} ({ISO}) · Strategy: {strategy} · {N} replicas · ran for {duration}".
- Close × top-right.
Tabs (DS Tabs):
- Logs — default on open
- Config — read-only render of the live config sub-tabs, with a view-mode toggle for "Snapshot" vs "Diff vs current"
Logs panel
Reuses useInfiniteApplicationLogs with the new instanceIds filter. The hook signature gets an optional instanceIds: string[] parameter that flows through to the LogQueryController query string.
Filters (in addition to instanceIds):
- Existing source/level multi-select pills
- New replica filter dropdown: "all (N)" / "0" / "1" / ... / "N-1" — narrows to a single replica when troubleshooting blue-green or rolling deploys.
Default sort: newest first (matches operator mental model when investigating a stopped deployment).
Total line count displayed in the filter bar.
Config panel
Renders the five existing live config sub-tabs (Monitoring, Resources, Variables, SensitiveKeys, Deployment) read-only, hydrated from deployedConfigSnapshot.
Each sub-tab component (ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/*) gains an optional readOnly?: boolean prop. When readOnly is set:
- All inputs disabled (
disabledattribute + visual styling) - Save / edit buttons hidden
- Live banners (
LiveBanner) hidden — these are not applicable to a frozen snapshot
If a sub-tab currently mixes derived state with form state in a way that makes a clean readOnly toggle awkward, refactor that sub-tab as part of this work. Don't proceed with leaky read-only behavior.
View-mode toggle: "Snapshot" / "Diff vs current". Default = Snapshot (full read-only render). Diff mode shows differences only — both old and new values per changed field, with red/green left borders, grouped by sub-tab. Each sub-tab pill shows a change-count badge (e.g. "Resources (2)"); sub-tabs with zero differences are dimmed and render a muted "No differences in this section" message when clicked.
Diff base = current live config, pulled via the existing useApplicationConfig hook the live form already uses. Algorithm: deep-equal field-level walk between snapshot and current.
The toggle is hidden entirely when JAR is pruned (the missing JAR makes "current vs snapshot" comparison incomplete and misleading).
Footer: Sticky. Single primary button "Restore this checkpoint" + helper text "Restoring hydrates the form — you'll still need to Redeploy."
When JAR is pruned: button disabled with tooltip "JAR was pruned by the environment retention policy".
Restore behavior is unchanged from today: closes the drawer + hydrates the form via the existing onRestore(deploymentId) callback. No backend call; the eventual Redeploy generates the next deploy_app audit row.
Authorization
DeploymentController and AppController are already class-level @PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')"), so the deployment page is operator-gated. The new instanceIds filter on LogQueryController (which is VIEWER+) widens nothing — viewers can already query the same logs by application + environment; the filter just narrows.
Real-time updates
When a new deployment lands, the previous "current" becomes a checkpoint. TanStack Query already polls deployments via the existing useDeployments(appSlug, envSlug) hook; the new table consumes the same data — auto-refresh comes for free.
Tests
Backend integration tests:
| Test | What it asserts |
|---|---|
V2MigrationIT |
created_by column exists, FK valid, index exists |
DeploymentServiceCreatedByIT |
createDeployment(...createdBy) persists the value |
DeploymentControllerAuditIT |
All three lifecycle actions write the expected audit row (action, category, target, details, actor, result) including FAILURE branches |
LogQueryControllerInstanceIdsFilterIT |
?instanceIds=a,b,c returns only matching rows; empty/missing param preserves prior behavior |
UI component tests:
| Test | What it asserts |
|---|---|
SideDrawer.test.tsx |
open/close, ESC closes, backdrop click closes, focus trap |
CheckpointsTable.test.tsx |
row click opens drawer; pruned-JAR row dimmed + clickable; empty state |
CheckpointDetailDrawer.test.tsx |
renders correct logs (mocked instance_id list); Restore disabled when JAR pruned |
ConfigPanel.test.tsx |
snapshot mode renders all fields read-only; diff mode counts differences correctly per sub-tab; "no differences" message when section unchanged; toggle hidden when JAR pruned |
Files touched
Backend:
- New:
cameleer-server-app/src/main/resources/db/migration/V2__add_deployment_created_by.sql - Modified:
cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java(addDEPLOYMENT) - Modified:
cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Deployment.java(record field) - Modified:
cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java(signature + impl) - Modified:
cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresDeploymentRepository.java(insert + map) - Modified:
cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java(audit calls + createdBy resolution) - Modified:
cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java(instanceIds param) - Modified:
cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java(instanceIds field) - Regenerate:
cameleer-server-app/src/main/resources/openapi.json(controller change → SPA types)
UI:
- New:
ui/src/components/SideDrawer.tsx+SideDrawer.module.css - New:
ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx - New:
ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/{index,LogsPanel,ConfigPanel}.tsx(Compare is a view-mode inside ConfigPanel, not a separate file) - Modified:
ui/src/pages/AppsTab/AppDeploymentPage/IdentitySection.tsx(swap Checkpoints → CheckpointsTable) - Deleted:
ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx - Modified:
ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/{Monitoring,Resources,Variables,SensitiveKeys,Deployment}Tab.tsx(addreadOnly?prop) - Modified:
ui/src/api/queries/logs.ts(useInfiniteApplicationLogsacceptsinstanceIds) - Modified:
ui/src/api/queries/admin/apps.ts(Deployment.createdByfield) - Modified:
ui/src/api/schema.d.ts+ui/src/api/openapi.json(regenerated) - Modified:
ui/src/pages/Admin/AuditLogPage.tsx(one new category in filter dropdown)
Docs / rules:
- Modified:
.claude/rules/app-classes.md(DeploymentController audit calls + LogQueryController instanceIds param) - Modified:
.claude/rules/ui.md(CheckpointsTable + SideDrawer pattern) - Modified:
.claude/rules/core-classes.md(AuditCategory.DEPLOYMENT,Deployment.createdBy)
Rollout
Two phases, ideally two PRs:
- Backend phase — V2 migration,
AuditCategory.DEPLOYMENT, audit calls inDeploymentController,created_byplumbing throughDeploymentService/ record / repository,LogQueryControllerinstanceIdsparam. Ships independently because the column is nullable, the audit category is picked up automatically, and the new log filter is opt-in. - UI phase —
SideDrawer,CheckpointsTable,CheckpointDetailDrawer,readOnly?props on the five config sub-tabs, audit-page dropdown entry. Depends on the backend PR being merged + the OpenAPI schema regenerated.
Splitting in this order means production gets the audit trail and created_by capture immediately, even before the new UI lands, so the audit gap is closed as quickly as possible.