From 1f0ab002d681e5ed21bff32254582a15d8ade772 Mon Sep 17 00:00:00 2001 From: hsiegeln <37154749+hsiegeln@users.noreply.github.com> Date: Thu, 23 Apr 2026 11:31:50 +0200 Subject: [PATCH] spec(deploy): checkpoints table redesign + deployment audit gap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces the cramped Checkpoints disclosure with a real DataTable + a side drawer (Logs / Config with snapshot/diff modes) and closes the audit-log gap discovered in DeploymentController (deploy/stop/promote currently make zero auditService.log calls). Cap visible checkpoints at Environment.jarRetentionCount — beyond that, JARs are pruned and rows aren't restorable. Logs scoped per-deployment via instance_id IN (...) computed from replicaStates (no time window needed). Compare folded into Config as a view-mode toggle. Two-phase rollout (backend ships first to close the audit gap immediately). Co-Authored-By: Claude Opus 4.7 (1M context) --- ...04-23-checkpoints-table-redesign-design.md | 264 ++++++++++++++++++ 1 file changed, 264 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-23-checkpoints-table-redesign-design.md diff --git a/docs/superpowers/specs/2026-04-23-checkpoints-table-redesign-design.md b/docs/superpowers/specs/2026-04-23-checkpoints-table-redesign-design.md new file mode 100644 index 00000000..d7d842ca --- /dev/null +++ b/docs/superpowers/specs/2026-04-23-checkpoints-table-redesign-design.md @@ -0,0 +1,264 @@ +# Checkpoints table redesign + deployment audit gap closure + +**Date:** 2026-04-23 +**Status:** Spec — pending implementation +**Affects:** App deployment page, deployments backend, audit log + +## Context + +The Checkpoints disclosure on the unified app deployment page (`ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx`) currently renders past deployments as a cramped row list — a Badge, a "12m ago" label, and a Restore button. It hides the operator information that matters most when reasoning about a checkpoint: who deployed it, the JAR filename (not just the version number), the deployment outcome, and access to the logs and config snapshot the deployment ran with. + +Investigating this also surfaced a **gap in the audit log**: `DeploymentController.deploy / stop / promote` make zero `auditService.log(...)` calls. Container deployments — the most consequential operations the server performs — leave no audit trail today. Closing this gap is in scope because it's prerequisite to the "Deployed by" column. + +## Goals + +1. Replace the cramped checkpoints list with a real table (DS `DataTable`) showing version, JAR filename, deployer, time, strategy, and outcome. +2. Capture and display "who deployed" — backend gains a `created_by` column on `deployments`, populated from `SecurityContextHolder`. +3. Audit deploy / stop / promote operations under a new `AuditCategory.DEPLOYMENT` value. +4. Provide an in-page detail view (side drawer) where the operator can review the deployment's logs and config snapshot before deciding to restore, with an optional diff against the current live config. +5. Cap the visible checkpoint list at the environment's JAR retention count, since older entries cannot be restored. + +## Out of scope + +- Sortable column headers (default newest-first is enough) +- Deep-linking via `?checkpoint=` query param +- "Remember last drawer tab" preference +- Bulk actions on checkpoints +- Promoting `SideDrawer` into `@cameleer/design-system` (wait for a second consumer) + +## Backend changes + +### Audit category + +Add `DEPLOYMENT` to `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java`: + +```java +public enum AuditCategory { + INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, + OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE, + ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE, + DEPLOYMENT +} +``` + +The `AuditCategory.valueOf(...)` lookup in `AuditLogController` picks this up automatically. The Admin → Audit page filter dropdown gets one new option in `ui/src/pages/Admin/AuditLogPage.tsx`. + +### Audit calls in `DeploymentController` + +Add `AuditService` injection and write audit rows on every successful and failed lifecycle operation. Action codes: + +| Method | Action | Target | Details | +|---|---|---|---| +| `deploy` | `deploy_app` | `deployment.id().toString()` | `{ appSlug, envSlug, appVersionId, jarFilename, version }` | +| `stop` | `stop_deployment` | `deploymentId.toString()` | `{ appSlug, envSlug }` | +| `promote` | `promote_deployment` | `deploymentId.toString()` | `{ sourceEnv, targetEnv, appSlug, appVersionId }` | + +Each `try` branch writes `AuditResult.SUCCESS`; `catch (IllegalArgumentException)` writes `AuditResult.FAILURE` with the exception message in details before returning the existing 404. Pattern matches `OutboundConnectionAdminController`. + +### Flyway migration `V2__add_deployment_created_by.sql` + +```sql +ALTER TABLE deployments ADD COLUMN created_by TEXT REFERENCES users(user_id); +CREATE INDEX idx_deployments_created_by ON deployments (created_by); +``` + +Nullable — existing rows stay `NULL` (rendered as `—` in UI). New rows always populated. No backfill: pre-V2 history is unrecoverable, and the column starts paying off from the next deploy onward. + +### Service signature change + +`DeploymentService.createDeployment(appId, appVersionId, envId, createdBy)` and `promote(targetAppId, sourceVersionId, targetEnvId, createdBy)` both gain a trailing `String createdBy` parameter. `PostgresDeploymentRepository` writes it to the new column. + +`DeploymentController` resolves `createdBy` via the existing user-id convention: strip `"user:"` prefix from `SecurityContextHolder.getContext().getAuthentication().getName()`. Same helper pattern as `AlertRuleController` / `OutboundConnectionAdminController`. + +### DTO change + +`com.cameleer.server.core.runtime.Deployment` record gains `createdBy: String`. UI `Deployment` interface in `ui/src/api/queries/admin/apps.ts` gains `createdBy: string | null`. + +### Log filter for the drawer + +`LogQueryController.GET /api/v1/environments/{envSlug}/logs` accepts a new multi-value query param `instanceIds` (comma-split, OR-joined). Translates to `WHERE instance_id IN (...)` against the existing `LowCardinality(String)` index on `logs.instance_id` (already part of the `ORDER BY` key). + +`LogSearchRequest` gains `instanceIds: List` (null-normalized). Service layer adds the `IN (...)` clause when non-null and non-empty. + +The drawer client computes the instance_id list from `Deployment.replicaStates`: for each replica, `instance_id = "{envSlug}-{appSlug}-{replicaIndex}-{generation}"` where generation is the first 8 chars of `deployment.id`. This is the documented format from `.claude/rules/docker-orchestration.md` — pure client-side derivation, no extra server endpoint. + +## Drawer infrastructure + +The design system provides `Modal` but no drawer. Building a project-local component is preferred over submitting to DS first (single consumer; easier to iterate locally). + +**File:** `ui/src/components/SideDrawer.tsx` + `SideDrawer.module.css` (~120 LOC total). + +**API:** + +```tsx + setSelectedCheckpoint(null)} + title={`Deployment v${version} · ${jarFilename}`} + size="lg" // 'md'=560px, 'lg'=720px, 'xl'=900px + footer={} +> + {/* scrollable body */} + +``` + +**Behavior:** +- React portal to `document.body` (mirrors DS `Modal`). +- Slides in from right via `transform: translateX(100% → 0)` over 240ms ease-out. +- Click-blocking transparent backdrop (no dim — the parent table stays readable). Clicking outside closes. +- ESC closes. +- Focus trap on open; focus restored to trigger on close. +- Sticky header (title + close ×) and optional sticky footer. +- Body uses `overflow-y: auto`. +- All colors via DS CSS variables (`--bg`, `--border`, `--shadow-lg`). + +**Unsaved-changes interaction:** Opening the drawer is unrestricted. The drawer is read-only — only Restore mutates form state, and Restore already triggers the existing unsaved-changes guard via `useUnsavedChangesBlocker`. + +## Checkpoints table + +**File:** `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx` — replaces `Checkpoints.tsx`. + +**Columns** (left to right): + +| Column | Source | Notes | +|---|---|---| +| Version | `versionMap.get(d.appVersionId).version` | Badge "v6" with auto-color (matches existing pattern) | +| JAR | `versionMap.get(d.appVersionId).jarFilename` | Monospace; truncate with tooltip on overflow | +| Deployed by | `d.createdBy` | Bare username; OIDC users show `oidc:` truncated with tooltip; null shows `—` muted | +| Deployed | `d.deployedAt` | Relative ("12m ago") + ISO subline | +| Strategy | `d.deploymentStrategy` | Small pill: "blue/green" or "rolling" | +| Outcome | `d.status` | Tinted pill: STOPPED (slate), DEGRADED (amber) | +| (chevron) | — | Visual affordance for "row click opens drawer" | + +**Interaction:** +- Row click opens `CheckpointDetailDrawer` (no separate "View" button). +- No per-row Restore button — Restore lives inside the drawer to force review before action. +- Pruned-JAR rows (`!versionMap.has(d.appVersionId)`) render at 55% opacity with a strikethrough on the filename and an amber "archived — JAR pruned" hint. Row stays clickable; Restore inside the drawer is disabled with tooltip. +- Currently-running deployment is excluded (already represented by `StatusCard` above). + +**Empty state:** When zero checkpoints, render a single full-width muted row: "No past deployments yet." + +## Pagination + +Visible cap = `Environment.jarRetentionCount` rows (newest first). Anything older has likely been pruned and is not restorable, so it's hidden by default. + +- `total ≤ jarRetentionCount` → render all, no expander. +- `total > jarRetentionCount` → render newest `jarRetentionCount` rows + an expander row: **"Show older (N) — archived, postmortem only"**. Expanding renders the full list (older rows already styled as archived). +- `jarRetentionCount === 0` (unlimited or unconfigured) → fall back to a default cap of 10. + +`jarRetentionCount` comes from `useEnvironments()` (already in the env-store). + +## Drawer detail view + +**File:** `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/index.tsx` plus three panel files: `LogsPanel.tsx`, `ConfigPanel.tsx`, `ComparePanel.tsx`. + +**Header:** +- Version badge + JAR filename + outcome pill. +- Meta line: "Deployed by **{createdBy}** · {relative} ({ISO}) · Strategy: {strategy} · {N} replicas · ran for {duration}". +- Close × top-right. + +**Tabs** (DS `Tabs`): +- **Logs** — default on open +- **Config** — read-only render of the live config sub-tabs, with a view-mode toggle for "Snapshot" vs "Diff vs current" + +### Logs panel + +Reuses `useInfiniteApplicationLogs` with the new `instanceIds` filter. The hook signature gets an optional `instanceIds: string[]` parameter that flows through to the `LogQueryController` query string. + +**Filters** (in addition to `instanceIds`): +- Existing source/level multi-select pills +- New replica filter dropdown: "all (N)" / "0" / "1" / ... / "N-1" — narrows to a single replica when troubleshooting blue-green or rolling deploys. + +**Default sort:** newest first (matches operator mental model when investigating a stopped deployment). + +**Total line count** displayed in the filter bar. + +### Config panel + +Renders the five existing live config sub-tabs (`Monitoring`, `Resources`, `Variables`, `SensitiveKeys`, `Deployment`) **read-only**, hydrated from `deployedConfigSnapshot`. + +Each sub-tab component (`ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/*`) gains an optional `readOnly?: boolean` prop. When `readOnly` is set: +- All inputs disabled (`disabled` attribute + visual styling) +- Save / edit buttons hidden +- Live banners (`LiveBanner`) hidden — these are not applicable to a frozen snapshot + +If a sub-tab currently mixes derived state with form state in a way that makes a clean `readOnly` toggle awkward, refactor that sub-tab as part of this work. Don't proceed with leaky read-only behavior. + +**View-mode toggle:** "Snapshot" / "Diff vs current". Default = Snapshot (full read-only render). Diff mode shows differences only — both old and new values per changed field, with red/green left borders, grouped by sub-tab. Each sub-tab pill shows a change-count badge (e.g. "Resources (2)"); sub-tabs with zero differences are dimmed and render a muted "No differences in this section" message when clicked. + +Diff base = current live config, pulled via the existing `useApplicationConfig` hook the live form already uses. Algorithm: deep-equal field-level walk between snapshot and current. + +The toggle is hidden entirely when JAR is pruned (the missing JAR makes "current vs snapshot" comparison incomplete and misleading). + +**Footer:** Sticky. Single primary button "Restore this checkpoint" + helper text "Restoring hydrates the form — you'll still need to Redeploy." + +When JAR is pruned: button disabled with tooltip "JAR was pruned by the environment retention policy". + +Restore behavior is unchanged from today: closes the drawer + hydrates the form via the existing `onRestore(deploymentId)` callback. No backend call; the eventual Redeploy generates the next `deploy_app` audit row. + +## Authorization + +`DeploymentController` and `AppController` are already class-level `@PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')")`, so the deployment page is operator-gated. The new `instanceIds` filter on `LogQueryController` (which is VIEWER+) widens nothing — viewers can already query the same logs by `application + environment`; the filter just narrows. + +## Real-time updates + +When a new deployment lands, the previous "current" becomes a checkpoint. TanStack Query already polls deployments via the existing `useDeployments(appSlug, envSlug)` hook; the new table consumes the same data — auto-refresh comes for free. + +## Tests + +**Backend integration tests:** + +| Test | What it asserts | +|---|---| +| `V2MigrationIT` | `created_by` column exists, FK valid, index exists | +| `DeploymentServiceCreatedByIT` | `createDeployment(...createdBy)` persists the value | +| `DeploymentControllerAuditIT` | All three lifecycle actions write the expected audit row (action, category, target, details, actor, result) including FAILURE branches | +| `LogQueryControllerInstanceIdsFilterIT` | `?instanceIds=a,b,c` returns only matching rows; empty/missing param preserves prior behavior | + +**UI component tests:** + +| Test | What it asserts | +|---|---| +| `SideDrawer.test.tsx` | open/close, ESC closes, backdrop click closes, focus trap | +| `CheckpointsTable.test.tsx` | row click opens drawer; pruned-JAR row dimmed + clickable; empty state | +| `CheckpointDetailDrawer.test.tsx` | renders correct logs (mocked instance_id list); Restore disabled when JAR pruned | +| `ConfigPanel.test.tsx` | snapshot mode renders all fields read-only; diff mode counts differences correctly per sub-tab; "no differences" message when section unchanged; toggle hidden when JAR pruned | + +## Files touched + +**Backend:** +- New: `cameleer-server-app/src/main/resources/db/migration/V2__add_deployment_created_by.sql` +- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` (add `DEPLOYMENT`) +- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Deployment.java` (record field) +- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java` (signature + impl) +- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresDeploymentRepository.java` (insert + map) +- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java` (audit calls + createdBy resolution) +- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java` (instanceIds param) +- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java` (instanceIds field) +- Regenerate: `cameleer-server-app/src/main/resources/openapi.json` (controller change → SPA types) + +**UI:** +- New: `ui/src/components/SideDrawer.tsx` + `SideDrawer.module.css` +- New: `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx` +- New: `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/{index,LogsPanel,ConfigPanel}.tsx` (Compare is a view-mode inside ConfigPanel, not a separate file) +- Modified: `ui/src/pages/AppsTab/AppDeploymentPage/IdentitySection.tsx` (swap Checkpoints → CheckpointsTable) +- Deleted: `ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx` +- Modified: `ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/{Monitoring,Resources,Variables,SensitiveKeys,Deployment}Tab.tsx` (add `readOnly?` prop) +- Modified: `ui/src/api/queries/logs.ts` (`useInfiniteApplicationLogs` accepts `instanceIds`) +- Modified: `ui/src/api/queries/admin/apps.ts` (`Deployment.createdBy` field) +- Modified: `ui/src/api/schema.d.ts` + `ui/src/api/openapi.json` (regenerated) +- Modified: `ui/src/pages/Admin/AuditLogPage.tsx` (one new category in filter dropdown) + +**Docs / rules:** +- Modified: `.claude/rules/app-classes.md` (DeploymentController audit calls + LogQueryController instanceIds param) +- Modified: `.claude/rules/ui.md` (CheckpointsTable + SideDrawer pattern) +- Modified: `.claude/rules/core-classes.md` (`AuditCategory.DEPLOYMENT`, `Deployment.createdBy`) + +## Rollout + +Two phases, ideally two PRs: + +1. **Backend phase** — V2 migration, `AuditCategory.DEPLOYMENT`, audit calls in `DeploymentController`, `created_by` plumbing through `DeploymentService` / record / repository, `LogQueryController` `instanceIds` param. Ships independently because the column is nullable, the audit category is picked up automatically, and the new log filter is opt-in. +2. **UI phase** — `SideDrawer`, `CheckpointsTable`, `CheckpointDetailDrawer`, `readOnly?` props on the five config sub-tabs, audit-page dropdown entry. Depends on the backend PR being merged + the OpenAPI schema regenerated. + +Splitting in this order means production gets the audit trail and `created_by` capture immediately, even before the new UI lands, so the audit gap is closed as quickly as possible.