spec(deploy): checkpoints table redesign + deployment audit gap

Replaces the cramped Checkpoints disclosure with a real DataTable + a
side drawer (Logs / Config with snapshot/diff modes) and closes the
audit-log gap discovered in DeploymentController (deploy/stop/promote
currently make zero auditService.log calls).

Cap visible checkpoints at Environment.jarRetentionCount — beyond that,
JARs are pruned and rows aren't restorable. Logs scoped per-deployment
via instance_id IN (...) computed from replicaStates (no time window
needed). Compare folded into Config as a view-mode toggle. Two-phase
rollout (backend ships first to close the audit gap immediately).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-04-23 11:31:50 +02:00
parent 242ef1f0af
commit 1f0ab002d6

View File

@@ -0,0 +1,264 @@
# Checkpoints table redesign + deployment audit gap closure
**Date:** 2026-04-23
**Status:** Spec — pending implementation
**Affects:** App deployment page, deployments backend, audit log
## Context
The Checkpoints disclosure on the unified app deployment page (`ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx`) currently renders past deployments as a cramped row list — a Badge, a "12m ago" label, and a Restore button. It hides the operator information that matters most when reasoning about a checkpoint: who deployed it, the JAR filename (not just the version number), the deployment outcome, and access to the logs and config snapshot the deployment ran with.
Investigating this also surfaced a **gap in the audit log**: `DeploymentController.deploy / stop / promote` make zero `auditService.log(...)` calls. Container deployments — the most consequential operations the server performs — leave no audit trail today. Closing this gap is in scope because it's prerequisite to the "Deployed by" column.
## Goals
1. Replace the cramped checkpoints list with a real table (DS `DataTable`) showing version, JAR filename, deployer, time, strategy, and outcome.
2. Capture and display "who deployed" — backend gains a `created_by` column on `deployments`, populated from `SecurityContextHolder`.
3. Audit deploy / stop / promote operations under a new `AuditCategory.DEPLOYMENT` value.
4. Provide an in-page detail view (side drawer) where the operator can review the deployment's logs and config snapshot before deciding to restore, with an optional diff against the current live config.
5. Cap the visible checkpoint list at the environment's JAR retention count, since older entries cannot be restored.
## Out of scope
- Sortable column headers (default newest-first is enough)
- Deep-linking via `?checkpoint=<id>` query param
- "Remember last drawer tab" preference
- Bulk actions on checkpoints
- Promoting `SideDrawer` into `@cameleer/design-system` (wait for a second consumer)
## Backend changes
### Audit category
Add `DEPLOYMENT` to `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java`:
```java
public enum AuditCategory {
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE,
DEPLOYMENT
}
```
The `AuditCategory.valueOf(...)` lookup in `AuditLogController` picks this up automatically. The Admin → Audit page filter dropdown gets one new option in `ui/src/pages/Admin/AuditLogPage.tsx`.
### Audit calls in `DeploymentController`
Add `AuditService` injection and write audit rows on every successful and failed lifecycle operation. Action codes:
| Method | Action | Target | Details |
|---|---|---|---|
| `deploy` | `deploy_app` | `deployment.id().toString()` | `{ appSlug, envSlug, appVersionId, jarFilename, version }` |
| `stop` | `stop_deployment` | `deploymentId.toString()` | `{ appSlug, envSlug }` |
| `promote` | `promote_deployment` | `deploymentId.toString()` | `{ sourceEnv, targetEnv, appSlug, appVersionId }` |
Each `try` branch writes `AuditResult.SUCCESS`; `catch (IllegalArgumentException)` writes `AuditResult.FAILURE` with the exception message in details before returning the existing 404. Pattern matches `OutboundConnectionAdminController`.
### Flyway migration `V2__add_deployment_created_by.sql`
```sql
ALTER TABLE deployments ADD COLUMN created_by TEXT REFERENCES users(user_id);
CREATE INDEX idx_deployments_created_by ON deployments (created_by);
```
Nullable — existing rows stay `NULL` (rendered as `—` in UI). New rows always populated. No backfill: pre-V2 history is unrecoverable, and the column starts paying off from the next deploy onward.
### Service signature change
`DeploymentService.createDeployment(appId, appVersionId, envId, createdBy)` and `promote(targetAppId, sourceVersionId, targetEnvId, createdBy)` both gain a trailing `String createdBy` parameter. `PostgresDeploymentRepository` writes it to the new column.
`DeploymentController` resolves `createdBy` via the existing user-id convention: strip `"user:"` prefix from `SecurityContextHolder.getContext().getAuthentication().getName()`. Same helper pattern as `AlertRuleController` / `OutboundConnectionAdminController`.
### DTO change
`com.cameleer.server.core.runtime.Deployment` record gains `createdBy: String`. UI `Deployment` interface in `ui/src/api/queries/admin/apps.ts` gains `createdBy: string | null`.
### Log filter for the drawer
`LogQueryController.GET /api/v1/environments/{envSlug}/logs` accepts a new multi-value query param `instanceIds` (comma-split, OR-joined). Translates to `WHERE instance_id IN (...)` against the existing `LowCardinality(String)` index on `logs.instance_id` (already part of the `ORDER BY` key).
`LogSearchRequest` gains `instanceIds: List<String>` (null-normalized). Service layer adds the `IN (...)` clause when non-null and non-empty.
The drawer client computes the instance_id list from `Deployment.replicaStates`: for each replica, `instance_id = "{envSlug}-{appSlug}-{replicaIndex}-{generation}"` where generation is the first 8 chars of `deployment.id`. This is the documented format from `.claude/rules/docker-orchestration.md` — pure client-side derivation, no extra server endpoint.
## Drawer infrastructure
The design system provides `Modal` but no drawer. Building a project-local component is preferred over submitting to DS first (single consumer; easier to iterate locally).
**File:** `ui/src/components/SideDrawer.tsx` + `SideDrawer.module.css` (~120 LOC total).
**API:**
```tsx
<SideDrawer
open={!!selectedCheckpoint}
onClose={() => setSelectedCheckpoint(null)}
title={`Deployment v${version} · ${jarFilename}`}
size="lg" // 'md'=560px, 'lg'=720px, 'xl'=900px
footer={<Button onClick={handleRestore}>Restore this checkpoint</Button>}
>
{/* scrollable body */}
</SideDrawer>
```
**Behavior:**
- React portal to `document.body` (mirrors DS `Modal`).
- Slides in from right via `transform: translateX(100% → 0)` over 240ms ease-out.
- Click-blocking transparent backdrop (no dim — the parent table stays readable). Clicking outside closes.
- ESC closes.
- Focus trap on open; focus restored to trigger on close.
- Sticky header (title + close ×) and optional sticky footer.
- Body uses `overflow-y: auto`.
- All colors via DS CSS variables (`--bg`, `--border`, `--shadow-lg`).
**Unsaved-changes interaction:** Opening the drawer is unrestricted. The drawer is read-only — only Restore mutates form state, and Restore already triggers the existing unsaved-changes guard via `useUnsavedChangesBlocker`.
## Checkpoints table
**File:** `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx` — replaces `Checkpoints.tsx`.
**Columns** (left to right):
| Column | Source | Notes |
|---|---|---|
| Version | `versionMap.get(d.appVersionId).version` | Badge "v6" with auto-color (matches existing pattern) |
| JAR | `versionMap.get(d.appVersionId).jarFilename` | Monospace; truncate with tooltip on overflow |
| Deployed by | `d.createdBy` | Bare username; OIDC users show `oidc:<sub>` truncated with tooltip; null shows `—` muted |
| Deployed | `d.deployedAt` | Relative ("12m ago") + ISO subline |
| Strategy | `d.deploymentStrategy` | Small pill: "blue/green" or "rolling" |
| Outcome | `d.status` | Tinted pill: STOPPED (slate), DEGRADED (amber) |
| (chevron) | — | Visual affordance for "row click opens drawer" |
**Interaction:**
- Row click opens `CheckpointDetailDrawer` (no separate "View" button).
- No per-row Restore button — Restore lives inside the drawer to force review before action.
- Pruned-JAR rows (`!versionMap.has(d.appVersionId)`) render at 55% opacity with a strikethrough on the filename and an amber "archived — JAR pruned" hint. Row stays clickable; Restore inside the drawer is disabled with tooltip.
- Currently-running deployment is excluded (already represented by `StatusCard` above).
**Empty state:** When zero checkpoints, render a single full-width muted row: "No past deployments yet."
## Pagination
Visible cap = `Environment.jarRetentionCount` rows (newest first). Anything older has likely been pruned and is not restorable, so it's hidden by default.
- `total ≤ jarRetentionCount` → render all, no expander.
- `total > jarRetentionCount` → render newest `jarRetentionCount` rows + an expander row: **"Show older (N) — archived, postmortem only"**. Expanding renders the full list (older rows already styled as archived).
- `jarRetentionCount === 0` (unlimited or unconfigured) → fall back to a default cap of 10.
`jarRetentionCount` comes from `useEnvironments()` (already in the env-store).
## Drawer detail view
**File:** `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/index.tsx` plus three panel files: `LogsPanel.tsx`, `ConfigPanel.tsx`, `ComparePanel.tsx`.
**Header:**
- Version badge + JAR filename + outcome pill.
- Meta line: "Deployed by **{createdBy}** · {relative} ({ISO}) · Strategy: {strategy} · {N} replicas · ran for {duration}".
- Close × top-right.
**Tabs** (DS `Tabs`):
- **Logs** — default on open
- **Config** — read-only render of the live config sub-tabs, with a view-mode toggle for "Snapshot" vs "Diff vs current"
### Logs panel
Reuses `useInfiniteApplicationLogs` with the new `instanceIds` filter. The hook signature gets an optional `instanceIds: string[]` parameter that flows through to the `LogQueryController` query string.
**Filters** (in addition to `instanceIds`):
- Existing source/level multi-select pills
- New replica filter dropdown: "all (N)" / "0" / "1" / ... / "N-1" — narrows to a single replica when troubleshooting blue-green or rolling deploys.
**Default sort:** newest first (matches operator mental model when investigating a stopped deployment).
**Total line count** displayed in the filter bar.
### Config panel
Renders the five existing live config sub-tabs (`Monitoring`, `Resources`, `Variables`, `SensitiveKeys`, `Deployment`) **read-only**, hydrated from `deployedConfigSnapshot`.
Each sub-tab component (`ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/*`) gains an optional `readOnly?: boolean` prop. When `readOnly` is set:
- All inputs disabled (`disabled` attribute + visual styling)
- Save / edit buttons hidden
- Live banners (`LiveBanner`) hidden — these are not applicable to a frozen snapshot
If a sub-tab currently mixes derived state with form state in a way that makes a clean `readOnly` toggle awkward, refactor that sub-tab as part of this work. Don't proceed with leaky read-only behavior.
**View-mode toggle:** "Snapshot" / "Diff vs current". Default = Snapshot (full read-only render). Diff mode shows differences only — both old and new values per changed field, with red/green left borders, grouped by sub-tab. Each sub-tab pill shows a change-count badge (e.g. "Resources (2)"); sub-tabs with zero differences are dimmed and render a muted "No differences in this section" message when clicked.
Diff base = current live config, pulled via the existing `useApplicationConfig` hook the live form already uses. Algorithm: deep-equal field-level walk between snapshot and current.
The toggle is hidden entirely when JAR is pruned (the missing JAR makes "current vs snapshot" comparison incomplete and misleading).
**Footer:** Sticky. Single primary button "Restore this checkpoint" + helper text "Restoring hydrates the form — you'll still need to Redeploy."
When JAR is pruned: button disabled with tooltip "JAR was pruned by the environment retention policy".
Restore behavior is unchanged from today: closes the drawer + hydrates the form via the existing `onRestore(deploymentId)` callback. No backend call; the eventual Redeploy generates the next `deploy_app` audit row.
## Authorization
`DeploymentController` and `AppController` are already class-level `@PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')")`, so the deployment page is operator-gated. The new `instanceIds` filter on `LogQueryController` (which is VIEWER+) widens nothing — viewers can already query the same logs by `application + environment`; the filter just narrows.
## Real-time updates
When a new deployment lands, the previous "current" becomes a checkpoint. TanStack Query already polls deployments via the existing `useDeployments(appSlug, envSlug)` hook; the new table consumes the same data — auto-refresh comes for free.
## Tests
**Backend integration tests:**
| Test | What it asserts |
|---|---|
| `V2MigrationIT` | `created_by` column exists, FK valid, index exists |
| `DeploymentServiceCreatedByIT` | `createDeployment(...createdBy)` persists the value |
| `DeploymentControllerAuditIT` | All three lifecycle actions write the expected audit row (action, category, target, details, actor, result) including FAILURE branches |
| `LogQueryControllerInstanceIdsFilterIT` | `?instanceIds=a,b,c` returns only matching rows; empty/missing param preserves prior behavior |
**UI component tests:**
| Test | What it asserts |
|---|---|
| `SideDrawer.test.tsx` | open/close, ESC closes, backdrop click closes, focus trap |
| `CheckpointsTable.test.tsx` | row click opens drawer; pruned-JAR row dimmed + clickable; empty state |
| `CheckpointDetailDrawer.test.tsx` | renders correct logs (mocked instance_id list); Restore disabled when JAR pruned |
| `ConfigPanel.test.tsx` | snapshot mode renders all fields read-only; diff mode counts differences correctly per sub-tab; "no differences" message when section unchanged; toggle hidden when JAR pruned |
## Files touched
**Backend:**
- New: `cameleer-server-app/src/main/resources/db/migration/V2__add_deployment_created_by.sql`
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` (add `DEPLOYMENT`)
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Deployment.java` (record field)
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java` (signature + impl)
- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresDeploymentRepository.java` (insert + map)
- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java` (audit calls + createdBy resolution)
- Modified: `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java` (instanceIds param)
- Modified: `cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java` (instanceIds field)
- Regenerate: `cameleer-server-app/src/main/resources/openapi.json` (controller change → SPA types)
**UI:**
- New: `ui/src/components/SideDrawer.tsx` + `SideDrawer.module.css`
- New: `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointsTable.tsx`
- New: `ui/src/pages/AppsTab/AppDeploymentPage/CheckpointDetailDrawer/{index,LogsPanel,ConfigPanel}.tsx` (Compare is a view-mode inside ConfigPanel, not a separate file)
- Modified: `ui/src/pages/AppsTab/AppDeploymentPage/IdentitySection.tsx` (swap Checkpoints → CheckpointsTable)
- Deleted: `ui/src/pages/AppsTab/AppDeploymentPage/Checkpoints.tsx`
- Modified: `ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/{Monitoring,Resources,Variables,SensitiveKeys,Deployment}Tab.tsx` (add `readOnly?` prop)
- Modified: `ui/src/api/queries/logs.ts` (`useInfiniteApplicationLogs` accepts `instanceIds`)
- Modified: `ui/src/api/queries/admin/apps.ts` (`Deployment.createdBy` field)
- Modified: `ui/src/api/schema.d.ts` + `ui/src/api/openapi.json` (regenerated)
- Modified: `ui/src/pages/Admin/AuditLogPage.tsx` (one new category in filter dropdown)
**Docs / rules:**
- Modified: `.claude/rules/app-classes.md` (DeploymentController audit calls + LogQueryController instanceIds param)
- Modified: `.claude/rules/ui.md` (CheckpointsTable + SideDrawer pattern)
- Modified: `.claude/rules/core-classes.md` (`AuditCategory.DEPLOYMENT`, `Deployment.createdBy`)
## Rollout
Two phases, ideally two PRs:
1. **Backend phase** — V2 migration, `AuditCategory.DEPLOYMENT`, audit calls in `DeploymentController`, `created_by` plumbing through `DeploymentService` / record / repository, `LogQueryController` `instanceIds` param. Ships independently because the column is nullable, the audit category is picked up automatically, and the new log filter is opt-in.
2. **UI phase**`SideDrawer`, `CheckpointsTable`, `CheckpointDetailDrawer`, `readOnly?` props on the five config sub-tabs, audit-page dropdown entry. Depends on the backend PR being merged + the OpenAPI schema regenerated.
Splitting in this order means production gets the audit trail and `created_by` capture immediately, even before the new UI lands, so the audit gap is closed as quickly as possible.