Commit Graph

1581 Commits

Author SHA1 Message Date
hsiegeln
e00848dc65 refactor(ui): drawer replica filter uses DS Select 2026-04-23 16:13:54 +02:00
hsiegeln
f31975e0ef feat(ui): checkpoints table collapsible, default collapsed 2026-04-23 16:09:28 +02:00
hsiegeln
2c0cf7dc9c fix(ui): StartupLogPanel — defensive scrollTo + disable buttons while fetching 2026-04-23 16:05:35 +02:00
hsiegeln
fb7b15f539 feat(ui): startup logs — sort toggle + refresh button + desc default 2026-04-23 16:00:44 +02:00
hsiegeln
1d7009d69c feat(ui): useStartupLogs accepts sort parameter (default desc) 2026-04-23 15:58:02 +02:00
hsiegeln
99a91a57be feat(ui): wire JAR upload progress into the primary action button 2026-04-23 15:54:23 +02:00
hsiegeln
427988bcc8 feat(ui): PrimaryActionButton gains uploading mode + progress overlay 2026-04-23 15:49:27 +02:00
hsiegeln
a208f2eec7 feat(ui): useUploadJar uses XHR and exposes onProgress 2026-04-23 15:44:50 +02:00
hsiegeln
13f218d522 docs(plan): deployment page polish (9 TDD tasks)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:42:06 +02:00
hsiegeln
900fba5af6 docs(spec): deployment page polish (upload-in-button, sort/refresh, collapsible checkpoints, DS Select, tab reorder)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:57 +02:00
hsiegeln
b3d1dd377d ui(deploy): hide CheckpointsTable when no past deployments exist
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:34:09 +02:00
hsiegeln
e36c82c4db test(deploy): scope schema ITs to current_schema + clear deployments FK in teardown
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m59s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Surface from the Task 0 testcontainers.reuse enable: when the same Postgres
container is reused across `mvn verify` runs, Flyway migrates both `public`
and `tenant_default` schemas (the app.yml default URL uses
?currentSchema=tenant_default; AbstractPostgresIT overrides to public).
Schema-introspection assertions saw duplicate rows/indexes/enums.

Plus: OutboundConnectionAdminControllerIT's AfterEach couldn't delete its
test users because sibling deployment ITs (Task 4) left deployments.created_by
references — FK blocks the DELETE. Clear referencing deployments first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:06:56 +02:00
hsiegeln
d192f6b57c docs(rules): deployment audit + checkpoints table + SideDrawer + log instanceIds
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:51:22 +02:00
hsiegeln
fe1681e6e8 ui(audit): surface DEPLOYMENT category in admin filter dropdown 2026-04-23 13:49:31 +02:00
hsiegeln
571f85cd0f feat(ui): wire CheckpointsTable + Drawer into IdentitySection (delete old Checkpoints)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:46:31 +02:00
hsiegeln
25d2a3014a refactor(ui): DiffView CSS module + drop duplicate snapshot type 2026-04-23 13:43:15 +02:00
hsiegeln
1a97e2146e feat(ui): ConfigPanel snapshot+diff modes; extract snapshotToForm helper
- Extract inline handleRestore mapping into snapshotToForm(snapshot, defaults) helper
- Export defaultForm from useDeploymentPageState for use in ConfigPanel
- Replace ConfigPanel stub with real read-only snapshot renderer + Snapshot/Diff toggle
- Add fieldDiff deep-equal field-walk helper with nested object + array support
- Forward optional currentForm prop through CheckpointDetailDrawer to ConfigPanel
- 13 new tests across diff.test.ts, snapshotToForm.test.ts, ConfigPanel.test.tsx (all pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:38:22 +02:00
hsiegeln
d1150e5dd8 refactor(ui): drawer CSS module + narrow LogsPanel memo deps
Extract 14 inline style blocks from CheckpointDetailDrawer index.tsx and
LogsPanel.tsx into a shared CSS module using DS CSS variables throughout.
Narrow the LogsPanel useMemo dep array from the full deployment object to
deployment.id + deployment.replicaStates to prevent spurious query
invalidation on every TanStack Query poll.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:30:48 +02:00
hsiegeln
b0995d84bc feat(ui): CheckpointDetailDrawer container + LogsPanel
Adds the CheckpointDetailDrawer with Logs/Config tabs. LogsPanel scopes
logs to a deployment's replicas via instanceIds derived from replicaStates
+ generation suffix. Stub ConfigPanel placeholder for Task 11.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:25:55 +02:00
hsiegeln
9756a20223 fix(ui): dim archived checkpoint rows + safer outcome class lookup + cleaner cap 2026-04-23 13:19:06 +02:00
hsiegeln
1b4b522233 feat(ui): CheckpointsTable component (replaces row list)
Full-width table with Version / JAR / Deployed-by / Deployed / Strategy /
Outcome columns, pagination cap (jarRetentionCount, default 10), pruned-JAR
archived state, empty state, and row-click onSelect handler. 8/8 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:15:30 +02:00
hsiegeln
48217e0034 test(deploy): contract test — ConfigTabs disabled gates all inputs 2026-04-23 13:10:17 +02:00
hsiegeln
c3ecff9d45 feat(ui): add SideDrawer component (project-local)
Right-sliding panel with portal, ESC + backdrop close, sticky header/footer,
three width sizes (md/lg/xl), transparent click-blocking backdrop, and DS token colors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:05:36 +02:00
hsiegeln
07099357af chore(api): regenerate UI types — Deployment.createdBy + logs instanceIds
- Fetched fresh openapi.json from local backend (Tasks 3-5 changes)
- Regenerated schema.d.ts via openapi-typescript
- Added createdBy: string | null to Deployment interface in apps.ts
- Added instanceIds?: string[] to UseInfiniteApplicationLogsArgs with sort/serialize/queryKey/URLSearchParams wiring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 13:00:16 +02:00
hsiegeln
ed0e616109 refactor(logs): drop dead null guards on instanceIds filter (record normalizes) 2026-04-23 12:52:18 +02:00
hsiegeln
382e1801a7 feat(logs): add instanceIds multi-value filter to /logs endpoint
Adds List<String> instanceIds to LogSearchRequest (null-normalized to
List.of() in compact ctor) and generates an IN clause in both
ClickHouseLogStore.search() and countLogs(), mirroring the existing
sources pattern. LogQueryController parses ?instanceIds= as a
comma-split list. All existing LogSearchRequest call sites updated.
New ClickHouseLogStoreInstanceIdsIT covers: multi-value filter, empty
filter (all rows), null filter (all rows), single-value filter, and
coexistence with the singular instanceId field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 12:41:09 +02:00
hsiegeln
2312a7304d fix(deploy): widen promote FAILURE audit detail + clean up test envs 2026-04-23 12:29:46 +02:00
hsiegeln
47d5611462 feat(audit): audit deploy/stop/promote with DEPLOYMENT category
Wires AuditService and AppVersionRepository into DeploymentController.
Replaces null createdBy placeholder with currentUserId() on createDeployment/promote.
Adds audit log entries (SUCCESS + FAILURE) for deploy_app, stop_deployment,
and promote_deployment actions. Fixes FK violations in affected ITs by
seeding the test-operator and alice users into the users table before deploy calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 12:24:27 +02:00
hsiegeln
9043dc00b0 test(deploy): clean up seeded users + document null createdBy placeholder
Fix Issue 1: Add @AfterEach cleanup for alice/bob users in PostgresDeploymentRepositoryCreatedByIT to prevent test leakage (FK order: deployments -> app_versions -> apps, then users).

Fix Issue 2: Add comment at first create(..., null) call site in PostgresDeploymentRepositoryIT documenting the null placeholder for pre-V4 rows where createdBy is nullable.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-23 12:10:21 +02:00
hsiegeln
a141e99a07 feat(deploy): cascade createdBy through Deployment record + service + repo
Appends String createdBy to the Deployment record (after createdAt), updates
both with-er methods to pass it through, threads the parameter through
DeploymentRepository.create, DeploymentService.createDeployment/promote, and
PostgresDeploymentRepository (INSERT + SELECT_COLS + mapRow). DeploymentController
passes null as placeholder (Task 4 will resolve from SecurityContextHolder).
Covers with PostgresDeploymentRepositoryCreatedByIT verifying round-trip via
both createDeployment and promote.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 12:04:15 +02:00
hsiegeln
15d00f039c feat(audit): add DEPLOYMENT audit category 2026-04-23 11:51:28 +02:00
hsiegeln
064c302073 docs(plan): V2 → V4 migration filename (V2/V3 already taken) 2026-04-23 11:49:12 +02:00
hsiegeln
35748ea7a1 feat(deploy): V4 migration — add created_by to deployments 2026-04-23 11:44:05 +02:00
hsiegeln
e558494f8d plan(deploy): checkpoints table redesign + audit gap
15 tasks across 5 phases (backend foundation → SideDrawer →
ConfigTabs readOnly → CheckpointsTable + DetailDrawer → polish).
TDD throughout with per-task commits. Backend phase ships
independently to close the audit gap as quickly as possible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:39:11 +02:00
hsiegeln
1f0ab002d6 spec(deploy): checkpoints table redesign + deployment audit gap
Replaces the cramped Checkpoints disclosure with a real DataTable + a
side drawer (Logs / Config with snapshot/diff modes) and closes the
audit-log gap discovered in DeploymentController (deploy/stop/promote
currently make zero auditService.log calls).

Cap visible checkpoints at Environment.jarRetentionCount — beyond that,
JARs are pruned and rows aren't restorable. Logs scoped per-deployment
via instance_id IN (...) computed from replicaStates (no time window
needed). Compare folded into Config as a view-mode toggle. Two-phase
rollout (backend ships first to close the audit gap immediately).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:31:50 +02:00
hsiegeln
242ef1f0af perf(build): faster Maven + UI + CI pipelines
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m43s
CI / docker (push) Successful in 4m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Maven: enable useIncrementalCompilation; Surefire forkCount=1C +
  reuseForks=true so unit-test JVMs are reused per CPU core instead of
  spawning per class (205 tests pass under the new strategy).
- Testcontainers: opt-in reuse via .withReuse(true) on Postgres +
  ClickHouse base; per-developer enable via ~/.testcontainers.properties.
- UI: drop redundant `tsc --noEmit` from `npm run build` (Vite already
  type-checks); split into a dedicated `npm run typecheck` script.
- CI: cache ~/.npm and ui/node_modules/.vite alongside Maven; npm ci with
  --prefer-offline --no-audit --fund=false; paths-ignore for docs-only,
  .planning/ and .claude/ changes so doc-only pushes skip the pipeline.
- Docs: CLAUDE.md + .claude/rules/cicd.md updated with the new build
  knobs and the Testcontainers reuse opt-in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:48:34 +02:00
hsiegeln
c6aef5ab35 fix(deploy): Checkpoints — preserve STOPPED history, fix filter + placement
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m4s
CI / docker (push) Successful in 1m15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment.
  STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty.
  Now only FAILED rows are pruned; STOPPED deployments are retained as restorable
  checkpoints (they still carry deployed_config_snapshot from their RUNNING window).
- UI filter: any deployment with a snapshot is a checkpoint (was RUNNING|DEGRADED only,
  which excluded the main case — the previous blue/green deployment now in STOPPED).
- UI placement: Checkpoints disclosure now renders inside IdentitySection, matching
  the design spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:26:46 +02:00
hsiegeln
007597715a docs(rules): deployment strategies + generation suffix
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m8s
CI / docker (push) Successful in 1m30s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 46s
Refresh the three rules files to match the new executor behavior:

- docker-orchestration.md: rewrite DeploymentExecutor Details with
  container naming scheme ({...}-{replica}-{generation}), strategy
  dispatch (blue-green vs rolling), and the new DEGRADED semantics
  (post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder
  bullets for the generation suffix + new cameleer.generation label.
- app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets
  mirror the same.
- core-classes.md: add DeploymentStrategy enum; note DEGRADED is now
  post-deploy-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:02:51 +02:00
hsiegeln
b6e54db6ec ui(deploy): strategy hint on Resources tab + indicator on StatusCard
Resources tab: add a hint under the Deploy Strategy dropdown that
explains the blue-green vs rolling trade-off (resource peak, failure
semantics), switching text based on the current selection.

StatusCard: show the active deployment's strategy inline in the info
grid so users can tell at a glance which path was taken for a given
deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:00:44 +02:00
hsiegeln
e9f523f2b8 test(deploy): blue-green + rolling strategy ITs
Four ITs covering strategy behavior:
- BlueGreenStrategyIT#blueGreen_allHealthy_stopsOldAfterNew:
  old is stopped only after all new replicas are healthy.
- BlueGreenStrategyIT#blueGreen_partialHealthy_preservesOldAndMarksFailed:
  strict all-healthy — one starting replica aborts the deploy and
  leaves the previous deployment RUNNING untouched.
- RollingStrategyIT#rolling_allHealthy_replacesOneByOne:
  InOrder on stopContainer confirms old-0 stops before old-1 (the
  interleaving that distinguishes rolling from blue-green).
- RollingStrategyIT#rolling_failsMidRollout_preservesRemainingOld:
  mid-rollout health failure stops only the in-flight new containers
  and the already-replaced old-0; old-1 stays untouched.

Shortens healthchecktimeout to 2s via @TestPropertySource so failure
paths complete in ~25s instead of ~60s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:00:00 +02:00
hsiegeln
653f983a08 deploy: rolling strategy (per-replica replacement)
Replace the Phase 3 stub with a working rolling implementation.

Flow:
- Capture previous deployment's per-index container ids up front.
- For i = 0..replicas-1:
  - Start new[i] (gen-suffixed name, coexists with old[i]).
  - Wait for new[i] healthy (new waitForOneHealthy helper).
  - On success: stop old[i] if present, continue.
  - On failure: stop in-flight new[0..i], leave un-replaced old[i+1..N]
    running, mark FAILED. Already-replaced old replicas are not
    restored — rolling is not reversible; user redeploys to recover.
- After the loop: sweep any leftover old replicas (when replica count
  shrank) and mark the old deployment STOPPED.

Resource peak: replicas + 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:53:52 +02:00
hsiegeln
459cdfe427 deploy: blue-green strategy (start → health-all → stop old)
Phase 3 of deployment-strategies plan. Refactor executeAsync to
dispatch on DeploymentStrategy.fromWire(config.deploymentStrategy()).

Blue-green (default):
- Start all N new replicas (gen-suffixed names coexist with old).
- Wait for ALL healthy (strict — partial-healthy = FAILED, preserves
  previous deployment untouched).
- Only then find + stop the previous deployment.
- Final status is always RUNNING; DEGRADED is now reserved for
  post-deploy replica crashes (set by DockerEventMonitor).

Rolling: stub — throws UnsupportedOperationException for now, gets
its real implementation in Phase 4.

Refactor details:
- Extract DeployCtx record to carry 13 per-deploy values around.
- Extract startReplica(ctx, i, stateOut) — shared by both strategy paths.
- Extract persistSnapshotAndMarkRunning(ctx, primaryCid) — shared finalizer.
- Rename waitForAnyHealthy → waitForAllHealthy (the name was misleading;
  the method already waited for all, just returned partial on timeout).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:51:24 +02:00
hsiegeln
652346dcd4 deploy: gen-suffixed container names + cameleer.generation label
Append an 8-char generation id (first 8 chars of deployment UUID) to:
- container name: {tenant}-{env}-{app}-{replica}-{gen}
- CAMELEER_AGENT_INSTANCEID (so old+new agents are distinct in the registry)
- Traefik cameleer.instance-id label

And emit a new standalone cameleer.generation label so dashboards
(Prometheus/Grafana) can pin deploy boundaries without regex on
instance-id.

Strategy branching comes next — this commit is foundation only; the
interim destroy-then-start flow still runs regardless of strategy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:45:44 +02:00
hsiegeln
5304c8ee01 core(deploy): DeploymentStrategy enum with safe wire conversion
Typed enum (BLUE_GREEN, ROLLING) with fromWire/toWire kebab-case
translation. fromWire falls back to BLUE_GREEN for unknown or null
input so the executor dispatch site never null-checks and no
misconfigured container-config can throw at runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:42:35 +02:00
hsiegeln
2c82f29aef docs(plans): deployment strategies (blue-green + rolling) plan
7-phase plan to replace the interim destroy-then-start flow (f8dccaae)
with a strategy-aware executor. Adds gen-suffixed container names so
old + new replicas can coexist, plus a cameleer.generation label for
Prometheus/Grafana deploy-boundary annotations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:41:43 +02:00
hsiegeln
4371372a26 ui(admin): solid env-colored circle in place of name-hash Avatar
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m7s
CI / docker (push) Successful in 1m21s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
SonarQube / sonarqube (push) Successful in 6m8s
Previous ring approach was too subtle against most env colors. Replace
the DS Avatar with a purpose-built circle rendered in the environment's
chosen color, showing 1–2 letter initials in white. Fills the full
circle so the color reads at a glance from across the list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 01:02:10 +02:00
hsiegeln
f8dccaae2b fix(deploy): stop previous active deployment before START_REPLICAS (fixes 409)
Container names are deterministic: {tenant}-{envSlug}-{appSlug}-{replica}.
The prior code did the stop-existing step at SWAP_TRAFFIC, *after*
START_REPLICAS had already tried to create containers with the same
names — so a redeploy against a RUNNING app consistently failed with
Docker 409 "container name already in use".

Move the stop-existing block to run right after CREATE_NETWORK and
before START_REPLICAS. SWAP_TRAFFIC becomes a label-only marker (traffic
is swapped implicitly by Traefik labels once new replicas are healthy).

Also: add `findActiveByAppIdAndEnvironmentIdExcluding` so the SQL
excludes the current deployment by id — previously the Java-side
`!id.equals(me)` guard failed because the newly-inserted row has
status=STARTING (DB default) and ORDER BY created_at DESC LIMIT 1
picked the new row, hiding the actual previous deployment.

Trade-off: this is destroy-then-start rather than true blue/green —
brief downtime during the swap. Matches the pre-unified-page behavior
and is what users reasonably expect. True blue/green would require
per-deployment container names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 01:01:00 +02:00
hsiegeln
9ecc9ee72a ui(deploy): pending-deploy badge + Start/Stop in page header
1. Add a 'Pending deploy' Badge next to the app name when there are
   local edits or the saved state differs from the last deploy. Makes
   the undeployed-changes state visible even when the user isn't looking
   at the tab asterisks.

2. Move Start/Stop buttons from StatusCard into the page header, next
   to Delete. Runs off the latest deployment's status — Stop when
   RUNNING/STARTING/DEGRADED, Start (triggers a redeploy of the last
   version) when STOPPED. DeploymentTab and StatusCard shed their
   onStop/onStart props.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 00:51:11 +02:00
hsiegeln
9c54313ff1 ui(deploy): surface deployment failure reason in StatusCard
DeploymentExecutor already persists errorMessage on FAILED transitions
but the UI never rendered it — users saw "FAILED" with no explanation.
Add a bordered error block above the action row when a deployment is
FAILED, preserving whitespace and wrapping long Docker error bodies
(e.g. 409 conflict JSON).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 00:49:29 +02:00
hsiegeln
e5eb48b0fa ui(admin): env-colored ring on environment avatars
Wrap Avatar in a span with box-shadow outline in the environment's
chosen color (slate/red/amber/green/teal/blue/purple/pink). Applied to
both the list row and the detail header. Keeps the Avatar's name-hash
interior so initials remain distinguishable; the ring just signals
which env you're looking at at a glance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 00:48:51 +02:00