feat(runtime): capture loader logs in failure exceptions; add LoaderHardeningIT regression guard
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 3m42s
CI / docker (push) Successful in 2m36s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 54s
SonarQube / sonarqube (push) Successful in 7m24s

Two diagnostics-and-confidence follow-ups to the loader-init-container pattern.

1) DockerRuntimeOrchestrator now captures the loader's last 50 lines of
   stdout/stderr (capped at 4096 chars, 5s timeout) before the finally-remove
   and appends them to the thrown RuntimeException as
   `. loader output: <text>`. Best-effort: log-capture failures are swallowed
   and never mask the original exit. Closes the visibility gap that turned a
   simple "wget: Permission denied" into the opaque "Loader exited 1".

2) New LoaderHardeningIT spins up a Testcontainers nginx serving a 1KB
   fixture, builds the loader image fresh from cameleer-runtime-loader/,
   and runs it under the exact baseHardenedHostConfig() shape (cap_drop ALL,
   readonly rootfs, /tmp tmpfs, no-new-privileges, apparmor=docker-default,
   pids=512) bound to a fresh named volume RW at /app/jars. Asserts exit 0.
   This would have caught the volume-permission regression in CI.

GenericContainer + OneShotStartupCheckStrategy is used instead of raw
docker-java waitContainerCmd because docker-java's unshaded api version
in this project's pom and testcontainers' shaded copy disagree on
WaitContainerCmd.getCondition() — going through GenericContainer keeps
the call inside testcontainers' shaded executor.

Rules doc updated to point at the captured-output behaviour and the IT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-04-27 23:51:25 +02:00
parent c2efb7fbf7
commit 2e2d069530
3 changed files with 189 additions and 3 deletions

View File

@@ -41,7 +41,7 @@ When deployed via the cameleer-saas platform, this server orchestrates customer
`startContainer` is now a two-phase op per replica:
1. **Volume create**`cameleer-jars-{containerName}` named volume (per-replica, deterministic so cleanup in `removeContainer` can derive it).
2. **Loader container**`loaderImage` (default `gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest`), name `{containerName}-loader`, mount the volume **RW at `/app/jars`**, env vars `ARTIFACT_URL` + `ARTIFACT_EXPECTED_SIZE`. Loader downloads the JAR from the signed URL into the volume and exits 0. Orchestrator blocks on `waitContainerCmd().exec(WaitContainerResultCallback).awaitStatusCode(120, SECONDS)`. Loader container is removed in a `finally` block; on non-zero exit the volume is also removed and `RuntimeException` propagates so `DeploymentExecutor` marks the deployment FAILED.
2. **Loader container**`loaderImage` (default `gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest`), name `{containerName}-loader`, mount the volume **RW at `/app/jars`**, env vars `ARTIFACT_URL` + `ARTIFACT_EXPECTED_SIZE`. Loader downloads the JAR from the signed URL into the volume and exits 0. Orchestrator blocks on `waitContainerCmd().exec(WaitContainerResultCallback).awaitStatusCode(120, SECONDS)`. Loader container is removed in a `finally` block; on non-zero exit the volume is also removed and `RuntimeException` propagates so `DeploymentExecutor` marks the deployment FAILED. **Loader logs are captured before removal** (`captureLoaderLogs``logContainerCmd` with `withTail(50)`, capped at 4096 chars, 5s timeout) and appended to the thrown `RuntimeException` message as `". loader output: <text>"`. Best-effort: log-capture failures are swallowed and don't mask the original exit. The loader image's Dockerfile pre-creates `/app/jars` owned by `loader:loader` (UID 1000) so the orchestrator's fresh named volume initialises with that ownership — without it the empty volume comes up as `root:root 0755` and wget exits 1 with "Permission denied". `LoaderHardeningIT` is the regression guard.
3. **Main container** — same hardening contract, mount the same volume **RO at `/app/jars`**, entrypoint reads `/app/jars/app.jar` (Spring Boot/Quarkus: `-jar /app/jars/app.jar`; plain Java: `-cp /app/jars/app.jar <MainClass>`; native: `exec /app/jars/app.jar`).
`removeContainer(id)` derives the volume name from the inspected container name (Docker prefixes it with `/`) and removes the volume after the container removes — blue/green doesn't leak volumes.