Files
cameleer-server/docs/handoff/2026-04-27-init-container-jar-fetch.md

122 lines
7.4 KiB
Markdown
Raw Normal View History

# Handoff — Init-Container JAR Fetch + ArtifactStore
Branch: `feature/init-container-jar-fetch`
Plan: `docs/superpowers/plans/2026-04-27-init-container-jar-fetch.md`
Worktree: `.worktrees/init-container-jar-fetch`
## What landed
19 commits replacing host-bind-mount JAR delivery with an init-container HTTP download pattern, behind a new `ArtifactStore` abstraction.
**Closed gap from issue #152:** `withUsernsMode("host:1000:65536")` is now applied to every tenant container — last open hardening item from the multi-tenant runtime issue.
**Storage migration insurance for issue #158 (Zot):** `ArtifactStore` interface in `cameleer-server-core` with a single `FilesystemArtifactStore` implementation today. Adding the OCI/Zot backend later is a single new class — no caller changes.
### Commit topology (oldest → newest)
```
cc17cdd0 feat(storage): add ArtifactCoordinates value type
435153da docs(storage): add issue #158 ref
cddf0569 feat(storage): add ArtifactStore interface
9c115f89 docs(storage): add Javadoc to ArtifactStore.exists
bc8bd590 feat(storage): add FilesystemArtifactStore
5eb07f50 fix(storage): atomic put + tolerate DirectoryNotEmptyException in delete
5238c58d refactor(storage): clean up tmp on put failure; promote import
07a2fd60 refactor(core): AppService writes via ArtifactStore; remove resolveJarPath
6b7b5ae1 docs(runtime): mark DeploymentExecutor jarPath as Task-11 bridge
4abcc610 refactor(retention): JarRetentionJob deletes via ArtifactStore
d90cd5ef test(retention): cover deployed-version-skip; preserve stack on delete failure
25bbd759 feat(web): add HMAC token signer for artifact downloads
73e06d81 test(web): cover constant-time compare path in HMAC verify
433155ae feat(web): add ArtifactDownloadController with HMAC URL auth
940bf18a refactor(web): authoritative Content-Length, typed Optional<AppVersion>
5043e1d4 feat(loader): add cameleer-runtime-loader image (busybox + entrypoint)
1ddae949 feat(runtime): init-container loader pattern + withUsernsMode (#152 close)
cc076b19 fix(runtime): pre-pull loader image, plug volume-leak windows, document network dep
0ee763ba docs(rules): document ArtifactDownloadController + storage abstraction
```
### Verification
- `cameleer-server-core` unit tests: **116/116 pass**
- `cameleer-server-app` unit tests (`-DskipITs`): **273/273 pass**
- `DockerRuntimeOrchestratorHardeningTest`: 8 cases asserting cap_drop ALL + no-new-privileges + apparmor + readonly rootfs + pids_limit + tmpfs `/tmp` (rw,nosuid,size=256m, no noexec) + **userns_mode=host:1000:65536**
- `DockerRuntimeOrchestratorLoaderTest`: 3 cases asserting volume→loader→main ordering (InOrder), abort+cleanup on loader failure, userns on both containers
- `FilesystemArtifactStoreTest`: 7 cases incl. atomic put, parent-dir sweep race tolerance, authoritative `size`
- `ArtifactDownloadTokenSigner`: 6 cases incl. constant-time same-length tamper, null/blank secret guard
- `ArtifactDownloadControllerTest`: 3 cases (200 OK with size from store, 401 with `verify(appService, never()).getVersion(any())` defence in depth, 404 via Optional.empty)
## Required before merge to main
### 1. Push the loader image to the gitea registry
```bash
cd cameleer-runtime-loader
docker build -t gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest .
docker push gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest
```
The `DeploymentExecutor` will pull this image during the new `PULL_IMAGE` stage. Without it, every deploy will fail at loader-create with an image-not-found error.
### 2. Regenerate OpenAPI schema (Task 14, deferred)
`/api/v1/artifacts/{appVersionId}` is a new public endpoint. The SPA does not call it directly (the loader container is the only consumer), so SPA compile passes without regenerating. But per `CLAUDE.md` policy, regenerate at PR time:
```bash
# Backend running on dev/staging server reachable at http://192.168.50.86:30090
cd ui && npm run generate-api:live
```
Commit the resulting `ui/src/api/openapi.json` and `ui/src/api/schema.d.ts` updates.
### 3. Optional: end-to-end integration test (Task 12 IT, deferred)
The plan called for a Testcontainers-backed end-to-end deploy test (`InitContainerDeployIT`) that drives a real Docker daemon to verify the full pipeline. Mock-based unit coverage is comprehensive (273 tests), so this was deferred to keep the autonomous run focused. If desired, add a follow-up commit that:
- Models after `cameleer-server-app/src/test/java/.../AbstractPostgresIT`
- Uploads a tiny JAR via `AppService.uploadJar`
- Triggers a deployment
- Asserts: per-replica volume created, loader exited 0, main container running, `/app/jars/app.jar` contents match input
## Configuration
New env vars (`application.yml` defaults shown):
| Env | Default | Purpose |
|---|---|---|
| `CAMELEER_SERVER_RUNTIME_LOADERIMAGE` | `gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest` | Init-container image |
| `CAMELEER_SERVER_RUNTIME_ARTIFACTTOKENTTLSECONDS` | `600` | Signed-URL TTL (10 min) |
| `CAMELEER_SERVER_RUNTIME_ARTIFACTBASEURL` | `` (falls back to `serverurl`, then `http://cameleer-server:8081`) | URL the loader uses to reach the server |
Removed: `CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME` — no longer needed (loader downloads via HTTP, not bind-mount).
`@PostConstruct` WARN logs at server startup if neither `artifactbaseurl` nor `serverurl` is set, pointing at the implicit `cameleer-server` Docker DNS dependency that only works on `cameleer-traefik`.
## Network reachability requirement
The loader container must be able to reach the Cameleer server over HTTP. In SaaS mode this works because `DockerNetworkManager` adds `cameleer-traefik` as an additional network for tenant containers, and the server is reachable on that network via the `cameleer-server` DNS alias. For non-SaaS topologies, set `CAMELEER_SERVER_RUNTIME_ARTIFACTBASEURL` to a URL the loader can reach.
## Documented but skipped
The code review of Tasks 9-11 surfaced backlog items intentionally not addressed in this branch:
- **Cache-Control on artifact responses** — content-addressed, so `immutable` would be correct. Cheap to add when a CDN is on the path.
- **`WWW-Authenticate` header on 401** — diagnostic improvement.
- **Distinguish 410 (expired) from 401 (tampered)** — diagnostic only; no security cost either way.
- **Audit interceptor coverage** — standalone-MockMvc tests skip the audit/usage interceptors. If "show me who pulled artifact X" becomes a security-review requirement, audit needs to land on `/api/v1/artifacts/**` and the test setup needs to switch to a fuller Spring slice.
- **Move `ArtifactDownloadTokenSigner` from `app/web/` to `app/security/`** — cosmetic; fits the security-primitive category.
- **`Clock` bean** — for deterministic test clocks across the codebase.
## Migration to OCI/Zot (issue #158)
The `ArtifactStore` interface + `ArtifactCoordinates.ociRef()` method are the migration insurance. When P1 security work begins (Trivy scanning + Cosign signing per issue #152), the path is:
1. Stand up Zot in the stack (single Go binary, `StatefulSet` with PVC).
2. Implement `OciArtifactStore implements ArtifactStore` in `cameleer-server-app/storage/`.
3. Dual-write for one release (write to both stores; reads still come from filesystem).
4. Cut over reads via bean swap.
5. Backfill historical `app_versions` rows.
6. Stop dual-write, decommission filesystem path.
The loader/controller/SecurityConfig don't change — they speak HTTP against whatever URL the new store advertises.