feat(runtime): init-container loader pattern + withUsernsMode (#152 hardening close)
Tasks 9+10+11 of the init-container-jar-fetch plan, landed atomically because
9 alone leaves the orchestrator+executor referencing removed ContainerRequest
fields.
ContainerRequest (core) drops jarPath/jarVolumeName/jarVolumeMountPath; adds
appVersionId, artifactDownloadUrl, artifactExpectedSize, loaderImage.
DockerRuntimeOrchestrator (app):
- per-replica named volume "cameleer-jars-{containerName}"
- phase 1: loader container with the volume mounted RW at /app/jars,
ARTIFACT_URL + ARTIFACT_EXPECTED_SIZE env, full hardening contract
- block on waitContainerCmd().awaitStatusCode(120s); on non-zero exit
remove the loader, remove the volume, propagate RuntimeException so
DeploymentExecutor marks the deployment FAILED. main is never created.
- phase 2: main container with the same volume mounted RO at /app/jars
- withUsernsMode("host:1000:65536") on BOTH containers — closes the last
open hardening gap from issue #152
- main entrypoint paths point at /app/jars/app.jar
- extracted baseHardenedHostConfig() so loader and main share the
cap_drop / security_opt / readonly / pids / tmpfs contract
- removeContainer() also removes the per-replica volume so blue/green
doesn't leak volumes
DeploymentExecutor (app):
- injects ArtifactDownloadTokenSigner; new @Value props loaderimage,
artifacttokenttlseconds, artifactbaseurl
- replaces the temporary getVersion(...).jarPath() bridge with a signed
URL ${artifactBaseUrl}/api/v1/artifacts/{id}?exp&sig
- drops the Files.exists pre-flight check; AppVersion.jarSizeBytes is
the size-of-record check now
- drops jarDockerVolume / jarStoragePath @Value fields and the volume
plumbing in startReplica
- DeployCtx carries appVersionId / artifactUrl / artifactExpectedSize
in place of jarPath
Tests:
- DockerRuntimeOrchestratorHardeningTest updated for the new shape;
captures HostConfig on the MAIN container and asserts cap_drop ALL
+ no-new-privileges + apparmor + readonly + pids + tmpfs + the new
withUsernsMode("host:1000:65536")
- DockerRuntimeOrchestratorLoaderTest (new): verifies volume create →
loader create with RW bind → loader started → awaited → loader
removed → main create with RO bind → main started; verifies abort
+ cleanup on loader exit != 0 (loader removed, volume removed, main
NEVER created); verifies userns_mode applied to both containers.
Config:
- application.yml replaces jardockervolume with loaderimage,
artifacttokenttlseconds, artifactbaseurl
Rules updated: .claude/rules/docker-orchestration.md (loader pattern,
userns, no more bind-mount); .claude/rules/core-classes.md
(ContainerRequest field map).
Test counts after change:
- cameleer-server-core: 116/116 unit tests pass
- cameleer-server-app: 273/273 unit tests pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -25,16 +25,29 @@ When deployed via the cameleer-saas platform, this server orchestrates customer
|
||||
|
||||
## Container Hardening (issue #152)
|
||||
|
||||
`DockerRuntimeOrchestrator.startContainer` applies an unconditional hardening contract to every tenant container — Java 17 has no SecurityManager so the JVM is not a security boundary, and isolation must live below it. Defaults are fail-closed and have no opt-out:
|
||||
`DockerRuntimeOrchestrator.startContainer` applies an unconditional hardening contract to BOTH the loader init-container AND the main tenant container (`baseHardenedHostConfig()` is the shared helper). Java 17 has no SecurityManager so the JVM is not a security boundary, and isolation must live below it. Defaults are fail-closed and have no opt-out:
|
||||
|
||||
- `cap_drop` = every `Capability.values()` (effectively ALL — docker-java's enum has no `ALL` constant). Outbound TCP still works (no caps needed); raw sockets, ptrace, mounts, and bind <1024 are denied.
|
||||
- `security_opt`: `no-new-privileges:true`, `apparmor=docker-default`. Default seccomp profile is applied implicitly when `seccomp=` is absent.
|
||||
- `read_only` rootfs = true.
|
||||
- `pids_limit` = 512 (`PIDS_LIMIT` constant).
|
||||
- `tmpfs` mount: `/tmp` with `rw,nosuid,size=256m`. **No `noexec`** — Netty/tcnative, Snappy, LZ4, Zstd dlopen native libs from `/tmp` via `mmap(PROT_EXEC)` which `noexec` blocks. Issue #153 will add per-app `writeableVolumes` for stateful tenants (Kafka Streams etc.).
|
||||
- `userns_mode` = `host:1000:65536` on both loader and main. Container root is never UID 0 on the host — closes the last open hardening item from issue #152.
|
||||
|
||||
**Sandboxed runtime auto-detect**: at construction the orchestrator calls `dockerClient.infoCmd().exec().getRuntimes()` and uses `runsc` (gVisor) when present. Override with `cameleer.server.runtime.dockerruntime` (e.g. `kata` to force Kata Containers, or any other registered runtime). Empty/blank = auto. The override always wins over auto-detect. The `DockerRuntimeOrchestrator(DockerClient, String)` constructor is the canonical entry point; the single-arg constructor exists only as a convenience for tests that don't need an override.
|
||||
|
||||
## Init-Container Loader Pattern (JAR fetch)
|
||||
|
||||
`startContainer` is now a two-phase op per replica:
|
||||
|
||||
1. **Volume create** — `cameleer-jars-{containerName}` named volume (per-replica, deterministic so cleanup in `removeContainer` can derive it).
|
||||
2. **Loader container** — `loaderImage` (default `gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest`), name `{containerName}-loader`, mount the volume **RW at `/app/jars`**, env vars `ARTIFACT_URL` + `ARTIFACT_EXPECTED_SIZE`. Loader downloads the JAR from the signed URL into the volume and exits 0. Orchestrator blocks on `waitContainerCmd().exec(WaitContainerResultCallback).awaitStatusCode(120, SECONDS)`. Loader container is removed in a `finally` block; on non-zero exit the volume is also removed and `RuntimeException` propagates so `DeploymentExecutor` marks the deployment FAILED.
|
||||
3. **Main container** — same hardening contract, mount the same volume **RO at `/app/jars`**, entrypoint reads `/app/jars/app.jar` (Spring Boot/Quarkus: `-jar /app/jars/app.jar`; plain Java: `-cp /app/jars/app.jar <MainClass>`; native: `exec /app/jars/app.jar`).
|
||||
|
||||
`removeContainer(id)` derives the volume name from the inspected container name (Docker prefixes it with `/`) and removes the volume after the container removes — blue/green doesn't leak volumes.
|
||||
|
||||
`DeploymentExecutor` generates the signed URL via `ArtifactDownloadTokenSigner.sign(appVersion.id(), Duration.ofSeconds(artifactTokenTtlSeconds))` and passes `appVersion.id()`, the URL, `appVersion.jarSizeBytes()`, and the loader image into `ContainerRequest`. The host filesystem is no longer involved at deploy time.
|
||||
|
||||
## DeploymentExecutor Details
|
||||
|
||||
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
|
||||
@@ -67,7 +80,8 @@ Traffic routing is implicit: Traefik labels (`cameleer.app`, `cameleer.environme
|
||||
|
||||
- **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
|
||||
- **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
|
||||
- **Volume-based JAR mounting** for Docker-in-Docker setups: set `CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME` to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
|
||||
- **Storage abstraction**: `ArtifactStore` (in `cameleer-server-core/storage`) is the only path that touches JAR bytes. `FilesystemArtifactStore` writes under `cameleer.server.runtime.jarstoragepath` (default `/data/jars`); the orchestrator never reads the host filesystem at deploy time.
|
||||
- **Loader-fetch at deploy time**: tenant containers no longer bind-mount JARs from the host. The loader init-container streams the JAR via a signed URL (HMAC-SHA256, TTL `cameleer.server.runtime.artifacttokenttlseconds`, default 600s) into a per-replica named volume; main mounts that volume RO. This works without host-path access and is the single path supported in Docker-in-Docker SaaS deployments.
|
||||
|
||||
## Runtime Type Detection
|
||||
|
||||
|
||||
Reference in New Issue
Block a user