Files
cameleer-server/.claude/rules/docker-orchestration.md
hsiegeln cc076b1923 fix(runtime): pre-pull loader image, plug volume-leak windows, document network dep
Pre-pull the loader image at PULL_IMAGE so the implicit pull on first
createContainerCmd doesn't bypass the 120s loader-wait timeout.

Wrap createAndStartLoader in try/catch so a create/start failure cleans
up the just-created volume; same guard around createAndStartMain on
phase-2 failures. Folds the wait-error message into the rethrown
RuntimeException so the cause chain is visible.

Add a @PostConstruct WARN when neither artifactbaseurl nor serverurl is
set so the implicit cameleer-server DNS dependency is loud at boot, and
document the loader-to-server reachability contract in
.claude/rules/docker-orchestration.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:26:35 +02:00

16 KiB
Raw Blame History

paths
paths
cameleer-server-app/**/runtime/**
cameleer-server-core/**/runtime/**
deploy/**
docker-compose*.yml
Dockerfile
docker-entrypoint.sh

Docker Orchestration

When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:

  • ConfigMerger (core/runtime/ConfigMerger.java) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes runtimeType (default "auto") and customArgs (default "").
  • TraefikLabelBuilder (app/runtime/TraefikLabelBuilder.java) — generates Traefik Docker labels for path-based (/{envSlug}/{appSlug}/) or subdomain-based ({appSlug}-{envSlug}.{domain}) routing. Supports strip-prefix and SSL offloading toggles. Per-replica identity labels: cameleer.replica (index), cameleer.generation (8-char deployment UUID prefix — pin Prometheus/Grafana deploy boundaries with this), cameleer.instance-id ({envSlug}-{appSlug}-{replicaIndex}-{generation}). Traefik router/service keys deliberately omit the generation so load balancing spans old + new replicas during a blue/green overlap. When ResolvedContainerConfig.externalRouting() is false (UI: Resources → External Routing, default true), the builder emits ONLY the identity labels (managed-by, cameleer.*) and skips every traefik.* label — the container stays on cameleer-traefik and the per-env network (so sibling containers can still reach it via Docker DNS) but is invisible to Traefik. The tls.certresolver label is emitted only when CAMELEER_SERVER_RUNTIME_CERTRESOLVER is set to a non-blank resolver name (matching a resolver configured in the Traefik static config). When unset (dev installs backed by a static TLS store) only tls=true is emitted and Traefik serves the default cert from the TLS store.
  • PrometheusLabelBuilder (app/runtime/PrometheusLabelBuilder.java) — generates Prometheus docker_sd_configs labels per resolved runtime type: Spring Boot /actuator/prometheus:8081, Quarkus/native /q/metrics:9000, plain Java /metrics:9464. Labels merged into container metadata alongside Traefik labels at deploy time.
  • DockerNetworkManager (app/runtime/DockerNetworkManager.java) — manages two Docker network tiers:
    • cameleer-traefik — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with cameleer-server DNS alias.
    • cameleer-env-{slug} — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: cameleer-env-{tenantId}-{envSlug} (overloaded envNetworkName(tenantId, envSlug) method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
  • DockerEventMonitor (app/runtime/DockerEventMonitor.java) — persistent Docker event stream listener for containers with managed-by=cameleer-server label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
  • DeploymentProgress (ui/src/components/DeploymentProgress.tsx) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
  • ContainerLogForwarder (app/runtime/ContainerLogForwarder.java) — streams Docker container stdout/stderr to ClickHouse logs table with source='container'. Uses docker logs --follow per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. DeploymentExecutor starts capture after each replica launches with the replica's instanceId ({envSlug}-{appSlug}-{replicaIndex}-{generation}); DockerEventMonitor stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same instanceId as the agent (set via CAMELEER_AGENT_INSTANCEID env var) for unified log correlation at the instance level. Instance-id changes per deployment — cross-deploy queries aggregate on application + environment (and optionally replica_index).
  • StartupLogPanel (ui/src/components/StartupLogPanel.tsx) — collapsible log panel rendered below DeploymentProgress. Queries /api/v1/logs?source=container&application={appSlug}&environment={envSlug}. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses useStartupLogs hook and LogViewer (design system).

Container Hardening (issue #152)

DockerRuntimeOrchestrator.startContainer applies an unconditional hardening contract to BOTH the loader init-container AND the main tenant container (baseHardenedHostConfig() is the shared helper). Java 17 has no SecurityManager so the JVM is not a security boundary, and isolation must live below it. Defaults are fail-closed and have no opt-out:

  • cap_drop = every Capability.values() (effectively ALL — docker-java's enum has no ALL constant). Outbound TCP still works (no caps needed); raw sockets, ptrace, mounts, and bind <1024 are denied.
  • security_opt: no-new-privileges:true, apparmor=docker-default. Default seccomp profile is applied implicitly when seccomp= is absent.
  • read_only rootfs = true.
  • pids_limit = 512 (PIDS_LIMIT constant).
  • tmpfs mount: /tmp with rw,nosuid,size=256m. No noexec — Netty/tcnative, Snappy, LZ4, Zstd dlopen native libs from /tmp via mmap(PROT_EXEC) which noexec blocks. Issue #153 will add per-app writeableVolumes for stateful tenants (Kafka Streams etc.).
  • userns_mode = host:1000:65536 on both loader and main. Container root is never UID 0 on the host — closes the last open hardening item from issue #152.

Sandboxed runtime auto-detect: at construction the orchestrator calls dockerClient.infoCmd().exec().getRuntimes() and uses runsc (gVisor) when present. Override with cameleer.server.runtime.dockerruntime (e.g. kata to force Kata Containers, or any other registered runtime). Empty/blank = auto. The override always wins over auto-detect. The DockerRuntimeOrchestrator(DockerClient, String) constructor is the canonical entry point; the single-arg constructor exists only as a convenience for tests that don't need an override.

Init-Container Loader Pattern (JAR fetch)

startContainer is now a two-phase op per replica:

  1. Volume createcameleer-jars-{containerName} named volume (per-replica, deterministic so cleanup in removeContainer can derive it).
  2. Loader containerloaderImage (default gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest), name {containerName}-loader, mount the volume RW at /app/jars, env vars ARTIFACT_URL + ARTIFACT_EXPECTED_SIZE. Loader downloads the JAR from the signed URL into the volume and exits 0. Orchestrator blocks on waitContainerCmd().exec(WaitContainerResultCallback).awaitStatusCode(120, SECONDS). Loader container is removed in a finally block; on non-zero exit the volume is also removed and RuntimeException propagates so DeploymentExecutor marks the deployment FAILED.
  3. Main container — same hardening contract, mount the same volume RO at /app/jars, entrypoint reads /app/jars/app.jar (Spring Boot/Quarkus: -jar /app/jars/app.jar; plain Java: -cp /app/jars/app.jar <MainClass>; native: exec /app/jars/app.jar).

removeContainer(id) derives the volume name from the inspected container name (Docker prefixes it with /) and removes the volume after the container removes — blue/green doesn't leak volumes.

DeploymentExecutor generates the signed URL via ArtifactDownloadTokenSigner.sign(appVersion.id(), Duration.ofSeconds(artifactTokenTtlSeconds)) and passes appVersion.id(), the URL, appVersion.jarSizeBytes(), and the loader image into ContainerRequest. The host filesystem is no longer involved at deploy time.

Loader → server reachability: the loader container hits the Cameleer server over HTTP from inside its own Docker network. The signed URL is built from cameleer.server.runtime.artifactbaseurl (preferred), falling back to cameleer.server.runtime.serverurl, falling back to http://cameleer-server:8081. The default works in SaaS mode because DockerNetworkManager adds cameleer-traefik as an additional network for tenant containers, and the server is reachable on that network via the cameleer-server DNS alias. For non-SaaS topologies (server on a different network than tenants), set CAMELEER_SERVER_RUNTIME_ARTIFACTBASEURL explicitly to a URL the loader can reach.

DeploymentExecutor Details

Primary network for app containers is set via CAMELEER_SERVER_RUNTIME_DOCKERNETWORK env var (in SaaS mode: cameleer-tenant-{slug}); apps also connect to cameleer-traefik (routing) and cameleer-env-{tenantId}-{envSlug} (per-environment discovery) as additional networks. Resolves runtimeType: auto to concrete type from AppVersion.detectedRuntimeType at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use -javaagent:/app/agent.jar -jar, plain Java uses -cp with main class, native runs binary directly). Sets per-replica CAMELEER_AGENT_INSTANCEID env var to {envSlug}-{appSlug}-{replicaIndex}-{generation} so container logs and agent logs share the same instance identity. Sets CAMELEER_AGENT_* env vars from ResolvedContainerConfig (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.

Container naming{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}, where generation is the first 8 characters of the deployment UUID. The generation suffix lets old + new replicas coexist during a blue/green swap (deterministic names without a generation used to 409). All lookups across the executor, DockerEventMonitor, and ContainerLogForwarder key on container id, not name — the name is operator-visibility only.

Strategy dispatchDeploymentStrategy.fromWire(config.deploymentStrategy()) branches the executor. Unknown values fall back to BLUE_GREEN so misconfiguration never throws at runtime.

  • Blue/green (default): start all N new replicas → wait for ALL healthy → stop the previous deployment. Resource peak ≈ 2× replicas for the health-check window. Partial health aborts with status FAILED; the previous deployment is preserved untouched (user's safety net).
  • Rolling: replace replicas one at a time — start new[i] → wait healthy → stop old[i] → next. Resource peak = replicas + 1. Mid-rollout health failure stops in-flight new containers and aborts; already-replaced old replicas are NOT restored (not reversible) but un-replaced old[i+1..N] keep serving traffic. User redeploys to recover.

Traffic routing is implicit: Traefik labels (cameleer.app, cameleer.environment) are generation-agnostic, so new replicas attract load balancing as soon as they come up healthy — no explicit swap step.

Deployment Status Model

Status Meaning
STOPPED Intentionally stopped or initial state
STARTING Deploy in progress
RUNNING All replicas healthy and serving
DEGRADED Post-deploy: a replica died after the deploy was marked RUNNING. Set by DockerEventMonitor reconciliation, never by DeploymentExecutor directly.
STOPPING Graceful shutdown in progress
FAILED Terminal failure (pre-flight, health check, or crash). Partial-healthy deploys now mark FAILED — DEGRADED is reserved for post-deploy drift.

Deploy stages (DeployStage): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage). Rolling reuses the same stage labels inside the per-replica loop; the UI progress bar shows the most recent stage.

Deployment retention: DeploymentService.createDeployment() deletes FAILED deployments for the same app+environment before creating a new one, preventing failed-attempt buildup. STOPPED deployments are preserved as restorable checkpoints — the UI Checkpoints disclosure lists every deployment with a non-null deployed_config_snapshot (RUNNING, DEGRADED, STOPPED) minus the current one.

JAR Management

  • Retention policy per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
  • Nightly cleanup job (JarRetentionJob, Spring @Scheduled 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
  • Storage abstraction: ArtifactStore (in cameleer-server-core/storage) is the only path that touches JAR bytes. FilesystemArtifactStore writes under cameleer.server.runtime.jarstoragepath (default /data/jars); the orchestrator never reads the host filesystem at deploy time.
  • Loader-fetch at deploy time: tenant containers no longer bind-mount JARs from the host. The loader init-container streams the JAR via a signed URL (HMAC-SHA256, TTL cameleer.server.runtime.artifacttokenttlseconds, default 600s) into a per-replica named volume; main mounts that volume RO. This works without host-path access and is the single path supported in Docker-in-Docker SaaS deployments.

Runtime Type Detection

The server detects the app framework from uploaded JARs and builds Docker entrypoints. The agent shaded JAR bundles the log appender, so no separate cameleer-log-appender.jar or PropertiesLauncher is needed:

  • Detection (RuntimeDetector): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes META-INF/MANIFEST.MF Main-Class: Spring Boot loader prefix -> spring-boot, Quarkus entry point -> quarkus, other Main-Class -> plain-java (extracts class name). Results stored on AppVersion (detected_runtime_type, detected_main_class).
  • Runtime types (RuntimeType enum): AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE. Configurable per app/environment via containerConfig.runtimeType (default "auto").
  • Entrypoint per type: All JVM types use java -javaagent:/app/agent.jar -jar app.jar. Plain Java uses -cp with explicit main class instead of -jar. Native runs the binary directly.
  • Custom arguments (containerConfig.customArgs): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses sh -c).
  • AUTO resolution: at deploy time (PRE_FLIGHT), "auto" resolves to the detected type from AppVersion. Fails deployment if detection was unsuccessful — user must set type explicitly.
  • UI: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.

SaaS Multi-Tenant Network Isolation

In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:

  • Tenant network (cameleer-tenant-{slug}) — primary internal bridge for all of a tenant's containers. Set as CAMELEER_SERVER_RUNTIME_DOCKERNETWORK for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
  • Shared services network — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and cameleer-traefik for HTTP routing.
  • Tenant-scoped environment networks (cameleer-env-{tenantId}-{envSlug}) — per-environment discovery is scoped per tenant, so alpha-corp's "dev" environment network is separate from beta-corp's "dev" environment network.

nginx / Reverse Proxy

  • client_max_body_size 200m is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.