Files
cameleer-server/.claude/rules/docker-orchestration.md
hsiegeln 3334f0a1d2
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 3m0s
CI / docker (push) Successful in 3m26s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 45s
chore: hand cameleer-runtime-loader image build to cameleer-saas
The loader is infra glue (per-replica init container that fetches the
tenant JAR from a signed URL) — same shape as runtime-base, postgres,
clickhouse, traefik, logto images already living in cameleer-saas. Move
the source + CI build there so all sidecar/infra image builds are in
one place; cameleer-server's CI is back to building only what it owns
(server, server-ui).

Coordination: cameleer-saas@ac8d628 added the build step and copied the
source verbatim. Published tag path is unchanged
(gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest), so running
tenant servers continue pulling the same image without disruption.

This commit:
- Deletes cameleer-runtime-loader/ (Dockerfile, entrypoint.sh, README).
- Removes the conditional "Build and push runtime-loader" step and its
  upstream "Detect runtime-loader changes" detection from .gitea/workflows/ci.yml.
  Drops the fetch-depth: 0 + outputs.loader_changed plumbing that only
  existed for the change-detection path.
- Drops cameleer-runtime-loader from the in-job and cleanup-branch image
  cleanup loops — saas owns the registry lifecycle now.
- Rewrites LoaderHardeningIT to pull the published :latest from the
  registry (via Testcontainers GenericContainer) instead of building
  from a local Dockerfile. The IT now functions as a cross-repo contract
  test: cameleer-server's hardening expectations vs. the saas-published
  artifact. Local devs need `docker login gitea.siegeln.net`; CI runners
  are pre-authenticated.
- Updates .claude/rules/docker-orchestration.md to point at the new
  source-of-truth location and reframe LoaderHardeningIT as the
  cross-repo contract test.

The image's runtime contract (ARTIFACT_URL, ARTIFACT_EXPECTED_SIZE,
/app/jars/app.jar mount, exit code semantics) is unchanged. Future
contract changes need coordinated commits across both repos.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:02:54 +02:00

118 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
paths:
- "cameleer-server-app/**/runtime/**"
- "cameleer-server-core/**/runtime/**"
- "deploy/**"
- "docker-compose*.yml"
- "Dockerfile"
- "docker-entrypoint.sh"
---
# Docker Orchestration
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Per-replica identity labels: `cameleer.replica` (index), `cameleer.generation` (8-char deployment UUID prefix — pin Prometheus/Grafana deploy boundaries with this), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Traefik router/service keys deliberately omit the generation so load balancing spans old + new replicas during a blue/green overlap. When `ResolvedContainerConfig.externalRouting()` is `false` (UI: Resources → External Routing, default `true`), the builder emits ONLY the identity labels (`managed-by`, `cameleer.*`) and skips every `traefik.*` label — the container stays on `cameleer-traefik` and the per-env network (so sibling containers can still reach it via Docker DNS) but is invisible to Traefik. The `tls.certresolver` label is emitted only when `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` is set to a non-blank resolver name (matching a resolver configured in the Traefik static config). When unset (dev installs backed by a static TLS store) only `tls=true` is emitted and Traefik serves the default cert from the TLS store.
- **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level. Instance-id changes per deployment — cross-deploy queries aggregate on `application + environment` (and optionally `replica_index`).
- **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
## Container Hardening (issue #152)
`DockerRuntimeOrchestrator.startContainer` applies an unconditional hardening contract to BOTH the loader init-container AND the main tenant container (`baseHardenedHostConfig()` is the shared helper). Java 17 has no SecurityManager so the JVM is not a security boundary, and isolation must live below it. Defaults are fail-closed and have no opt-out:
- `cap_drop` = every `Capability.values()` (effectively ALL — docker-java's enum has no `ALL` constant). Outbound TCP still works (no caps needed); raw sockets, ptrace, mounts, and bind <1024 are denied.
- `security_opt`: `no-new-privileges:true`, `apparmor=docker-default`. Default seccomp profile is applied implicitly when `seccomp=` is absent.
- `read_only` rootfs = true.
- `pids_limit` = 512 (`PIDS_LIMIT` constant).
- `tmpfs` mount: `/tmp` with `rw,nosuid,size=256m`. **No `noexec`** — Netty/tcnative, Snappy, LZ4, Zstd dlopen native libs from `/tmp` via `mmap(PROT_EXEC)` which `noexec` blocks. Issue #153 will add per-app `writeableVolumes` for stateful tenants (Kafka Streams etc.).
- `userns_mode` = `host:1000:65536` on both loader and main. Container root is never UID 0 on the host — closes the last open hardening item from issue #152.
**Sandboxed runtime auto-detect**: at construction the orchestrator calls `dockerClient.infoCmd().exec().getRuntimes()` and uses `runsc` (gVisor) when present. Override with `cameleer.server.runtime.dockerruntime` (e.g. `kata` to force Kata Containers, or any other registered runtime). Empty/blank = auto. The override always wins over auto-detect. The `DockerRuntimeOrchestrator(DockerClient, String)` constructor is the canonical entry point; the single-arg constructor exists only as a convenience for tests that don't need an override.
## Init-Container Loader Pattern (JAR fetch)
`startContainer` is now a two-phase op per replica:
1. **Volume create**`cameleer-jars-{containerName}` named volume (per-replica, deterministic so cleanup in `removeContainer` can derive it).
2. **Loader container**`loaderImage` (default `gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest`, **built and published by the cameleer-saas repo** at `docker/runtime-loader/`), name `{containerName}-loader`, mount the volume **RW at `/app/jars`**, env vars `ARTIFACT_URL` + `ARTIFACT_EXPECTED_SIZE`. Loader downloads the JAR from the signed URL into the volume and exits 0. Orchestrator blocks on `waitContainerCmd().exec(WaitContainerResultCallback).awaitStatusCode(120, SECONDS)`. Loader container is removed in a `finally` block; on non-zero exit the volume is also removed and `RuntimeException` propagates so `DeploymentExecutor` marks the deployment FAILED. **Loader logs are captured before removal** (`captureLoaderLogs``logContainerCmd` with `withTail(50)`, capped at 4096 chars, 5s timeout) and appended to the thrown `RuntimeException` message as `". loader output: <text>"`. Best-effort: log-capture failures are swallowed and don't mask the original exit. The loader image's Dockerfile pre-creates `/app/jars` owned by `loader:loader` (UID 1000) so the orchestrator's fresh named volume initialises with that ownership — without it the empty volume comes up as `root:root 0755` and wget exits 1 with "Permission denied". `LoaderHardeningIT` is the cross-repo contract test (pulls the published `:latest` and asserts exit 0 under the orchestrator's hardening shape).
3. **Main container** — same hardening contract, mount the same volume **RO at `/app/jars`**, entrypoint reads `/app/jars/app.jar` (Spring Boot/Quarkus: `-jar /app/jars/app.jar`; plain Java: `-cp /app/jars/app.jar <MainClass>`; native: `exec /app/jars/app.jar`).
`removeContainer(id)` derives the volume name from the inspected container name (Docker prefixes it with `/`) and removes the volume after the container removes — blue/green doesn't leak volumes.
`DeploymentExecutor` generates the signed URL via `ArtifactDownloadTokenSigner.sign(appVersion.id(), Duration.ofSeconds(artifactTokenTtlSeconds))` and passes `appVersion.id()`, the URL, `appVersion.jarSizeBytes()`, and the loader image into `ContainerRequest`. The host filesystem is no longer involved at deploy time.
**Loader → server reachability**: the loader hits the Cameleer server from its **primary** Docker
network only (`request.network()`, set from `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK`). Additional networks
(`cameleer-traefik`, per-env) are attached by `DockerNetworkManager.connectContainer` AFTER `startContainer`
returns — by which time the loader has already exited. The loader cannot use them. The signed URL is built
from `cameleer.server.runtime.artifactbaseurl` (preferred), falling back to `cameleer.server.runtime.serverurl`,
falling back to `http://cameleer-server:8081`. The default works in SaaS mode because the tenant's primary
network (`cameleer-tenant-{slug}`) hosts the tenant's own server — same `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK`
on both. For non-SaaS topologies, set `CAMELEER_SERVER_RUNTIME_ARTIFACTBASEURL` to a URL the loader can reach
on its primary network.
## DeploymentExecutor Details
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
**Container naming**`{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 characters of the deployment UUID. The generation suffix lets old + new replicas coexist during a blue/green swap (deterministic names without a generation used to 409). All lookups across the executor, `DockerEventMonitor`, and `ContainerLogForwarder` key on container **id**, not name — the name is operator-visibility only.
**Strategy dispatch**`DeploymentStrategy.fromWire(config.deploymentStrategy())` branches the executor. Unknown values fall back to BLUE_GREEN so misconfiguration never throws at runtime.
- **Blue/green** (default): start all N new replicas → wait for ALL healthy → stop the previous deployment. Resource peak ≈ 2× replicas for the health-check window. Partial health aborts with status FAILED; the previous deployment is preserved untouched (user's safety net).
- **Rolling**: replace replicas one at a time — start new[i] → wait healthy → stop old[i] → next. Resource peak = replicas + 1. Mid-rollout health failure stops in-flight new containers and aborts; already-replaced old replicas are NOT restored (not reversible) but un-replaced old[i+1..N] keep serving traffic. User redeploys to recover.
Traffic routing is implicit: Traefik labels (`cameleer.app`, `cameleer.environment`) are generation-agnostic, so new replicas attract load balancing as soon as they come up healthy — no explicit swap step.
## Deployment Status Model
| Status | Meaning |
|--------|---------|
| `STOPPED` | Intentionally stopped or initial state |
| `STARTING` | Deploy in progress |
| `RUNNING` | All replicas healthy and serving |
| `DEGRADED` | Post-deploy: a replica died after the deploy was marked RUNNING. Set by `DockerEventMonitor` reconciliation, never by `DeploymentExecutor` directly. |
| `STOPPING` | Graceful shutdown in progress |
| `FAILED` | Terminal failure (pre-flight, health check, or crash). Partial-healthy deploys now mark FAILED — DEGRADED is reserved for post-deploy drift. |
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage). Rolling reuses the same stage labels inside the per-replica loop; the UI progress bar shows the most recent stage.
**Deployment retention**: `DeploymentService.createDeployment()` deletes FAILED deployments for the same app+environment before creating a new one, preventing failed-attempt buildup. STOPPED deployments are preserved as restorable checkpoints — the UI Checkpoints disclosure lists every deployment with a non-null `deployed_config_snapshot` (RUNNING, DEGRADED, STOPPED) minus the current one.
## JAR Management
- **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
- **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
- **Storage abstraction**: `ArtifactStore` (in `cameleer-server-core/storage`) is the only path that touches JAR bytes. `FilesystemArtifactStore` writes under `cameleer.server.runtime.jarstoragepath` (default `/data/jars`); the orchestrator never reads the host filesystem at deploy time.
- **Loader-fetch at deploy time**: tenant containers no longer bind-mount JARs from the host. The loader init-container streams the JAR via a signed URL (HMAC-SHA256, TTL `cameleer.server.runtime.artifacttokenttlseconds`, default 600s) into a per-replica named volume; main mounts that volume RO. This works without host-path access and is the single path supported in Docker-in-Docker SaaS deployments.
## Runtime Type Detection
The server detects the app framework from uploaded JARs and builds Docker entrypoints. The agent shaded JAR bundles the log appender, so no separate `cameleer-log-appender.jar` or `PropertiesLauncher` is needed:
- **Detection** (`RuntimeDetector`): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes `META-INF/MANIFEST.MF` Main-Class: Spring Boot loader prefix -> `spring-boot`, Quarkus entry point -> `quarkus`, other Main-Class -> `plain-java` (extracts class name). Results stored on `AppVersion` (`detected_runtime_type`, `detected_main_class`).
- **Runtime types** (`RuntimeType` enum): `AUTO`, `SPRING_BOOT`, `QUARKUS`, `PLAIN_JAVA`, `NATIVE`. Configurable per app/environment via `containerConfig.runtimeType` (default `"auto"`).
- **Entrypoint per type**: All JVM types use `java -javaagent:/app/agent.jar -jar app.jar`. Plain Java uses `-cp` with explicit main class instead of `-jar`. Native runs the binary directly.
- **Custom arguments** (`containerConfig.customArgs`): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses `sh -c`).
- **AUTO resolution**: at deploy time (PRE_FLIGHT), `"auto"` resolves to the detected type from `AppVersion`. Fails deployment if detection was unsuccessful — user must set type explicitly.
- **UI**: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.
## SaaS Multi-Tenant Network Isolation
In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:
- **Tenant network** (`cameleer-tenant-{slug}`) — primary internal bridge for all of a tenant's containers. Set as `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
- **Shared services network** — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and `cameleer-traefik` for HTTP routing.
- **Tenant-scoped environment networks** (`cameleer-env-{tenantId}-{envSlug}`) — per-environment discovery is scoped per tenant, so `alpha-corp`'s "dev" environment network is separate from `beta-corp`'s "dev" environment network.
## nginx / Reverse Proxy
- `client_max_body_size 200m` is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.