Files
cameleer-server/.claude/rules/docker-orchestration.md
hsiegeln 810f493639
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 5m22s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
chore: track .claude/rules/ and add self-maintenance instruction
Un-ignore .claude/rules/ so path-scoped rule files are shared via git.
Add instruction in CLAUDE.md to update rule files when modifying classes,
controllers, endpoints, or metrics — keeps rules current as part of
normal workflow rather than requiring separate maintenance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:26:53 +02:00

8.8 KiB

paths
paths
cameleer-server-app/**/runtime/**
cameleer-server-core/**/runtime/**
deploy/**
docker-compose*.yml
Dockerfile
docker-entrypoint.sh

Docker Orchestration

When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:

  • ConfigMerger (core/runtime/ConfigMerger.java) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes runtimeType (default "auto") and customArgs (default "").
  • TraefikLabelBuilder (app/runtime/TraefikLabelBuilder.java) — generates Traefik Docker labels for path-based (/{envSlug}/{appSlug}/) or subdomain-based ({appSlug}-{envSlug}.{domain}) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: cameleer.replica (index) and cameleer.instance-id ({envSlug}-{appSlug}-{replicaIndex}). Internal processing uses labels (not container name parsing) for extensibility.
  • PrometheusLabelBuilder (app/runtime/PrometheusLabelBuilder.java) — generates Prometheus docker_sd_configs labels per resolved runtime type: Spring Boot /actuator/prometheus:8081, Quarkus/native /q/metrics:9000, plain Java /metrics:9464. Labels merged into container metadata alongside Traefik labels at deploy time.
  • DockerNetworkManager (app/runtime/DockerNetworkManager.java) — manages two Docker network tiers:
    • cameleer-traefik — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with cameleer-server DNS alias.
    • cameleer-env-{slug} — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: cameleer-env-{tenantId}-{envSlug} (overloaded envNetworkName(tenantId, envSlug) method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
  • DockerEventMonitor (app/runtime/DockerEventMonitor.java) — persistent Docker event stream listener for containers with managed-by=cameleer-server label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
  • DeploymentProgress (ui/src/components/DeploymentProgress.tsx) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
  • ContainerLogForwarder (app/runtime/ContainerLogForwarder.java) — streams Docker container stdout/stderr to ClickHouse logs table with source='container'. Uses docker logs --follow per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. DeploymentExecutor starts capture after each replica launches with the replica's instanceId ({envSlug}-{appSlug}-{replicaIndex}); DockerEventMonitor stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same instanceId as the agent (set via CAMELEER_AGENT_INSTANCEID env var) for unified log correlation at the instance level.
  • StartupLogPanel (ui/src/components/StartupLogPanel.tsx) — collapsible log panel rendered below DeploymentProgress. Queries /api/v1/logs?source=container&application={appSlug}&environment={envSlug}. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses useStartupLogs hook and LogViewer (design system).

DeploymentExecutor Details

Primary network for app containers is set via CAMELEER_SERVER_RUNTIME_DOCKERNETWORK env var (in SaaS mode: cameleer-tenant-{slug}); apps also connect to cameleer-traefik (routing) and cameleer-env-{tenantId}-{envSlug} (per-environment discovery) as additional networks. Resolves runtimeType: auto to concrete type from AppVersion.detectedRuntimeType at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use -javaagent:/app/agent.jar -jar, plain Java uses -cp with main class, native runs binary directly). Sets per-replica CAMELEER_AGENT_INSTANCEID env var to {envSlug}-{appSlug}-{replicaIndex} so container logs and agent logs share the same instance identity. Sets CAMELEER_AGENT_* env vars from ResolvedContainerConfig (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.

Deployment Status Model

Status Meaning
STOPPED Intentionally stopped or initial state
STARTING Deploy in progress
RUNNING All replicas healthy and serving
DEGRADED Some replicas healthy, some dead
STOPPING Graceful shutdown in progress
FAILED Terminal failure (pre-flight, health check, or crash)

Replica support: deployments can specify a replica count. DEGRADED is used when at least one but not all replicas are healthy.

Deploy stages (DeployStage): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).

Blue/green strategy: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.

Deployment uniqueness: DeploymentService.createDeployment() deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.

JAR Management

  • Retention policy per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
  • Nightly cleanup job (JarRetentionJob, Spring @Scheduled 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
  • Volume-based JAR mounting for Docker-in-Docker setups: set CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).

Runtime Type Detection

The server detects the app framework from uploaded JARs and builds Docker entrypoints. The agent shaded JAR bundles the log appender, so no separate cameleer-log-appender.jar or PropertiesLauncher is needed:

  • Detection (RuntimeDetector): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes META-INF/MANIFEST.MF Main-Class: Spring Boot loader prefix -> spring-boot, Quarkus entry point -> quarkus, other Main-Class -> plain-java (extracts class name). Results stored on AppVersion (detected_runtime_type, detected_main_class).
  • Runtime types (RuntimeType enum): AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE. Configurable per app/environment via containerConfig.runtimeType (default "auto").
  • Entrypoint per type: All JVM types use java -javaagent:/app/agent.jar -jar app.jar. Plain Java uses -cp with explicit main class instead of -jar. Native runs the binary directly.
  • Custom arguments (containerConfig.customArgs): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses sh -c).
  • AUTO resolution: at deploy time (PRE_FLIGHT), "auto" resolves to the detected type from AppVersion. Fails deployment if detection was unsuccessful — user must set type explicitly.
  • UI: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.

SaaS Multi-Tenant Network Isolation

In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:

  • Tenant network (cameleer-tenant-{slug}) — primary internal bridge for all of a tenant's containers. Set as CAMELEER_SERVER_RUNTIME_DOCKERNETWORK for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
  • Shared services network — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and cameleer-traefik for HTTP routing.
  • Tenant-scoped environment networks (cameleer-env-{tenantId}-{envSlug}) — per-environment discovery is scoped per tenant, so alpha-corp's "dev" environment network is separate from beta-corp's "dev" environment network.

nginx / Reverse Proxy

  • client_max_body_size 200m is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.