The cameleer-traefik network disables inter-container communication so app containers cannot reach each other directly — only through Traefik. Environment networks keep ICC enabled for intra-env comms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 KiB
Docker Container Orchestration Design
Goal
Make the DockerRuntimeOrchestrator fully functional: apply container configs (memory, CPU, ports, env vars) when starting containers, generate correct Traefik routing labels, support replicas, implement blue/green and rolling deployment strategies, and monitor container health via Docker event stream.
Scope
- Docker single-host only (Swarm and K8s are future
RuntimeOrchestratorimplementations) - Replicas managed by the orchestrator as independent containers
- Traefik integration for path-based and subdomain-based routing
- Docker event stream for infrastructure-level health monitoring
- UI changes for new config fields, replica management, and deployment progress
Network Topology
Three network tiers with lazy creation:
cameleer-infra — server, postgres, clickhouse (databases isolated)
cameleer-traefik — server, traefik, all app containers (ingress + agent SSE)
cameleer-env-{slug} — app containers within one environment (inter-app only)
- Server joins
cameleer-infra+cameleer-traefik - App containers join
cameleer-traefik+cameleer-env-{envSlug} - Traefik joins
cameleer-traefikonly - Databases join
cameleer-infraonly
App containers reach the server for SSE/heartbeats via the cameleer-traefik network. They never touch databases directly.
Network isolation
The cameleer-traefik network is created with inter-container communication (ICC) disabled (--opt com.docker.network.bridge.enable_icc=false). This means containers on the traefik network cannot communicate directly with each other — they can only be reached through Traefik's published ports. This prevents a compromised app in one environment from reaching apps in other environments via the shared routing network.
The cameleer-env-{slug} networks keep ICC enabled so apps within the same environment can discover and communicate with each other freely.
Network Manager
Wraps Docker network operations. ensureNetwork(name, iccEnabled) creates a bridge network if it doesn't exist (idempotent). The traefik network is created with iccEnabled=false, environment networks with iccEnabled=true. connectContainer(containerId, networkName) attaches a container to a second network. Called by DeploymentExecutor before container creation.
Configuration Model
Three-layer merge
Global defaults (application.yml)
→ Environment defaults (environments.default_container_config)
→ App overrides (apps.container_config)
App overrides environment, environment overrides global. Missing keys fall through.
Environment-level settings (defaultContainerConfig)
| Key | Type | Default | Description |
|---|---|---|---|
memoryLimitMb |
int | 512 | Default memory limit |
memoryReserveMb |
int | null | Memory reservation |
cpuShares |
int | 512 | CPU shares |
cpuLimit |
float | null | CPU core limit |
routingMode |
string | "path" |
path or subdomain |
routingDomain |
string | from global | Domain for URL generation |
serverUrl |
string | from global | Server URL for agent callbacks |
sslOffloading |
boolean | true | Traefik terminates TLS |
App-level settings (containerConfig)
| Key | Type | Default | Description |
|---|---|---|---|
memoryLimitMb |
int | from env | Override memory limit |
memoryReserveMb |
int | from env | Override memory reservation |
cpuShares |
int | from env | Override CPU shares |
cpuLimit |
float | from env | Override CPU core limit |
appPort |
int | 8080 | Main HTTP port for Traefik |
exposedPorts |
int[] | [] | Additional ports (debug, JMX) |
customEnvVars |
map | {} | App-specific environment variables |
stripPathPrefix |
boolean | true | Traefik strips /{env}/{app} prefix |
sslOffloading |
boolean | from env | Override SSL offloading |
replicas |
int | 1 | Number of container replicas |
deploymentStrategy |
string | "blue-green" |
blue-green or rolling |
ConfigMerger
Pure function: resolve(globalDefaults, envConfig, appConfig) → ResolvedContainerConfig
ResolvedContainerConfig is a typed Java record with all fields resolved to concrete values. No more scattered @Value fields in DeploymentExecutor for container-level settings.
Traefik Label Generation
TraefikLabelBuilder
Pure function: takes app slug, env slug, resolved config. Returns Map<String, String>.
Path-based routing (routingMode: "path")
Service name derived as {envSlug}-{appSlug}.
traefik.enable=true
traefik.http.routers.{svc}.rule=PathPrefix(`/{envSlug}/{appSlug}/`)
traefik.http.routers.{svc}.entrypoints=websecure
traefik.http.services.{svc}.loadbalancer.server.port={appPort}
managed-by=cameleer3-server
cameleer.app={appSlug}
cameleer.environment={envSlug}
If stripPathPrefix is true:
traefik.http.middlewares.{svc}-strip.stripprefix.prefixes=/{envSlug}/{appSlug}
traefik.http.routers.{svc}.middlewares={svc}-strip
Subdomain-based routing (routingMode: "subdomain")
traefik.http.routers.{svc}.rule=Host(`{appSlug}-{envSlug}.{routingDomain}`)
No strip-prefix needed for subdomain routing.
SSL offloading
If sslOffloading is true:
traefik.http.routers.{svc}.tls=true
traefik.http.routers.{svc}.tls.certresolver=default
If false, Traefik passes through TLS to the container (requires the app to terminate TLS itself).
Replicas
All replicas of the same app get identical Traefik labels. Traefik automatically load-balances across containers with the same service name on the same network.
Deployment Status Model
New fields on deployments table
| Column | Type | Description |
|---|---|---|
target_state |
varchar | RUNNING or STOPPED |
deployment_strategy |
varchar | BLUE_GREEN or ROLLING |
replica_states |
jsonb | Array of {index, containerId, status} |
deploy_stage |
varchar | Current stage for progress tracking (null when stable) |
Status values
STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
State transitions
Target: RUNNING
STOPPED → STARTING → RUNNING (all replicas healthy)
→ DEGRADED (some replicas healthy, some dead)
→ FAILED (zero healthy / pre-flight failed)
RUNNING → DEGRADED (replica dies)
DEGRADED → RUNNING (replica recovers via restart policy)
DEGRADED → FAILED (all dead, retries exhausted)
Target: STOPPED
RUNNING/DEGRADED → STOPPING → STOPPED (all replicas stopped and removed)
→ FAILED (couldn't stop some replicas)
Aggregate derivation
Deployment status is derived from replica states:
- All replicas
RUNNING→ deploymentRUNNING - At least one
RUNNING, someDEAD→ deploymentDEGRADED - Zero
RUNNINGafter retries → deploymentFAILED - All stopped → deployment
STOPPED
Deployment Flow
Deploy stages (tracked in deploy_stage for progress UI)
PRE_FLIGHT— Validate config, check JAR exists, verify base imagePULL_IMAGE— Pull base image if not present locallyCREATE_NETWORK— Ensure traefik and environment networks existSTART_REPLICAS— Create and start N containersHEALTH_CHECK— Wait for at least one replica to pass health checkSWAP_TRAFFIC— (blue/green) Stop old deployment replicasCOMPLETE— Mark deployment RUNNING/DEGRADED
Pre-flight checks
Before touching any running containers:
- JAR file exists on disk at the path stored in
app_versions - Base image available (pull if missing)
- Resolved config is valid (memory > 0, appPort > 0, replicas >= 1)
- No naming conflict with containers from other apps
If any check fails → deployment marked FAILED immediately, existing deployment untouched.
Blue/green strategy
- Start all new replicas alongside the old deployment
- Wait for health checks on new replicas
- If healthy: stop and remove all old replicas, mark new deployment RUNNING
- If unhealthy: remove new replicas, mark new deployment FAILED, old deployment stays
Temporarily uses 2x resources during the swap window.
Rolling strategy
- For each replica index (0..N-1): a. Stop old replica at index i (if exists) b. Start new replica at index i c. Wait for health check d. If unhealthy: stop, mark deployment FAILED, leave remaining old replicas
- After all replicas replaced, mark deployment RUNNING
Lower peak resources but slower and more complex.
Container naming
{envSlug}-{appSlug}-{replicaIndex} (e.g., staging-payment-gateway-0)
Container restart policy
on-failure with max 3 retries. Docker handles transient failures. After 3 retries exhausted, the container stays dead and DockerEventMonitor detects the permanent failure.
Environment variables injected
Base env vars (always set):
CAMELEER_EXPORT_TYPE=HTTP
CAMELEER_APPLICATION_ID={appSlug}
CAMELEER_ENVIRONMENT_ID={envSlug}
CAMELEER_DISPLAY_NAME={containerName}
CAMELEER_SERVER_URL={resolvedServerUrl}
CAMELEER_AUTH_TOKEN={bootstrapToken}
Plus all entries from customEnvVars in the resolved config.
Docker Event Monitor
DockerEventMonitor
@Component that starts a persistent Docker event stream on @PostConstruct.
- Filters for containers with label
managed-by=cameleer3-server - Listens for events:
die,oom,stop,start - On
die/oom: looks up deployment by container ID, updates replica status toDEAD, recomputes deployment status (RUNNING → DEGRADED → FAILED) - On
start: updates replica status toRUNNING(handles Docker restart policy recoveries) - Reconnects automatically if the stream drops
Interaction with agent heartbeats
- Agent heartbeats: app-level health (is the Camel context running, are routes active)
- Docker events: infrastructure-level health (is the container alive, OOM, crash)
- Both feed into the same deployment status. Docker events are faster for container crashes. Agent heartbeats catch app-level hangs where the container is alive but the app is stuck.
Database Migration
V7__deployment_orchestration.sql:
-- New status values and fields for deployments
ALTER TABLE deployments ADD COLUMN target_state VARCHAR(20) NOT NULL DEFAULT 'RUNNING';
ALTER TABLE deployments ADD COLUMN deployment_strategy VARCHAR(20) NOT NULL DEFAULT 'BLUE_GREEN';
ALTER TABLE deployments ADD COLUMN replica_states JSONB NOT NULL DEFAULT '[]';
ALTER TABLE deployments ADD COLUMN deploy_stage VARCHAR(30);
-- Backfill existing deployments
UPDATE deployments SET target_state = CASE
WHEN status = 'STOPPED' THEN 'STOPPED'
ELSE 'RUNNING'
END;
The status column remains but gains two new values: DEGRADED and STOPPING. The DeploymentStatus enum is updated to match.
UI Changes
Deployments tab — Overview
- Replicas column in deployments table: shows
{healthy}/{total}(e.g.,2/3) - Status badge updated for new states:
DEGRADED(warning color),STOPPING(auto color) - Deployment progress shown when
deploy_stageis not null — horizontal step indicator:Completed steps filled, current step highlighted, failed step red.●━━━━●━━━━●━━━━○━━━━○━━━━○ Pre- Pull Start Health Swap flight reps check traffic
Create App page — Resources tab
appPort— number input (default 8080)replicas— number input (default 1)deploymentStrategy— select: Blue/Green, Rolling (default Blue/Green)stripPathPrefix— toggle (default true)sslOffloading— toggle (default true)
Config tab — Resources sub-tab (app detail)
Same fields as create page, plus visible in read-only mode when not editing.
Environment admin page
routingMode— select: Path-based, Subdomain (default Path-based)routingDomain— text inputserverUrl— text input with placeholder showing global defaultsslOffloading— toggle (default true)
New/Modified Components Summary
Core module (cameleer3-server-core)
ResolvedContainerConfig— new record with all typed fieldsConfigMerger— pure function, three-layer mergeContainerRequest— addcpuLimit,exposedPorts,restartPolicy,additionalNetworksDeploymentStatus— addDEGRADED,STOPPINGDeployment— addtargetState,deploymentStrategy,replicaStates,deployStage
App module (cameleer3-server-app)
DockerRuntimeOrchestrator— apply full config (memory reserve, CPU limit, exposed ports, restart policy)DockerNetworkManager— new component, lazy network creation + container attachmentDockerEventMonitor— new component, persistent event stream listenerTraefikLabelBuilder— new utility, generates full Traefik label setDeploymentExecutor— rewrite deploy flow with stages, pre-flight, strategy dispatchV7__deployment_orchestration.sql— migration for new columns
UI
AppsTab.tsx— new fields in create page and config tabsEnvironmentsPage.tsx— routing and SSL fieldsDeploymentProgresscomponent — step indicator for deploy stages- Status badges updated for DEGRADED/STOPPING