Un-ignore .claude/rules/ so path-scoped rule files are shared via git. Add instruction in CLAUDE.md to update rule files when modifying classes, controllers, endpoints, or metrics — keeps rules current as part of normal workflow rather than requiring separate maintenance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
77 lines
8.8 KiB
Markdown
77 lines
8.8 KiB
Markdown
---
|
|
paths:
|
|
- "cameleer-server-app/**/runtime/**"
|
|
- "cameleer-server-core/**/runtime/**"
|
|
- "deploy/**"
|
|
- "docker-compose*.yml"
|
|
- "Dockerfile"
|
|
- "docker-entrypoint.sh"
|
|
---
|
|
|
|
# Docker Orchestration
|
|
|
|
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
|
|
|
|
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
|
|
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
|
|
- **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
|
|
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
|
|
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
|
|
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
|
|
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
|
|
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
|
|
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
|
|
- **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
|
|
|
|
## DeploymentExecutor Details
|
|
|
|
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
|
|
|
|
## Deployment Status Model
|
|
|
|
| Status | Meaning |
|
|
|--------|---------|
|
|
| `STOPPED` | Intentionally stopped or initial state |
|
|
| `STARTING` | Deploy in progress |
|
|
| `RUNNING` | All replicas healthy and serving |
|
|
| `DEGRADED` | Some replicas healthy, some dead |
|
|
| `STOPPING` | Graceful shutdown in progress |
|
|
| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
|
|
|
|
**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
|
|
|
|
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
|
|
|
|
**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
|
|
|
|
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
|
|
|
|
## JAR Management
|
|
|
|
- **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
|
|
- **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
|
|
- **Volume-based JAR mounting** for Docker-in-Docker setups: set `CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME` to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
|
|
|
|
## Runtime Type Detection
|
|
|
|
The server detects the app framework from uploaded JARs and builds Docker entrypoints. The agent shaded JAR bundles the log appender, so no separate `cameleer-log-appender.jar` or `PropertiesLauncher` is needed:
|
|
|
|
- **Detection** (`RuntimeDetector`): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes `META-INF/MANIFEST.MF` Main-Class: Spring Boot loader prefix -> `spring-boot`, Quarkus entry point -> `quarkus`, other Main-Class -> `plain-java` (extracts class name). Results stored on `AppVersion` (`detected_runtime_type`, `detected_main_class`).
|
|
- **Runtime types** (`RuntimeType` enum): `AUTO`, `SPRING_BOOT`, `QUARKUS`, `PLAIN_JAVA`, `NATIVE`. Configurable per app/environment via `containerConfig.runtimeType` (default `"auto"`).
|
|
- **Entrypoint per type**: All JVM types use `java -javaagent:/app/agent.jar -jar app.jar`. Plain Java uses `-cp` with explicit main class instead of `-jar`. Native runs the binary directly.
|
|
- **Custom arguments** (`containerConfig.customArgs`): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses `sh -c`).
|
|
- **AUTO resolution**: at deploy time (PRE_FLIGHT), `"auto"` resolves to the detected type from `AppVersion`. Fails deployment if detection was unsuccessful — user must set type explicitly.
|
|
- **UI**: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.
|
|
|
|
## SaaS Multi-Tenant Network Isolation
|
|
|
|
In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:
|
|
|
|
- **Tenant network** (`cameleer-tenant-{slug}`) — primary internal bridge for all of a tenant's containers. Set as `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
|
|
- **Shared services network** — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and `cameleer-traefik` for HTTP routing.
|
|
- **Tenant-scoped environment networks** (`cameleer-env-{tenantId}-{envSlug}`) — per-environment discovery is scoped per tenant, so `alpha-corp`'s "dev" environment network is separate from `beta-corp`'s "dev" environment network.
|
|
|
|
## nginx / Reverse Proxy
|
|
|
|
- `client_max_body_size 200m` is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.
|