fix(runtime): pre-pull loader image, plug volume-leak windows, document network dep

Pre-pull the loader image at PULL_IMAGE so the implicit pull on first
createContainerCmd doesn't bypass the 120s loader-wait timeout.

Wrap createAndStartLoader in try/catch so a create/start failure cleans
up the just-created volume; same guard around createAndStartMain on
phase-2 failures. Folds the wait-error message into the rethrown
RuntimeException so the cause chain is visible.

Add a @PostConstruct WARN when neither artifactbaseurl nor serverurl is
set so the implicit cameleer-server DNS dependency is loud at boot, and
document the loader-to-server reachability contract in
.claude/rules/docker-orchestration.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-04-27 16:26:35 +02:00
parent 1ddae94930
commit cc076b1923
3 changed files with 38 additions and 4 deletions

View File

@@ -48,6 +48,14 @@ When deployed via the cameleer-saas platform, this server orchestrates customer
`DeploymentExecutor` generates the signed URL via `ArtifactDownloadTokenSigner.sign(appVersion.id(), Duration.ofSeconds(artifactTokenTtlSeconds))` and passes `appVersion.id()`, the URL, `appVersion.jarSizeBytes()`, and the loader image into `ContainerRequest`. The host filesystem is no longer involved at deploy time.
**Loader → server reachability**: the loader container hits the Cameleer server over HTTP from inside its
own Docker network. The signed URL is built from `cameleer.server.runtime.artifactbaseurl` (preferred), falling
back to `cameleer.server.runtime.serverurl`, falling back to `http://cameleer-server:8081`. The default works
in SaaS mode because `DockerNetworkManager` adds `cameleer-traefik` as an additional network for tenant
containers, and the server is reachable on that network via the `cameleer-server` DNS alias. For non-SaaS
topologies (server on a different network than tenants), set `CAMELEER_SERVER_RUNTIME_ARTIFACTBASEURL`
explicitly to a URL the loader can reach.
## DeploymentExecutor Details
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.