Compare commits
8 Commits
4371372a26
...
007597715a
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
007597715a | ||
|
|
b6e54db6ec | ||
|
|
e9f523f2b8 | ||
|
|
653f983a08 | ||
|
|
459cdfe427 | ||
|
|
652346dcd4 | ||
|
|
5304c8ee01 | ||
|
|
2c82f29aef |
@@ -118,10 +118,10 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
|
|||||||
## runtime/ — Docker orchestration
|
## runtime/ — Docker orchestration
|
||||||
|
|
||||||
- `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
|
- `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
|
||||||
- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}` (globally unique on Docker daemon). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}`.
|
- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 chars of the deployment UUID — old and new replicas coexist during a blue/green swap. Per-replica `CAMELEER_AGENT_INSTANCEID` env var is `{envSlug}-{appSlug}-{replicaIndex}-{generation}`. Branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`: **blue-green** (default) starts all N → waits for all healthy → stops old (partial health = FAILED, preserves old untouched); **rolling** replaces replicas one at a time with rollback only for in-flight new containers (already-replaced old stay stopped; un-replaced old keep serving). DEGRADED is now only set by `DockerEventMonitor` post-deploy, never by the executor.
|
||||||
- `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
|
- `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
|
||||||
- `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
|
- `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
|
||||||
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
|
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Per-container identity labels: `cameleer.replica` (index), `cameleer.generation` (deployment-scoped 8-char id — for Prometheus/Grafana deploy-boundary annotations), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Router/service label keys are generation-agnostic so load balancing spans old + new replicas during a blue/green overlap.
|
||||||
- `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
|
- `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
|
||||||
- `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
|
- `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
|
||||||
- `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
|
- `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
|
||||||
|
|||||||
@@ -29,8 +29,9 @@ paths:
|
|||||||
- `Environment` — record: id, slug, displayName, production, enabled, defaultContainerConfig, jarRetentionCount, color, createdAt. `color` is one of the 8 preset palette values validated by `EnvironmentColor.VALUES` and CHECK-constrained in PostgreSQL (V2 migration).
|
- `Environment` — record: id, slug, displayName, production, enabled, defaultContainerConfig, jarRetentionCount, color, createdAt. `color` is one of the 8 preset palette values validated by `EnvironmentColor.VALUES` and CHECK-constrained in PostgreSQL (V2 migration).
|
||||||
- `EnvironmentColor` — constants: `DEFAULT = "slate"`, `VALUES = {slate,red,amber,green,teal,blue,purple,pink}`, `isValid(String)`.
|
- `EnvironmentColor` — constants: `DEFAULT = "slate"`, `VALUES = {slate,red,amber,green,teal,blue,purple,pink}`, `isValid(String)`.
|
||||||
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
|
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
|
||||||
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
|
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED. `DEGRADED` is reserved for post-deploy drift (a replica died after RUNNING); `DeploymentExecutor` now marks partial-healthy deploys FAILED, not DEGRADED.
|
||||||
- `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
|
- `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
|
||||||
|
- `DeploymentStrategy` — enum: BLUE_GREEN, ROLLING. Stored on `ResolvedContainerConfig.deploymentStrategy` as kebab-case string (`"blue-green"` / `"rolling"`). `fromWire(String)` is the only conversion entry point; unknown/null inputs fall back to BLUE_GREEN so the executor dispatch site never null-checks or throws.
|
||||||
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
|
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
|
||||||
- `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
|
- `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
|
||||||
- `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
|
- `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
|
||||||
|
|||||||
@@ -13,19 +13,28 @@ paths:
|
|||||||
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
|
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
|
||||||
|
|
||||||
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
|
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
|
||||||
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
|
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Per-replica identity labels: `cameleer.replica` (index), `cameleer.generation` (8-char deployment UUID prefix — pin Prometheus/Grafana deploy boundaries with this), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Traefik router/service keys deliberately omit the generation so load balancing spans old + new replicas during a blue/green overlap.
|
||||||
- **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
|
- **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
|
||||||
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
|
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
|
||||||
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
|
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
|
||||||
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
|
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
|
||||||
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
|
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
|
||||||
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
|
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
|
||||||
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
|
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level. Instance-id changes per deployment — cross-deploy queries aggregate on `application + environment` (and optionally `replica_index`).
|
||||||
- **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
|
- **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
|
||||||
|
|
||||||
## DeploymentExecutor Details
|
## DeploymentExecutor Details
|
||||||
|
|
||||||
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
|
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
|
||||||
|
|
||||||
|
**Container naming** — `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 characters of the deployment UUID. The generation suffix lets old + new replicas coexist during a blue/green swap (deterministic names without a generation used to 409). All lookups across the executor, `DockerEventMonitor`, and `ContainerLogForwarder` key on container **id**, not name — the name is operator-visibility only.
|
||||||
|
|
||||||
|
**Strategy dispatch** — `DeploymentStrategy.fromWire(config.deploymentStrategy())` branches the executor. Unknown values fall back to BLUE_GREEN so misconfiguration never throws at runtime.
|
||||||
|
|
||||||
|
- **Blue/green** (default): start all N new replicas → wait for ALL healthy → stop the previous deployment. Resource peak ≈ 2× replicas for the health-check window. Partial health aborts with status FAILED; the previous deployment is preserved untouched (user's safety net).
|
||||||
|
- **Rolling**: replace replicas one at a time — start new[i] → wait healthy → stop old[i] → next. Resource peak = replicas + 1. Mid-rollout health failure stops in-flight new containers and aborts; already-replaced old replicas are NOT restored (not reversible) but un-replaced old[i+1..N] keep serving traffic. User redeploys to recover.
|
||||||
|
|
||||||
|
Traffic routing is implicit: Traefik labels (`cameleer.app`, `cameleer.environment`) are generation-agnostic, so new replicas attract load balancing as soon as they come up healthy — no explicit swap step.
|
||||||
|
|
||||||
## Deployment Status Model
|
## Deployment Status Model
|
||||||
|
|
||||||
@@ -34,15 +43,11 @@ Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNET
|
|||||||
| `STOPPED` | Intentionally stopped or initial state |
|
| `STOPPED` | Intentionally stopped or initial state |
|
||||||
| `STARTING` | Deploy in progress |
|
| `STARTING` | Deploy in progress |
|
||||||
| `RUNNING` | All replicas healthy and serving |
|
| `RUNNING` | All replicas healthy and serving |
|
||||||
| `DEGRADED` | Some replicas healthy, some dead |
|
| `DEGRADED` | Post-deploy: a replica died after the deploy was marked RUNNING. Set by `DockerEventMonitor` reconciliation, never by `DeploymentExecutor` directly. |
|
||||||
| `STOPPING` | Graceful shutdown in progress |
|
| `STOPPING` | Graceful shutdown in progress |
|
||||||
| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
|
| `FAILED` | Terminal failure (pre-flight, health check, or crash). Partial-healthy deploys now mark FAILED — DEGRADED is reserved for post-deploy drift. |
|
||||||
|
|
||||||
**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
|
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage). Rolling reuses the same stage labels inside the per-replica loop; the UI progress bar shows the most recent stage.
|
||||||
|
|
||||||
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
|
|
||||||
|
|
||||||
**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
|
|
||||||
|
|
||||||
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
|
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
|
||||||
|
|
||||||
|
|||||||
@@ -89,6 +89,34 @@ public class DeploymentExecutor {
|
|||||||
this.applicationConfigRepository = applicationConfigRepository;
|
this.applicationConfigRepository = applicationConfigRepository;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** Deployment-scoped id suffix — distinguishes container names and
|
||||||
|
* CAMELEER_AGENT_INSTANCEID across redeploys so old + new replicas can
|
||||||
|
* coexist during a blue/green swap. First 8 chars of the deployment UUID. */
|
||||||
|
static String generationOf(Deployment deployment) {
|
||||||
|
return deployment.id().toString().substring(0, 8);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Per-deployment context assembled once at the top of executeAsync and passed
|
||||||
|
* into strategy handlers. Keeps the strategy methods readable instead of
|
||||||
|
* threading 12 positional args.
|
||||||
|
*/
|
||||||
|
private record DeployCtx(
|
||||||
|
Deployment deployment,
|
||||||
|
App app,
|
||||||
|
Environment env,
|
||||||
|
ResolvedContainerConfig config,
|
||||||
|
String jarPath,
|
||||||
|
String resolvedRuntimeType,
|
||||||
|
String mainClass,
|
||||||
|
String generation,
|
||||||
|
String primaryNetwork,
|
||||||
|
List<String> additionalNets,
|
||||||
|
Map<String, String> baseEnvVars,
|
||||||
|
Map<String, String> prometheusLabels,
|
||||||
|
long deployStart
|
||||||
|
) {}
|
||||||
|
|
||||||
@Async("deploymentTaskExecutor")
|
@Async("deploymentTaskExecutor")
|
||||||
public void executeAsync(Deployment deployment) {
|
public void executeAsync(Deployment deployment) {
|
||||||
long deployStart = System.currentTimeMillis();
|
long deployStart = System.currentTimeMillis();
|
||||||
@@ -96,6 +124,7 @@ public class DeploymentExecutor {
|
|||||||
App app = appService.getById(deployment.appId());
|
App app = appService.getById(deployment.appId());
|
||||||
Environment env = envService.getById(deployment.environmentId());
|
Environment env = envService.getById(deployment.environmentId());
|
||||||
String jarPath = appService.resolveJarPath(deployment.appVersionId());
|
String jarPath = appService.resolveJarPath(deployment.appVersionId());
|
||||||
|
String generation = generationOf(deployment);
|
||||||
|
|
||||||
var globalDefaults = new ConfigMerger.GlobalRuntimeDefaults(
|
var globalDefaults = new ConfigMerger.GlobalRuntimeDefaults(
|
||||||
parseMemoryLimitMb(globalMemoryLimit),
|
parseMemoryLimitMb(globalMemoryLimit),
|
||||||
@@ -144,7 +173,6 @@ public class DeploymentExecutor {
|
|||||||
updateStage(deployment.id(), DeployStage.CREATE_NETWORK);
|
updateStage(deployment.id(), DeployStage.CREATE_NETWORK);
|
||||||
// Primary network: use configured CAMELEER_DOCKER_NETWORK (tenant-isolated in SaaS mode)
|
// Primary network: use configured CAMELEER_DOCKER_NETWORK (tenant-isolated in SaaS mode)
|
||||||
String primaryNetwork = dockerNetwork;
|
String primaryNetwork = dockerNetwork;
|
||||||
String envNet = null;
|
|
||||||
List<String> additionalNets = new ArrayList<>();
|
List<String> additionalNets = new ArrayList<>();
|
||||||
if (networkManager != null) {
|
if (networkManager != null) {
|
||||||
networkManager.ensureNetwork(primaryNetwork);
|
networkManager.ensureNetwork(primaryNetwork);
|
||||||
@@ -152,7 +180,7 @@ public class DeploymentExecutor {
|
|||||||
networkManager.ensureNetwork(DockerNetworkManager.TRAEFIK_NETWORK);
|
networkManager.ensureNetwork(DockerNetworkManager.TRAEFIK_NETWORK);
|
||||||
additionalNets.add(DockerNetworkManager.TRAEFIK_NETWORK);
|
additionalNets.add(DockerNetworkManager.TRAEFIK_NETWORK);
|
||||||
// Per-environment network scoped to tenant to prevent cross-tenant collisions
|
// Per-environment network scoped to tenant to prevent cross-tenant collisions
|
||||||
envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
|
String envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
|
||||||
networkManager.ensureNetwork(envNet);
|
networkManager.ensureNetwork(envNet);
|
||||||
additionalNets.add(envNet);
|
additionalNets.add(envNet);
|
||||||
}
|
}
|
||||||
@@ -167,135 +195,21 @@ public class DeploymentExecutor {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// === STOP PREVIOUS ACTIVE DEPLOYMENT ===
|
DeployCtx ctx = new DeployCtx(
|
||||||
// Container names are deterministic ({tenant}-{env}-{app}-{replica}), so a
|
deployment, app, env, config, jarPath,
|
||||||
// previous active deployment holds the Docker names we need. Stop + remove
|
resolvedRuntimeType, mainClass, generation,
|
||||||
// it before starting new replicas to avoid a 409 name conflict. Excluding
|
primaryNetwork, additionalNets,
|
||||||
// the current deployment id by SQL (not Java) because the newly created
|
buildEnvVars(app, env, config),
|
||||||
// row already has status=STARTING and would otherwise be picked by
|
PrometheusLabelBuilder.build(resolvedRuntimeType),
|
||||||
// findActiveByAppIdAndEnvironmentId ORDER BY created_at DESC LIMIT 1.
|
deployStart);
|
||||||
Optional<Deployment> previous = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
|
|
||||||
deployment.appId(), deployment.environmentId(), deployment.id());
|
// Dispatch on strategy. Unknown values fall back to BLUE_GREEN via fromWire.
|
||||||
if (previous.isPresent()) {
|
DeploymentStrategy strategy = DeploymentStrategy.fromWire(config.deploymentStrategy());
|
||||||
log.info("Stopping previous deployment {} before starting new replicas", previous.get().id());
|
switch (strategy) {
|
||||||
stopDeploymentContainers(previous.get());
|
case BLUE_GREEN -> deployBlueGreen(ctx);
|
||||||
deploymentService.markStopped(previous.get().id());
|
case ROLLING -> deployRolling(ctx);
|
||||||
}
|
}
|
||||||
|
|
||||||
// === START REPLICAS ===
|
|
||||||
updateStage(deployment.id(), DeployStage.START_REPLICAS);
|
|
||||||
|
|
||||||
Map<String, String> baseEnvVars = buildEnvVars(app, env, config);
|
|
||||||
Map<String, String> prometheusLabels = PrometheusLabelBuilder.build(resolvedRuntimeType);
|
|
||||||
|
|
||||||
List<Map<String, Object>> replicaStates = new ArrayList<>();
|
|
||||||
List<String> newContainerIds = new ArrayList<>();
|
|
||||||
|
|
||||||
for (int i = 0; i < config.replicas(); i++) {
|
|
||||||
String instanceId = env.slug() + "-" + app.slug() + "-" + i;
|
|
||||||
String containerName = tenantId + "-" + instanceId;
|
|
||||||
|
|
||||||
// Per-replica labels (include replica index and instance-id)
|
|
||||||
Map<String, String> labels = TraefikLabelBuilder.build(app.slug(), env.slug(), tenantId, config, i);
|
|
||||||
labels.putAll(prometheusLabels);
|
|
||||||
|
|
||||||
// Per-replica env vars (set agent instance ID to match container log identity)
|
|
||||||
Map<String, String> replicaEnvVars = new LinkedHashMap<>(baseEnvVars);
|
|
||||||
replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
|
|
||||||
|
|
||||||
String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
|
|
||||||
ContainerRequest request = new ContainerRequest(
|
|
||||||
containerName, baseImage, jarPath,
|
|
||||||
volumeName, jarStoragePath,
|
|
||||||
primaryNetwork,
|
|
||||||
additionalNets,
|
|
||||||
replicaEnvVars, labels,
|
|
||||||
config.memoryLimitBytes(), config.memoryReserveBytes(),
|
|
||||||
config.dockerCpuShares(), config.dockerCpuQuota(),
|
|
||||||
config.exposedPorts(), agentHealthPort,
|
|
||||||
"on-failure", 3,
|
|
||||||
resolvedRuntimeType, config.customArgs(), mainClass
|
|
||||||
);
|
|
||||||
|
|
||||||
String containerId = orchestrator.startContainer(request);
|
|
||||||
newContainerIds.add(containerId);
|
|
||||||
|
|
||||||
// Connect to additional networks after container is started
|
|
||||||
for (String net : additionalNets) {
|
|
||||||
if (networkManager != null) {
|
|
||||||
networkManager.connectContainer(containerId, net);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
|
|
||||||
|
|
||||||
replicaStates.add(Map.of(
|
|
||||||
"index", i,
|
|
||||||
"containerId", containerId,
|
|
||||||
"containerName", containerName,
|
|
||||||
"status", "STARTING"
|
|
||||||
));
|
|
||||||
}
|
|
||||||
|
|
||||||
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
|
|
||||||
|
|
||||||
// === HEALTH CHECK ===
|
|
||||||
updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
|
|
||||||
int healthyCount = waitForAnyHealthy(newContainerIds, healthCheckTimeout);
|
|
||||||
|
|
||||||
if (healthyCount == 0) {
|
|
||||||
for (String cid : newContainerIds) {
|
|
||||||
try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
|
|
||||||
catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
|
|
||||||
}
|
|
||||||
pgDeployRepo.updateDeployStage(deployment.id(), null);
|
|
||||||
deploymentService.markFailed(deployment.id(), "No replicas passed health check within " + healthCheckTimeout + "s");
|
|
||||||
serverMetrics.recordDeploymentOutcome("FAILED");
|
|
||||||
serverMetrics.recordDeploymentDuration(deployStart);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
|
|
||||||
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
|
|
||||||
|
|
||||||
// === SWAP TRAFFIC ===
|
|
||||||
// Traffic is routed via Traefik Docker labels, so the "swap" happens
|
|
||||||
// implicitly once the new replicas are healthy and the old containers
|
|
||||||
// are gone. The old deployment was already stopped before START_REPLICAS
|
|
||||||
// to free the deterministic container names.
|
|
||||||
updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
|
|
||||||
|
|
||||||
// === COMPLETE ===
|
|
||||||
updateStage(deployment.id(), DeployStage.COMPLETE);
|
|
||||||
|
|
||||||
// Capture config snapshot before marking RUNNING
|
|
||||||
ApplicationConfig agentConfig = applicationConfigRepository
|
|
||||||
.findByApplicationAndEnvironment(app.slug(), env.slug())
|
|
||||||
.orElse(null);
|
|
||||||
List<String> snapshotSensitiveKeys = agentConfig != null ? agentConfig.getSensitiveKeys() : null;
|
|
||||||
DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
|
|
||||||
deployment.appVersionId(),
|
|
||||||
agentConfig,
|
|
||||||
app.containerConfig(),
|
|
||||||
snapshotSensitiveKeys
|
|
||||||
);
|
|
||||||
pgDeployRepo.saveDeployedConfigSnapshot(deployment.id(), snapshot);
|
|
||||||
|
|
||||||
String primaryContainerId = newContainerIds.get(0);
|
|
||||||
DeploymentStatus finalStatus = healthyCount == config.replicas()
|
|
||||||
? DeploymentStatus.RUNNING : DeploymentStatus.DEGRADED;
|
|
||||||
deploymentService.markRunning(deployment.id(), primaryContainerId);
|
|
||||||
if (finalStatus == DeploymentStatus.DEGRADED) {
|
|
||||||
deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.DEGRADED,
|
|
||||||
primaryContainerId, null);
|
|
||||||
}
|
|
||||||
|
|
||||||
pgDeployRepo.updateDeployStage(deployment.id(), null);
|
|
||||||
serverMetrics.recordDeploymentOutcome(finalStatus.name());
|
|
||||||
serverMetrics.recordDeploymentDuration(deployStart);
|
|
||||||
log.info("Deployment {} is {} ({}/{} replicas healthy)",
|
|
||||||
deployment.id(), finalStatus, healthyCount, config.replicas());
|
|
||||||
|
|
||||||
} catch (Exception e) {
|
} catch (Exception e) {
|
||||||
log.error("Deployment {} FAILED: {}", deployment.id(), e.getMessage(), e);
|
log.error("Deployment {} FAILED: {}", deployment.id(), e.getMessage(), e);
|
||||||
pgDeployRepo.updateDeployStage(deployment.id(), null);
|
pgDeployRepo.updateDeployStage(deployment.id(), null);
|
||||||
@@ -305,6 +219,262 @@ public class DeploymentExecutor {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Blue/green strategy: start all N new replicas (coexisting with the old
|
||||||
|
* ones thanks to the gen-suffixed container names), wait for ALL healthy,
|
||||||
|
* then stop the previous deployment. Strict all-healthy — partial failure
|
||||||
|
* preserves the previous deployment untouched.
|
||||||
|
*/
|
||||||
|
private void deployBlueGreen(DeployCtx ctx) {
|
||||||
|
ResolvedContainerConfig config = ctx.config();
|
||||||
|
Deployment deployment = ctx.deployment();
|
||||||
|
|
||||||
|
// === START REPLICAS ===
|
||||||
|
updateStage(deployment.id(), DeployStage.START_REPLICAS);
|
||||||
|
List<Map<String, Object>> replicaStates = new ArrayList<>();
|
||||||
|
List<String> newContainerIds = new ArrayList<>();
|
||||||
|
for (int i = 0; i < config.replicas(); i++) {
|
||||||
|
Map<String, Object> state = new LinkedHashMap<>();
|
||||||
|
String containerId = startReplica(ctx, i, state);
|
||||||
|
newContainerIds.add(containerId);
|
||||||
|
replicaStates.add(state);
|
||||||
|
}
|
||||||
|
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
|
||||||
|
|
||||||
|
// === HEALTH CHECK ===
|
||||||
|
updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
|
||||||
|
int healthyCount = waitForAllHealthy(newContainerIds, healthCheckTimeout);
|
||||||
|
|
||||||
|
if (healthyCount < config.replicas()) {
|
||||||
|
// Strict abort: tear down new replicas, leave the previous deployment untouched.
|
||||||
|
for (String cid : newContainerIds) {
|
||||||
|
try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
|
||||||
|
catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
|
||||||
|
}
|
||||||
|
pgDeployRepo.updateDeployStage(deployment.id(), null);
|
||||||
|
String reason = String.format(
|
||||||
|
"blue-green: %d/%d replicas healthy within %ds; preserving previous deployment",
|
||||||
|
healthyCount, config.replicas(), healthCheckTimeout);
|
||||||
|
deploymentService.markFailed(deployment.id(), reason);
|
||||||
|
serverMetrics.recordDeploymentOutcome("FAILED");
|
||||||
|
serverMetrics.recordDeploymentDuration(ctx.deployStart());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
|
||||||
|
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
|
||||||
|
|
||||||
|
// === SWAP TRAFFIC ===
|
||||||
|
// All new replicas are healthy; Traefik labels are already attracting
|
||||||
|
// traffic to them. Stop the previous deployment now — the swap is
|
||||||
|
// implicit in the label-driven load balancer.
|
||||||
|
updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
|
||||||
|
Optional<Deployment> previous = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
|
||||||
|
deployment.appId(), deployment.environmentId(), deployment.id());
|
||||||
|
if (previous.isPresent()) {
|
||||||
|
log.info("blue-green: stopping previous deployment {} now that new replicas are healthy",
|
||||||
|
previous.get().id());
|
||||||
|
stopDeploymentContainers(previous.get());
|
||||||
|
deploymentService.markStopped(previous.get().id());
|
||||||
|
}
|
||||||
|
|
||||||
|
// === COMPLETE ===
|
||||||
|
updateStage(deployment.id(), DeployStage.COMPLETE);
|
||||||
|
persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
|
||||||
|
log.info("Deployment {} is RUNNING (blue-green, {}/{} replicas healthy)",
|
||||||
|
deployment.id(), healthyCount, config.replicas());
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Rolling strategy: replace replicas one at a time — start new[i], wait
|
||||||
|
* healthy, stop old[i]. On any replica's health failure, stop the
|
||||||
|
* in-flight new container, leave remaining old replicas serving, mark
|
||||||
|
* FAILED. Already-replaced old containers are not restored (can't unring
|
||||||
|
* that bell) — user redeploys to recover.
|
||||||
|
*
|
||||||
|
* Resource peak: replicas + 1 (briefly while a new replica warms up
|
||||||
|
* before its counterpart is stopped).
|
||||||
|
*/
|
||||||
|
private void deployRolling(DeployCtx ctx) {
|
||||||
|
ResolvedContainerConfig config = ctx.config();
|
||||||
|
Deployment deployment = ctx.deployment();
|
||||||
|
|
||||||
|
// Capture previous deployment's per-index container ids up front.
|
||||||
|
Optional<Deployment> previousOpt = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
|
||||||
|
deployment.appId(), deployment.environmentId(), deployment.id());
|
||||||
|
Map<Integer, String> oldContainerByIndex = new LinkedHashMap<>();
|
||||||
|
if (previousOpt.isPresent() && previousOpt.get().replicaStates() != null) {
|
||||||
|
for (Map<String, Object> r : previousOpt.get().replicaStates()) {
|
||||||
|
Object idx = r.get("index");
|
||||||
|
Object cid = r.get("containerId");
|
||||||
|
if (idx instanceof Number n && cid instanceof String s) {
|
||||||
|
oldContainerByIndex.put(n.intValue(), s);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// === START REPLICAS ===
|
||||||
|
updateStage(deployment.id(), DeployStage.START_REPLICAS);
|
||||||
|
List<Map<String, Object>> replicaStates = new ArrayList<>();
|
||||||
|
List<String> newContainerIds = new ArrayList<>();
|
||||||
|
|
||||||
|
for (int i = 0; i < config.replicas(); i++) {
|
||||||
|
// Start new replica i (gen-suffixed name; coexists with old[i]).
|
||||||
|
Map<String, Object> state = new LinkedHashMap<>();
|
||||||
|
String newCid = startReplica(ctx, i, state);
|
||||||
|
newContainerIds.add(newCid);
|
||||||
|
replicaStates.add(state);
|
||||||
|
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
|
||||||
|
|
||||||
|
// === HEALTH CHECK (per-replica) ===
|
||||||
|
updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
|
||||||
|
boolean healthy = waitForOneHealthy(newCid, healthCheckTimeout);
|
||||||
|
if (!healthy) {
|
||||||
|
// Abort: stop this in-flight new replica AND any new replicas
|
||||||
|
// started so far. Already-stopped old replicas stay stopped
|
||||||
|
// (rolling is not reversible). Remaining un-replaced old
|
||||||
|
// replicas keep serving traffic.
|
||||||
|
for (String cid : newContainerIds) {
|
||||||
|
try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
|
||||||
|
catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
|
||||||
|
}
|
||||||
|
pgDeployRepo.updateDeployStage(deployment.id(), null);
|
||||||
|
String reason = String.format(
|
||||||
|
"rolling: replica %d failed to reach healthy within %ds; %d previous replicas still running",
|
||||||
|
i, healthCheckTimeout, oldContainerByIndex.size());
|
||||||
|
deploymentService.markFailed(deployment.id(), reason);
|
||||||
|
serverMetrics.recordDeploymentOutcome("FAILED");
|
||||||
|
serverMetrics.recordDeploymentDuration(ctx.deployStart());
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Health check passed: update replica status to RUNNING, stop the
|
||||||
|
// corresponding old[i] if present, and continue with replica i+1.
|
||||||
|
replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
|
||||||
|
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
|
||||||
|
|
||||||
|
String oldCid = oldContainerByIndex.remove(i);
|
||||||
|
if (oldCid != null) {
|
||||||
|
try {
|
||||||
|
orchestrator.stopContainer(oldCid);
|
||||||
|
orchestrator.removeContainer(oldCid);
|
||||||
|
log.info("rolling: replaced replica {} (old={}, new={})", i, oldCid, newCid);
|
||||||
|
} catch (Exception e) {
|
||||||
|
log.warn("rolling: failed to stop old replica {} ({}): {}", i, oldCid, e.getMessage());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// === SWAP TRAFFIC ===
|
||||||
|
// Any old replicas with indices >= new.replicas (e.g., when replica
|
||||||
|
// count shrank) are still running; sweep them now so the old
|
||||||
|
// deployment can be marked STOPPED.
|
||||||
|
updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
|
||||||
|
for (Map.Entry<Integer, String> e : oldContainerByIndex.entrySet()) {
|
||||||
|
try {
|
||||||
|
orchestrator.stopContainer(e.getValue());
|
||||||
|
orchestrator.removeContainer(e.getValue());
|
||||||
|
log.info("rolling: stopped leftover old replica {} ({})", e.getKey(), e.getValue());
|
||||||
|
} catch (Exception ex) {
|
||||||
|
log.warn("rolling: failed to stop leftover old replica {}: {}", e.getKey(), ex.getMessage());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (previousOpt.isPresent()) {
|
||||||
|
deploymentService.markStopped(previousOpt.get().id());
|
||||||
|
}
|
||||||
|
|
||||||
|
// === COMPLETE ===
|
||||||
|
updateStage(deployment.id(), DeployStage.COMPLETE);
|
||||||
|
persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
|
||||||
|
log.info("Deployment {} is RUNNING (rolling, {}/{} replicas replaced)",
|
||||||
|
deployment.id(), config.replicas(), config.replicas());
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Poll a single container until healthy or the timeout expires. Returns
|
||||||
|
* true on healthy, false on timeout or thread interrupt. */
|
||||||
|
private boolean waitForOneHealthy(String containerId, int timeoutSeconds) {
|
||||||
|
long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
|
||||||
|
while (System.currentTimeMillis() < deadline) {
|
||||||
|
ContainerStatus status = orchestrator.getContainerStatus(containerId);
|
||||||
|
if ("healthy".equals(status.state())) return true;
|
||||||
|
try { Thread.sleep(2000); } catch (InterruptedException e) {
|
||||||
|
Thread.currentThread().interrupt();
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Start one replica container with the gen-suffixed name and return its
|
||||||
|
* container id. Fills `stateOut` with the replicaStates JSONB row. */
|
||||||
|
private String startReplica(DeployCtx ctx, int i, Map<String, Object> stateOut) {
|
||||||
|
Environment env = ctx.env();
|
||||||
|
App app = ctx.app();
|
||||||
|
ResolvedContainerConfig config = ctx.config();
|
||||||
|
|
||||||
|
String instanceId = env.slug() + "-" + app.slug() + "-" + i + "-" + ctx.generation();
|
||||||
|
String containerName = tenantId + "-" + instanceId;
|
||||||
|
|
||||||
|
Map<String, String> labels = TraefikLabelBuilder.build(
|
||||||
|
app.slug(), env.slug(), tenantId, config, i, ctx.generation());
|
||||||
|
labels.putAll(ctx.prometheusLabels());
|
||||||
|
|
||||||
|
Map<String, String> replicaEnvVars = new LinkedHashMap<>(ctx.baseEnvVars());
|
||||||
|
replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
|
||||||
|
|
||||||
|
String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
|
||||||
|
ContainerRequest request = new ContainerRequest(
|
||||||
|
containerName, baseImage, ctx.jarPath(),
|
||||||
|
volumeName, jarStoragePath,
|
||||||
|
ctx.primaryNetwork(),
|
||||||
|
ctx.additionalNets(),
|
||||||
|
replicaEnvVars, labels,
|
||||||
|
config.memoryLimitBytes(), config.memoryReserveBytes(),
|
||||||
|
config.dockerCpuShares(), config.dockerCpuQuota(),
|
||||||
|
config.exposedPorts(), agentHealthPort,
|
||||||
|
"on-failure", 3,
|
||||||
|
ctx.resolvedRuntimeType(), config.customArgs(), ctx.mainClass()
|
||||||
|
);
|
||||||
|
|
||||||
|
String containerId = orchestrator.startContainer(request);
|
||||||
|
|
||||||
|
// Connect to additional networks after container is started
|
||||||
|
for (String net : ctx.additionalNets()) {
|
||||||
|
if (networkManager != null) {
|
||||||
|
networkManager.connectContainer(containerId, net);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
|
||||||
|
|
||||||
|
stateOut.put("index", i);
|
||||||
|
stateOut.put("containerId", containerId);
|
||||||
|
stateOut.put("containerName", containerName);
|
||||||
|
stateOut.put("status", "STARTING");
|
||||||
|
return containerId;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Persist the deployment snapshot and mark the deployment RUNNING.
|
||||||
|
* Finalizes the deploy in a single place shared by all strategy paths. */
|
||||||
|
private void persistSnapshotAndMarkRunning(DeployCtx ctx, String primaryContainerId) {
|
||||||
|
Deployment deployment = ctx.deployment();
|
||||||
|
ApplicationConfig agentConfig = applicationConfigRepository
|
||||||
|
.findByApplicationAndEnvironment(ctx.app().slug(), ctx.env().slug())
|
||||||
|
.orElse(null);
|
||||||
|
List<String> snapshotSensitiveKeys = agentConfig != null ? agentConfig.getSensitiveKeys() : null;
|
||||||
|
DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
|
||||||
|
deployment.appVersionId(),
|
||||||
|
agentConfig,
|
||||||
|
ctx.app().containerConfig(),
|
||||||
|
snapshotSensitiveKeys);
|
||||||
|
pgDeployRepo.saveDeployedConfigSnapshot(deployment.id(), snapshot);
|
||||||
|
|
||||||
|
deploymentService.markRunning(deployment.id(), primaryContainerId);
|
||||||
|
pgDeployRepo.updateDeployStage(deployment.id(), null);
|
||||||
|
serverMetrics.recordDeploymentOutcome("RUNNING");
|
||||||
|
serverMetrics.recordDeploymentDuration(ctx.deployStart());
|
||||||
|
}
|
||||||
|
|
||||||
public void stopDeployment(Deployment deployment) {
|
public void stopDeployment(Deployment deployment) {
|
||||||
pgDeployRepo.updateTargetState(deployment.id(), "STOPPED");
|
pgDeployRepo.updateTargetState(deployment.id(), "STOPPED");
|
||||||
deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.STOPPING,
|
deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.STOPPING,
|
||||||
@@ -370,7 +540,10 @@ public class DeploymentExecutor {
|
|||||||
return envVars;
|
return envVars;
|
||||||
}
|
}
|
||||||
|
|
||||||
private int waitForAnyHealthy(List<String> containerIds, int timeoutSeconds) {
|
/** Poll until all containers are healthy or the timeout expires. Returns
|
||||||
|
* the healthy count at return time — == ids.size() on full success, less
|
||||||
|
* if the timeout won. */
|
||||||
|
private int waitForAllHealthy(List<String> containerIds, int timeoutSeconds) {
|
||||||
long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
|
long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
|
||||||
int lastHealthy = 0;
|
int lastHealthy = 0;
|
||||||
while (System.currentTimeMillis() < deadline) {
|
while (System.currentTimeMillis() < deadline) {
|
||||||
|
|||||||
@@ -10,9 +10,13 @@ public final class TraefikLabelBuilder {
|
|||||||
private TraefikLabelBuilder() {}
|
private TraefikLabelBuilder() {}
|
||||||
|
|
||||||
public static Map<String, String> build(String appSlug, String envSlug, String tenantId,
|
public static Map<String, String> build(String appSlug, String envSlug, String tenantId,
|
||||||
ResolvedContainerConfig config, int replicaIndex) {
|
ResolvedContainerConfig config, int replicaIndex,
|
||||||
|
String generation) {
|
||||||
|
// Traefik router/service keys stay generation-agnostic so load balancing
|
||||||
|
// spans old + new replicas during a blue/green overlap. instance-id and
|
||||||
|
// the new generation label carry the per-deploy identity.
|
||||||
String svc = envSlug + "-" + appSlug;
|
String svc = envSlug + "-" + appSlug;
|
||||||
String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex;
|
String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex + "-" + generation;
|
||||||
Map<String, String> labels = new LinkedHashMap<>();
|
Map<String, String> labels = new LinkedHashMap<>();
|
||||||
|
|
||||||
labels.put("traefik.enable", "true");
|
labels.put("traefik.enable", "true");
|
||||||
@@ -21,6 +25,7 @@ public final class TraefikLabelBuilder {
|
|||||||
labels.put("cameleer.app", appSlug);
|
labels.put("cameleer.app", appSlug);
|
||||||
labels.put("cameleer.environment", envSlug);
|
labels.put("cameleer.environment", envSlug);
|
||||||
labels.put("cameleer.replica", String.valueOf(replicaIndex));
|
labels.put("cameleer.replica", String.valueOf(replicaIndex));
|
||||||
|
labels.put("cameleer.generation", generation);
|
||||||
labels.put("cameleer.instance-id", instanceId);
|
labels.put("cameleer.instance-id", instanceId);
|
||||||
|
|
||||||
labels.put("traefik.http.services." + svc + ".loadbalancer.server.port",
|
labels.put("traefik.http.services." + svc + ".loadbalancer.server.port",
|
||||||
|
|||||||
@@ -0,0 +1,190 @@
|
|||||||
|
package com.cameleer.server.app.runtime;
|
||||||
|
|
||||||
|
import com.cameleer.server.app.AbstractPostgresIT;
|
||||||
|
import com.cameleer.server.app.TestSecurityHelper;
|
||||||
|
import com.cameleer.server.app.storage.PostgresDeploymentRepository;
|
||||||
|
import com.cameleer.server.core.runtime.ContainerStatus;
|
||||||
|
import com.cameleer.server.core.runtime.Deployment;
|
||||||
|
import com.cameleer.server.core.runtime.DeploymentStatus;
|
||||||
|
import com.cameleer.server.core.runtime.RuntimeOrchestrator;
|
||||||
|
import com.fasterxml.jackson.databind.JsonNode;
|
||||||
|
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||||
|
import org.junit.jupiter.api.BeforeEach;
|
||||||
|
import org.junit.jupiter.api.Test;
|
||||||
|
import org.springframework.beans.factory.annotation.Autowired;
|
||||||
|
import org.springframework.boot.test.mock.mockito.MockBean;
|
||||||
|
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||||
|
import org.springframework.core.io.ByteArrayResource;
|
||||||
|
import org.springframework.http.HttpEntity;
|
||||||
|
import org.springframework.http.HttpHeaders;
|
||||||
|
import org.springframework.http.HttpMethod;
|
||||||
|
import org.springframework.http.MediaType;
|
||||||
|
import org.springframework.test.context.TestPropertySource;
|
||||||
|
import org.springframework.util.LinkedMultiValueMap;
|
||||||
|
import org.springframework.util.MultiValueMap;
|
||||||
|
|
||||||
|
import java.util.UUID;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
|
||||||
|
import static org.assertj.core.api.Assertions.assertThat;
|
||||||
|
import static org.awaitility.Awaitility.await;
|
||||||
|
import static org.mockito.ArgumentMatchers.any;
|
||||||
|
import static org.mockito.Mockito.never;
|
||||||
|
import static org.mockito.Mockito.verify;
|
||||||
|
import static org.mockito.Mockito.when;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Verifies the blue-green deployment strategy: start all new → health-check
|
||||||
|
* all → stop old. Strict all-healthy — partial failure preserves the previous
|
||||||
|
* deployment untouched.
|
||||||
|
*/
|
||||||
|
@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
|
||||||
|
class BlueGreenStrategyIT extends AbstractPostgresIT {
|
||||||
|
|
||||||
|
@MockBean
|
||||||
|
RuntimeOrchestrator runtimeOrchestrator;
|
||||||
|
|
||||||
|
@Autowired private TestRestTemplate restTemplate;
|
||||||
|
@Autowired private ObjectMapper objectMapper;
|
||||||
|
@Autowired private TestSecurityHelper securityHelper;
|
||||||
|
@Autowired private PostgresDeploymentRepository deploymentRepository;
|
||||||
|
|
||||||
|
private String operatorJwt;
|
||||||
|
private String appSlug;
|
||||||
|
private String versionId;
|
||||||
|
|
||||||
|
@BeforeEach
|
||||||
|
void setUp() throws Exception {
|
||||||
|
operatorJwt = securityHelper.operatorToken();
|
||||||
|
|
||||||
|
jdbcTemplate.update("DELETE FROM deployments");
|
||||||
|
jdbcTemplate.update("DELETE FROM app_versions");
|
||||||
|
jdbcTemplate.update("DELETE FROM apps");
|
||||||
|
jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
|
||||||
|
|
||||||
|
when(runtimeOrchestrator.isEnabled()).thenReturn(true);
|
||||||
|
|
||||||
|
appSlug = "bg-" + UUID.randomUUID().toString().substring(0, 8);
|
||||||
|
post("/api/v1/environments/default/apps", String.format("""
|
||||||
|
{"slug": "%s", "displayName": "BG App"}
|
||||||
|
""", appSlug), operatorJwt);
|
||||||
|
put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
|
||||||
|
{"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "blue-green"}
|
||||||
|
""", operatorJwt);
|
||||||
|
versionId = uploadJar(appSlug, ("bg-jar-" + appSlug).getBytes());
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void blueGreen_allHealthy_stopsOldAfterNew() throws Exception {
|
||||||
|
when(runtimeOrchestrator.startContainer(any()))
|
||||||
|
.thenReturn("old-0", "old-1", "new-0", "new-1");
|
||||||
|
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
|
||||||
|
|
||||||
|
String firstDeployId = triggerDeploy();
|
||||||
|
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
|
||||||
|
|
||||||
|
String secondDeployId = triggerDeploy();
|
||||||
|
awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
|
||||||
|
|
||||||
|
// Previous deployment was stopped once new was healthy
|
||||||
|
Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
|
||||||
|
assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
|
||||||
|
|
||||||
|
verify(runtimeOrchestrator).stopContainer("old-0");
|
||||||
|
verify(runtimeOrchestrator).stopContainer("old-1");
|
||||||
|
verify(runtimeOrchestrator, never()).stopContainer("new-0");
|
||||||
|
verify(runtimeOrchestrator, never()).stopContainer("new-1");
|
||||||
|
|
||||||
|
// New deployment has both new replicas recorded
|
||||||
|
Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
|
||||||
|
assertThat(second.replicaStates()).hasSize(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void blueGreen_partialHealthy_preservesOldAndMarksFailed() throws Exception {
|
||||||
|
when(runtimeOrchestrator.startContainer(any()))
|
||||||
|
.thenReturn("old-0", "old-1", "new-0", "new-1");
|
||||||
|
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
|
||||||
|
ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
|
||||||
|
|
||||||
|
String firstDeployId = triggerDeploy();
|
||||||
|
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
|
||||||
|
|
||||||
|
String secondDeployId = triggerDeploy();
|
||||||
|
awaitStatus(secondDeployId, DeploymentStatus.FAILED);
|
||||||
|
|
||||||
|
Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
|
||||||
|
assertThat(second.errorMessage())
|
||||||
|
.contains("blue-green")
|
||||||
|
.contains("1/2");
|
||||||
|
|
||||||
|
// Previous deployment stays RUNNING — blue-green's safety promise.
|
||||||
|
Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
|
||||||
|
assertThat(first.status()).isEqualTo(DeploymentStatus.RUNNING);
|
||||||
|
|
||||||
|
verify(runtimeOrchestrator, never()).stopContainer("old-0");
|
||||||
|
verify(runtimeOrchestrator, never()).stopContainer("old-1");
|
||||||
|
// Cleanup ran on both new replicas.
|
||||||
|
verify(runtimeOrchestrator).stopContainer("new-0");
|
||||||
|
verify(runtimeOrchestrator).stopContainer("new-1");
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- helpers ----
|
||||||
|
|
||||||
|
private String triggerDeploy() throws Exception {
|
||||||
|
JsonNode deployResponse = post(
|
||||||
|
"/api/v1/environments/default/apps/" + appSlug + "/deployments",
|
||||||
|
String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
|
||||||
|
return deployResponse.path("id").asText();
|
||||||
|
}
|
||||||
|
|
||||||
|
private void awaitStatus(String deployId, DeploymentStatus expected) {
|
||||||
|
await().atMost(30, TimeUnit.SECONDS)
|
||||||
|
.pollInterval(500, TimeUnit.MILLISECONDS)
|
||||||
|
.untilAsserted(() -> {
|
||||||
|
Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
|
||||||
|
.orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
|
||||||
|
assertThat(d.status()).isEqualTo(expected);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
private JsonNode post(String path, String json, String jwt) throws Exception {
|
||||||
|
HttpHeaders headers = securityHelper.authHeaders(jwt);
|
||||||
|
var response = restTemplate.exchange(path, HttpMethod.POST,
|
||||||
|
new HttpEntity<>(json, headers), String.class);
|
||||||
|
return objectMapper.readTree(response.getBody());
|
||||||
|
}
|
||||||
|
|
||||||
|
private void put(String path, String json, String jwt) {
|
||||||
|
HttpHeaders headers = securityHelper.authHeaders(jwt);
|
||||||
|
restTemplate.exchange(path, HttpMethod.PUT,
|
||||||
|
new HttpEntity<>(json, headers), String.class);
|
||||||
|
}
|
||||||
|
|
||||||
|
private String uploadJar(String appSlug, byte[] content) throws Exception {
|
||||||
|
ByteArrayResource resource = new ByteArrayResource(content) {
|
||||||
|
@Override public String getFilename() { return "app.jar"; }
|
||||||
|
};
|
||||||
|
MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
|
||||||
|
body.add("file", resource);
|
||||||
|
|
||||||
|
HttpHeaders headers = new HttpHeaders();
|
||||||
|
headers.set("Authorization", "Bearer " + operatorJwt);
|
||||||
|
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||||
|
headers.setContentType(MediaType.MULTIPART_FORM_DATA);
|
||||||
|
|
||||||
|
var response = restTemplate.exchange(
|
||||||
|
"/api/v1/environments/default/apps/" + appSlug + "/versions",
|
||||||
|
HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
|
||||||
|
JsonNode versionNode = objectMapper.readTree(response.getBody());
|
||||||
|
return versionNode.path("id").asText();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,194 @@
|
|||||||
|
package com.cameleer.server.app.runtime;
|
||||||
|
|
||||||
|
import com.cameleer.server.app.AbstractPostgresIT;
|
||||||
|
import com.cameleer.server.app.TestSecurityHelper;
|
||||||
|
import com.cameleer.server.app.storage.PostgresDeploymentRepository;
|
||||||
|
import com.cameleer.server.core.runtime.ContainerStatus;
|
||||||
|
import com.cameleer.server.core.runtime.Deployment;
|
||||||
|
import com.cameleer.server.core.runtime.DeploymentStatus;
|
||||||
|
import com.cameleer.server.core.runtime.RuntimeOrchestrator;
|
||||||
|
import com.fasterxml.jackson.databind.JsonNode;
|
||||||
|
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||||
|
import org.junit.jupiter.api.BeforeEach;
|
||||||
|
import org.junit.jupiter.api.Test;
|
||||||
|
import org.mockito.InOrder;
|
||||||
|
import org.springframework.beans.factory.annotation.Autowired;
|
||||||
|
import org.springframework.boot.test.mock.mockito.MockBean;
|
||||||
|
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||||
|
import org.springframework.core.io.ByteArrayResource;
|
||||||
|
import org.springframework.http.HttpEntity;
|
||||||
|
import org.springframework.http.HttpHeaders;
|
||||||
|
import org.springframework.http.HttpMethod;
|
||||||
|
import org.springframework.http.MediaType;
|
||||||
|
import org.springframework.test.context.TestPropertySource;
|
||||||
|
import org.springframework.util.LinkedMultiValueMap;
|
||||||
|
import org.springframework.util.MultiValueMap;
|
||||||
|
|
||||||
|
import java.util.UUID;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
|
||||||
|
import static org.assertj.core.api.Assertions.assertThat;
|
||||||
|
import static org.awaitility.Awaitility.await;
|
||||||
|
import static org.mockito.ArgumentMatchers.any;
|
||||||
|
import static org.mockito.Mockito.inOrder;
|
||||||
|
import static org.mockito.Mockito.never;
|
||||||
|
import static org.mockito.Mockito.times;
|
||||||
|
import static org.mockito.Mockito.verify;
|
||||||
|
import static org.mockito.Mockito.when;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Verifies the rolling deployment strategy: per-replica start → health → stop
|
||||||
|
* old. Mid-rollout health failure preserves remaining un-replaced old replicas;
|
||||||
|
* already-stopped old replicas are not restored.
|
||||||
|
*/
|
||||||
|
@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
|
||||||
|
class RollingStrategyIT extends AbstractPostgresIT {
|
||||||
|
|
||||||
|
@MockBean
|
||||||
|
RuntimeOrchestrator runtimeOrchestrator;
|
||||||
|
|
||||||
|
@Autowired private TestRestTemplate restTemplate;
|
||||||
|
@Autowired private ObjectMapper objectMapper;
|
||||||
|
@Autowired private TestSecurityHelper securityHelper;
|
||||||
|
@Autowired private PostgresDeploymentRepository deploymentRepository;
|
||||||
|
|
||||||
|
private String operatorJwt;
|
||||||
|
private String appSlug;
|
||||||
|
private String versionId;
|
||||||
|
|
||||||
|
@BeforeEach
|
||||||
|
void setUp() throws Exception {
|
||||||
|
operatorJwt = securityHelper.operatorToken();
|
||||||
|
|
||||||
|
jdbcTemplate.update("DELETE FROM deployments");
|
||||||
|
jdbcTemplate.update("DELETE FROM app_versions");
|
||||||
|
jdbcTemplate.update("DELETE FROM apps");
|
||||||
|
jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
|
||||||
|
|
||||||
|
when(runtimeOrchestrator.isEnabled()).thenReturn(true);
|
||||||
|
|
||||||
|
appSlug = "roll-" + UUID.randomUUID().toString().substring(0, 8);
|
||||||
|
post("/api/v1/environments/default/apps", String.format("""
|
||||||
|
{"slug": "%s", "displayName": "Rolling App"}
|
||||||
|
""", appSlug), operatorJwt);
|
||||||
|
put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
|
||||||
|
{"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "rolling"}
|
||||||
|
""", operatorJwt);
|
||||||
|
versionId = uploadJar(appSlug, ("roll-jar-" + appSlug).getBytes());
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void rolling_allHealthy_replacesOneByOne() throws Exception {
|
||||||
|
when(runtimeOrchestrator.startContainer(any()))
|
||||||
|
.thenReturn("old-0", "old-1", "new-0", "new-1");
|
||||||
|
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
|
||||||
|
|
||||||
|
String firstDeployId = triggerDeploy();
|
||||||
|
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
|
||||||
|
|
||||||
|
String secondDeployId = triggerDeploy();
|
||||||
|
awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
|
||||||
|
|
||||||
|
// Rolling invariant: old-0 is stopped BEFORE old-1 (replicas replaced
|
||||||
|
// one at a time, not all at once). Checking stop order is sufficient —
|
||||||
|
// a blue-green path would have both stops adjacent at the end with no
|
||||||
|
// interleaved starts; rolling interleaves starts between stops.
|
||||||
|
InOrder inOrder = inOrder(runtimeOrchestrator);
|
||||||
|
inOrder.verify(runtimeOrchestrator).stopContainer("old-0");
|
||||||
|
inOrder.verify(runtimeOrchestrator).stopContainer("old-1");
|
||||||
|
|
||||||
|
// Total of 4 startContainer calls: 2 for first deploy, 2 for rolling.
|
||||||
|
verify(runtimeOrchestrator, times(4)).startContainer(any());
|
||||||
|
// New replicas were not stopped — they're the running ones now.
|
||||||
|
verify(runtimeOrchestrator, never()).stopContainer("new-0");
|
||||||
|
verify(runtimeOrchestrator, never()).stopContainer("new-1");
|
||||||
|
|
||||||
|
Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
|
||||||
|
assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void rolling_failsMidRollout_preservesRemainingOld() throws Exception {
|
||||||
|
when(runtimeOrchestrator.startContainer(any()))
|
||||||
|
.thenReturn("old-0", "old-1", "new-0", "new-1");
|
||||||
|
ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
|
||||||
|
ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
|
||||||
|
when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
|
||||||
|
|
||||||
|
String firstDeployId = triggerDeploy();
|
||||||
|
awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
|
||||||
|
|
||||||
|
String secondDeployId = triggerDeploy();
|
||||||
|
awaitStatus(secondDeployId, DeploymentStatus.FAILED);
|
||||||
|
|
||||||
|
Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
|
||||||
|
assertThat(second.errorMessage())
|
||||||
|
.contains("rolling")
|
||||||
|
.contains("replica 1");
|
||||||
|
|
||||||
|
// old-0 was replaced before the failure; old-1 was never touched.
|
||||||
|
verify(runtimeOrchestrator).stopContainer("old-0");
|
||||||
|
verify(runtimeOrchestrator, never()).stopContainer("old-1");
|
||||||
|
// Cleanup stops both new replicas started so far.
|
||||||
|
verify(runtimeOrchestrator).stopContainer("new-0");
|
||||||
|
verify(runtimeOrchestrator).stopContainer("new-1");
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- helpers (same pattern as BlueGreenStrategyIT) ----
|
||||||
|
|
||||||
|
private String triggerDeploy() throws Exception {
|
||||||
|
JsonNode deployResponse = post(
|
||||||
|
"/api/v1/environments/default/apps/" + appSlug + "/deployments",
|
||||||
|
String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
|
||||||
|
return deployResponse.path("id").asText();
|
||||||
|
}
|
||||||
|
|
||||||
|
private void awaitStatus(String deployId, DeploymentStatus expected) {
|
||||||
|
await().atMost(30, TimeUnit.SECONDS)
|
||||||
|
.pollInterval(500, TimeUnit.MILLISECONDS)
|
||||||
|
.untilAsserted(() -> {
|
||||||
|
Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
|
||||||
|
.orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
|
||||||
|
assertThat(d.status()).isEqualTo(expected);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
private JsonNode post(String path, String json, String jwt) throws Exception {
|
||||||
|
HttpHeaders headers = securityHelper.authHeaders(jwt);
|
||||||
|
var response = restTemplate.exchange(path, HttpMethod.POST,
|
||||||
|
new HttpEntity<>(json, headers), String.class);
|
||||||
|
return objectMapper.readTree(response.getBody());
|
||||||
|
}
|
||||||
|
|
||||||
|
private void put(String path, String json, String jwt) {
|
||||||
|
HttpHeaders headers = securityHelper.authHeaders(jwt);
|
||||||
|
restTemplate.exchange(path, HttpMethod.PUT,
|
||||||
|
new HttpEntity<>(json, headers), String.class);
|
||||||
|
}
|
||||||
|
|
||||||
|
private String uploadJar(String appSlug, byte[] content) throws Exception {
|
||||||
|
ByteArrayResource resource = new ByteArrayResource(content) {
|
||||||
|
@Override public String getFilename() { return "app.jar"; }
|
||||||
|
};
|
||||||
|
MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
|
||||||
|
body.add("file", resource);
|
||||||
|
|
||||||
|
HttpHeaders headers = new HttpHeaders();
|
||||||
|
headers.set("Authorization", "Bearer " + operatorJwt);
|
||||||
|
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||||
|
headers.setContentType(MediaType.MULTIPART_FORM_DATA);
|
||||||
|
|
||||||
|
var response = restTemplate.exchange(
|
||||||
|
"/api/v1/environments/default/apps/" + appSlug + "/versions",
|
||||||
|
HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
|
||||||
|
JsonNode versionNode = objectMapper.readTree(response.getBody());
|
||||||
|
return versionNode.path("id").asText();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,31 @@
|
|||||||
|
package com.cameleer.server.core.runtime;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Supported deployment strategies. Persisted as a kebab-case string on
|
||||||
|
* ApplicationConfig / ResolvedContainerConfig; {@link #fromWire(String)} is
|
||||||
|
* the only conversion entry point and falls back to {@link #BLUE_GREEN} for
|
||||||
|
* unknown or null input so the executor never has to null-check.
|
||||||
|
*/
|
||||||
|
public enum DeploymentStrategy {
|
||||||
|
BLUE_GREEN("blue-green"),
|
||||||
|
ROLLING("rolling");
|
||||||
|
|
||||||
|
private final String wire;
|
||||||
|
|
||||||
|
DeploymentStrategy(String wire) {
|
||||||
|
this.wire = wire;
|
||||||
|
}
|
||||||
|
|
||||||
|
public String toWire() {
|
||||||
|
return wire;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static DeploymentStrategy fromWire(String value) {
|
||||||
|
if (value == null) return BLUE_GREEN;
|
||||||
|
String normalized = value.trim().toLowerCase();
|
||||||
|
for (DeploymentStrategy s : values()) {
|
||||||
|
if (s.wire.equals(normalized)) return s;
|
||||||
|
}
|
||||||
|
return BLUE_GREEN;
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,34 @@
|
|||||||
|
package com.cameleer.server.core.runtime;
|
||||||
|
|
||||||
|
import org.junit.jupiter.api.Test;
|
||||||
|
|
||||||
|
import static org.assertj.core.api.Assertions.assertThat;
|
||||||
|
|
||||||
|
class DeploymentStrategyTest {
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void fromWire_knownValues() {
|
||||||
|
assertThat(DeploymentStrategy.fromWire("blue-green")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
|
||||||
|
assertThat(DeploymentStrategy.fromWire("rolling")).isEqualTo(DeploymentStrategy.ROLLING);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void fromWire_caseInsensitiveAndTrims() {
|
||||||
|
assertThat(DeploymentStrategy.fromWire("BLUE-GREEN")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
|
||||||
|
assertThat(DeploymentStrategy.fromWire(" Rolling ")).isEqualTo(DeploymentStrategy.ROLLING);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void fromWire_unknownOrNullFallsBackToBlueGreen() {
|
||||||
|
assertThat(DeploymentStrategy.fromWire(null)).isEqualTo(DeploymentStrategy.BLUE_GREEN);
|
||||||
|
assertThat(DeploymentStrategy.fromWire("")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
|
||||||
|
assertThat(DeploymentStrategy.fromWire("canary")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
void toWire_roundTrips() {
|
||||||
|
for (DeploymentStrategy s : DeploymentStrategy.values()) {
|
||||||
|
assertThat(DeploymentStrategy.fromWire(s.toWire())).isEqualTo(s);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
225
docs/superpowers/plans/2026-04-23-deployment-strategies.md
Normal file
225
docs/superpowers/plans/2026-04-23-deployment-strategies.md
Normal file
@@ -0,0 +1,225 @@
|
|||||||
|
# Deployment Strategies (blue-green + rolling) — Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** Make `deploymentStrategy` actually affect runtime behavior. Support **blue-green** (all-at-once, default) and **rolling** (per-replica) deployments with correct semantics. Unblock real blue/green by giving each deployment a unique container-name generation suffix so old + new replicas can coexist during the swap.
|
||||||
|
|
||||||
|
**Current state (interim fix landed in `f8dccaae`):** strategy field exists but executor doesn't branch on it; a destroy-then-start flow runs regardless. This plan replaces that interim behavior.
|
||||||
|
|
||||||
|
**Architecture:**
|
||||||
|
- Append an 8-char **`gen`** suffix (first 8 chars of `deployment.id`) to container name AND `CAMELEER_AGENT_INSTANCEID`. Unique per deployment; no new DB state.
|
||||||
|
- Add a `cameleer.generation` Docker label so Grafana/Prometheus can pin deploy boundaries without regex on instance-id.
|
||||||
|
- Branch `DeploymentExecutor.executeAsync` on strategy:
|
||||||
|
- **blue-green**: start all N new → health-check all → stop all old. Strict all-healthy: partial = FAILED (old stays running).
|
||||||
|
- **rolling**: per-replica loop: start new[i] → health-check → stop old[i] → next. Mid-rollout failure → stop failed new[i], leave remaining old[i..n] running, mark FAILED.
|
||||||
|
- Keep destroy-then-start as the fallback for unknown strategy values (safety net).
|
||||||
|
|
||||||
|
**Reference:** interim-fix commit `f8dccaae`; investigation summary in the session log.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
### Backend (new / modified)
|
||||||
|
|
||||||
|
- **Create:** `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java` — enum `BLUE_GREEN, ROLLING`; `fromWire(String)` with blue-green fallback; `toWire()` → "blue-green" / "rolling".
|
||||||
|
- **Modify:** `cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java` — add `gen` computation, strategy branching, per-strategy START_REPLICAS + HEALTH_CHECK + SWAP_TRAFFIC flows. Rewrite the body of `executeAsync` so stages 4–6 dispatch on strategy. Extract helper methods `deployBlueGreen` and `deployRolling` to keep each path readable.
|
||||||
|
- **Modify:** `cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/TraefikLabelBuilder.java` — take `gen` argument; emit `cameleer.generation` label; `cameleer.instance-id` becomes `{envSlug}-{appSlug}-{replicaIndex}-{gen}`.
|
||||||
|
- **Modify:** `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java` — `containerName` stored on the row becomes `env.slug() + "-" + app.slug()` (unchanged — already just the group-name for DB/operator visibility; real Docker name is computed in the executor).
|
||||||
|
- **Modify:** `cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DeploymentControllerIT.java` — update the single assertion that pins `container_name` format if any (spotted at line ~112 in the investigation).
|
||||||
|
- **Create:** `cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/BlueGreenStrategyIT.java` — two tests: all-replicas-healthy path stops old after new, and partial-healthy aborts preserving old.
|
||||||
|
- **Create:** `cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/RollingStrategyIT.java` — two tests: happy rolling 3→3 replacement, and fail-on-replica-1 preserves remaining old replicas.
|
||||||
|
|
||||||
|
### UI
|
||||||
|
|
||||||
|
- **Modify:** `ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/ResourcesTab.tsx` — confirm the strategy dropdown offers "blue-green" and "rolling" with descriptive labels + a hint line.
|
||||||
|
- **Modify:** `ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/StatusCard.tsx` — surface `deployment.deploymentStrategy` as a small text/badge near the version badge (read-only).
|
||||||
|
|
||||||
|
### Docs + rules
|
||||||
|
|
||||||
|
- **Modify:** `.claude/rules/docker-orchestration.md` — rewrite the "DeploymentExecutor Details" and "Blue/green strategy" sections to describe the new behavior and the `gen` suffix; retire the interim destroy-then-start note.
|
||||||
|
- **Modify:** `.claude/rules/app-classes.md` — update the `DeploymentExecutor` bullet under `runtime/`.
|
||||||
|
- **Modify:** `.claude/rules/core-classes.md` — note new `DeploymentStrategy` enum under `runtime/`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1 — Core: DeploymentStrategy enum + gen utility
|
||||||
|
|
||||||
|
### Task 1.1: DeploymentStrategy enum
|
||||||
|
|
||||||
|
**Files:** Create `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java`.
|
||||||
|
|
||||||
|
- [ ] Create enum with two constants `BLUE_GREEN`, `ROLLING`.
|
||||||
|
- [ ] Add `toWire()` returning `"blue-green"` / `"rolling"`.
|
||||||
|
- [ ] Add `fromWire(String)` — case-insensitive match; unknown or null → `BLUE_GREEN` with no throw (safety fallback). Returns enum, never null.
|
||||||
|
|
||||||
|
**Verification:** unit test covering known + unknown + null inputs.
|
||||||
|
|
||||||
|
### Task 1.2: Generation suffix helper
|
||||||
|
|
||||||
|
- [ ] Decide location — inline static helper on `DeploymentExecutor` is fine (`private static String gen(UUID id) { return id.toString().substring(0,8); }`). No new file needed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 — Executor: gen-suffixed naming + `cameleer.generation` label
|
||||||
|
|
||||||
|
This phase is purely the naming change; no strategy branching yet. After this phase, redeploy still uses the destroy-then-start interim, but containers carry the new names + label.
|
||||||
|
|
||||||
|
### Task 2.1: TraefikLabelBuilder — accept `gen`, emit generation label
|
||||||
|
|
||||||
|
**Files:** Modify `TraefikLabelBuilder.java`.
|
||||||
|
|
||||||
|
- [ ] Add `String gen` as a new arg on `build(...)`.
|
||||||
|
- [ ] Change `instanceId` construction: `envSlug + "-" + appSlug + "-" + replicaIndex + "-" + gen`.
|
||||||
|
- [ ] Add label `cameleer.generation = gen`.
|
||||||
|
- [ ] Leave the Traefik router/service label keys using `svc = envSlug + "-" + appSlug` (unchanged — routing is generation-agnostic so load balancing across old+new works automatically).
|
||||||
|
|
||||||
|
### Task 2.2: DeploymentExecutor — compute gen once, thread through
|
||||||
|
|
||||||
|
**Files:** Modify `DeploymentExecutor.executeAsync`.
|
||||||
|
|
||||||
|
- [ ] At the top of the try block (after `env`, `app`, `config` resolution), compute `String gen = gen(deployment.id());`.
|
||||||
|
- [ ] In the replica loop: `String instanceId = env.slug() + "-" + app.slug() + "-" + i + "-" + gen;` and `String containerName = tenantId + "-" + instanceId;`.
|
||||||
|
- [ ] Pass `gen` to `TraefikLabelBuilder.build(...)`.
|
||||||
|
- [ ] Set `CAMELEER_AGENT_INSTANCEID=instanceId` (already done, just verify the new value propagates).
|
||||||
|
- [ ] Leave `replicaStates[].containerName` stored as the new full name.
|
||||||
|
|
||||||
|
### Task 2.3: Update the one brittle test
|
||||||
|
|
||||||
|
**Files:** Modify `DeploymentControllerIT.java`.
|
||||||
|
|
||||||
|
- [ ] Relax the container-name assertion to `startsWith("default-default-deploy-test-")` or similar — verify behavior, not exact suffix.
|
||||||
|
|
||||||
|
**Verification after Phase 2:**
|
||||||
|
- `mvn -pl cameleer-server-app -am test -Dtest=DeploymentSnapshotIT,DeploymentControllerIT,PostgresDeploymentRepositoryIT`
|
||||||
|
- All green; container names now include gen; redeploy still works via the interim destroy-then-start flow (which will be replaced in Phase 3).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3 — Blue-green strategy (default)
|
||||||
|
|
||||||
|
### Task 3.1: Extract `deployBlueGreen(...)` helper
|
||||||
|
|
||||||
|
**Files:** Modify `DeploymentExecutor.java`.
|
||||||
|
|
||||||
|
- [ ] Move the current START_REPLICAS → HEALTH_CHECK → SWAP_TRAFFIC body into a new `private void deployBlueGreen(...)` method.
|
||||||
|
- [ ] Signature: take `deployment`, `app`, `env`, `config`, `resolvedRuntimeType`, `mainClass`, `gen`, `primaryNetwork`, `additionalNets`.
|
||||||
|
|
||||||
|
### Task 3.2: Reorder for proper blue-green
|
||||||
|
|
||||||
|
- [ ] Remove the pre-flight "stop previous" block added in `f8dccaae` (will be replaced by post-health swap).
|
||||||
|
- [ ] Order: start all new → wait all healthy → find previous active (via `findActiveByAppIdAndEnvironmentIdExcluding`) → stop old containers + mark old row STOPPED.
|
||||||
|
- [ ] Strict all-healthy: if `healthyCount < config.replicas()`, stop the new containers we just started, mark deployment FAILED with `"blue-green: %d/%d replicas healthy; preserving previous deployment"`. Do **not** touch the old deployment.
|
||||||
|
|
||||||
|
### Task 3.3: Wire strategy dispatch
|
||||||
|
|
||||||
|
- [ ] At the point where `deployBlueGreen` is called, check `DeploymentStrategy.fromWire(config.deploymentStrategy())` and dispatch. For this phase, always call `deployBlueGreen`.
|
||||||
|
- [ ] `ROLLING` dispatches to `deployRolling(...)` implemented in Phase 4 (stub it to throw `UnsupportedOperationException` for now — will be replaced before this phase lands).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4 — Rolling strategy
|
||||||
|
|
||||||
|
### Task 4.1: `deployRolling(...)` helper
|
||||||
|
|
||||||
|
**Files:** Modify `DeploymentExecutor.java`.
|
||||||
|
|
||||||
|
- [ ] Same signature as `deployBlueGreen`.
|
||||||
|
- [ ] Look up previous deployment once at entry via `findActiveByAppIdAndEnvironmentIdExcluding`. Capture its `replicaStates` into a map keyed by replica index.
|
||||||
|
- [ ] For `i` from 0 to `config.replicas() - 1`:
|
||||||
|
- [ ] Start new replica `i` (with gen-suffixed name).
|
||||||
|
- [ ] Wait for this single container to go healthy (per-replica `waitForOneHealthy(containerId, timeoutSeconds)`; reuse `healthCheckTimeout` per replica or introduce a smaller per-replica budget).
|
||||||
|
- [ ] On success: stop the corresponding old replica `i` by `containerId` from the previous deployment's replicaStates (if present); log continue.
|
||||||
|
- [ ] On failure: stop + remove all new replicas started so far, mark deployment FAILED with `"rolling: replica %d failed to reach healthy; preserved %d previous replicas"`. Do **not** touch the already-replaced replicas from previous deployment (they're already stopped) or the not-yet-replaced ones (they keep serving).
|
||||||
|
- [ ] After the loop succeeds for all replicas, mark the previous deployment row STOPPED (its containers are all stopped).
|
||||||
|
|
||||||
|
### Task 4.2: Add `waitForOneHealthy`
|
||||||
|
|
||||||
|
- [ ] Variant of `waitForAnyHealthy` that polls a single container id. Returns boolean. Same sleep cadence.
|
||||||
|
|
||||||
|
### Task 4.3: Replace the Phase 3 stub
|
||||||
|
|
||||||
|
- [ ] `ROLLING` dispatch calls `deployRolling` instead of throwing.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5 — Integration tests
|
||||||
|
|
||||||
|
Each IT extends `AbstractPostgresIT`, uses `@MockBean RuntimeOrchestrator`, and overrides `cameleer.server.runtime.healthchecktimeout=2` via `@TestPropertySource`.
|
||||||
|
|
||||||
|
### Task 5.1: BlueGreenStrategyIT
|
||||||
|
|
||||||
|
**Files:** Create `BlueGreenStrategyIT.java`.
|
||||||
|
|
||||||
|
- [ ] **Test 1 `blueGreen_allHealthy_stopsOldAfterNew`:** seed a previous RUNNING deployment (2 replicas). Trigger redeploy with `containerConfig.deploymentStrategy=blue-green` + replicas=2. Mock orchestrator: new containers return `healthy`. Await new deployment RUNNING. Assert: previous deployment has status STOPPED, its container IDs had `stopContainer`+`removeContainer` called; new deployment replicaStates contain the two new container IDs; `cameleer.generation` label on both new container requests.
|
||||||
|
- [ ] **Test 2 `blueGreen_partialHealthy_preservesOldAndMarksFailed`:** seed previous RUNNING (2 replicas). New deploy with replicas=2. Mock: container A healthy, container B starting forever. Await new deployment FAILED. Assert: previous deployment still RUNNING; its container IDs were **not** stopped; new deployment errorMessage contains "1/2 replicas healthy".
|
||||||
|
|
||||||
|
### Task 5.2: RollingStrategyIT
|
||||||
|
|
||||||
|
**Files:** Create `RollingStrategyIT.java`.
|
||||||
|
|
||||||
|
- [ ] **Test 1 `rolling_allHealthy_replacesOneByOne`:** seed previous RUNNING (3 replicas). New deploy with strategy=rolling, replicas=3. Mock: new containers all healthy. Use `ArgumentCaptor` on `startContainer` to observe start order. Assert: start[0] → stop[old0] → start[1] → stop[old1] → start[2] → stop[old2]; new deployment RUNNING with 3 replicaStates; old deployment STOPPED.
|
||||||
|
- [ ] **Test 2 `rolling_failsMidRollout_preservesRemainingOld`:** seed previous RUNNING (3 replicas). New deploy strategy=rolling. Mock: new[0] healthy, new[1] never healthy. Await FAILED. Assert: new[0] was stopped during cleanup; old[0] was stopped (replaced before the failure); old[1] + old[2] still RUNNING; new deployment errorMessage contains "replica 1".
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6 — UI strategy indicator
|
||||||
|
|
||||||
|
### Task 6.1: Strategy dropdown polish
|
||||||
|
|
||||||
|
**Files:** Modify `ResourcesTab.tsx`.
|
||||||
|
|
||||||
|
- [ ] Verify the `<select>` has options `blue-green` and `rolling`.
|
||||||
|
- [ ] Add a one-line description under the dropdown: "Blue-green: start all new, swap when healthy. Rolling: replace one replica at a time."
|
||||||
|
|
||||||
|
### Task 6.2: Strategy on StatusCard
|
||||||
|
|
||||||
|
**Files:** Modify `DeploymentTab/StatusCard.tsx`.
|
||||||
|
|
||||||
|
- [ ] Add a small subtle text line in the grid: `<span>Strategy</span><span>{deployment.deploymentStrategy}</span>` (read-only, mono text ok).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 7 — Docs + rules updates
|
||||||
|
|
||||||
|
### Task 7.1: Update `.claude/rules/docker-orchestration.md`
|
||||||
|
|
||||||
|
- [ ] Replace the "DeploymentExecutor Details" section with the new flow (gen suffix, strategy dispatch, per-strategy ordering).
|
||||||
|
- [ ] Update the "Deployment Status Model" table — `DEGRADED` now means "post-deploy replica crashed"; failed-during-deploy is always `FAILED`.
|
||||||
|
- [ ] Add a short "Deployment Strategies" section: behavior of blue-green vs rolling, resource peak, failure semantics.
|
||||||
|
|
||||||
|
### Task 7.2: Update `.claude/rules/app-classes.md`
|
||||||
|
|
||||||
|
- [ ] Under `runtime/` → `DeploymentExecutor` bullet: add "branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`. Container name format: `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{gen}` where gen = 8-char prefix of deployment UUID."
|
||||||
|
|
||||||
|
### Task 7.3: Update `.claude/rules/core-classes.md`
|
||||||
|
|
||||||
|
- [ ] Add under `runtime/`: `DeploymentStrategy` — enum BLUE_GREEN, ROLLING; `fromWire` falls back to BLUE_GREEN; note stored as kebab-case string on config.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollout sequence
|
||||||
|
|
||||||
|
1. Phase 1 (enum + helper) — trivial, land as one commit.
|
||||||
|
2. Phase 2 (naming + generation label) — one commit; interim destroy-then-start still active; regenerates no OpenAPI (no controller change).
|
||||||
|
3. Phase 3 (blue-green as default) — one commit replacing the interim flow. This is where real behavior changes.
|
||||||
|
4. Phase 4 (rolling) — one commit.
|
||||||
|
5. Phase 5 (4 ITs) — one commit; run `mvn test` against affected modules.
|
||||||
|
6. Phase 6 (UI) — one commit; `npx tsc` clean.
|
||||||
|
7. Phase 7 (docs) — one commit.
|
||||||
|
|
||||||
|
Total: 7 commits, all atomic.
|
||||||
|
|
||||||
|
## Acceptance
|
||||||
|
|
||||||
|
- Existing `DeploymentSnapshotIT` still passes.
|
||||||
|
- New `BlueGreenStrategyIT` (2 tests) and `RollingStrategyIT` (2 tests) pass.
|
||||||
|
- Browser QA: redeploy with `deploymentStrategy=blue-green` vs `rolling` produces the expected container timeline (inspect via `docker ps`); Prometheus metrics show continuity across deploys when queried by `{cameleer_app, cameleer_environment}`; the `cameleer_generation` label flips per deploy.
|
||||||
|
- `.claude/rules/docker-orchestration.md` reflects the new behavior.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Automatic rollback on blue-green partial failure (old is left running; user redeploys).
|
||||||
|
- Automatic rollback on rolling mid-failure (remaining old replicas keep running; user redeploys).
|
||||||
|
- Per-replica `HEALTH_CHECK` stage label in the UI progress bar — the 7-stage progress is reused as-is; strategy dictates internal looping.
|
||||||
|
- Strategy field validation at container-config save time (executor's `fromWire` fallback absorbs unknown values — consider a follow-up for strict validation if it becomes an issue).
|
||||||
@@ -172,15 +172,22 @@ export function ResourcesTab({ value, onChange, disabled, isProd = false }: Prop
|
|||||||
/>
|
/>
|
||||||
|
|
||||||
<span className={styles.configLabel}>Deploy Strategy</span>
|
<span className={styles.configLabel}>Deploy Strategy</span>
|
||||||
<Select
|
<div>
|
||||||
disabled={disabled}
|
<Select
|
||||||
value={value.deployStrategy}
|
disabled={disabled}
|
||||||
onChange={(e) => update('deployStrategy', e.target.value)}
|
value={value.deployStrategy}
|
||||||
options={[
|
onChange={(e) => update('deployStrategy', e.target.value)}
|
||||||
{ value: 'blue-green', label: 'Blue/Green' },
|
options={[
|
||||||
{ value: 'rolling', label: 'Rolling' },
|
{ value: 'blue-green', label: 'Blue/Green' },
|
||||||
]}
|
{ value: 'rolling', label: 'Rolling' },
|
||||||
/>
|
]}
|
||||||
|
/>
|
||||||
|
<span className={styles.configHint}>
|
||||||
|
{value.deployStrategy === 'rolling'
|
||||||
|
? 'Replace one replica at a time; peak = replicas + 1. Partial failure leaves remaining old replicas serving.'
|
||||||
|
: 'Start all new replicas, swap once all are healthy; peak = 2 × replicas. Partial failure preserves the previous deployment.'}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
<span className={styles.configLabel}>Strip Path Prefix</span>
|
<span className={styles.configLabel}>Strip Path Prefix</span>
|
||||||
<div className={styles.configInline}>
|
<div className={styles.configInline}>
|
||||||
|
|||||||
@@ -35,6 +35,7 @@ export function StatusCard({ deployment, version, externalUrl }: Props) {
|
|||||||
{version && <><span>JAR</span><MonoText size="sm">{version.jarFilename}</MonoText></>}
|
{version && <><span>JAR</span><MonoText size="sm">{version.jarFilename}</MonoText></>}
|
||||||
{version && <><span>Checksum</span><MonoText size="xs">{version.jarChecksum.substring(0, 12)}</MonoText></>}
|
{version && <><span>Checksum</span><MonoText size="xs">{version.jarChecksum.substring(0, 12)}</MonoText></>}
|
||||||
<span>Replicas</span><span>{running}/{total}</span>
|
<span>Replicas</span><span>{running}/{total}</span>
|
||||||
|
<span>Strategy</span><span>{deployment.deploymentStrategy ?? '—'}</span>
|
||||||
<span>URL</span>
|
<span>URL</span>
|
||||||
{deployment.status === 'RUNNING'
|
{deployment.status === 'RUNNING'
|
||||||
? <a href={externalUrl} target="_blank" rel="noreferrer"><MonoText size="sm">{externalUrl}</MonoText></a>
|
? <a href={externalUrl} target="_blank" rel="noreferrer"><MonoText size="sm">{externalUrl}</MonoText></a>
|
||||||
|
|||||||
Reference in New Issue
Block a user