docs(rules): deployment strategies + generation suffix

Refresh the three rules files to match the new executor behavior: - docker-orchestration.md: rewrite DeploymentExecutor Details with container naming scheme ({...}-{replica}-{generation}), strategy dispatch (blue-green vs rolling), and the new DEGRADED semantics (post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder bullets for the generation suffix + new cameleer.generation label. - app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets mirror the same. - core-classes.md: add DeploymentStrategy enum; note DEGRADED is now post-deploy-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ui(deploy): strategy hint on Resources tab + indicator on StatusCard
2026-04-23 10:02:51 +02:00 · 2026-04-23 10:00:44 +02:00 · 2026-04-23 10:00:00 +02:00 · 2026-04-23 09:53:52 +02:00 · 2026-04-23 09:51:24 +02:00 · 2026-04-23 09:45:44 +02:00
12 changed files with 1020 additions and 154 deletions
--- a/.claude/rules/app-classes.md
+++ b/.claude/rules/app-classes.md
@@ -118,10 +118,10 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
 ## runtime/ — Docker orchestration
 - `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}` (globally unique on Docker daemon). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}`.
+- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 chars of the deployment UUID — old and new replicas coexist during a blue/green swap. Per-replica `CAMELEER_AGENT_INSTANCEID` env var is `{envSlug}-{appSlug}-{replicaIndex}-{generation}`. Branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`: **blue-green** (default) starts all N → waits for all healthy → stops old (partial health = FAILED, preserves old untouched); **rolling** replaces replicas one at a time with rollback only for in-flight new containers (already-replaced old stay stopped; un-replaced old keep serving). DEGRADED is now only set by `DockerEventMonitor` post-deploy, never by the executor.
 - `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
 - `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
+- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Per-container identity labels: `cameleer.replica` (index), `cameleer.generation` (deployment-scoped 8-char id — for Prometheus/Grafana deploy-boundary annotations), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Router/service label keys are generation-agnostic so load balancing spans old + new replicas during a blue/green overlap.
 - `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
 - `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
 - `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
--- a/.claude/rules/core-classes.md
+++ b/.claude/rules/core-classes.md
@@ -29,8 +29,9 @@ paths:
 - `Environment` — record: id, slug, displayName, production, enabled, defaultContainerConfig, jarRetentionCount, color, createdAt. `color` is one of the 8 preset palette values validated by `EnvironmentColor.VALUES` and CHECK-constrained in PostgreSQL (V2 migration).
 - `EnvironmentColor` — constants: `DEFAULT = "slate"`, `VALUES = {slate,red,amber,green,teal,blue,purple,pink}`, `isValid(String)`.
 - `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
+- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED. `DEGRADED` is reserved for post-deploy drift (a replica died after RUNNING); `DeploymentExecutor` now marks partial-healthy deploys FAILED, not DEGRADED.
 - `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
 - `DeploymentStrategy` — enum: BLUE_GREEN, ROLLING. Stored on `ResolvedContainerConfig.deploymentStrategy` as kebab-case string (`"blue-green"` / `"rolling"`). `fromWire(String)` is the only conversion entry point; unknown/null inputs fall back to BLUE_GREEN so the executor dispatch site never null-checks or throws.
 - `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
 - `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
 - `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
--- a/.claude/rules/docker-orchestration.md
+++ b/.claude/rules/docker-orchestration.md
@@ -13,19 +13,28 @@ paths:
 When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
 - **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
+- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Per-replica identity labels: `cameleer.replica` (index), `cameleer.generation` (8-char deployment UUID prefix — pin Prometheus/Grafana deploy boundaries with this), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Traefik router/service keys deliberately omit the generation so load balancing spans old + new replicas during a blue/green overlap.
 - **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
 - **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
  - `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
  - `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
 - **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
 - **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
+- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level. Instance-id changes per deployment — cross-deploy queries aggregate on `application + environment` (and optionally `replica_index`).
 - **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
 ## DeploymentExecutor Details
-Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
+Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
 **Container naming** — `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 characters of the deployment UUID. The generation suffix lets old + new replicas coexist during a blue/green swap (deterministic names without a generation used to 409). All lookups across the executor, `DockerEventMonitor`, and `ContainerLogForwarder` key on container **id**, not name — the name is operator-visibility only.
 **Strategy dispatch** — `DeploymentStrategy.fromWire(config.deploymentStrategy())` branches the executor. Unknown values fall back to BLUE_GREEN so misconfiguration never throws at runtime.
 - **Blue/green** (default): start all N new replicas → wait for ALL healthy → stop the previous deployment. Resource peak ≈ 2× replicas for the health-check window. Partial health aborts with status FAILED; the previous deployment is preserved untouched (user's safety net).
 - **Rolling**: replace replicas one at a time — start new[i] → wait healthy → stop old[i] → next. Resource peak = replicas + 1. Mid-rollout health failure stops in-flight new containers and aborts; already-replaced old replicas are NOT restored (not reversible) but un-replaced old[i+1..N] keep serving traffic. User redeploys to recover.
 Traffic routing is implicit: Traefik labels (`cameleer.app`, `cameleer.environment`) are generation-agnostic, so new replicas attract load balancing as soon as they come up healthy — no explicit swap step.
 ## Deployment Status Model
@@ -34,15 +43,11 @@ Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNET
 | `STOPPED` | Intentionally stopped or initial state |
 | `STARTING` | Deploy in progress |
 | `RUNNING` | All replicas healthy and serving |
-| `DEGRADED` | Some replicas healthy, some dead |
+| `DEGRADED` | Post-deploy: a replica died after the deploy was marked RUNNING. Set by `DockerEventMonitor` reconciliation, never by `DeploymentExecutor` directly. |
 | `STOPPING` | Graceful shutdown in progress |
-| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
+| `FAILED` | Terminal failure (pre-flight, health check, or crash). Partial-healthy deploys now mark FAILED — DEGRADED is reserved for post-deploy drift. |
-**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
+**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage). Rolling reuses the same stage labels inside the per-replica loop; the UI progress bar shows the most recent stage.
 **Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
 **Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
 **Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java
@@ -89,6 +89,34 @@ public class DeploymentExecutor {
        this.applicationConfigRepository = applicationConfigRepository;
    }
    /** Deployment-scoped id suffix — distinguishes container names and
     * CAMELEER_AGENT_INSTANCEID across redeploys so old + new replicas can
     * coexist during a blue/green swap. First 8 chars of the deployment UUID. */
    static String generationOf(Deployment deployment) {
        return deployment.id().toString().substring(0, 8);
    }
    /**
     * Per-deployment context assembled once at the top of executeAsync and passed
     * into strategy handlers. Keeps the strategy methods readable instead of
     * threading 12 positional args.
     */
    private record DeployCtx(
            Deployment deployment,
            App app,
            Environment env,
            ResolvedContainerConfig config,
            String jarPath,
            String resolvedRuntimeType,
            String mainClass,
            String generation,
            String primaryNetwork,
            List<String> additionalNets,
            Map<String, String> baseEnvVars,
            Map<String, String> prometheusLabels,
            long deployStart
    ) {}
    @Async("deploymentTaskExecutor")
    public void executeAsync(Deployment deployment) {
        long deployStart = System.currentTimeMillis();
@@ -96,6 +124,7 @@ public class DeploymentExecutor {
            App app = appService.getById(deployment.appId());
            Environment env = envService.getById(deployment.environmentId());
            String jarPath = appService.resolveJarPath(deployment.appVersionId());
            String generation = generationOf(deployment);
            var globalDefaults = new ConfigMerger.GlobalRuntimeDefaults(
                    parseMemoryLimitMb(globalMemoryLimit),
@@ -144,7 +173,6 @@ public class DeploymentExecutor {
            updateStage(deployment.id(), DeployStage.CREATE_NETWORK);
            // Primary network: use configured CAMELEER_DOCKER_NETWORK (tenant-isolated in SaaS mode)
            String primaryNetwork = dockerNetwork;
            String envNet = null;
            List<String> additionalNets = new ArrayList<>();
            if (networkManager != null) {
                networkManager.ensureNetwork(primaryNetwork);
@@ -152,7 +180,7 @@ public class DeploymentExecutor {
                networkManager.ensureNetwork(DockerNetworkManager.TRAEFIK_NETWORK);
                additionalNets.add(DockerNetworkManager.TRAEFIK_NETWORK);
                // Per-environment network scoped to tenant to prevent cross-tenant collisions
-                envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
+                String envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
                networkManager.ensureNetwork(envNet);
                additionalNets.add(envNet);
            }
@@ -167,135 +195,21 @@ public class DeploymentExecutor {
                }
            }
-            // === STOP PREVIOUS ACTIVE DEPLOYMENT ===
+            DeployCtx ctx = new DeployCtx(
-            // Container names are deterministic ({tenant}-{env}-{app}-{replica}), so a
+                    deployment, app, env, config, jarPath,
-            // previous active deployment holds the Docker names we need. Stop + remove
+                    resolvedRuntimeType, mainClass, generation,
-            // it before starting new replicas to avoid a 409 name conflict. Excluding
+                    primaryNetwork, additionalNets,
-            // the current deployment id by SQL (not Java) because the newly created
+                    buildEnvVars(app, env, config),
-            // row already has status=STARTING and would otherwise be picked by
+                    PrometheusLabelBuilder.build(resolvedRuntimeType),
-            // findActiveByAppIdAndEnvironmentId ORDER BY created_at DESC LIMIT 1.
+                    deployStart);
-            Optional<Deployment> previous = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
+
-                    deployment.appId(), deployment.environmentId(), deployment.id());
+            // Dispatch on strategy. Unknown values fall back to BLUE_GREEN via fromWire.
-            if (previous.isPresent()) {
+            DeploymentStrategy strategy = DeploymentStrategy.fromWire(config.deploymentStrategy());
-                log.info("Stopping previous deployment {} before starting new replicas", previous.get().id());
+            switch (strategy) {
-                stopDeploymentContainers(previous.get());
+                case BLUE_GREEN -> deployBlueGreen(ctx);
-                deploymentService.markStopped(previous.get().id());
+                case ROLLING -> deployRolling(ctx);
            }
            // === START REPLICAS ===
            updateStage(deployment.id(), DeployStage.START_REPLICAS);
            Map<String, String> baseEnvVars = buildEnvVars(app, env, config);
            Map<String, String> prometheusLabels = PrometheusLabelBuilder.build(resolvedRuntimeType);
            List<Map<String, Object>> replicaStates = new ArrayList<>();
            List<String> newContainerIds = new ArrayList<>();
            for (int i = 0; i < config.replicas(); i++) {
                String instanceId = env.slug() + "-" + app.slug() + "-" + i;
                String containerName = tenantId + "-" + instanceId;
                // Per-replica labels (include replica index and instance-id)
                Map<String, String> labels = TraefikLabelBuilder.build(app.slug(), env.slug(), tenantId, config, i);
                labels.putAll(prometheusLabels);
                // Per-replica env vars (set agent instance ID to match container log identity)
                Map<String, String> replicaEnvVars = new LinkedHashMap<>(baseEnvVars);
                replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
                String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
                ContainerRequest request = new ContainerRequest(
                        containerName, baseImage, jarPath,
                        volumeName, jarStoragePath,
                        primaryNetwork,
                        additionalNets,
                        replicaEnvVars, labels,
                        config.memoryLimitBytes(), config.memoryReserveBytes(),
                        config.dockerCpuShares(), config.dockerCpuQuota(),
                        config.exposedPorts(), agentHealthPort,
                        "on-failure", 3,
                        resolvedRuntimeType, config.customArgs(), mainClass
                );
                String containerId = orchestrator.startContainer(request);
                newContainerIds.add(containerId);
                // Connect to additional networks after container is started
                for (String net : additionalNets) {
                    if (networkManager != null) {
                        networkManager.connectContainer(containerId, net);
                    }
                }
                orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
                replicaStates.add(Map.of(
                        "index", i,
                        "containerId", containerId,
                        "containerName", containerName,
                        "status", "STARTING"
                ));
            }
            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
            // === HEALTH CHECK ===
            updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
            int healthyCount = waitForAnyHealthy(newContainerIds, healthCheckTimeout);
            if (healthyCount == 0) {
                for (String cid : newContainerIds) {
                    try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
                    catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
                }
                pgDeployRepo.updateDeployStage(deployment.id(), null);
                deploymentService.markFailed(deployment.id(), "No replicas passed health check within " + healthCheckTimeout + "s");
                serverMetrics.recordDeploymentOutcome("FAILED");
                serverMetrics.recordDeploymentDuration(deployStart);
                return;
            }
            replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
            // === SWAP TRAFFIC ===
            // Traffic is routed via Traefik Docker labels, so the "swap" happens
            // implicitly once the new replicas are healthy and the old containers
            // are gone. The old deployment was already stopped before START_REPLICAS
            // to free the deterministic container names.
            updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
            // === COMPLETE ===
            updateStage(deployment.id(), DeployStage.COMPLETE);
            // Capture config snapshot before marking RUNNING
            ApplicationConfig agentConfig = applicationConfigRepository
                    .findByApplicationAndEnvironment(app.slug(), env.slug())
                    .orElse(null);
            List<String> snapshotSensitiveKeys = agentConfig != null ? agentConfig.getSensitiveKeys() : null;
            DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
                    deployment.appVersionId(),
                    agentConfig,
                    app.containerConfig(),
                    snapshotSensitiveKeys
            );
            pgDeployRepo.saveDeployedConfigSnapshot(deployment.id(), snapshot);
            String primaryContainerId = newContainerIds.get(0);
            DeploymentStatus finalStatus = healthyCount == config.replicas()
                    ? DeploymentStatus.RUNNING : DeploymentStatus.DEGRADED;
            deploymentService.markRunning(deployment.id(), primaryContainerId);
            if (finalStatus == DeploymentStatus.DEGRADED) {
                deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.DEGRADED,
                        primaryContainerId, null);
            }
            pgDeployRepo.updateDeployStage(deployment.id(), null);
            serverMetrics.recordDeploymentOutcome(finalStatus.name());
            serverMetrics.recordDeploymentDuration(deployStart);
            log.info("Deployment {} is {} ({}/{} replicas healthy)",
                    deployment.id(), finalStatus, healthyCount, config.replicas());
        } catch (Exception e) {
            log.error("Deployment {} FAILED: {}", deployment.id(), e.getMessage(), e);
            pgDeployRepo.updateDeployStage(deployment.id(), null);
@@ -305,6 +219,262 @@ public class DeploymentExecutor {
        }
    }
    /**
     * Blue/green strategy: start all N new replicas (coexisting with the old
     * ones thanks to the gen-suffixed container names), wait for ALL healthy,
     * then stop the previous deployment. Strict all-healthy — partial failure
     * preserves the previous deployment untouched.
     */
    private void deployBlueGreen(DeployCtx ctx) {
        ResolvedContainerConfig config = ctx.config();
        Deployment deployment = ctx.deployment();
        // === START REPLICAS ===
        updateStage(deployment.id(), DeployStage.START_REPLICAS);
        List<Map<String, Object>> replicaStates = new ArrayList<>();
        List<String> newContainerIds = new ArrayList<>();
        for (int i = 0; i < config.replicas(); i++) {
            Map<String, Object> state = new LinkedHashMap<>();
            String containerId = startReplica(ctx, i, state);
            newContainerIds.add(containerId);
            replicaStates.add(state);
        }
        pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
        // === HEALTH CHECK ===
        updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
        int healthyCount = waitForAllHealthy(newContainerIds, healthCheckTimeout);
        if (healthyCount < config.replicas()) {
            // Strict abort: tear down new replicas, leave the previous deployment untouched.
            for (String cid : newContainerIds) {
                try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
                catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
            }
            pgDeployRepo.updateDeployStage(deployment.id(), null);
            String reason = String.format(
                    "blue-green: %d/%d replicas healthy within %ds; preserving previous deployment",
                    healthyCount, config.replicas(), healthCheckTimeout);
            deploymentService.markFailed(deployment.id(), reason);
            serverMetrics.recordDeploymentOutcome("FAILED");
            serverMetrics.recordDeploymentDuration(ctx.deployStart());
            return;
        }
        replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
        pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
        // === SWAP TRAFFIC ===
        // All new replicas are healthy; Traefik labels are already attracting
        // traffic to them. Stop the previous deployment now — the swap is
        // implicit in the label-driven load balancer.
        updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
        Optional<Deployment> previous = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
                deployment.appId(), deployment.environmentId(), deployment.id());
        if (previous.isPresent()) {
            log.info("blue-green: stopping previous deployment {} now that new replicas are healthy",
                    previous.get().id());
            stopDeploymentContainers(previous.get());
            deploymentService.markStopped(previous.get().id());
        }
        // === COMPLETE ===
        updateStage(deployment.id(), DeployStage.COMPLETE);
        persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
        log.info("Deployment {} is RUNNING (blue-green, {}/{} replicas healthy)",
                deployment.id(), healthyCount, config.replicas());
    }
    /**
     * Rolling strategy: replace replicas one at a time — start new[i], wait
     * healthy, stop old[i]. On any replica's health failure, stop the
     * in-flight new container, leave remaining old replicas serving, mark
     * FAILED. Already-replaced old containers are not restored (can't unring
     * that bell) — user redeploys to recover.
     *
     * Resource peak: replicas + 1 (briefly while a new replica warms up
     * before its counterpart is stopped).
     */
    private void deployRolling(DeployCtx ctx) {
        ResolvedContainerConfig config = ctx.config();
        Deployment deployment = ctx.deployment();
        // Capture previous deployment's per-index container ids up front.
        Optional<Deployment> previousOpt = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
                deployment.appId(), deployment.environmentId(), deployment.id());
        Map<Integer, String> oldContainerByIndex = new LinkedHashMap<>();
        if (previousOpt.isPresent() && previousOpt.get().replicaStates() != null) {
            for (Map<String, Object> r : previousOpt.get().replicaStates()) {
                Object idx = r.get("index");
                Object cid = r.get("containerId");
                if (idx instanceof Number n && cid instanceof String s) {
                    oldContainerByIndex.put(n.intValue(), s);
                }
            }
        }
        // === START REPLICAS ===
        updateStage(deployment.id(), DeployStage.START_REPLICAS);
        List<Map<String, Object>> replicaStates = new ArrayList<>();
        List<String> newContainerIds = new ArrayList<>();
        for (int i = 0; i < config.replicas(); i++) {
            // Start new replica i (gen-suffixed name; coexists with old[i]).
            Map<String, Object> state = new LinkedHashMap<>();
            String newCid = startReplica(ctx, i, state);
            newContainerIds.add(newCid);
            replicaStates.add(state);
            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
            // === HEALTH CHECK (per-replica) ===
            updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
            boolean healthy = waitForOneHealthy(newCid, healthCheckTimeout);
            if (!healthy) {
                // Abort: stop this in-flight new replica AND any new replicas
                // started so far. Already-stopped old replicas stay stopped
                // (rolling is not reversible). Remaining un-replaced old
                // replicas keep serving traffic.
                for (String cid : newContainerIds) {
                    try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
                    catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
                }
                pgDeployRepo.updateDeployStage(deployment.id(), null);
                String reason = String.format(
                        "rolling: replica %d failed to reach healthy within %ds; %d previous replicas still running",
                        i, healthCheckTimeout, oldContainerByIndex.size());
                deploymentService.markFailed(deployment.id(), reason);
                serverMetrics.recordDeploymentOutcome("FAILED");
                serverMetrics.recordDeploymentDuration(ctx.deployStart());
                return;
            }
            // Health check passed: update replica status to RUNNING, stop the
            // corresponding old[i] if present, and continue with replica i+1.
            replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
            String oldCid = oldContainerByIndex.remove(i);
            if (oldCid != null) {
                try {
                    orchestrator.stopContainer(oldCid);
                    orchestrator.removeContainer(oldCid);
                    log.info("rolling: replaced replica {} (old={}, new={})", i, oldCid, newCid);
                } catch (Exception e) {
                    log.warn("rolling: failed to stop old replica {} ({}): {}", i, oldCid, e.getMessage());
                }
            }
        }
        // === SWAP TRAFFIC ===
        // Any old replicas with indices >= new.replicas (e.g., when replica
        // count shrank) are still running; sweep them now so the old
        // deployment can be marked STOPPED.
        updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
        for (Map.Entry<Integer, String> e : oldContainerByIndex.entrySet()) {
            try {
                orchestrator.stopContainer(e.getValue());
                orchestrator.removeContainer(e.getValue());
                log.info("rolling: stopped leftover old replica {} ({})", e.getKey(), e.getValue());
            } catch (Exception ex) {
                log.warn("rolling: failed to stop leftover old replica {}: {}", e.getKey(), ex.getMessage());
            }
        }
        if (previousOpt.isPresent()) {
            deploymentService.markStopped(previousOpt.get().id());
        }
        // === COMPLETE ===
        updateStage(deployment.id(), DeployStage.COMPLETE);
        persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
        log.info("Deployment {} is RUNNING (rolling, {}/{} replicas replaced)",
                deployment.id(), config.replicas(), config.replicas());
    }
    /** Poll a single container until healthy or the timeout expires. Returns
     * true on healthy, false on timeout or thread interrupt. */
    private boolean waitForOneHealthy(String containerId, int timeoutSeconds) {
        long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
        while (System.currentTimeMillis() < deadline) {
            ContainerStatus status = orchestrator.getContainerStatus(containerId);
            if ("healthy".equals(status.state())) return true;
            try { Thread.sleep(2000); } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                return false;
            }
        }
        return false;
    }
    /** Start one replica container with the gen-suffixed name and return its
     * container id. Fills `stateOut` with the replicaStates JSONB row. */
    private String startReplica(DeployCtx ctx, int i, Map<String, Object> stateOut) {
        Environment env = ctx.env();
        App app = ctx.app();
        ResolvedContainerConfig config = ctx.config();
        String instanceId = env.slug() + "-" + app.slug() + "-" + i + "-" + ctx.generation();
        String containerName = tenantId + "-" + instanceId;
        Map<String, String> labels = TraefikLabelBuilder.build(
                app.slug(), env.slug(), tenantId, config, i, ctx.generation());
        labels.putAll(ctx.prometheusLabels());
        Map<String, String> replicaEnvVars = new LinkedHashMap<>(ctx.baseEnvVars());
        replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
        String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
        ContainerRequest request = new ContainerRequest(
                containerName, baseImage, ctx.jarPath(),
                volumeName, jarStoragePath,
                ctx.primaryNetwork(),
                ctx.additionalNets(),
                replicaEnvVars, labels,
                config.memoryLimitBytes(), config.memoryReserveBytes(),
                config.dockerCpuShares(), config.dockerCpuQuota(),
                config.exposedPorts(), agentHealthPort,
                "on-failure", 3,
                ctx.resolvedRuntimeType(), config.customArgs(), ctx.mainClass()
        );
        String containerId = orchestrator.startContainer(request);
        // Connect to additional networks after container is started
        for (String net : ctx.additionalNets()) {
            if (networkManager != null) {
                networkManager.connectContainer(containerId, net);
            }
        }
        orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
        stateOut.put("index", i);
        stateOut.put("containerId", containerId);
        stateOut.put("containerName", containerName);
        stateOut.put("status", "STARTING");
        return containerId;
    }
    /** Persist the deployment snapshot and mark the deployment RUNNING.
     * Finalizes the deploy in a single place shared by all strategy paths. */
    private void persistSnapshotAndMarkRunning(DeployCtx ctx, String primaryContainerId) {
        Deployment deployment = ctx.deployment();
        ApplicationConfig agentConfig = applicationConfigRepository
                .findByApplicationAndEnvironment(ctx.app().slug(), ctx.env().slug())
                .orElse(null);
        List<String> snapshotSensitiveKeys = agentConfig != null ? agentConfig.getSensitiveKeys() : null;
        DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
                deployment.appVersionId(),
                agentConfig,
                ctx.app().containerConfig(),
                snapshotSensitiveKeys);
        pgDeployRepo.saveDeployedConfigSnapshot(deployment.id(), snapshot);
        deploymentService.markRunning(deployment.id(), primaryContainerId);
        pgDeployRepo.updateDeployStage(deployment.id(), null);
        serverMetrics.recordDeploymentOutcome("RUNNING");
        serverMetrics.recordDeploymentDuration(ctx.deployStart());
    }
    public void stopDeployment(Deployment deployment) {
        pgDeployRepo.updateTargetState(deployment.id(), "STOPPED");
        deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.STOPPING,
@@ -370,7 +540,10 @@ public class DeploymentExecutor {
        return envVars;
    }
-    private int waitForAnyHealthy(List<String> containerIds, int timeoutSeconds) {
+    /** Poll until all containers are healthy or the timeout expires. Returns
     * the healthy count at return time — == ids.size() on full success, less
     * if the timeout won. */
    private int waitForAllHealthy(List<String> containerIds, int timeoutSeconds) {
        long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
        int lastHealthy = 0;
        while (System.currentTimeMillis() < deadline) {
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/TraefikLabelBuilder.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/TraefikLabelBuilder.java
@@ -10,9 +10,13 @@ public final class TraefikLabelBuilder {
    private TraefikLabelBuilder() {}
    public static Map<String, String> build(String appSlug, String envSlug, String tenantId,
-                                              ResolvedContainerConfig config, int replicaIndex) {
+                                              ResolvedContainerConfig config, int replicaIndex,
                                              String generation) {
        // Traefik router/service keys stay generation-agnostic so load balancing
        // spans old + new replicas during a blue/green overlap. instance-id and
        // the new generation label carry the per-deploy identity.
        String svc = envSlug + "-" + appSlug;
-        String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex;
+        String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex + "-" + generation;
        Map<String, String> labels = new LinkedHashMap<>();
        labels.put("traefik.enable", "true");
@@ -21,6 +25,7 @@ public final class TraefikLabelBuilder {
        labels.put("cameleer.app", appSlug);
        labels.put("cameleer.environment", envSlug);
        labels.put("cameleer.replica", String.valueOf(replicaIndex));
        labels.put("cameleer.generation", generation);
        labels.put("cameleer.instance-id", instanceId);
        labels.put("traefik.http.services." + svc + ".loadbalancer.server.port",
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/BlueGreenStrategyIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/BlueGreenStrategyIT.java
@@ -0,0 +1,190 @@
 package com.cameleer.server.app.runtime;
 import com.cameleer.server.app.AbstractPostgresIT;
 import com.cameleer.server.app.TestSecurityHelper;
 import com.cameleer.server.app.storage.PostgresDeploymentRepository;
 import com.cameleer.server.core.runtime.ContainerStatus;
 import com.cameleer.server.core.runtime.Deployment;
 import com.cameleer.server.core.runtime.DeploymentStatus;
 import com.cameleer.server.core.runtime.RuntimeOrchestrator;
 import com.fasterxml.jackson.databind.JsonNode;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.boot.test.mock.mockito.MockBean;
 import org.springframework.boot.test.web.client.TestRestTemplate;
 import org.springframework.core.io.ByteArrayResource;
 import org.springframework.http.HttpEntity;
 import org.springframework.http.HttpHeaders;
 import org.springframework.http.HttpMethod;
 import org.springframework.http.MediaType;
 import org.springframework.test.context.TestPropertySource;
 import org.springframework.util.LinkedMultiValueMap;
 import org.springframework.util.MultiValueMap;
 import java.util.UUID;
 import java.util.concurrent.TimeUnit;
 import static org.assertj.core.api.Assertions.assertThat;
 import static org.awaitility.Awaitility.await;
 import static org.mockito.ArgumentMatchers.any;
 import static org.mockito.Mockito.never;
 import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 /**
 * Verifies the blue-green deployment strategy: start all new → health-check
 * all → stop old. Strict all-healthy — partial failure preserves the previous
 * deployment untouched.
 */
@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
 class BlueGreenStrategyIT extends AbstractPostgresIT {
    @MockBean
    RuntimeOrchestrator runtimeOrchestrator;
    @Autowired private TestRestTemplate restTemplate;
    @Autowired private ObjectMapper objectMapper;
    @Autowired private TestSecurityHelper securityHelper;
    @Autowired private PostgresDeploymentRepository deploymentRepository;
    private String operatorJwt;
    private String appSlug;
    private String versionId;
    @BeforeEach
    void setUp() throws Exception {
        operatorJwt = securityHelper.operatorToken();
        jdbcTemplate.update("DELETE FROM deployments");
        jdbcTemplate.update("DELETE FROM app_versions");
        jdbcTemplate.update("DELETE FROM apps");
        jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
        appSlug = "bg-" + UUID.randomUUID().toString().substring(0, 8);
        post("/api/v1/environments/default/apps", String.format("""
                {"slug": "%s", "displayName": "BG App"}
                """, appSlug), operatorJwt);
        put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
                {"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "blue-green"}
                """, operatorJwt);
        versionId = uploadJar(appSlug, ("bg-jar-" + appSlug).getBytes());
    }
    @Test
    void blueGreen_allHealthy_stopsOldAfterNew() throws Exception {
        when(runtimeOrchestrator.startContainer(any()))
                .thenReturn("old-0", "old-1", "new-0", "new-1");
        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
        String firstDeployId = triggerDeploy();
        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
        String secondDeployId = triggerDeploy();
        awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
        // Previous deployment was stopped once new was healthy
        Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
        assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
        verify(runtimeOrchestrator).stopContainer("old-0");
        verify(runtimeOrchestrator).stopContainer("old-1");
        verify(runtimeOrchestrator, never()).stopContainer("new-0");
        verify(runtimeOrchestrator, never()).stopContainer("new-1");
        // New deployment has both new replicas recorded
        Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
        assertThat(second.replicaStates()).hasSize(2);
    }
    @Test
    void blueGreen_partialHealthy_preservesOldAndMarksFailed() throws Exception {
        when(runtimeOrchestrator.startContainer(any()))
                .thenReturn("old-0", "old-1", "new-0", "new-1");
        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
        ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
        String firstDeployId = triggerDeploy();
        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
        String secondDeployId = triggerDeploy();
        awaitStatus(secondDeployId, DeploymentStatus.FAILED);
        Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
        assertThat(second.errorMessage())
                .contains("blue-green")
                .contains("1/2");
        // Previous deployment stays RUNNING — blue-green's safety promise.
        Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
        assertThat(first.status()).isEqualTo(DeploymentStatus.RUNNING);
        verify(runtimeOrchestrator, never()).stopContainer("old-0");
        verify(runtimeOrchestrator, never()).stopContainer("old-1");
        // Cleanup ran on both new replicas.
        verify(runtimeOrchestrator).stopContainer("new-0");
        verify(runtimeOrchestrator).stopContainer("new-1");
    }
    // ---- helpers ----
    private String triggerDeploy() throws Exception {
        JsonNode deployResponse = post(
                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
                String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
        return deployResponse.path("id").asText();
    }
    private void awaitStatus(String deployId, DeploymentStatus expected) {
        await().atMost(30, TimeUnit.SECONDS)
                .pollInterval(500, TimeUnit.MILLISECONDS)
                .untilAsserted(() -> {
                    Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
                            .orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
                    assertThat(d.status()).isEqualTo(expected);
                });
    }
    private JsonNode post(String path, String json, String jwt) throws Exception {
        HttpHeaders headers = securityHelper.authHeaders(jwt);
        var response = restTemplate.exchange(path, HttpMethod.POST,
                new HttpEntity<>(json, headers), String.class);
        return objectMapper.readTree(response.getBody());
    }
    private void put(String path, String json, String jwt) {
        HttpHeaders headers = securityHelper.authHeaders(jwt);
        restTemplate.exchange(path, HttpMethod.PUT,
                new HttpEntity<>(json, headers), String.class);
    }
    private String uploadJar(String appSlug, byte[] content) throws Exception {
        ByteArrayResource resource = new ByteArrayResource(content) {
            @Override public String getFilename() { return "app.jar"; }
        };
        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
        body.add("file", resource);
        HttpHeaders headers = new HttpHeaders();
        headers.set("Authorization", "Bearer " + operatorJwt);
        headers.set("X-Cameleer-Protocol-Version", "1");
        headers.setContentType(MediaType.MULTIPART_FORM_DATA);
        var response = restTemplate.exchange(
                "/api/v1/environments/default/apps/" + appSlug + "/versions",
                HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
        JsonNode versionNode = objectMapper.readTree(response.getBody());
        return versionNode.path("id").asText();
    }
 }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/RollingStrategyIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/RollingStrategyIT.java
@@ -0,0 +1,194 @@
 package com.cameleer.server.app.runtime;
 import com.cameleer.server.app.AbstractPostgresIT;
 import com.cameleer.server.app.TestSecurityHelper;
 import com.cameleer.server.app.storage.PostgresDeploymentRepository;
 import com.cameleer.server.core.runtime.ContainerStatus;
 import com.cameleer.server.core.runtime.Deployment;
 import com.cameleer.server.core.runtime.DeploymentStatus;
 import com.cameleer.server.core.runtime.RuntimeOrchestrator;
 import com.fasterxml.jackson.databind.JsonNode;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.mockito.InOrder;
 import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.boot.test.mock.mockito.MockBean;
 import org.springframework.boot.test.web.client.TestRestTemplate;
 import org.springframework.core.io.ByteArrayResource;
 import org.springframework.http.HttpEntity;
 import org.springframework.http.HttpHeaders;
 import org.springframework.http.HttpMethod;
 import org.springframework.http.MediaType;
 import org.springframework.test.context.TestPropertySource;
 import org.springframework.util.LinkedMultiValueMap;
 import org.springframework.util.MultiValueMap;
 import java.util.UUID;
 import java.util.concurrent.TimeUnit;
 import static org.assertj.core.api.Assertions.assertThat;
 import static org.awaitility.Awaitility.await;
 import static org.mockito.ArgumentMatchers.any;
 import static org.mockito.Mockito.inOrder;
 import static org.mockito.Mockito.never;
 import static org.mockito.Mockito.times;
 import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 /**
 * Verifies the rolling deployment strategy: per-replica start → health → stop
 * old. Mid-rollout health failure preserves remaining un-replaced old replicas;
 * already-stopped old replicas are not restored.
 */
@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
 class RollingStrategyIT extends AbstractPostgresIT {
    @MockBean
    RuntimeOrchestrator runtimeOrchestrator;
    @Autowired private TestRestTemplate restTemplate;
    @Autowired private ObjectMapper objectMapper;
    @Autowired private TestSecurityHelper securityHelper;
    @Autowired private PostgresDeploymentRepository deploymentRepository;
    private String operatorJwt;
    private String appSlug;
    private String versionId;
    @BeforeEach
    void setUp() throws Exception {
        operatorJwt = securityHelper.operatorToken();
        jdbcTemplate.update("DELETE FROM deployments");
        jdbcTemplate.update("DELETE FROM app_versions");
        jdbcTemplate.update("DELETE FROM apps");
        jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
        appSlug = "roll-" + UUID.randomUUID().toString().substring(0, 8);
        post("/api/v1/environments/default/apps", String.format("""
                {"slug": "%s", "displayName": "Rolling App"}
                """, appSlug), operatorJwt);
        put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
                {"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "rolling"}
                """, operatorJwt);
        versionId = uploadJar(appSlug, ("roll-jar-" + appSlug).getBytes());
    }
    @Test
    void rolling_allHealthy_replacesOneByOne() throws Exception {
        when(runtimeOrchestrator.startContainer(any()))
                .thenReturn("old-0", "old-1", "new-0", "new-1");
        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
        String firstDeployId = triggerDeploy();
        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
        String secondDeployId = triggerDeploy();
        awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
        // Rolling invariant: old-0 is stopped BEFORE old-1 (replicas replaced
        // one at a time, not all at once). Checking stop order is sufficient —
        // a blue-green path would have both stops adjacent at the end with no
        // interleaved starts; rolling interleaves starts between stops.
        InOrder inOrder = inOrder(runtimeOrchestrator);
        inOrder.verify(runtimeOrchestrator).stopContainer("old-0");
        inOrder.verify(runtimeOrchestrator).stopContainer("old-1");
        // Total of 4 startContainer calls: 2 for first deploy, 2 for rolling.
        verify(runtimeOrchestrator, times(4)).startContainer(any());
        // New replicas were not stopped — they're the running ones now.
        verify(runtimeOrchestrator, never()).stopContainer("new-0");
        verify(runtimeOrchestrator, never()).stopContainer("new-1");
        Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
        assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
    }
    @Test
    void rolling_failsMidRollout_preservesRemainingOld() throws Exception {
        when(runtimeOrchestrator.startContainer(any()))
                .thenReturn("old-0", "old-1", "new-0", "new-1");
        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
        ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
        String firstDeployId = triggerDeploy();
        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
        String secondDeployId = triggerDeploy();
        awaitStatus(secondDeployId, DeploymentStatus.FAILED);
        Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
        assertThat(second.errorMessage())
                .contains("rolling")
                .contains("replica 1");
        // old-0 was replaced before the failure; old-1 was never touched.
        verify(runtimeOrchestrator).stopContainer("old-0");
        verify(runtimeOrchestrator, never()).stopContainer("old-1");
        // Cleanup stops both new replicas started so far.
        verify(runtimeOrchestrator).stopContainer("new-0");
        verify(runtimeOrchestrator).stopContainer("new-1");
    }
    // ---- helpers (same pattern as BlueGreenStrategyIT) ----
    private String triggerDeploy() throws Exception {
        JsonNode deployResponse = post(
                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
                String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
        return deployResponse.path("id").asText();
    }
    private void awaitStatus(String deployId, DeploymentStatus expected) {
        await().atMost(30, TimeUnit.SECONDS)
                .pollInterval(500, TimeUnit.MILLISECONDS)
                .untilAsserted(() -> {
                    Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
                            .orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
                    assertThat(d.status()).isEqualTo(expected);
                });
    }
    private JsonNode post(String path, String json, String jwt) throws Exception {
        HttpHeaders headers = securityHelper.authHeaders(jwt);
        var response = restTemplate.exchange(path, HttpMethod.POST,
                new HttpEntity<>(json, headers), String.class);
        return objectMapper.readTree(response.getBody());
    }
    private void put(String path, String json, String jwt) {
        HttpHeaders headers = securityHelper.authHeaders(jwt);
        restTemplate.exchange(path, HttpMethod.PUT,
                new HttpEntity<>(json, headers), String.class);
    }
    private String uploadJar(String appSlug, byte[] content) throws Exception {
        ByteArrayResource resource = new ByteArrayResource(content) {
            @Override public String getFilename() { return "app.jar"; }
        };
        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
        body.add("file", resource);
        HttpHeaders headers = new HttpHeaders();
        headers.set("Authorization", "Bearer " + operatorJwt);
        headers.set("X-Cameleer-Protocol-Version", "1");
        headers.setContentType(MediaType.MULTIPART_FORM_DATA);
        var response = restTemplate.exchange(
                "/api/v1/environments/default/apps/" + appSlug + "/versions",
                HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
        JsonNode versionNode = objectMapper.readTree(response.getBody());
        return versionNode.path("id").asText();
    }
 }
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java
@@ -0,0 +1,31 @@
 package com.cameleer.server.core.runtime;
 /**
 * Supported deployment strategies. Persisted as a kebab-case string on
 * ApplicationConfig / ResolvedContainerConfig; {@link #fromWire(String)} is
 * the only conversion entry point and falls back to {@link #BLUE_GREEN} for
 * unknown or null input so the executor never has to null-check.
 */
 public enum DeploymentStrategy {
    BLUE_GREEN("blue-green"),
    ROLLING("rolling");
    private final String wire;
    DeploymentStrategy(String wire) {
        this.wire = wire;
    }
    public String toWire() {
        return wire;
    }
    public static DeploymentStrategy fromWire(String value) {
        if (value == null) return BLUE_GREEN;
        String normalized = value.trim().toLowerCase();
        for (DeploymentStrategy s : values()) {
            if (s.wire.equals(normalized)) return s;
        }
        return BLUE_GREEN;
    }
 }
--- a/cameleer-server-core/src/test/java/com/cameleer/server/core/runtime/DeploymentStrategyTest.java
+++ b/cameleer-server-core/src/test/java/com/cameleer/server/core/runtime/DeploymentStrategyTest.java
@@ -0,0 +1,34 @@
 package com.cameleer.server.core.runtime;
 import org.junit.jupiter.api.Test;
 import static org.assertj.core.api.Assertions.assertThat;
 class DeploymentStrategyTest {
    @Test
    void fromWire_knownValues() {
        assertThat(DeploymentStrategy.fromWire("blue-green")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
        assertThat(DeploymentStrategy.fromWire("rolling")).isEqualTo(DeploymentStrategy.ROLLING);
    }
    @Test
    void fromWire_caseInsensitiveAndTrims() {
        assertThat(DeploymentStrategy.fromWire("BLUE-GREEN")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
        assertThat(DeploymentStrategy.fromWire("  Rolling  ")).isEqualTo(DeploymentStrategy.ROLLING);
    }
    @Test
    void fromWire_unknownOrNullFallsBackToBlueGreen() {
        assertThat(DeploymentStrategy.fromWire(null)).isEqualTo(DeploymentStrategy.BLUE_GREEN);
        assertThat(DeploymentStrategy.fromWire("")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
        assertThat(DeploymentStrategy.fromWire("canary")).isEqualTo(DeploymentStrategy.BLUE_GREEN);
    }
    @Test
    void toWire_roundTrips() {
        for (DeploymentStrategy s : DeploymentStrategy.values()) {
            assertThat(DeploymentStrategy.fromWire(s.toWire())).isEqualTo(s);
        }
    }
 }
--- a/docs/superpowers/plans/2026-04-23-deployment-strategies.md
+++ b/docs/superpowers/plans/2026-04-23-deployment-strategies.md
@@ -0,0 +1,225 @@
 # Deployment Strategies (blue-green + rolling) — Implementation Plan
 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans. Steps use checkbox (`- [ ]`) syntax for tracking.
 **Goal:** Make `deploymentStrategy` actually affect runtime behavior. Support **blue-green** (all-at-once, default) and **rolling** (per-replica) deployments with correct semantics. Unblock real blue/green by giving each deployment a unique container-name generation suffix so old + new replicas can coexist during the swap.
 **Current state (interim fix landed in `f8dccaae`):** strategy field exists but executor doesn't branch on it; a destroy-then-start flow runs regardless. This plan replaces that interim behavior.
 **Architecture:**
 - Append an 8-char **`gen`** suffix (first 8 chars of `deployment.id`) to container name AND `CAMELEER_AGENT_INSTANCEID`. Unique per deployment; no new DB state.
 - Add a `cameleer.generation` Docker label so Grafana/Prometheus can pin deploy boundaries without regex on instance-id.
 - Branch `DeploymentExecutor.executeAsync` on strategy:
  - **blue-green**: start all N new → health-check all → stop all old. Strict all-healthy: partial = FAILED (old stays running).
  - **rolling**: per-replica loop: start new[i] → health-check → stop old[i] → next. Mid-rollout failure → stop failed new[i], leave remaining old[i..n] running, mark FAILED.
 - Keep destroy-then-start as the fallback for unknown strategy values (safety net).
 **Reference:** interim-fix commit `f8dccaae`; investigation summary in the session log.
 ---
 ## File Structure
 ### Backend (new / modified)
 - **Create:** `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java` — enum `BLUE_GREEN, ROLLING`; `fromWire(String)` with blue-green fallback; `toWire()` → "blue-green" / "rolling".
 - **Modify:** `cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java` — add `gen` computation, strategy branching, per-strategy START_REPLICAS + HEALTH_CHECK + SWAP_TRAFFIC flows. Rewrite the body of `executeAsync` so stages 4–6 dispatch on strategy. Extract helper methods `deployBlueGreen` and `deployRolling` to keep each path readable.
 - **Modify:** `cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/TraefikLabelBuilder.java` — take `gen` argument; emit `cameleer.generation` label; `cameleer.instance-id` becomes `{envSlug}-{appSlug}-{replicaIndex}-{gen}`.
 - **Modify:** `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java` — `containerName` stored on the row becomes `env.slug() + "-" + app.slug()` (unchanged — already just the group-name for DB/operator visibility; real Docker name is computed in the executor).
 - **Modify:** `cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DeploymentControllerIT.java` — update the single assertion that pins `container_name` format if any (spotted at line ~112 in the investigation).
 - **Create:** `cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/BlueGreenStrategyIT.java` — two tests: all-replicas-healthy path stops old after new, and partial-healthy aborts preserving old.
 - **Create:** `cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/RollingStrategyIT.java` — two tests: happy rolling 3→3 replacement, and fail-on-replica-1 preserves remaining old replicas.
 ### UI
 - **Modify:** `ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/ResourcesTab.tsx` — confirm the strategy dropdown offers "blue-green" and "rolling" with descriptive labels + a hint line.
 - **Modify:** `ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/StatusCard.tsx` — surface `deployment.deploymentStrategy` as a small text/badge near the version badge (read-only).
 ### Docs + rules
 - **Modify:** `.claude/rules/docker-orchestration.md` — rewrite the "DeploymentExecutor Details" and "Blue/green strategy" sections to describe the new behavior and the `gen` suffix; retire the interim destroy-then-start note.
 - **Modify:** `.claude/rules/app-classes.md` — update the `DeploymentExecutor` bullet under `runtime/`.
 - **Modify:** `.claude/rules/core-classes.md` — note new `DeploymentStrategy` enum under `runtime/`.
 ---
 ## Phase 1 — Core: DeploymentStrategy enum + gen utility
 ### Task 1.1: DeploymentStrategy enum
 **Files:** Create `cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java`.
 - [ ] Create enum with two constants `BLUE_GREEN`, `ROLLING`.
 - [ ] Add `toWire()` returning `"blue-green"` / `"rolling"`.
 - [ ] Add `fromWire(String)` — case-insensitive match; unknown or null → `BLUE_GREEN` with no throw (safety fallback). Returns enum, never null.
 **Verification:** unit test covering known + unknown + null inputs.
 ### Task 1.2: Generation suffix helper
 - [ ] Decide location — inline static helper on `DeploymentExecutor` is fine (`private static String gen(UUID id) { return id.toString().substring(0,8); }`). No new file needed.
 ---
 ## Phase 2 — Executor: gen-suffixed naming + `cameleer.generation` label
 This phase is purely the naming change; no strategy branching yet. After this phase, redeploy still uses the destroy-then-start interim, but containers carry the new names + label.
 ### Task 2.1: TraefikLabelBuilder — accept `gen`, emit generation label
 **Files:** Modify `TraefikLabelBuilder.java`.
 - [ ] Add `String gen` as a new arg on `build(...)`.
 - [ ] Change `instanceId` construction: `envSlug + "-" + appSlug + "-" + replicaIndex + "-" + gen`.
 - [ ] Add label `cameleer.generation = gen`.
 - [ ] Leave the Traefik router/service label keys using `svc = envSlug + "-" + appSlug` (unchanged — routing is generation-agnostic so load balancing across old+new works automatically).
 ### Task 2.2: DeploymentExecutor — compute gen once, thread through
 **Files:** Modify `DeploymentExecutor.executeAsync`.
 - [ ] At the top of the try block (after `env`, `app`, `config` resolution), compute `String gen = gen(deployment.id());`.
 - [ ] In the replica loop: `String instanceId = env.slug() + "-" + app.slug() + "-" + i + "-" + gen;` and `String containerName = tenantId + "-" + instanceId;`.
 - [ ] Pass `gen` to `TraefikLabelBuilder.build(...)`.
 - [ ] Set `CAMELEER_AGENT_INSTANCEID=instanceId` (already done, just verify the new value propagates).
 - [ ] Leave `replicaStates[].containerName` stored as the new full name.
 ### Task 2.3: Update the one brittle test
 **Files:** Modify `DeploymentControllerIT.java`.
 - [ ] Relax the container-name assertion to `startsWith("default-default-deploy-test-")` or similar — verify behavior, not exact suffix.
 **Verification after Phase 2:**
 - `mvn -pl cameleer-server-app -am test -Dtest=DeploymentSnapshotIT,DeploymentControllerIT,PostgresDeploymentRepositoryIT`
 - All green; container names now include gen; redeploy still works via the interim destroy-then-start flow (which will be replaced in Phase 3).
 ---
 ## Phase 3 — Blue-green strategy (default)
 ### Task 3.1: Extract `deployBlueGreen(...)` helper
 **Files:** Modify `DeploymentExecutor.java`.
 - [ ] Move the current START_REPLICAS → HEALTH_CHECK → SWAP_TRAFFIC body into a new `private void deployBlueGreen(...)` method.
 - [ ] Signature: take `deployment`, `app`, `env`, `config`, `resolvedRuntimeType`, `mainClass`, `gen`, `primaryNetwork`, `additionalNets`.
 ### Task 3.2: Reorder for proper blue-green
 - [ ] Remove the pre-flight "stop previous" block added in `f8dccaae` (will be replaced by post-health swap).
 - [ ] Order: start all new → wait all healthy → find previous active (via `findActiveByAppIdAndEnvironmentIdExcluding`) → stop old containers + mark old row STOPPED.
 - [ ] Strict all-healthy: if `healthyCount < config.replicas()`, stop the new containers we just started, mark deployment FAILED with `"blue-green: %d/%d replicas healthy; preserving previous deployment"`. Do **not** touch the old deployment.
 ### Task 3.3: Wire strategy dispatch
 - [ ] At the point where `deployBlueGreen` is called, check `DeploymentStrategy.fromWire(config.deploymentStrategy())` and dispatch. For this phase, always call `deployBlueGreen`.
 - [ ] `ROLLING` dispatches to `deployRolling(...)` implemented in Phase 4 (stub it to throw `UnsupportedOperationException` for now — will be replaced before this phase lands).
 ---
 ## Phase 4 — Rolling strategy
 ### Task 4.1: `deployRolling(...)` helper
 **Files:** Modify `DeploymentExecutor.java`.
 - [ ] Same signature as `deployBlueGreen`.
 - [ ] Look up previous deployment once at entry via `findActiveByAppIdAndEnvironmentIdExcluding`. Capture its `replicaStates` into a map keyed by replica index.
 - [ ] For `i` from 0 to `config.replicas() - 1`:
  - [ ] Start new replica `i` (with gen-suffixed name).
  - [ ] Wait for this single container to go healthy (per-replica `waitForOneHealthy(containerId, timeoutSeconds)`; reuse `healthCheckTimeout` per replica or introduce a smaller per-replica budget).
  - [ ] On success: stop the corresponding old replica `i` by `containerId` from the previous deployment's replicaStates (if present); log continue.
  - [ ] On failure: stop + remove all new replicas started so far, mark deployment FAILED with `"rolling: replica %d failed to reach healthy; preserved %d previous replicas"`. Do **not** touch the already-replaced replicas from previous deployment (they're already stopped) or the not-yet-replaced ones (they keep serving).
 - [ ] After the loop succeeds for all replicas, mark the previous deployment row STOPPED (its containers are all stopped).
 ### Task 4.2: Add `waitForOneHealthy`
 - [ ] Variant of `waitForAnyHealthy` that polls a single container id. Returns boolean. Same sleep cadence.
 ### Task 4.3: Replace the Phase 3 stub
 - [ ] `ROLLING` dispatch calls `deployRolling` instead of throwing.
 ---
 ## Phase 5 — Integration tests
 Each IT extends `AbstractPostgresIT`, uses `@MockBean RuntimeOrchestrator`, and overrides `cameleer.server.runtime.healthchecktimeout=2` via `@TestPropertySource`.
 ### Task 5.1: BlueGreenStrategyIT
 **Files:** Create `BlueGreenStrategyIT.java`.
 - [ ] **Test 1 `blueGreen_allHealthy_stopsOldAfterNew`:** seed a previous RUNNING deployment (2 replicas). Trigger redeploy with `containerConfig.deploymentStrategy=blue-green` + replicas=2. Mock orchestrator: new containers return `healthy`. Await new deployment RUNNING. Assert: previous deployment has status STOPPED, its container IDs had `stopContainer`+`removeContainer` called; new deployment replicaStates contain the two new container IDs; `cameleer.generation` label on both new container requests.
 - [ ] **Test 2 `blueGreen_partialHealthy_preservesOldAndMarksFailed`:** seed previous RUNNING (2 replicas). New deploy with replicas=2. Mock: container A healthy, container B starting forever. Await new deployment FAILED. Assert: previous deployment still RUNNING; its container IDs were **not** stopped; new deployment errorMessage contains "1/2 replicas healthy".
 ### Task 5.2: RollingStrategyIT
 **Files:** Create `RollingStrategyIT.java`.
 - [ ] **Test 1 `rolling_allHealthy_replacesOneByOne`:** seed previous RUNNING (3 replicas). New deploy with strategy=rolling, replicas=3. Mock: new containers all healthy. Use `ArgumentCaptor` on `startContainer` to observe start order. Assert: start[0] → stop[old0] → start[1] → stop[old1] → start[2] → stop[old2]; new deployment RUNNING with 3 replicaStates; old deployment STOPPED.
 - [ ] **Test 2 `rolling_failsMidRollout_preservesRemainingOld`:** seed previous RUNNING (3 replicas). New deploy strategy=rolling. Mock: new[0] healthy, new[1] never healthy. Await FAILED. Assert: new[0] was stopped during cleanup; old[0] was stopped (replaced before the failure); old[1] + old[2] still RUNNING; new deployment errorMessage contains "replica 1".
 ---
 ## Phase 6 — UI strategy indicator
 ### Task 6.1: Strategy dropdown polish
 **Files:** Modify `ResourcesTab.tsx`.
 - [ ] Verify the `<select>` has options `blue-green` and `rolling`.
 - [ ] Add a one-line description under the dropdown: "Blue-green: start all new, swap when healthy. Rolling: replace one replica at a time."
 ### Task 6.2: Strategy on StatusCard
 **Files:** Modify `DeploymentTab/StatusCard.tsx`.
 - [ ] Add a small subtle text line in the grid: `<span>Strategy</span><span>{deployment.deploymentStrategy}</span>` (read-only, mono text ok).
 ---
 ## Phase 7 — Docs + rules updates
 ### Task 7.1: Update `.claude/rules/docker-orchestration.md`
 - [ ] Replace the "DeploymentExecutor Details" section with the new flow (gen suffix, strategy dispatch, per-strategy ordering).
 - [ ] Update the "Deployment Status Model" table — `DEGRADED` now means "post-deploy replica crashed"; failed-during-deploy is always `FAILED`.
 - [ ] Add a short "Deployment Strategies" section: behavior of blue-green vs rolling, resource peak, failure semantics.
 ### Task 7.2: Update `.claude/rules/app-classes.md`
 - [ ] Under `runtime/` → `DeploymentExecutor` bullet: add "branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`. Container name format: `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{gen}` where gen = 8-char prefix of deployment UUID."
 ### Task 7.3: Update `.claude/rules/core-classes.md`
 - [ ] Add under `runtime/`: `DeploymentStrategy` — enum BLUE_GREEN, ROLLING; `fromWire` falls back to BLUE_GREEN; note stored as kebab-case string on config.
 ---
 ## Rollout sequence
 1. Phase 1 (enum + helper) — trivial, land as one commit.
 2. Phase 2 (naming + generation label) — one commit; interim destroy-then-start still active; regenerates no OpenAPI (no controller change).
 3. Phase 3 (blue-green as default) — one commit replacing the interim flow. This is where real behavior changes.
 4. Phase 4 (rolling) — one commit.
 5. Phase 5 (4 ITs) — one commit; run `mvn test` against affected modules.
 6. Phase 6 (UI) — one commit; `npx tsc` clean.
 7. Phase 7 (docs) — one commit.
 Total: 7 commits, all atomic.
 ## Acceptance
 - Existing `DeploymentSnapshotIT` still passes.
 - New `BlueGreenStrategyIT` (2 tests) and `RollingStrategyIT` (2 tests) pass.
 - Browser QA: redeploy with `deploymentStrategy=blue-green` vs `rolling` produces the expected container timeline (inspect via `docker ps`); Prometheus metrics show continuity across deploys when queried by `{cameleer_app, cameleer_environment}`; the `cameleer_generation` label flips per deploy.
 - `.claude/rules/docker-orchestration.md` reflects the new behavior.
 ## Non-goals
 - Automatic rollback on blue-green partial failure (old is left running; user redeploys).
 - Automatic rollback on rolling mid-failure (remaining old replicas keep running; user redeploys).
 - Per-replica `HEALTH_CHECK` stage label in the UI progress bar — the 7-stage progress is reused as-is; strategy dictates internal looping.
 - Strategy field validation at container-config save time (executor's `fromWire` fallback absorbs unknown values — consider a follow-up for strict validation if it becomes an issue).
--- a/ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/ResourcesTab.tsx
+++ b/ui/src/pages/AppsTab/AppDeploymentPage/ConfigTabs/ResourcesTab.tsx
@@ -172,15 +172,22 @@ export function ResourcesTab({ value, onChange, disabled, isProd = false }: Prop
      />
      <span className={styles.configLabel}>Deploy Strategy</span>
-      <Select
+      <div>
-        disabled={disabled}
+        <Select
-        value={value.deployStrategy}
+          disabled={disabled}
-        onChange={(e) => update('deployStrategy', e.target.value)}
+          value={value.deployStrategy}
-        options={[
+          onChange={(e) => update('deployStrategy', e.target.value)}
-          { value: 'blue-green', label: 'Blue/Green' },
+          options={[
-          { value: 'rolling', label: 'Rolling' },
+            { value: 'blue-green', label: 'Blue/Green' },
-        ]}
+            { value: 'rolling', label: 'Rolling' },
-      />
+          ]}
        />
        <span className={styles.configHint}>
          {value.deployStrategy === 'rolling'
            ? 'Replace one replica at a time; peak = replicas + 1. Partial failure leaves remaining old replicas serving.'
            : 'Start all new replicas, swap once all are healthy; peak = 2 × replicas. Partial failure preserves the previous deployment.'}
        </span>
      </div>
      <span className={styles.configLabel}>Strip Path Prefix</span>
      <div className={styles.configInline}>
--- a/ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/StatusCard.tsx
+++ b/ui/src/pages/AppsTab/AppDeploymentPage/DeploymentTab/StatusCard.tsx
@@ -35,6 +35,7 @@ export function StatusCard({ deployment, version, externalUrl }: Props) {
        {version && <><span>JAR</span><MonoText size="sm">{version.jarFilename}</MonoText></>}
        {version && <><span>Checksum</span><MonoText size="xs">{version.jarChecksum.substring(0, 12)}</MonoText></>}
        <span>Replicas</span><span>{running}/{total}</span>
        <span>Strategy</span><span>{deployment.deploymentStrategy ?? '—'}</span>
        <span>URL</span>
        {deployment.status === 'RUNNING'
          ? <a href={externalUrl} target="_blank" rel="noreferrer"><MonoText size="sm">{externalUrl}</MonoText></a>
Author	SHA1	Message	Date
hsiegeln	007597715a	docs(rules): deployment strategies + generation suffix All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m8s Details CI / docker (push) Successful in 1m30s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 46s Details Refresh the three rules files to match the new executor behavior: - docker-orchestration.md: rewrite DeploymentExecutor Details with container naming scheme ({...}-{replica}-{generation}), strategy dispatch (blue-green vs rolling), and the new DEGRADED semantics (post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder bullets for the generation suffix + new cameleer.generation label. - app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets mirror the same. - core-classes.md: add DeploymentStrategy enum; note DEGRADED is now post-deploy-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:02:51 +02:00
hsiegeln	b6e54db6ec	ui(deploy): strategy hint on Resources tab + indicator on StatusCard Resources tab: add a hint under the Deploy Strategy dropdown that explains the blue-green vs rolling trade-off (resource peak, failure semantics), switching text based on the current selection. StatusCard: show the active deployment's strategy inline in the info grid so users can tell at a glance which path was taken for a given deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:00:44 +02:00
hsiegeln	e9f523f2b8	test(deploy): blue-green + rolling strategy ITs Four ITs covering strategy behavior: - BlueGreenStrategyIT#blueGreen_allHealthy_stopsOldAfterNew: old is stopped only after all new replicas are healthy. - BlueGreenStrategyIT#blueGreen_partialHealthy_preservesOldAndMarksFailed: strict all-healthy — one starting replica aborts the deploy and leaves the previous deployment RUNNING untouched. - RollingStrategyIT#rolling_allHealthy_replacesOneByOne: InOrder on stopContainer confirms old-0 stops before old-1 (the interleaving that distinguishes rolling from blue-green). - RollingStrategyIT#rolling_failsMidRollout_preservesRemainingOld: mid-rollout health failure stops only the in-flight new containers and the already-replaced old-0; old-1 stays untouched. Shortens healthchecktimeout to 2s via @TestPropertySource so failure paths complete in ~25s instead of ~60s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:00:00 +02:00
hsiegeln	653f983a08	deploy: rolling strategy (per-replica replacement) Replace the Phase 3 stub with a working rolling implementation. Flow: - Capture previous deployment's per-index container ids up front. - For i = 0..replicas-1: - Start new[i] (gen-suffixed name, coexists with old[i]). - Wait for new[i] healthy (new waitForOneHealthy helper). - On success: stop old[i] if present, continue. - On failure: stop in-flight new[0..i], leave un-replaced old[i+1..N] running, mark FAILED. Already-replaced old replicas are not restored — rolling is not reversible; user redeploys to recover. - After the loop: sweep any leftover old replicas (when replica count shrank) and mark the old deployment STOPPED. Resource peak: replicas + 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:53:52 +02:00
hsiegeln	459cdfe427	deploy: blue-green strategy (start → health-all → stop old) Phase 3 of deployment-strategies plan. Refactor executeAsync to dispatch on DeploymentStrategy.fromWire(config.deploymentStrategy()). Blue-green (default): - Start all N new replicas (gen-suffixed names coexist with old). - Wait for ALL healthy (strict — partial-healthy = FAILED, preserves previous deployment untouched). - Only then find + stop the previous deployment. - Final status is always RUNNING; DEGRADED is now reserved for post-deploy replica crashes (set by DockerEventMonitor). Rolling: stub — throws UnsupportedOperationException for now, gets its real implementation in Phase 4. Refactor details: - Extract DeployCtx record to carry 13 per-deploy values around. - Extract startReplica(ctx, i, stateOut) — shared by both strategy paths. - Extract persistSnapshotAndMarkRunning(ctx, primaryCid) — shared finalizer. - Rename waitForAnyHealthy → waitForAllHealthy (the name was misleading; the method already waited for all, just returned partial on timeout). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:51:24 +02:00
hsiegeln	652346dcd4	deploy: gen-suffixed container names + cameleer.generation label Append an 8-char generation id (first 8 chars of deployment UUID) to: - container name: {tenant}-{env}-{app}-{replica}-{gen} - CAMELEER_AGENT_INSTANCEID (so old+new agents are distinct in the registry) - Traefik cameleer.instance-id label And emit a new standalone cameleer.generation label so dashboards (Prometheus/Grafana) can pin deploy boundaries without regex on instance-id. Strategy branching comes next — this commit is foundation only; the interim destroy-then-start flow still runs regardless of strategy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:45:44 +02:00
hsiegeln	5304c8ee01	core(deploy): DeploymentStrategy enum with safe wire conversion Typed enum (BLUE_GREEN, ROLLING) with fromWire/toWire kebab-case translation. fromWire falls back to BLUE_GREEN for unknown or null input so the executor dispatch site never null-checks and no misconfigured container-config can throw at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:42:35 +02:00
hsiegeln	2c82f29aef	docs(plans): deployment strategies (blue-green + rolling) plan 7-phase plan to replace the interim destroy-then-start flow (`f8dccaae`) with a strategy-aware executor. Adds gen-suffixed container names so old + new replicas can coexist, plus a cameleer.generation label for Prometheus/Grafana deploy-boundary annotations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:41:43 +02:00