fix(dirty-state): exclude live-pushed fields from deploy diff

Live-pushed config fields (taps, tapVersion, tracedProcessors, routeRecording) apply via SSE CONFIG_UPDATE — they take effect on running agents without a redeploy and are fetched on agent restart from application_config. They must not contribute to the "pending deploy" diff against the last-successful-deployment snapshot. Before this fix, applying a tap from the process diagram correctly rolled out in real time but then marked the app "Pending Deploy (1)" because DirtyStateCalculator compared every agentConfig field. This also contradicted the UI rule (ui.md) that the live tabs "never mark dirty". Adds taps, tapVersion, tracedProcessors, routeRecording to AGENT_CONFIG_IGNORED_KEYS. Updates the nested-path test to use a staged field (sensitiveKeys) and adds a new test asserting that divergent live-push fields keep dirty=false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge pull request 'feat(ui): show deployment status + rich pending-deploy tooltip on app header' (#151 ) from feature/deployment-status-badge into main
2026-04-24 14:42:07 +02:00 · 2026-04-24 13:50:00 +02:00 · 2026-04-24 13:49:51 +02:00 · 2026-04-24 13:49:24 +02:00 · 2026-04-24 13:47:04 +02:00 · 2026-04-24 11:22:27 +02:00
219 changed files with 24373 additions and 1888 deletions
--- a/.claude/rules/app-classes.md
+++ b/.claude/rules/app-classes.md
@@ -53,18 +53,18 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale

 ### Env-scoped (user-facing data & config)

- `AppController` — `/api/v1/environments/{envSlug}/apps`. GET list / POST create / GET `{appSlug}` / DELETE `{appSlug}` / GET `{appSlug}/versions` / POST `{appSlug}/versions` (JAR upload) / PUT `{appSlug}/container-config`. App slug uniqueness is per-env (`(env, app_slug)` is the natural key). `CreateAppRequest` body has no env (path), validates slug regex.
- `DeploymentController` — `/api/v1/environments/{envSlug}/apps/{appSlug}/deployments`. GET list / POST create (body `{ appVersionId }`) / POST `{id}/stop` / POST `{id}/promote` (body `{ targetEnvironment: slug }` — target app slug must exist in target env) / GET `{id}/logs`.
- `ApplicationConfigController` — `/api/v1/environments/{envSlug}`. GET `/config` (list), GET/PUT `/apps/{appSlug}/config`, GET `/apps/{appSlug}/processor-routes`, POST `/apps/{appSlug}/config/test-expression`. PUT also pushes `CONFIG_UPDATE` to LIVE agents in this env.
+- `AppController` — `/api/v1/environments/{envSlug}/apps`. GET list / POST create / GET `{appSlug}` / DELETE `{appSlug}` / GET `{appSlug}/versions` / POST `{appSlug}/versions` (JAR upload) / PUT `{appSlug}/container-config` / GET `{appSlug}/dirty-state` (returns `DirtyStateResponse{dirty, lastSuccessfulDeploymentId, differences}` — compares current JAR+config against last RUNNING deployment snapshot; dirty=true when no snapshot exists). App slug uniqueness is per-env (`(env, app_slug)` is the natural key). `CreateAppRequest` body has no env (path), validates slug regex. Injects `DirtyStateCalculator` bean (registered in `RuntimeBeanConfig`, requires `ObjectMapper` with `JavaTimeModule`).
+- `DeploymentController` — `/api/v1/environments/{envSlug}/apps/{appSlug}/deployments`. GET list / POST create (body `{ appVersionId }`) / POST `{id}/stop` / POST `{id}/promote` (body `{ targetEnvironment: slug }` — target app slug must exist in target env) / GET `{id}/logs`. All lifecycle ops (`POST /` deploy, `POST /{id}/stop`, `POST /{id}/promote`) audited under `AuditCategory.DEPLOYMENT`. Action codes: `deploy_app`, `stop_deployment`, `promote_deployment`. Acting user resolved via the `user:` prefix-strip convention; both SUCCESS and FAILURE branches write audit rows. `created_by` (TEXT, nullable) populated from `SecurityContextHolder` and surfaced on the `Deployment` DTO.
+- `ApplicationConfigController` — `/api/v1/environments/{envSlug}`. GET `/config` (list), GET/PUT `/apps/{appSlug}/config`, GET `/apps/{appSlug}/processor-routes`, POST `/apps/{appSlug}/config/test-expression`. PUT accepts `?apply=staged|live` (default `live`). `live` saves to DB and pushes `CONFIG_UPDATE` SSE to live agents in this env (existing behavior); `staged` saves to DB only, skipping the SSE push — used by the unified app deployment page. Audit action is `stage_app_config` for staged writes, `update_app_config` for live. Invalid `apply` values return 400.
 - `AppSettingsController` — `/api/v1/environments/{envSlug}`. GET `/app-settings` (list), GET/PUT/DELETE `/apps/{appSlug}/settings`. ADMIN/OPERATOR only.
- `SearchController` — `/api/v1/environments/{envSlug}`. GET `/executions`, POST `/executions/search`, GET `/stats`, `/stats/timeseries`, `/stats/timeseries/by-app`, `/stats/timeseries/by-route`, `/stats/punchcard`, `/attributes/keys`, `/errors/top`.
- `LogQueryController` — GET `/api/v1/environments/{envSlug}/logs` (filters: source (multi, comma-split, OR-joined), level (multi, comma-split, OR-joined), application, agentId, exchangeId, logger, q, time range; sort asc/desc). Cursor-paginated, returns `{ data, nextCursor, hasMore, levelCounts }`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — same-millisecond tiebreak via the `insert_id` UUID column on `logs`.
+- `SearchController` — `/api/v1/environments/{envSlug}`. GET `/executions`, POST `/executions/search`, GET `/stats`, `/stats/timeseries`, `/stats/timeseries/by-app`, `/stats/timeseries/by-route`, `/stats/punchcard`, `/attributes/keys`, `/errors/top`. GET `/executions` accepts repeat `attr` query params: `attr=order` (key-exists), `attr=order:47` (exact), `attr=order:4*` (wildcard — `*` maps to SQL LIKE `%`). First `:` splits key/value; later colons stay in the value. Invalid keys → 400. POST `/executions/search` accepts the same filters via `SearchRequest.attributeFilters` in the body.
+- `LogQueryController` — GET `/api/v1/environments/{envSlug}/logs` (filters: source (multi, comma-split, OR-joined), level (multi, comma-split, OR-joined), application, agentId, exchangeId, logger, q, time range, instanceIds (multi, comma-split, AND-joined as WHERE instance_id IN (...) — used by the Checkpoint detail drawer to scope logs to a deployment's replicas); sort asc/desc). Cursor-paginated, returns `{ data, nextCursor, hasMore, levelCounts }`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — same-millisecond tiebreak via the `insert_id` UUID column on `logs`.
 - `RouteCatalogController` — GET `/api/v1/environments/{envSlug}/routes` (merged route catalog from registry + ClickHouse; env filter unconditional).
 - `RouteMetricsController` — GET `/api/v1/environments/{envSlug}/routes/metrics`, GET `/api/v1/environments/{envSlug}/routes/metrics/processors`.
 - `AgentListController` — GET `/api/v1/environments/{envSlug}/agents` (registered agents with runtime metrics, filtered to env).
 - `AgentEventsController` — GET `/api/v1/environments/{envSlug}/agents/events` (lifecycle events; cursor-paginated, returns `{ data, nextCursor, hasMore }`; order `(timestamp DESC, insert_id DESC)`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — `insert_id` is a stable UUID column used as a same-millisecond tiebreak).
 - `AgentMetricsController` — GET `/api/v1/environments/{envSlug}/agents/{agentId}/metrics` (JVM/Camel metrics). Rejects cross-env agents (404) as defence-in-depth.
- `DiagramRenderController` — GET `/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram` (env-scoped lookup). Also GET `/api/v1/diagrams/{contentHash}/render` (flat — content hashes are globally unique).
+- `DiagramRenderController` — GET `/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram` returns the most recent diagram for (app, env, route) via `DiagramStore.findLatestContentHashForAppRoute`. Registry-independent — routes whose publishing agents were removed still resolve. Also GET `/api/v1/diagrams/{contentHash}/render` (flat — content hashes are globally unique), the point-in-time path consumed by the exchange viewer via `ExecutionDetail.diagramContentHash`.
 - `AlertRuleController` — `/api/v1/environments/{envSlug}/alerts/rules`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/enable` / POST `{id}/disable` / POST `{id}/render-preview` / POST `{id}/test-evaluate`. OPERATOR+ for mutations, VIEWER+ for reads. CRITICAL: attribute keys in `ExchangeMatchCondition.filter.attributes` are validated at rule-save time against `^[a-zA-Z0-9._-]+$` — they are later inlined into ClickHouse SQL. `AgentLifecycleCondition` is allowlist-only — the `AgentLifecycleEventType` enum (REGISTERED / RE_REGISTERED / DEREGISTERED / WENT_STALE / WENT_DEAD / RECOVERED) plus the record compact ctor (non-empty `eventTypes`, `withinSeconds ≥ 1`) do the validation; custom agent-emitted event types are tracked in backlog issue #145. Webhook validation: verifies `outboundConnectionId` exists and `isAllowedInEnvironment`. Null notification templates default to `""` (NOT NULL constraint). Audit: `ALERT_RULE_CHANGE`.
 - `AlertController` — `/api/v1/environments/{envSlug}/alerts`. GET list (inbox filtered by userId/groupIds/roleNames via `InAppInboxQuery`; optional multi-value `state`, `severity`, tri-state `acked`, tri-state `read` query params; soft-deleted rows always excluded) / GET `/unread-count` / GET `{id}` / POST `{id}/ack` / POST `{id}/read` / POST `/bulk-read` / POST `/bulk-ack` (VIEWER+) / DELETE `{id}` (OPERATOR+, soft-delete) / POST `/bulk-delete` (OPERATOR+) / POST `{id}/restore` (OPERATOR+, clears `deleted_at`). `requireLiveInstance` helper returns 404 on soft-deleted rows; `restore` explicitly fetches regardless of `deleted_at`. `BulkIdsRequest` is the shared body for bulk-read/ack/delete (`{ instanceIds }`). `AlertDto` includes `readAt`; `deletedAt` is intentionally NOT on the wire. Inbox SQL: `? = ANY(target_user_ids) OR target_group_ids && ? OR target_role_names && ?` — requires at least one matching target (no broadcast concept).
 - `AlertSilenceController` — `/api/v1/environments/{envSlug}/alerts/silences`. GET list / POST create / DELETE `{id}`. 422 if `endsAt <= startsAt`. OPERATOR+ for mutations, VIEWER+ for list. Audit: `ALERT_SILENCE_CHANGE`.
@@ -72,7 +72,7 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale

 ### Env admin (env-slug-parameterized, not env-scoped data)

- `EnvironmentAdminController` — `/api/v1/admin/environments`. GET list / POST create / GET `{envSlug}` / PUT `{envSlug}` / DELETE `{envSlug}` / PUT `{envSlug}/default-container-config` / PUT `{envSlug}/jar-retention`. Slug immutable — PUT body has no slug field; any slug supplied is dropped by Jackson. Slug validated on POST.
+- `EnvironmentAdminController` — `/api/v1/admin/environments`. GET list / POST create / GET `{envSlug}` / PUT `{envSlug}` / DELETE `{envSlug}` / PUT `{envSlug}/default-container-config` / PUT `{envSlug}/jar-retention`. Slug immutable — PUT body has no slug field; any slug supplied is dropped by Jackson. Slug validated on POST. `UpdateEnvironmentRequest` carries `color` (nullable); unknown values rejected with 400 via `EnvironmentColor.isValid`. Null/absent color preserves the existing value.

 ### Agent-only (JWT-authoritative, intentionally flat)

@@ -109,6 +109,7 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
 - `UsageAnalyticsController` — GET `/api/v1/admin/usage` (ClickHouse `usage_events`).
 - `ClickHouseAdminController` — GET `/api/v1/admin/clickhouse/**` (conditional on `infrastructureendpoints` flag).
 - `DatabaseAdminController` — GET `/api/v1/admin/database/**` (conditional on `infrastructureendpoints` flag).
+- `ServerMetricsAdminController` — `/api/v1/admin/server-metrics/**`. GET `/catalog`, GET `/instances`, POST `/query`. Generic read API over the `server_metrics` ClickHouse table so SaaS dashboards don't need direct CH access. Delegates to `ServerMetricsQueryStore` (impl `ClickHouseServerMetricsQueryStore`). Visibility matches ClickHouse/Database admin: `@ConditionalOnProperty(infrastructureendpoints, matchIfMissing=true)` + class-level `@PreAuthorize("hasRole('ADMIN')")`. Validation: metric/tag regex `^[a-zA-Z0-9._]+$`, statistic regex `^[a-z_]+$`, `to - from ≤ 31 days`, stepSeconds ∈ [10, 3600], response capped at 500 series. `IllegalArgumentException` → 400. `/query` supports `raw` + `delta` modes (delta does per-`server_instance_id` positive-clipped differences, then aggregates across instances). Derived `statistic=mean` for timers computes `sum(total|total_time)/sum(count)` per bucket.

 ### Other (flat)

@@ -118,10 +119,10 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
 ## runtime/ — Docker orchestration

 - `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}` (globally unique on Docker daemon). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}`.
+- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 chars of the deployment UUID — old and new replicas coexist during a blue/green swap. Per-replica `CAMELEER_AGENT_INSTANCEID` env var is `{envSlug}-{appSlug}-{replicaIndex}-{generation}`. Branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`: **blue-green** (default) starts all N → waits for all healthy → stops old (partial health = FAILED, preserves old untouched); **rolling** replaces replicas one at a time with rollback only for in-flight new containers (already-replaced old stay stopped; un-replaced old keep serving). DEGRADED is now only set by `DockerEventMonitor` post-deploy, never by the executor.
 - `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
 - `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
+- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Per-container identity labels: `cameleer.replica` (index), `cameleer.generation` (deployment-scoped 8-char id — for Prometheus/Grafana deploy-boundary annotations), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Router/service label keys are generation-agnostic so load balancing spans old + new replicas during a blue/green overlap.
 - `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
 - `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
 - `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
@@ -129,11 +130,13 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
 ## metrics/ — Prometheus observability

 - `ServerMetrics` — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via `/api/v1/prometheus`.
+- `ServerInstanceIdConfig` — `@Configuration`, exposes `@Bean("serverInstanceId") String`. Resolution precedence: `cameleer.server.instance-id` property → `HOSTNAME` env → `InetAddress.getLocalHost()` → random UUID. Fixed at boot; rotates across restarts so counters restart cleanly.
+- `ServerMetricsSnapshotScheduler` — `@Scheduled(fixedDelayString = "${cameleer.server.self-metrics.interval-ms:60000}")`. Walks `MeterRegistry.getMeters()` each tick, emits one `ServerMetricSample` per `Measurement` (Timer/DistributionSummary produce multiple rows per meter — one per Micrometer `Statistic`). Skips non-finite values; logs and swallows store failures. Disabled via `cameleer.server.self-metrics.enabled=false` (`@ConditionalOnProperty`). Write-only — no query endpoint yet; inspect via `/api/v1/admin/clickhouse/query`.

 ## storage/ — PostgreSQL repositories (JdbcTemplate)

 - `PostgresAppRepository`, `PostgresAppVersionRepository`, `PostgresEnvironmentRepository`
- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId
+- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId. Also carries `deployed_config_snapshot` JSONB (Flyway V3) populated by `DeploymentExecutor` via `saveDeployedConfigSnapshot(UUID, DeploymentConfigSnapshot)` on successful RUNNING transition. Consumed by `DirtyStateCalculator` for the `/apps/{slug}/dirty-state` endpoint and by the UI for checkpoint restore.
 - `PostgresUserRepository`, `PostgresRoleRepository`, `PostgresGroupRepository`
 - `PostgresAuditRepository`, `PostgresOidcConfigRepository`, `PostgresClaimMappingRepository`, `PostgresSensitiveKeysRepository`
 - `PostgresAppSettingsRepository`, `PostgresApplicationConfigRepository`, `PostgresThresholdRepository`. Both `app_settings` and `application_config` are env-scoped (PK `(app_id, environment)` / `(application, environment)`); finders take `(app, env)` — no env-agnostic variants.
@@ -145,6 +148,8 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
 - `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
 - `ClickHouseUsageTracker` — usage_events for billing
 - `ClickHouseRouteCatalogStore` — persistent route catalog with first_seen cache, warm-loaded on startup
+- `ClickHouseServerMetricsStore` — periodic dumps of the server's own Micrometer registry into the `server_metrics` table. Tenant-stamped (bound at the scheduler, not the bean); no `environment` column (server straddles envs). Batch-insert via `JdbcTemplate.batchUpdate` with `Map(String, String)` tag binding. Written by `ServerMetricsSnapshotScheduler`.
+- `ClickHouseServerMetricsQueryStore` — read side of `server_metrics` for dashboards. Implements `ServerMetricsQueryStore`. `catalog(from,to)` returns name+type+statistics+tagKeys, `listInstances(from,to)` returns server_instance_ids with first/last seen, `query(request)` builds bucketed time-series with `raw` or `delta` mode and supports a derived `mean` statistic for timers. All identifier inputs regex-validated; tenant_id always bound; max range 31 days; series count capped at 500. Exposed via `ServerMetricsAdminController`.

 ## search/ — ClickHouse search and log stores

@@ -171,6 +176,14 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale

 - `JarRetentionJob` — @Scheduled 03:00 daily, per-environment retention, skips deployed versions

+## alerting/eval/ — Rule evaluation
+
+- `AlertEvaluatorJob` — @Scheduled tick driver; per-rule claim/release via `AlertRuleRepository`, dispatches to per-kind `ConditionEvaluator`, persists advanced cursor on release via `AlertRule.withEvalState`.
+- `BatchResultApplier` — `@Component` that wraps a single rule's tick outcome (`EvalResult.Batch` = `firings` + `nextEvalState`) in one `@Transactional` boundary: instance upserts + notification enqueues + cursor advance commit atomically or roll back together. This is the exactly-once-per-exchange guarantee for `PER_EXCHANGE` fire mode.
+- `ConditionEvaluator` — interface; per-kind implementations: `ExchangeMatchEvaluator`, `AgentLifecycleEvaluator`, `AgentStateEvaluator`, `DeploymentStateEvaluator`, `JvmMetricEvaluator`, `LogPatternEvaluator`, `RouteMetricEvaluator`.
+- `AlertStateTransitions` — PER_EXCHANGE vs rule-level FSM helpers (fire/resolve/ack).
+- `PerKindCircuitBreaker` — trips noisy per-kind evaluators; `TickCache` — per-tick shared lookups (apps, envs, silences).
+
 ## http/ — Outbound HTTP client implementation

 - `SslContextBuilder` — composes SSL context from `OutboundHttpProperties` + `OutboundHttpRequestContext`. Supports SYSTEM_DEFAULT (JDK roots + configured CA extras), TRUST_ALL (short-circuit no-op TrustManager), TRUST_PATHS (JDK roots + system extras + per-request extras). Throws `IllegalArgumentException("CA file not found: ...")` on missing PEM.
--- a/.claude/rules/cicd.md
+++ b/.claude/rules/cicd.md
@@ -8,8 +8,11 @@ paths:

 # CI/CD & Deployment

- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches
+- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches. `paths-ignore` skips the whole pipeline for docs-only / `.planning/` / `.claude/` / `*.md` changes (push and PR triggers).
 - Build step skips integration tests (`-DskipITs`) — Testcontainers needs Docker daemon
+- Build caches (parallel `actions/cache@v4` steps in the `build` job): `~/.m2/repository` (key on all `pom.xml`), `~/.npm` (key on `ui/package-lock.json`), `ui/node_modules/.vite` (key on `ui/package-lock.json` + `ui/vite.config.ts`). UI install uses `npm ci --prefer-offline --no-audit --fund=false` so the npm cache is the primary source.
+- Maven build performance (set in `pom.xml` and `cameleer-server-app/pom.xml`): `useIncrementalCompilation=true` on the compiler plugin; Surefire uses `forkCount=1C` + `reuseForks=true` (one JVM per CPU core, reused across test classes); Failsafe keeps `forkCount=1` + `reuseForks=true`. Unit tests must not rely on per-class JVM isolation.
+- UI build script (`ui/package.json`): `build` is `vite build` only — the type-check pass was split out into `npm run typecheck` (run separately when you want a full `tsc --noEmit` sweep).
 - Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime. `docker-entrypoint.sh` imports `/certs/ca.pem` into JVM truststore before starting the app (supports custom CAs for OIDC discovery without `CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY`).
 - `REGISTRY_TOKEN` build arg required for `cameleer-common` dependency resolution
 - Registry: `gitea.siegeln.net/cameleer/cameleer-server` (container images)
--- a/.claude/rules/core-classes.md
+++ b/.claude/rules/core-classes.md
@@ -26,16 +26,18 @@ paths:

 - `App` — record: id, environmentId, slug, displayName, containerConfig (JSONB)
 - `AppVersion` — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
- `Environment` — record: id, slug, jarRetentionCount
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
+- `Environment` — record: id, slug, displayName, production, enabled, defaultContainerConfig, jarRetentionCount, color, createdAt. `color` is one of the 8 preset palette values validated by `EnvironmentColor.VALUES` and CHECK-constrained in PostgreSQL (V2 migration).
+- `EnvironmentColor` — constants: `DEFAULT = "slate"`, `VALUES = {slate,red,amber,green,teal,blue,purple,pink}`, `isValid(String)`.
+- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName, createdBy (String, user_id reference; nullable for pre-V4 historical rows)
+- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED. `DEGRADED` is reserved for post-deploy drift (a replica died after RUNNING); `DeploymentExecutor` now marks partial-healthy deploys FAILED, not DEGRADED.
 - `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
+- `DeploymentStrategy` — enum: BLUE_GREEN, ROLLING. Stored on `ResolvedContainerConfig.deploymentStrategy` as kebab-case string (`"blue-green"` / `"rolling"`). `fromWire(String)` is the only conversion entry point; unknown/null inputs fall back to BLUE_GREEN so the executor dispatch site never null-checks or throws.
+- `DeploymentService` — createDeployment (calls `deleteFailedByAppAndEnvironment` first so FAILED rows don't pile up; STOPPED rows are preserved as restorable checkpoints), markRunning, markFailed, markStopped
 - `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
 - `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
 - `ContainerRequest` — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
 - `ContainerStatus` — record: state, running, exitCode, error
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks
+- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks, externalRouting (default `true`; when `false`, `TraefikLabelBuilder` strips all `traefik.*` labels so the container is not publicly routed), certResolver (server-wide, sourced from `CAMELEER_SERVER_RUNTIME_CERTRESOLVER`; when blank the `tls.certresolver` label is omitted — use for dev installs with a static TLS store)
 - `RoutingMode` — enum for routing strategies
 - `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
 - `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs, startLogCapture, stopLogCapture
@@ -45,14 +47,15 @@ paths:
 ## search/ — Execution search and stats

 - `SearchService` — search, count, stats, statsForApp, statsForRoute, timeseries, timeseriesForApp, timeseriesForRoute, timeseriesGroupedByApp, timeseriesGroupedByRoute, slaCompliance, slaCountsByApp, slaCountsByRoute, topErrors, activeErrorTypes, punchcard, distinctAttributeKeys. `statsForRoute`/`timeseriesForRoute` take `(routeId, applicationId)` — app filter is applied to `stats_1m_route`.
- `SearchRequest` / `SearchResult` — search DTOs
+- `SearchRequest` / `SearchResult` — search DTOs. `SearchRequest.attributeFilters: List<AttributeFilter>` carries structured facet filters for execution attributes — key-only (exists), exact (key=value), or wildcard (`*` in value). The 21-arg legacy ctor is preserved for call-site churn; the compact ctor normalises null → `List.of()`.
+- `AttributeFilter(key, value)` — record with key regex `^[a-zA-Z0-9._-]+$` (inlined into SQL, same constraint as alerting), `value == null` means key-exists, `value` containing `*` becomes a SQL LIKE pattern via `toLikePattern()`.
 - `ExecutionStats`, `ExecutionSummary` — stats aggregation records
 - `StatsTimeseries`, `TopError` — timeseries and error DTOs
 - `LogSearchRequest` / `LogSearchResponse` — log search DTOs. `LogSearchRequest.sources` / `levels` are `List<String>` (null-normalized, multi-value OR); `cursor` + `limit` + `sort` drive keyset pagination. Response carries `nextCursor` + `hasMore` + per-level `levelCounts`.

 ## storage/ — Storage abstractions

- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `RouteCatalogStore`, `SearchIndex`, `LogIndex` — interfaces
+- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `RouteCatalogStore`, `SearchIndex`, `LogIndex` — interfaces. `DiagramStore.findLatestContentHashForAppRoute(appId, routeId, env)` resolves the latest diagram by (app, env, route) without consulting the agent registry, so routes whose publishing agents were removed between app versions still resolve. `findContentHashForRoute(route, instance)` is retained for the ingestion path that stamps a per-execution `diagramContentHash` at ingest time (point-in-time link from `ExecutionDetail`/`ExecutionSummary`).
 - `RouteCatalogEntry` — record: applicationId, routeId, environment, firstSeen, lastSeen
 - `LogEntryResult` — log query result record
 - `model/` — `ExecutionDocument`, `MetricTimeSeries`, `MetricsSnapshot`
@@ -78,7 +81,7 @@ paths:
 - `AppSettings`, `AppSettingsRepository` — per-app-per-env settings config and persistence. Record carries `(applicationId, environment, …)`; repository methods are `findByApplicationAndEnvironment`, `findByEnvironment`, `save`, `delete(appId, env)`. `AppSettings.defaults(appId, env)` produces a default instance scoped to an environment.
 - `ThresholdConfig`, `ThresholdRepository` — alerting threshold config and persistence
 - `AuditService` — audit logging facade
- `AuditRecord`, `AuditResult`, `AuditCategory` (enum: `INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE`), `AuditRepository` — audit trail records and persistence
+- `AuditRecord`, `AuditResult`, `AuditCategory` (enum: `INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE, ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE, DEPLOYMENT`), `AuditRepository` — audit trail records and persistence

 ## http/ — Outbound HTTP primitives (cross-cutting)

--- a/.claude/rules/docker-orchestration.md
+++ b/.claude/rules/docker-orchestration.md
@@ -13,19 +13,28 @@ paths:
 When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:

 - **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
+- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Per-replica identity labels: `cameleer.replica` (index), `cameleer.generation` (8-char deployment UUID prefix — pin Prometheus/Grafana deploy boundaries with this), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Traefik router/service keys deliberately omit the generation so load balancing spans old + new replicas during a blue/green overlap. When `ResolvedContainerConfig.externalRouting()` is `false` (UI: Resources → External Routing, default `true`), the builder emits ONLY the identity labels (`managed-by`, `cameleer.*`) and skips every `traefik.*` label — the container stays on `cameleer-traefik` and the per-env network (so sibling containers can still reach it via Docker DNS) but is invisible to Traefik. The `tls.certresolver` label is emitted only when `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` is set to a non-blank resolver name (matching a resolver configured in the Traefik static config). When unset (dev installs backed by a static TLS store) only `tls=true` is emitted and Traefik serves the default cert from the TLS store.
 - **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
 - **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
  - `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
  - `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
 - **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
 - **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
+- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level. Instance-id changes per deployment — cross-deploy queries aggregate on `application + environment` (and optionally `replica_index`).
 - **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).

 ## DeploymentExecutor Details

-Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
+Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
+
+**Container naming** — `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 characters of the deployment UUID. The generation suffix lets old + new replicas coexist during a blue/green swap (deterministic names without a generation used to 409). All lookups across the executor, `DockerEventMonitor`, and `ContainerLogForwarder` key on container **id**, not name — the name is operator-visibility only.
+
+**Strategy dispatch** — `DeploymentStrategy.fromWire(config.deploymentStrategy())` branches the executor. Unknown values fall back to BLUE_GREEN so misconfiguration never throws at runtime.
+
+- **Blue/green** (default): start all N new replicas → wait for ALL healthy → stop the previous deployment. Resource peak ≈ 2× replicas for the health-check window. Partial health aborts with status FAILED; the previous deployment is preserved untouched (user's safety net).
+- **Rolling**: replace replicas one at a time — start new[i] → wait healthy → stop old[i] → next. Resource peak = replicas + 1. Mid-rollout health failure stops in-flight new containers and aborts; already-replaced old replicas are NOT restored (not reversible) but un-replaced old[i+1..N] keep serving traffic. User redeploys to recover.
+
+Traffic routing is implicit: Traefik labels (`cameleer.app`, `cameleer.environment`) are generation-agnostic, so new replicas attract load balancing as soon as they come up healthy — no explicit swap step.

 ## Deployment Status Model

@@ -34,17 +43,13 @@ Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNET
 | `STOPPED` | Intentionally stopped or initial state |
 | `STARTING` | Deploy in progress |
 | `RUNNING` | All replicas healthy and serving |
-| `DEGRADED` | Some replicas healthy, some dead |
+| `DEGRADED` | Post-deploy: a replica died after the deploy was marked RUNNING. Set by `DockerEventMonitor` reconciliation, never by `DeploymentExecutor` directly. |
 | `STOPPING` | Graceful shutdown in progress |
-| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
+| `FAILED` | Terminal failure (pre-flight, health check, or crash). Partial-healthy deploys now mark FAILED — DEGRADED is reserved for post-deploy drift. |

-**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
+**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage). Rolling reuses the same stage labels inside the per-replica loop; the UI progress bar shows the most recent stage.

-**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
-
-**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
-
-**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
+**Deployment retention**: `DeploymentService.createDeployment()` deletes FAILED deployments for the same app+environment before creating a new one, preventing failed-attempt buildup. STOPPED deployments are preserved as restorable checkpoints — the UI Checkpoints disclosure lists every deployment with a non-null `deployed_config_snapshot` (RUNNING, DEGRADED, STOPPED) minus the current one.

 ## JAR Management

--- a/.claude/rules/metrics.md
+++ b/.claude/rules/metrics.md
@@ -8,7 +8,9 @@ paths:

 # Prometheus Metrics

-Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component:
+Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component.
+
+The same `MeterRegistry` is also snapshotted to ClickHouse every 60 s by `ServerMetricsSnapshotScheduler` (see "Server self-metrics persistence" at the bottom of this file) — so historical server-health data survives restarts without an external Prometheus.

 ## Gauges (auto-polled)

@@ -83,3 +85,23 @@ Mean processing time = `camel.route.policy.total_time / camel.route.policy.count
 | `cameleer.sse.reconnects.count` | counter | `instanceId` |
 | `cameleer.taps.evaluated.count` | counter | `instanceId` |
 | `cameleer.metrics.exported.count` | counter | `instanceId` |
+
+## Server self-metrics persistence
+
+`ServerMetricsSnapshotScheduler` walks `MeterRegistry.getMeters()` every 60 s (configurable via `cameleer.server.self-metrics.interval-ms`) and writes one row per Micrometer `Measurement` to the ClickHouse `server_metrics` table. Full registry is captured — Spring Boot Actuator series (`jvm.*`, `process.*`, `http.server.requests`, `hikaricp.*`, `jdbc.*`, `tomcat.*`, `logback.events`, `system.*`) plus `cameleer.*` and `alerting_*`.
+
+**Table** (`cameleer-server-app/src/main/resources/clickhouse/init.sql`):
+
+```
+server_metrics(tenant_id, collected_at, server_instance_id,
+               metric_name, metric_type, statistic, metric_value,
+               tags Map(String,String), server_received_at)
+```
+
+- `metric_type` — lowercase Micrometer `Meter.Type` (counter, gauge, timer, distribution_summary, long_task_timer, other)
+- `statistic` — Micrometer `Statistic.getTagValueRepresentation()` (value, count, total, total_time, max, mean, active_tasks, duration). Timers emit 3 rows per tick (count + total_time + max); gauges/counters emit 1 (`statistic='value'` or `'count'`).
+- No `environment` column — the server is env-agnostic.
+- `tenant_id` threaded from `cameleer.server.tenant.id` (single-tenant per server).
+- `server_instance_id` resolved once at boot by `ServerInstanceIdConfig` (property → HOSTNAME → localhost → UUID fallback). Rotates across restarts so counter resets are unambiguous.
+- TTL: 90 days (vs 365 for `agent_metrics`). Write-only in v1 — no query endpoint or UI page. Inspect via ClickHouse admin: `/api/v1/admin/clickhouse/query` or direct SQL.
+- Toggle off entirely with `cameleer.server.self-metrics.enabled=false` (uses `@ConditionalOnProperty`).
--- a/.claude/rules/ui.md
+++ b/.claude/rules/ui.md
@@ -10,13 +10,18 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
 - **Exchanges** — route execution search and detail (`ui/src/pages/Exchanges/`)
 - **Dashboard** — metrics and stats with L1/L2/L3 drill-down (`ui/src/pages/DashboardTab/`)
 - **Runtime** — live agent status, logs, commands (`ui/src/pages/RuntimeTab/`). AgentHealth supports compact view (dense health-tinted cards) and expanded view (full GroupCard+DataTable per app). View mode persisted to localStorage.
- **Deployments** — app management, JAR upload, deployment lifecycle (`ui/src/pages/AppsTab/`)
-  - Config sub-tabs: **Monitoring | Resources | Variables | Traces & Taps | Route Recording**
-  - Create app: full page at `/apps/new` (not a modal)
-  - Deployment progress: `ui/src/components/DeploymentProgress.tsx` (7-stage step indicator)
+- **Deployments** — unified app deployment page (`ui/src/pages/AppsTab/`)
+  - Routes: `/apps` (list, `AppListView` in `AppsTab.tsx`), `/apps/new` + `/apps/:slug` (both render `AppDeploymentPage`).
+  - Identity & Artifact section always visible; name editable pre-first-deploy, read-only after. JAR picker client-stages; new JAR + any form edits flip the primary button from `Save` to `Redeploy`. Environment fixed to the currently-selected env (no selector).
+  - Config sub-tabs: **Monitoring | Resources | Variables | Sensitive Keys | Deployment | ● Traces & Taps | ● Route Recording**. The four staged tabs feed dirty detection; the `●` live tabs apply in real-time (amber LiveBanner + default `?apply=live` on their writes) and never mark dirty.
+  - Primary action state machine: `Save` → `Uploading… N%` (during JAR upload; button shows percent with a tinted progress-fill overlay) → `Redeploy` → `Deploying…` during active deploy. Upload progress sourced from `useUploadJar` (XHR `upload.onprogress` → page-level `uploadPct` state). The button is disabled during `uploading` and `deploying`.
+  - Checkpoints render as a collapsible `CheckpointsTable` (default **collapsed**) **inside the Identity & Artifact `configGrid`** as an in-grid row (`Checkpoints | ▸ Expand (N)` / `▾ Collapse (N)`). `CheckpointsTable` returns a React.Fragment of grid-ready children so the label + trigger align with the other identity rows; when opened, a third grid child spans both columns via `grid-column: 1 / -1` so the 7-column table gets full width. Wired through `IdentitySection.checkpointsSlot` — `CheckpointDetailDrawer` stays in `IdentitySection.children` because it portals. Columns: Version · JAR (filename) · Deployed by · Deployed (relative `timeAgo` + user-locale sub-line via `new Date(iso).toLocaleString()`) · Strategy · Outcome · ›. Row click opens the drawer. Drawer tabs are ordered **Config | Logs** with `Config` as the default. Config panel has Snapshot / Diff vs current view modes. Replica filter in the Logs panel uses DS `Select`. Restore lives in the drawer footer (forces review). Visible row cap = `Environment.jarRetentionCount` (default 10 if 0/null); older rows accessible via "Show older (N)" expander. Currently-running deployment is excluded — represented separately by `StatusCard`. The empty-checkpoints case returns `null` (no row). The legacy `Checkpoints.tsx` row-list component is gone.
+  - Deployment tab: `StatusCard` + `DeploymentProgress` (during STARTING / FAILED) + flex-grow `StartupLogPanel` (no fixed maxHeight). Auto-activates when a deploy starts. The former `HistoryDisclosure` is retired — per-deployment config and logs live in the Checkpoints drawer. `StartupLogPanel` header mirrors the Runtime Application Log pattern: title + live/stopped badge + `N entries` + sort toggle (↑/↓, default **desc**) + refresh icon (`RefreshCw`). Sort drives the backend fetch via `useStartupLogs(…, sort)` so the 500-line limit returns the window closest to the user's interest; display order matches fetch order. Refresh scrolls to the latest edge (top for desc, bottom for asc). Sort + refresh buttons disable while a refetch is in flight. 3s polling while STARTING is unchanged.
+  - Unsaved-change router blocker uses DS `AlertDialog` (not `window.beforeunload`). Env switch intentionally discards edits without warning.

 **Admin pages** (ADMIN-only, under `/admin/`):
 - **Sensitive Keys** (`ui/src/pages/Admin/SensitiveKeysPage.tsx`) — global sensitive key masking config. Shows agent built-in defaults as outlined Badge reference, editable Tag pills for custom keys, amber-highlighted push-to-agents toggle. Keys add to (not replace) agent defaults. Per-app sensitive key additions managed via `ApplicationConfigController` API. Note: `AppConfigDetailPage.tsx` exists but is not routed in `router.tsx`.
+- **Server Metrics** (`ui/src/pages/Admin/ServerMetricsAdminPage.tsx`) — dashboard over the `server_metrics` ClickHouse table. Visibility matches Database/ClickHouse pages: gated on `capabilities.infrastructureEndpoints` in `buildAdminTreeNodes`; backend is `@ConditionalOnProperty(infrastructureendpoints) + @PreAuthorize('hasRole(ADMIN)')`. Uses the generic `/api/v1/admin/server-metrics/{catalog,instances,query}` API via `ui/src/api/queries/admin/serverMetrics.ts` hooks (`useServerMetricsCatalog`, `useServerMetricsInstances`, `useServerMetricsSeries`), all three of which take a `ServerMetricsRange = { from: Date; to: Date }`. Time range is driven by the global TopBar picker via `useGlobalFilters()` — no page-local selector; bucket size auto-scales through `stepSecondsFor(windowSeconds)` (10 s up to 1 h buckets). Toolbar is just server-instance badges. Sections: Server health (agents/ingestion/auth), JVM (memory/CPU/GC/threads), HTTP & DB pools, Alerting (conditional on catalog), Deployments (conditional on catalog). Each panel is a `ThemedChart` with `Line`/`Area` children from the design system; multi-series responses are flattened into overlap rows by bucket timestamp. Alerting and Deployments rows are hidden when their metrics aren't in the catalog (zero-deploy / alerting-disabled installs).

 ## Key UI Files

@@ -25,6 +30,8 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
 - `ui/src/auth/auth-store.ts` — Zustand: accessToken, user, roles, login/logout
 - `ui/src/api/environment-store.ts` — Zustand: selected environment (localStorage)
 - `ui/src/components/ContentTabs.tsx` — main tab switcher
+- `ui/src/components/EnvironmentSwitcherButton.tsx` + `EnvironmentSwitcherModal.tsx` — explicit env picker (button in TopBar; DS `Modal`-based list). Replaces the retired `EnvironmentSelector` (All-Envs dropdown). When `envRecords.length > 0` and the stored `selectedEnv` no longer matches any env, `LayoutShell` opens the modal in `forced` mode (non-dismissible). Switcher pulls env records from `useEnvironments()` (admin endpoint; readable by VIEWER+).
+- `ui/src/components/env-colors.ts` + `ui/src/styles/env-colors.css` — 8-swatch preset palette for the per-environment color indicator. Tokens `--env-color-slate/red/amber/green/teal/blue/purple/pink` are defined for both light and dark themes. `envColorVar(name)` falls back to `slate` for unknown values. `LayoutShell` renders a 3px fixed top bar in the current env's color (z-index 900, below DS modals).
 - `ui/src/components/ExecutionDiagram/` — interactive trace view (canvas)
 - `ui/src/components/ProcessDiagram/` — ELK-rendered route diagram
 - `ui/src/hooks/useScope.ts` — TabKey type, scope inference
@@ -33,6 +40,7 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
 - `ui/src/api/queries/agents.ts` — `useAgents` for agent list, `useInfiniteAgentEvents` for cursor-paginated timeline stream
 - `ui/src/hooks/useInfiniteStream.ts` — tanstack `useInfiniteQuery` wrapper with top-gated auto-refetch, flattened `items[]`, and `refresh()` invalidator
 - `ui/src/components/InfiniteScrollArea.tsx` — scrollable container with IntersectionObserver top/bottom sentinels. Streaming log/event views use this + `useInfiniteStream`. Bounded views (LogTab, StartupLogPanel) keep `useLogs`/`useStartupLogs`
+- `ui/src/components/SideDrawer.tsx` — project-local right-slide drawer (DS has Modal but no Drawer). Portal-rendered, ESC + transparent-backdrop click closes, sticky header/footer, sizes md/lg/xl. Currently consumed only by `CheckpointDetailDrawer` — promote to `@cameleer/design-system` once a second consumer appears.

 ## Alerts

--- a/.gitea/workflows/ci.yml
+++ b/.gitea/workflows/ci.yml
@@ -5,8 +5,20 @@ on:
    branches: [main, 'feature/**', 'fix/**', 'feat/**']
    tags-ignore:
      - 'v*'
+    paths-ignore:
+      - '.planning/**'
+      - 'docs/**'
+      - '**/*.md'
+      - '.claude/**'
+      - 'AGENTS.md'
+      - 'CLAUDE.md'
  pull_request:
    branches: [main]
+    paths-ignore:
+      - '.planning/**'
+      - 'docs/**'
+      - '**/*.md'
+      - '.claude/**'
  delete:

 jobs:
@@ -45,11 +57,25 @@ jobs:
          key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
          restore-keys: ${{ runner.os }}-maven-

+      - name: Cache npm registry
+        uses: actions/cache@v4
+        with:
+          path: ~/.npm
+          key: ${{ runner.os }}-npm-${{ hashFiles('ui/package-lock.json') }}
+          restore-keys: ${{ runner.os }}-npm-
+
+      - name: Cache Vite build artifacts
+        uses: actions/cache@v4
+        with:
+          path: ui/node_modules/.vite
+          key: ${{ runner.os }}-vite-${{ hashFiles('ui/package-lock.json', 'ui/vite.config.ts') }}
+          restore-keys: ${{ runner.os }}-vite-
+
      - name: Build UI
        working-directory: ui
        run: |
          echo '//gitea.siegeln.net/api/packages/cameleer/npm/:_authToken=${REGISTRY_TOKEN}' >> .npmrc
-          npm ci
+          npm ci --prefer-offline --no-audit --fund=false
          npm run build
        env:
          REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,7 +1,7 @@
 <!-- gitnexus:start -->
 # GitNexus — Code Intelligence

-This project is indexed by GitNexus as **cameleer-server** (8778 symbols, 22647 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
+This project is indexed by GitNexus as **cameleer-server** (9731 symbols, 24987 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.

 > If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.

--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -22,8 +22,19 @@ Cameleer Server — observability server that receives, stores, and serves Camel
 ```bash
 mvn clean compile          # Compile all modules
 mvn clean verify           # Full build with tests
+mvn clean verify -DskipITs # Fast: unit tests only (no Testcontainers)
 ```

+### Faster local builds
+
+- **Surefire reuses forks** (`cameleer-server-app/pom.xml`): unit tests run with `forkCount=1C` + `reuseForks=true` — one JVM per CPU core, reused across classes. Test classes that mutate static state must clean up after themselves.
+- **Testcontainers reuse** — opt-in per developer. Add to `~/.testcontainers.properties`:
+  ```
+  testcontainers.reuse.enable=true
+  ```
+  Then `AbstractPostgresIT` containers persist across `mvn verify` runs (saves ~20s per run). Stop them manually when you need a clean DB: `docker rm -f $(docker ps -aq --filter label=org.testcontainers.reuse=true)`.
+- **UI build** dropped redundant `tsc --noEmit` from `npm run build` (Vite/esbuild type-checks during bundling). Run `npm run typecheck` explicitly when you want a full type-check pass.
+
 ## Run

 ```bash
@@ -85,7 +96,7 @@ When adding, removing, or renaming classes, controllers, endpoints, UI component
 <!-- gitnexus:start -->
 # GitNexus — Code Intelligence

-This project is indexed by GitNexus as **cameleer-server** (8778 symbols, 22647 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
+This project is indexed by GitNexus as **cameleer-server** (9731 symbols, 24987 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.

 > If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.

--- a/HOWTO.md
+++ b/HOWTO.md
@@ -499,6 +499,7 @@ Key settings in `cameleer-server-app/src/main/resources/application.yml`. All cu
 | `cameleer.server.runtime.routingmode` | `path` | `CAMELEER_SERVER_RUNTIME_ROUTINGMODE` | `path` or `subdomain` Traefik routing |
 | `cameleer.server.runtime.routingdomain` | `localhost` | `CAMELEER_SERVER_RUNTIME_ROUTINGDOMAIN` | Domain for Traefik routing labels |
 | `cameleer.server.runtime.serverurl` | *(empty)* | `CAMELEER_SERVER_RUNTIME_SERVERURL` | Server URL injected into app containers |
+| `cameleer.server.runtime.certresolver` | *(empty)* | `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` | Traefik TLS cert resolver name (e.g. `letsencrypt`). Blank = omit the `tls.certresolver` label and let Traefik serve the default TLS-store cert |
 | `cameleer.server.runtime.agenthealthport` | `9464` | `CAMELEER_SERVER_RUNTIME_AGENTHEALTHPORT` | Agent health check port |
 | `cameleer.server.runtime.healthchecktimeout` | `60` | `CAMELEER_SERVER_RUNTIME_HEALTHCHECKTIMEOUT` | Health check timeout (seconds) |
 | `cameleer.server.runtime.container.memorylimit` | `512m` | `CAMELEER_SERVER_RUNTIME_CONTAINER_MEMORYLIMIT` | Default memory limit for app containers |
--- a/cameleer-server-app/pom.xml
+++ b/cameleer-server-app/pom.xml
@@ -189,8 +189,8 @@
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
-                    <forkCount>1</forkCount>
-                    <reuseForks>false</reuseForks>
+                    <forkCount>1C</forkCount>
+                    <reuseForks>true</reuseForks>
                </configuration>
            </plugin>
            <plugin>
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java
@@ -16,7 +16,8 @@ public record AlertingProperties(
        Integer eventRetentionDays,
        Integer notificationRetentionDays,
        Integer webhookTimeoutMs,
-        Integer webhookMaxAttempts) {
+        Integer webhookMaxAttempts,
+        Integer perExchangeDeployBacklogCapSeconds) {

    public int effectiveEvaluatorTickIntervalMs() {
        int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs;
@@ -70,4 +71,9 @@ public record AlertingProperties(
    public int cbCooldownSeconds() {
        return circuitBreakerCooldownSeconds == null ? 60 : circuitBreakerCooldownSeconds;
    }
+
+    public int effectivePerExchangeDeployBacklogCapSeconds() {
+        // Default 24 h. Zero or negative = disabled (no clamp — first-run uses rule.createdAt as today).
+        return perExchangeDeployBacklogCapSeconds == null ? 86_400 : perExchangeDeployBacklogCapSeconds;
+    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertRuleController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertRuleController.java
@@ -22,6 +22,7 @@ import com.cameleer.server.core.alerting.AlertRuleRepository;
 import com.cameleer.server.core.alerting.AlertRuleTarget;
 import com.cameleer.server.core.alerting.ConditionKind;
 import com.cameleer.server.core.alerting.ExchangeMatchCondition;
+import com.cameleer.server.core.alerting.FireMode;
 import com.cameleer.server.core.alerting.WebhookBinding;
 import com.cameleer.server.core.outbound.OutboundConnection;
 import com.cameleer.server.core.outbound.OutboundConnectionService;
@@ -126,6 +127,7 @@ public class AlertRuleController {
            HttpServletRequest httpRequest) {

        validateAttributeKeys(req.condition());
+        validateBusinessRules(req);
        validateWebhooks(req.webhooks(), env.id());

        AlertRule draft = buildRule(null, env.id(), req, currentUserId());
@@ -147,6 +149,7 @@ public class AlertRuleController {

        AlertRule existing = requireRule(id, env.id());
        validateAttributeKeys(req.condition());
+        validateBusinessRules(req);
        validateWebhooks(req.webhooks(), env.id());

        AlertRule updated = buildRule(existing, env.id(), req, currentUserId());
@@ -258,6 +261,36 @@ public class AlertRuleController {
    // Helpers
    // -------------------------------------------------------------------------

+    /**
+     * Cross-field business-rule validation for {@link AlertRuleRequest}.
+     *
+     * <p>PER_EXCHANGE rules: re-notify and for-duration are nonsensical (each fire is its own
+     * exchange — there's no "still firing" window and nothing to re-notify about). Reject 400
+     * if either is non-zero.
+     *
+     * <p>All rules: reject 400 if both webhooks and targets are empty — such a rule can never
+     * notify anyone and is a pure footgun.
+     */
+    private void validateBusinessRules(AlertRuleRequest req) {
+        if (req.condition() instanceof ExchangeMatchCondition ex
+                && ex.fireMode() == FireMode.PER_EXCHANGE) {
+            if (req.reNotifyMinutes() != null && req.reNotifyMinutes() != 0) {
+                throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                        "reNotifyMinutes must be 0 for PER_EXCHANGE rules (re-notify does not apply)");
+            }
+            if (req.forDurationSeconds() != null && req.forDurationSeconds() != 0) {
+                throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                        "forDurationSeconds must be 0 for PER_EXCHANGE rules");
+            }
+        }
+        boolean noWebhooks = req.webhooks() == null || req.webhooks().isEmpty();
+        boolean noTargets  = req.targets()  == null || req.targets().isEmpty();
+        if (noWebhooks && noTargets) {
+            throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                    "rule must have at least one webhook or target — otherwise it never notifies anyone");
+        }
+    }
+
    /**
     * Validates that all attribute keys in an {@link ExchangeMatchCondition} match
     * {@code ^[a-zA-Z0-9._-]+$}. Keys are inlined into ClickHouse SQL, making this
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentLifecycleEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentLifecycleEvaluator.java
@@ -64,13 +64,13 @@ public class AgentLifecycleEvaluator implements ConditionEvaluator<AgentLifecycl
        List<AgentEventRecord> matches = eventRepo.findInWindow(
                envSlug, appSlug, agentId, typeNames, from, to, MAX_EVENTS_PER_TICK);

-        if (matches.isEmpty()) return new EvalResult.Batch(List.of());
+        if (matches.isEmpty()) return new EvalResult.Batch(List.of(), Map.of());

        List<EvalResult.Firing> firings = new ArrayList<>(matches.size());
        for (AgentEventRecord ev : matches) {
            firings.add(toFiring(ev));
        }
-        return new EvalResult.Batch(firings);
+        return new EvalResult.Batch(firings, Map.of());
    }

    private static EvalResult.Firing toFiring(AgentEventRecord ev) {
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java
@@ -47,6 +47,7 @@ public class AlertEvaluatorJob implements SchedulingConfigurer {
    private final NotificationContextBuilder contextBuilder;
    private final EnvironmentRepository environmentRepo;
    private final ObjectMapper objectMapper;
+    private final BatchResultApplier batchResultApplier;
    private final String instanceId;
    private final String tenantId;
    private final Clock clock;
@@ -64,26 +65,28 @@ public class AlertEvaluatorJob implements SchedulingConfigurer {
            NotificationContextBuilder contextBuilder,
            EnvironmentRepository environmentRepo,
            ObjectMapper objectMapper,
+            BatchResultApplier batchResultApplier,
            @Qualifier("alertingInstanceId") String instanceId,
            @Value("${cameleer.server.tenant.id:default}") String tenantId,
            Clock alertingClock,
            AlertingMetrics metrics) {

-        this.props           = props;
-        this.ruleRepo        = ruleRepo;
-        this.instanceRepo    = instanceRepo;
-        this.notificationRepo = notificationRepo;
-        this.evaluators      = evaluatorList.stream()
+        this.props              = props;
+        this.ruleRepo           = ruleRepo;
+        this.instanceRepo       = instanceRepo;
+        this.notificationRepo   = notificationRepo;
+        this.evaluators         = evaluatorList.stream()
                .collect(Collectors.toMap(ConditionEvaluator::kind, e -> e));
-        this.circuitBreaker  = circuitBreaker;
-        this.renderer        = renderer;
-        this.contextBuilder  = contextBuilder;
-        this.environmentRepo = environmentRepo;
-        this.objectMapper    = objectMapper;
-        this.instanceId      = instanceId;
-        this.tenantId        = tenantId;
-        this.clock           = alertingClock;
-        this.metrics         = metrics;
+        this.circuitBreaker     = circuitBreaker;
+        this.renderer           = renderer;
+        this.contextBuilder     = contextBuilder;
+        this.environmentRepo    = environmentRepo;
+        this.objectMapper       = objectMapper;
+        this.batchResultApplier = batchResultApplier;
+        this.instanceId         = instanceId;
+        this.tenantId           = tenantId;
+        this.clock              = alertingClock;
+        this.metrics            = metrics;
    }

    // -------------------------------------------------------------------------
@@ -112,21 +115,61 @@ public class AlertEvaluatorJob implements SchedulingConfigurer {

        for (AlertRule rule : claimed) {
            Instant nextRun = Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds());
+            if (circuitBreaker.isOpen(rule.conditionKind())) {
+                log.debug("Circuit breaker open for {}; skipping rule {}", rule.conditionKind(), rule.id());
+                reschedule(rule, nextRun);
+                continue;
+            }
+
+            EvalResult result;
            try {
-                if (circuitBreaker.isOpen(rule.conditionKind())) {
-                    log.debug("Circuit breaker open for {}; skipping rule {}", rule.conditionKind(), rule.id());
-                    continue;
-                }
-                EvalResult result = metrics.evalDuration(rule.conditionKind())
+                result = metrics.evalDuration(rule.conditionKind())
                        .recordCallable(() -> evaluateSafely(rule, ctx));
-                applyResult(rule, result);
-                circuitBreaker.recordSuccess(rule.conditionKind());
            } catch (Exception e) {
                metrics.evalError(rule.conditionKind(), rule.id());
                circuitBreaker.recordFailure(rule.conditionKind());
                log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString());
-            } finally {
+                // Evaluation itself failed — release the claim so the rule can be
+                // retried on the next tick. Cursor stays put.
                reschedule(rule, nextRun);
+                continue;
+            }
+
+            if (result instanceof EvalResult.Batch b) {
+                // Phase 2: the Batch path is atomic. The @Transactional apply() on
+                // BatchResultApplier wraps instance writes, notification enqueues,
+                // AND the cursor advance + releaseClaim into a single tx. A
+                // mid-batch fault rolls everything back — including the cursor —
+                // so the next tick replays the whole batch exactly once.
+                try {
+                    batchResultApplier.apply(rule, b, nextRun);
+                    circuitBreaker.recordSuccess(rule.conditionKind());
+                } catch (Exception e) {
+                    metrics.evalError(rule.conditionKind(), rule.id());
+                    circuitBreaker.recordFailure(rule.conditionKind());
+                    log.warn("Batch apply failed for rule {} ({}): {} — rolling back; next tick will retry",
+                            rule.id(), rule.conditionKind(), e.toString());
+                    // The transaction rolled back. Do NOT call reschedule here —
+                    // leaving claim + next_evaluation_at as they were means the
+                    // claim TTL takes over and the rule becomes due on its own.
+                    // Rethrowing is unnecessary for correctness — the cursor
+                    // stayed put, so exactly-once-per-exchange is preserved.
+                }
+            } else {
+                // Non-Batch path (FIRING / Clear / Error): classic apply + rule
+                // reschedule. Not wrapped in a single tx — semantics unchanged
+                // from pre-Phase-2.
+                try {
+                    applyResult(rule, result);
+                    circuitBreaker.recordSuccess(rule.conditionKind());
+                } catch (Exception e) {
+                    metrics.evalError(rule.conditionKind(), rule.id());
+                    circuitBreaker.recordFailure(rule.conditionKind());
+                    log.warn("applyResult failed for rule {} ({}): {}",
+                            rule.id(), rule.conditionKind(), e.toString());
+                } finally {
+                    reschedule(rule, nextRun);
+                }
            }
        }

@@ -171,14 +214,10 @@ public class AlertEvaluatorJob implements SchedulingConfigurer {
    // -------------------------------------------------------------------------

    private void applyResult(AlertRule rule, EvalResult result) {
-        if (result instanceof EvalResult.Batch b) {
-            // PER_EXCHANGE mode: each Firing in the batch creates its own AlertInstance
-            for (EvalResult.Firing f : b.firings()) {
-                applyBatchFiring(rule, f);
-            }
-            return;
-        }
-
+        // Note: the Batch path is handled by BatchResultApplier (transactional) —
+        // tick() routes Batch results there directly and never calls applyResult
+        // for them. This method only handles FIRING / Clear / Error state-machine
+        // transitions for the classic (non-PER_EXCHANGE) path.
        AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null);
        Instant now = Instant.now(clock);

@@ -199,19 +238,6 @@ public class AlertEvaluatorJob implements SchedulingConfigurer {
        });
    }

-    /**
-     * Batch (PER_EXCHANGE) mode: always create a fresh FIRING instance per Firing entry.
-     * No forDuration check — each exchange is its own event.
-     */
-    private void applyBatchFiring(AlertRule rule, EvalResult.Firing f) {
-        Instant now = Instant.now(clock);
-        AlertInstance instance = AlertStateTransitions.newInstance(rule, f, AlertState.FIRING, now)
-                .withRuleSnapshot(snapshotRule(rule));
-        AlertInstance enriched = enrichTitleMessage(rule, instance);
-        AlertInstance persisted = instanceRepo.save(enriched);
-        enqueueNotifications(rule, persisted, now);
-    }
-
    // -------------------------------------------------------------------------
    // Title / message rendering
    // -------------------------------------------------------------------------
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/BatchResultApplier.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/BatchResultApplier.java
@@ -0,0 +1,144 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.app.alerting.notify.MustacheRenderer;
+import com.cameleer.server.app.alerting.notify.NotificationContextBuilder;
+import com.cameleer.server.core.alerting.*;
+import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.stereotype.Component;
+import org.springframework.transaction.annotation.Transactional;
+
+import java.time.Clock;
+import java.time.Instant;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * Applies a {@link EvalResult.Batch} result to persistent state inside a single
+ * transaction: instance writes, notification enqueues, and the rule's cursor
+ * advance + {@code releaseClaim} either all commit or all roll back together.
+ * <p>
+ * Lives in its own bean so the {@code @Transactional} annotation engages via the
+ * Spring proxy when invoked from {@link AlertEvaluatorJob#tick()}; calling it as
+ * {@code this.apply(...)} from {@code AlertEvaluatorJob} (a bean calling its own
+ * method) would bypass the proxy and silently disable the transaction.
+ * <p>
+ * Phase 2 of the per-exchange exactly-once plan (see
+ * {@code docs/superpowers/plans/2026-04-22-per-exchange-exactly-once.md}).
+ */
+@Component
+public class BatchResultApplier {
+
+    private static final Logger log = LoggerFactory.getLogger(BatchResultApplier.class);
+
+    private final AlertRuleRepository ruleRepo;
+    private final AlertInstanceRepository instanceRepo;
+    private final AlertNotificationRepository notificationRepo;
+    private final MustacheRenderer renderer;
+    private final NotificationContextBuilder contextBuilder;
+    private final EnvironmentRepository environmentRepo;
+    private final ObjectMapper objectMapper;
+    private final Clock clock;
+
+    public BatchResultApplier(
+            AlertRuleRepository ruleRepo,
+            AlertInstanceRepository instanceRepo,
+            AlertNotificationRepository notificationRepo,
+            MustacheRenderer renderer,
+            NotificationContextBuilder contextBuilder,
+            EnvironmentRepository environmentRepo,
+            ObjectMapper objectMapper,
+            Clock alertingClock) {
+        this.ruleRepo         = ruleRepo;
+        this.instanceRepo     = instanceRepo;
+        this.notificationRepo = notificationRepo;
+        this.renderer         = renderer;
+        this.contextBuilder   = contextBuilder;
+        this.environmentRepo  = environmentRepo;
+        this.objectMapper     = objectMapper;
+        this.clock            = alertingClock;
+    }
+
+    /**
+     * Atomically apply a Batch result for a single rule:
+     * <ol>
+     *   <li>persist a FIRING instance per firing + enqueue its notifications</li>
+     *   <li>advance the rule's cursor ({@code evalState}) iff the batch supplied one</li>
+     *   <li>release the claim with the new {@code nextRun} + {@code evalState}</li>
+     * </ol>
+     * Any exception thrown from the repo calls rolls back every write — including
+     * the cursor advance — so the rule is replayable on the next tick.
+     */
+    @Transactional
+    public void apply(AlertRule rule, EvalResult.Batch batch, Instant nextRun) {
+        for (EvalResult.Firing f : batch.firings()) {
+            applyBatchFiring(rule, f);
+        }
+        Map<String, Object> nextEvalState =
+                batch.nextEvalState().isEmpty() ? rule.evalState() : batch.nextEvalState();
+        ruleRepo.releaseClaim(rule.id(), nextRun, nextEvalState);
+    }
+
+    /**
+     * Batch (PER_EXCHANGE) mode: always create a fresh FIRING instance per Firing entry.
+     * No forDuration check — each exchange is its own event.
+     */
+    private void applyBatchFiring(AlertRule rule, EvalResult.Firing f) {
+        Instant now = Instant.now(clock);
+        AlertInstance instance = AlertStateTransitions.newInstance(rule, f, AlertState.FIRING, now)
+                .withRuleSnapshot(snapshotRule(rule));
+        AlertInstance enriched = enrichTitleMessage(rule, instance);
+        AlertInstance persisted = instanceRepo.save(enriched);
+        enqueueNotifications(rule, persisted, now);
+    }
+
+    private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance) {
+        Environment env = environmentRepo.findById(rule.environmentId()).orElse(null);
+        Map<String, Object> ctx = contextBuilder.build(rule, instance, env, null);
+        String title   = renderer.render(rule.notificationTitleTmpl(), ctx);
+        String message = renderer.render(rule.notificationMessageTmpl(), ctx);
+        return instance.withTitleMessage(title, message);
+    }
+
+    private void enqueueNotifications(AlertRule rule, AlertInstance instance, Instant now) {
+        for (WebhookBinding w : rule.webhooks()) {
+            Map<String, Object> payload = buildPayload(rule, instance);
+            notificationRepo.save(new AlertNotification(
+                    UUID.randomUUID(),
+                    instance.id(),
+                    w.id(),
+                    w.outboundConnectionId(),
+                    NotificationStatus.PENDING,
+                    0,
+                    now,
+                    null, null, null, null,
+                    payload,
+                    null,
+                    now));
+        }
+    }
+
+    private Map<String, Object> buildPayload(AlertRule rule, AlertInstance instance) {
+        Environment env = environmentRepo.findById(rule.environmentId()).orElse(null);
+        return contextBuilder.build(rule, instance, env, null);
+    }
+
+    @SuppressWarnings("unchecked")
+    private Map<String, Object> snapshotRule(AlertRule rule) {
+        try {
+            Map<String, Object> raw = objectMapper.convertValue(rule, Map.class);
+            // Map.copyOf (used in AlertInstance compact ctor) rejects null values —
+            // strip them so the snapshot is safe to store.
+            Map<String, Object> safe = new LinkedHashMap<>();
+            raw.forEach((k, v) -> { if (v != null) safe.put(k, v); });
+            return safe;
+        } catch (Exception e) {
+            log.warn("Failed to snapshot rule {}: {}", rule.id(), e.getMessage());
+            return Map.of("id", rule.id().toString(), "name", rule.name());
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/EvalResult.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/EvalResult.java
@@ -17,9 +17,14 @@ public sealed interface EvalResult {

    record Error(Throwable cause) implements EvalResult {}

-    record Batch(List<Firing> firings) implements EvalResult {
+    record Batch(List<Firing> firings, Map<String, Object> nextEvalState) implements EvalResult {
        public Batch {
            firings = firings == null ? List.of() : List.copyOf(firings);
+            nextEvalState = nextEvalState == null ? Map.of() : Map.copyOf(nextEvalState);
+        }
+        /** Convenience: a Batch with no cursor update (first-run empty, or no matches). */
+        public static Batch empty() {
+            return new Batch(List.of(), Map.of());
        }
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ExchangeMatchEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ExchangeMatchEvaluator.java
@@ -1,5 +1,6 @@
 package com.cameleer.server.app.alerting.eval;

+import com.cameleer.server.app.alerting.config.AlertingProperties;
 import com.cameleer.server.app.search.ClickHouseSearchIndex;
 import com.cameleer.server.core.alerting.AlertMatchSpec;
 import com.cameleer.server.core.alerting.AlertRule;
@@ -14,6 +15,7 @@ import org.springframework.stereotype.Component;

 import java.time.Instant;
 import java.util.ArrayList;
+import java.util.Comparator;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
@@ -23,10 +25,14 @@ public class ExchangeMatchEvaluator implements ConditionEvaluator<ExchangeMatchC

    private final ClickHouseSearchIndex searchIndex;
    private final EnvironmentRepository envRepo;
+    private final AlertingProperties alertingProperties;

-    public ExchangeMatchEvaluator(ClickHouseSearchIndex searchIndex, EnvironmentRepository envRepo) {
-        this.searchIndex = searchIndex;
-        this.envRepo     = envRepo;
+    public ExchangeMatchEvaluator(ClickHouseSearchIndex searchIndex,
+                                  EnvironmentRepository envRepo,
+                                  AlertingProperties alertingProperties) {
+        this.searchIndex        = searchIndex;
+        this.envRepo            = envRepo;
+        this.alertingProperties = alertingProperties;
    }

    @Override
@@ -85,19 +91,31 @@ public class ExchangeMatchEvaluator implements ConditionEvaluator<ExchangeMatchC
        String routeId = c.scope() != null ? c.scope().routeId() : null;
        ExchangeMatchCondition.ExchangeFilter filter = c.filter();

-        // Resolve cursor from evalState
-        Instant cursor = null;
-        Object raw = rule.evalState().get("lastExchangeTs");
+        // Resolve composite cursor: (startTime, executionId)
+        Instant cursorTs;
+        String cursorId;
+        Object raw = rule.evalState().get("lastExchangeCursor");
        if (raw instanceof String s && !s.isBlank()) {
-            try { cursor = Instant.parse(s); } catch (Exception ignored) {}
-        } else if (raw instanceof Instant i) {
-            cursor = i;
+            int pipe = s.indexOf('|');
+            if (pipe < 0) {
+                // Malformed — treat as first-run (with deploy-backlog-cap clamp).
+                cursorTs = firstRunCursorTs(rule, ctx);
+                cursorId = "";
+            } else {
+                cursorTs = Instant.parse(s.substring(0, pipe));
+                cursorId = s.substring(pipe + 1);
+            }
+        } else {
+            // First run — bounded by rule.createdAt, empty executionId so any real id sorts after it.
+            // Clamp to deploy-backlog-cap to avoid backlog flooding for long-lived rules on first
+            // post-deploy tick. Normal-advance path (valid cursor above) is intentionally unaffected.
+            cursorTs = firstRunCursorTs(rule, ctx);
+            cursorId = "";
        }

-        // Build SearchRequest — use cursor as timeFrom so we only see exchanges after last run
        var req = new SearchRequest(
                filter != null ? filter.status() : null,
-                cursor,                          // timeFrom = cursor (or null for first run)
+                cursorTs,                        // timeFrom
                ctx.now(),                       // timeTo
                null, null, null,                // durationMin/Max, correlationId
                null, null, null, null,          // text variants
@@ -110,23 +128,26 @@ public class ExchangeMatchEvaluator implements ConditionEvaluator<ExchangeMatchC
                50,
                "startTime",
                "asc",                           // asc so we process oldest first
+                cursorId.isEmpty() ? null : cursorId,  // afterExecutionId — null on first run enables >=
                envSlug
        );

        SearchResult<ExecutionSummary> result = searchIndex.search(req);
        List<ExecutionSummary> matches = result.data();

-        if (matches.isEmpty()) return new EvalResult.Batch(List.of());
+        if (matches.isEmpty()) return EvalResult.Batch.empty();

-        // Find the latest startTime across all matches — becomes the next cursor
-        Instant latestTs = matches.stream()
-                .map(ExecutionSummary::startTime)
-                .max(Instant::compareTo)
-                .orElse(ctx.now());
+        // Ensure deterministic ordering for cursor advance
+        matches = new ArrayList<>(matches);
+        matches.sort(Comparator
+                .comparing(ExecutionSummary::startTime)
+                .thenComparing(ExecutionSummary::executionId));
+
+        ExecutionSummary last = matches.get(matches.size() - 1);
+        String nextCursorSerialized = last.startTime().toString() + "|" + last.executionId();

        List<EvalResult.Firing> firings = new ArrayList<>();
-        for (int i = 0; i < matches.size(); i++) {
-            ExecutionSummary ex = matches.get(i);
+        for (ExecutionSummary ex : matches) {
            Map<String, Object> ctx2 = new HashMap<>();
            ctx2.put("exchange", Map.of(
                    "id",        ex.executionId(),
@@ -135,15 +156,32 @@ public class ExchangeMatchEvaluator implements ConditionEvaluator<ExchangeMatchC
                    "startTime", ex.startTime() == null ? "" : ex.startTime().toString()
            ));
            ctx2.put("app", Map.of("slug", ex.applicationId() == null ? "" : ex.applicationId()));
-
-            // Attach the next-cursor to the last firing so the job can extract it
-            if (i == matches.size() - 1) {
-                ctx2.put("_nextCursor", latestTs);
-            }
-
            firings.add(new EvalResult.Firing(1.0, null, ctx2));
        }

-        return new EvalResult.Batch(firings);
+        Map<String, Object> nextEvalState = new HashMap<>(rule.evalState());
+        nextEvalState.put("lastExchangeCursor", nextCursorSerialized);
+        return new EvalResult.Batch(firings, nextEvalState);
+    }
+
+    /**
+     * First-run cursor timestamp: {@code rule.createdAt()}, clamped to
+     * {@code now - perExchangeDeployBacklogCapSeconds} so a long-lived PER_EXCHANGE rule
+     * doesn't scan from its creation date forward on first post-deploy tick.
+     * <p>
+     * Cap ≤ 0 disables the clamp (first-run falls back to {@code rule.createdAt()} verbatim).
+     * Applied only on first-run / malformed-cursor paths — the normal-advance path is
+     * intentionally unaffected so legitimate missed ticks are not silently skipped.
+     */
+    private Instant firstRunCursorTs(AlertRule rule, EvalContext ctx) {
+        Instant cursorTs = rule.createdAt();
+        int capSeconds = alertingProperties.effectivePerExchangeDeployBacklogCapSeconds();
+        if (capSeconds > 0) {
+            Instant capFloor = ctx.now().minusSeconds(capSeconds);
+            if (cursorTs == null || cursorTs.isBefore(capFloor)) {
+                cursorTs = capFloor;
+            }
+        }
+        return cursorTs;
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/LogPatternEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/LogPatternEvaluator.java
@@ -61,7 +61,8 @@ public class LogPatternEvaluator implements ConditionEvaluator<LogPatternConditi
                    to,
                    null,   // cursor
                    1,      // limit (count query; value irrelevant)
-                    "desc"  // sort
+                    "desc", // sort
+                    null    // instanceIds
            );
            return logStore.countLogs(req);
        });
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/config/RuntimeBeanConfig.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/config/RuntimeBeanConfig.java
@@ -9,6 +9,7 @@ import com.cameleer.server.core.runtime.AppService;
 import com.cameleer.server.core.runtime.AppVersionRepository;
 import com.cameleer.server.core.runtime.DeploymentRepository;
 import com.cameleer.server.core.runtime.DeploymentService;
+import com.cameleer.server.core.runtime.DirtyStateCalculator;
 import com.cameleer.server.core.runtime.EnvironmentRepository;
 import com.cameleer.server.core.runtime.EnvironmentService;
 import com.fasterxml.jackson.databind.ObjectMapper;
@@ -64,6 +65,11 @@ public class RuntimeBeanConfig {
        return new DeploymentService(deployRepo, appService, envService);
    }

+    @Bean
+    public DirtyStateCalculator dirtyStateCalculator(ObjectMapper objectMapper) {
+        return new DirtyStateCalculator(objectMapper);
+    }
+
    @Bean(name = "deploymentTaskExecutor")
    public Executor deploymentTaskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/config/StorageBeanConfig.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/config/StorageBeanConfig.java
@@ -9,6 +9,8 @@ import com.cameleer.server.app.storage.ClickHouseRouteCatalogStore;
 import com.cameleer.server.core.storage.RouteCatalogStore;
 import com.cameleer.server.app.storage.ClickHouseMetricsQueryStore;
 import com.cameleer.server.app.storage.ClickHouseMetricsStore;
+import com.cameleer.server.app.storage.ClickHouseServerMetricsQueryStore;
+import com.cameleer.server.app.storage.ClickHouseServerMetricsStore;
 import com.cameleer.server.app.storage.ClickHouseStatsStore;
 import com.cameleer.server.core.admin.AuditRepository;
 import com.cameleer.server.core.admin.AuditService;
@@ -67,6 +69,19 @@ public class StorageBeanConfig {
        return new ClickHouseMetricsQueryStore(tenantProperties.getId(), clickHouseJdbc);
    }

+    @Bean
+    public ServerMetricsStore clickHouseServerMetricsStore(
+            @Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
+        return new ClickHouseServerMetricsStore(clickHouseJdbc);
+    }
+
+    @Bean
+    public ServerMetricsQueryStore clickHouseServerMetricsQueryStore(
+            TenantProperties tenantProperties,
+            @Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
+        return new ClickHouseServerMetricsQueryStore(tenantProperties.getId(), clickHouseJdbc);
+    }
+
    // ── Execution Store ──────────────────────────────────────────────────

    @Bean
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AppController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AppController.java
@@ -1,14 +1,24 @@
 package com.cameleer.server.app.controller;

+import com.cameleer.common.model.ApplicationConfig;
+import com.cameleer.server.app.dto.DirtyStateResponse;
+import com.cameleer.server.app.storage.PostgresApplicationConfigRepository;
+import com.cameleer.server.app.storage.PostgresDeploymentRepository;
 import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.runtime.App;
 import com.cameleer.server.core.runtime.AppService;
 import com.cameleer.server.core.runtime.AppVersion;
+import com.cameleer.server.core.runtime.AppVersionRepository;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentConfigSnapshot;
+import com.cameleer.server.core.runtime.DirtyStateCalculator;
+import com.cameleer.server.core.runtime.DirtyStateResult;
 import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.runtime.RuntimeType;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
+import org.springframework.http.HttpStatus;
 import org.springframework.http.MediaType;
 import org.springframework.http.ResponseEntity;
 import org.springframework.security.access.prepost.PreAuthorize;
@@ -22,8 +32,10 @@ import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;
 import org.springframework.web.multipart.MultipartFile;
+import org.springframework.web.server.ResponseStatusException;

 import java.io.IOException;
+import java.util.Comparator;
 import java.util.List;
 import java.util.Map;
 import java.util.UUID;
@@ -40,9 +52,21 @@ import java.util.UUID;
 public class AppController {

    private final AppService appService;
+    private final AppVersionRepository appVersionRepository;
+    private final PostgresApplicationConfigRepository configRepository;
+    private final PostgresDeploymentRepository deploymentRepository;
+    private final DirtyStateCalculator dirtyCalc;

-    public AppController(AppService appService) {
+    public AppController(AppService appService,
+                         AppVersionRepository appVersionRepository,
+                         PostgresApplicationConfigRepository configRepository,
+                         PostgresDeploymentRepository deploymentRepository,
+                         DirtyStateCalculator dirtyCalc) {
        this.appService = appService;
+        this.appVersionRepository = appVersionRepository;
+        this.configRepository = configRepository;
+        this.deploymentRepository = deploymentRepository;
+        this.dirtyCalc = dirtyCalc;
    }

    @GetMapping
@@ -120,6 +144,47 @@ public class AppController {
        }
    }

+    @GetMapping("/{appSlug}/dirty-state")
+    @Operation(summary = "Check whether the app's current config differs from the last successful deploy",
+            description = "Returns dirty=true when the desired state (current JAR + agent config + container config) "
+                    + "would produce a changed deployment. When no successful deploy exists yet, dirty=true.")
+    @ApiResponse(responseCode = "200", description = "Dirty-state computed")
+    @ApiResponse(responseCode = "404", description = "App not found in this environment")
+    public ResponseEntity<DirtyStateResponse> getDirtyState(@EnvPath Environment env,
+                                                             @PathVariable String appSlug) {
+        App app;
+        try {
+            app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
+        } catch (IllegalArgumentException e) {
+            throw new ResponseStatusException(HttpStatus.NOT_FOUND, "App not found");
+        }
+
+        // Latest JAR version (newest first — findByAppId orders by version DESC)
+        List<AppVersion> versions = appVersionRepository.findByAppId(app.id());
+        UUID latestVersionId = versions.isEmpty() ? null
+                : versions.stream().max(Comparator.comparingInt(AppVersion::version))
+                          .map(AppVersion::id).orElse(null);
+
+        // Desired agent config
+        ApplicationConfig agentConfig = configRepository
+                .findByApplicationAndEnvironment(appSlug, env.slug())
+                .orElse(null);
+
+        // Container config
+        Map<String, Object> containerConfig = app.containerConfig();
+
+        // Last successful deployment snapshot
+        Deployment lastSuccessful = deploymentRepository
+                .findLatestSuccessfulByAppAndEnv(app.id(), env.id())
+                .orElse(null);
+        DeploymentConfigSnapshot snapshot = lastSuccessful != null ? lastSuccessful.deployedConfigSnapshot() : null;
+
+        DirtyStateResult result = dirtyCalc.compute(latestVersionId, agentConfig, containerConfig, snapshot);
+
+        String lastId = lastSuccessful != null ? lastSuccessful.id().toString() : null;
+        return ResponseEntity.ok(new DirtyStateResponse(result.dirty(), lastId, result.differences()));
+    }
+
    private static final java.util.regex.Pattern CUSTOM_ARGS_PATTERN =
            java.util.regex.Pattern.compile("^[-a-zA-Z0-9_.=:/\\s+\"']*$");

--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ApplicationConfigController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ApplicationConfigController.java
@@ -24,6 +24,7 @@ import com.cameleer.server.core.storage.DiagramStore;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import io.swagger.v3.oas.annotations.Operation;
+import io.swagger.v3.oas.annotations.Parameter;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
 import jakarta.servlet.http.HttpServletRequest;
@@ -33,6 +34,7 @@ import org.springframework.http.HttpStatus;
 import org.springframework.http.ResponseEntity;
 import org.springframework.security.core.Authentication;
 import org.springframework.web.bind.annotation.*;
+import org.springframework.web.server.ResponseStatusException;

 import java.util.ArrayList;
 import java.util.List;
@@ -108,13 +110,23 @@ public class ApplicationConfigController {

    @PutMapping("/apps/{appSlug}/config")
    @Operation(summary = "Update application config for this environment",
-            description = "Saves config and pushes CONFIG_UPDATE to LIVE agents of this application in the given environment")
-    @ApiResponse(responseCode = "200", description = "Config saved and pushed")
+            description = "Saves config. When apply=live (default), also pushes CONFIG_UPDATE to LIVE agents. "
+                    + "When apply=staged, persists without a live push — the next successful deploy applies it.")
+    @ApiResponse(responseCode = "200", description = "Config saved (and pushed if apply=live)")
+    @ApiResponse(responseCode = "400", description = "Unknown apply value (must be 'staged' or 'live')")
    public ResponseEntity<ConfigUpdateResponse> updateConfig(@EnvPath Environment env,
                                                              @PathVariable String appSlug,
+                                                              @Parameter(name = "apply",
+                                                                      description = "When to apply: 'live' (default) saves and pushes CONFIG_UPDATE to live agents immediately; 'staged' saves without pushing — the next successful deploy applies it.")
+                                                              @RequestParam(name = "apply", defaultValue = "live") String apply,
                                                              @RequestBody ApplicationConfig config,
                                                              Authentication auth,
                                                              HttpServletRequest httpRequest) {
+        if (!"staged".equalsIgnoreCase(apply) && !"live".equalsIgnoreCase(apply)) {
+            throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                    "Unknown apply value '" + apply + "' — must be 'staged' or 'live'");
+        }
+
        String updatedBy = auth != null ? auth.getName() : "system";

        config.setApplication(appSlug);
@@ -126,14 +138,24 @@ public class ApplicationConfigController {
        List<String> perAppKeys = extractSensitiveKeys(saved);
        List<String> mergedKeys = SensitiveKeysMerger.merge(globalKeys, perAppKeys);

-        CommandGroupResponse pushResult = pushConfigToAgentsWithMergedKeys(appSlug, env.slug(), saved, mergedKeys);
-        log.info("Config v{} saved for '{}', pushed to {} agent(s), {} responded",
-                saved.getVersion(), appSlug, pushResult.total(), pushResult.responded());
+        CommandGroupResponse pushResult;
+        if ("staged".equalsIgnoreCase(apply)) {
+            pushResult = new CommandGroupResponse(true, 0, 0, List.of(), List.of());
+            log.info("Config v{} staged for '{}' (no live push)", saved.getVersion(), appSlug);
+        } else {
+            pushResult = pushConfigToAgentsWithMergedKeys(appSlug, env.slug(), saved, mergedKeys);
+            log.info("Config v{} saved for '{}', pushed to {} agent(s), {} responded",
+                    saved.getVersion(), appSlug, pushResult.total(), pushResult.responded());
+        }

-        auditService.log("update_app_config", AuditCategory.CONFIG, appSlug,
+        auditService.log(
+                "staged".equalsIgnoreCase(apply) ? "stage_app_config" : "update_app_config",
+                AuditCategory.CONFIG, appSlug,
                Map.of("environment", env.slug(), "version", saved.getVersion(),
+                        "apply", apply.toLowerCase(),
                        "agentsPushed", pushResult.total(),
-                        "responded", pushResult.responded(), "timedOut", pushResult.timedOut().size()),
+                        "responded", pushResult.responded(),
+                        "timedOut", pushResult.timedOut().size()),
                AuditResult.SUCCESS, httpRequest);

        return ResponseEntity.ok(new ConfigUpdateResponse(saved, pushResult));
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/CatalogController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/CatalogController.java
@@ -196,7 +196,16 @@ public class CatalogController {
            }

            Set<String> routeIds = routesByApp.getOrDefault(slug, Set.of());
-            List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
+
+            // Resolve the env slug for this row early so fromUri can survive
+            // cross-env queries (env==null) against managed apps.
+            String rowEnvSlug = envSlug;
+            if (app != null && rowEnvSlug.isEmpty()) {
+                try {
+                    rowEnvSlug = envService.getById(app.environmentId()).slug();
+                } catch (Exception ignored) {}
+            }
+            final String resolvedEnvSlug = rowEnvSlug;

            // Routes
            List<RouteSummary> routeSummaries = routeIds.stream()
@@ -204,7 +213,7 @@ public class CatalogController {
                        String key = slug + "/" + routeId;
                        long count = routeExchangeCounts.getOrDefault(key, 0L);
                        Instant lastSeen = routeLastSeen.get(key);
-                        String fromUri = resolveFromEndpointUri(routeId, agentIds);
+                        String fromUri = resolveFromEndpointUri(slug, routeId, resolvedEnvSlug);
                        String state = routeStateRegistry.getState(slug, routeId).name().toLowerCase();
                        String routeState = "started".equals(state) ? null : state;
                        return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
@@ -258,15 +267,9 @@ public class CatalogController {
            String healthTooltip = buildHealthTooltip(app != null, deployStatus, agentHealth, agents.size());

            String displayName = app != null ? app.displayName() : slug;
-            String appEnvSlug = envSlug;
-            if (app != null && appEnvSlug.isEmpty()) {
-                try {
-                    appEnvSlug = envService.getById(app.environmentId()).slug();
-                } catch (Exception ignored) {}
-            }

            catalog.add(new CatalogApp(
-                    slug, displayName, app != null, appEnvSlug,
+                    slug, displayName, app != null, resolvedEnvSlug,
                    health, healthTooltip, agents.size(), routeSummaries, agentSummaries,
                    totalExchanges, deploymentSummary
            ));
@@ -275,8 +278,11 @@ public class CatalogController {
        return ResponseEntity.ok(catalog);
    }

-    private String resolveFromEndpointUri(String routeId, List<String> agentIds) {
-        return diagramStore.findContentHashForRouteByAgents(routeId, agentIds)
+    private String resolveFromEndpointUri(String applicationId, String routeId, String environment) {
+        if (environment == null || environment.isBlank()) {
+            return null;
+        }
+        return diagramStore.findLatestContentHashForAppRoute(applicationId, routeId, environment)
                .flatMap(diagramStore::findByContentHash)
                .map(RouteGraph::getRoot)
                .map(root -> root.getEndpointUri())
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java
@@ -2,8 +2,13 @@ package com.cameleer.server.app.controller;

 import com.cameleer.server.app.runtime.DeploymentExecutor;
 import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.admin.AuditCategory;
+import com.cameleer.server.core.admin.AuditResult;
+import com.cameleer.server.core.admin.AuditService;
 import com.cameleer.server.core.runtime.App;
 import com.cameleer.server.core.runtime.AppService;
+import com.cameleer.server.core.runtime.AppVersion;
+import com.cameleer.server.core.runtime.AppVersionRepository;
 import com.cameleer.server.core.runtime.Deployment;
 import com.cameleer.server.core.runtime.DeploymentService;
 import com.cameleer.server.core.runtime.Environment;
@@ -12,14 +17,18 @@ import com.cameleer.server.core.runtime.RuntimeOrchestrator;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
+import jakarta.servlet.http.HttpServletRequest;
+import org.springframework.http.HttpStatus;
 import org.springframework.http.ResponseEntity;
 import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.security.core.context.SecurityContextHolder;
 import org.springframework.web.bind.annotation.GetMapping;
 import org.springframework.web.bind.annotation.PathVariable;
 import org.springframework.web.bind.annotation.PostMapping;
 import org.springframework.web.bind.annotation.RequestBody;
 import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;

 import java.util.List;
 import java.util.Map;
@@ -42,17 +51,23 @@ public class DeploymentController {
    private final RuntimeOrchestrator orchestrator;
    private final AppService appService;
    private final EnvironmentService environmentService;
+    private final AuditService auditService;
+    private final AppVersionRepository appVersionRepository;

    public DeploymentController(DeploymentService deploymentService,
                                 DeploymentExecutor deploymentExecutor,
                                 RuntimeOrchestrator orchestrator,
                                 AppService appService,
-                                 EnvironmentService environmentService) {
+                                 EnvironmentService environmentService,
+                                 AuditService auditService,
+                                 AppVersionRepository appVersionRepository) {
        this.deploymentService = deploymentService;
        this.deploymentExecutor = deploymentExecutor;
        this.orchestrator = orchestrator;
        this.appService = appService;
        this.environmentService = environmentService;
+        this.auditService = auditService;
+        this.appVersionRepository = appVersionRepository;
    }

    @GetMapping
@@ -86,13 +101,25 @@ public class DeploymentController {
    @ApiResponse(responseCode = "202", description = "Deployment accepted and starting")
    public ResponseEntity<Deployment> deploy(@EnvPath Environment env,
                                              @PathVariable String appSlug,
-                                              @RequestBody DeployRequest request) {
+                                              @RequestBody DeployRequest request,
+                                              HttpServletRequest httpRequest) {
        try {
            App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
-            Deployment deployment = deploymentService.createDeployment(app.id(), request.appVersionId(), env.id());
+            AppVersion appVersion = appVersionRepository.findById(request.appVersionId())
+                    .orElseThrow(() -> new IllegalArgumentException("AppVersion not found: " + request.appVersionId()));
+            Deployment deployment = deploymentService.createDeployment(app.id(), request.appVersionId(), env.id(), currentUserId());
            deploymentExecutor.executeAsync(deployment);
+            auditService.log("deploy_app", AuditCategory.DEPLOYMENT, deployment.id().toString(),
+                    Map.of("appSlug", appSlug, "envSlug", env.slug(),
+                            "appVersionId", request.appVersionId().toString(),
+                            "jarFilename", appVersion.jarFilename() != null ? appVersion.jarFilename() : "",
+                            "version", appVersion.version()),
+                    AuditResult.SUCCESS, httpRequest);
            return ResponseEntity.accepted().body(deployment);
        } catch (IllegalArgumentException e) {
+            auditService.log("deploy_app", AuditCategory.DEPLOYMENT, null,
+                    Map.of("appSlug", appSlug, "envSlug", env.slug(), "error", e.getMessage()),
+                    AuditResult.FAILURE, httpRequest);
            return ResponseEntity.notFound().build();
        }
    }
@@ -103,12 +130,19 @@ public class DeploymentController {
    @ApiResponse(responseCode = "404", description = "Deployment not found")
    public ResponseEntity<Deployment> stop(@EnvPath Environment env,
                                            @PathVariable String appSlug,
-                                            @PathVariable UUID deploymentId) {
+                                            @PathVariable UUID deploymentId,
+                                            HttpServletRequest httpRequest) {
        try {
            Deployment deployment = deploymentService.getById(deploymentId);
            deploymentExecutor.stopDeployment(deployment);
+            auditService.log("stop_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
+                    Map.of("appSlug", appSlug, "envSlug", env.slug()),
+                    AuditResult.SUCCESS, httpRequest);
            return ResponseEntity.ok(deploymentService.getById(deploymentId));
        } catch (IllegalArgumentException e) {
+            auditService.log("stop_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
+                    Map.of("appSlug", appSlug, "envSlug", env.slug(), "error", e.getMessage()),
+                    AuditResult.FAILURE, httpRequest);
            return ResponseEntity.notFound().build();
        }
    }
@@ -122,18 +156,26 @@ public class DeploymentController {
    public ResponseEntity<?> promote(@EnvPath Environment env,
                                      @PathVariable String appSlug,
                                      @PathVariable UUID deploymentId,
-                                      @RequestBody PromoteRequest request) {
+                                      @RequestBody PromoteRequest request,
+                                      HttpServletRequest httpRequest) {
        try {
-            App sourceApp = appService.getByEnvironmentAndSlug(env.id(), appSlug);
            Deployment source = deploymentService.getById(deploymentId);
            Environment targetEnv = environmentService.getBySlug(request.targetEnvironment());
            // Target must also have the app with the same slug
            App targetApp = appService.getByEnvironmentAndSlug(targetEnv.id(), appSlug);
-            Deployment promoted = deploymentService.promote(targetApp.id(), source.appVersionId(), targetEnv.id());
+            Deployment promoted = deploymentService.promote(targetApp.id(), source.appVersionId(), targetEnv.id(), currentUserId());
            deploymentExecutor.executeAsync(promoted);
+            auditService.log("promote_deployment", AuditCategory.DEPLOYMENT, promoted.id().toString(),
+                    Map.of("sourceEnv", env.slug(), "targetEnv", request.targetEnvironment(),
+                            "appSlug", appSlug, "appVersionId", source.appVersionId().toString()),
+                    AuditResult.SUCCESS, httpRequest);
            return ResponseEntity.accepted().body(promoted);
        } catch (IllegalArgumentException e) {
-            return ResponseEntity.status(org.springframework.http.HttpStatus.NOT_FOUND)
+            auditService.log("promote_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
+                    Map.of("sourceEnv", env.slug(), "targetEnv", request.targetEnvironment(),
+                            "appSlug", appSlug, "error", e.getMessage()),
+                    AuditResult.FAILURE, httpRequest);
+            return ResponseEntity.status(HttpStatus.NOT_FOUND)
                    .body(Map.of("error", e.getMessage()));
        }
    }
@@ -157,6 +199,15 @@ public class DeploymentController {
        }
    }

+    private String currentUserId() {
+        var auth = SecurityContextHolder.getContext().getAuthentication();
+        if (auth == null || auth.getName() == null) {
+            throw new ResponseStatusException(HttpStatus.UNAUTHORIZED, "No authentication");
+        }
+        String name = auth.getName();
+        return name.startsWith("user:") ? name.substring(5) : name;
+    }
+
    public record DeployRequest(UUID appVersionId) {}
    public record PromoteRequest(String targetEnvironment) {}
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DiagramRenderController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DiagramRenderController.java
@@ -2,8 +2,6 @@ package com.cameleer.server.app.controller;

 import com.cameleer.common.graph.RouteGraph;
 import com.cameleer.server.app.web.EnvPath;
-import com.cameleer.server.core.agent.AgentInfo;
-import com.cameleer.server.core.agent.AgentRegistryService;
 import com.cameleer.server.core.diagram.DiagramLayout;
 import com.cameleer.server.core.diagram.DiagramRenderer;
 import com.cameleer.server.core.runtime.Environment;
@@ -21,7 +19,6 @@ import org.springframework.web.bind.annotation.PathVariable;
 import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;

-import java.util.List;
 import java.util.Optional;

 /**
@@ -42,14 +39,11 @@ public class DiagramRenderController {

    private final DiagramStore diagramStore;
    private final DiagramRenderer diagramRenderer;
-    private final AgentRegistryService registryService;

    public DiagramRenderController(DiagramStore diagramStore,
-                                    DiagramRenderer diagramRenderer,
-                                    AgentRegistryService registryService) {
+                                    DiagramRenderer diagramRenderer) {
        this.diagramStore = diagramStore;
        this.diagramRenderer = diagramRenderer;
-        this.registryService = registryService;
    }

    @GetMapping("/api/v1/diagrams/{contentHash}/render")
@@ -90,8 +84,8 @@ public class DiagramRenderController {

    @GetMapping("/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram")
    @Operation(summary = "Find the latest diagram for this app's route in this environment",
-            description = "Resolves agents in this env for this app, then looks up the latest diagram for the route "
-                    + "they reported. Env scope prevents a dev route from returning a prod diagram.")
+            description = "Returns the most recently stored diagram for (app, env, route). Independent of the "
+                    + "agent registry, so routes removed from the current app version still resolve.")
    @ApiResponse(responseCode = "200", description = "Diagram layout returned")
    @ApiResponse(responseCode = "404", description = "No diagram found")
    public ResponseEntity<DiagramLayout> findByAppAndRoute(
@@ -99,15 +93,7 @@ public class DiagramRenderController {
            @PathVariable String appSlug,
            @PathVariable String routeId,
            @RequestParam(defaultValue = "LR") String direction) {
-        List<String> agentIds = registryService.findByApplicationAndEnvironment(appSlug, env.slug()).stream()
-                .map(AgentInfo::instanceId)
-                .toList();
-
-        if (agentIds.isEmpty()) {
-            return ResponseEntity.notFound().build();
-        }
-
-        Optional<String> contentHash = diagramStore.findContentHashForRouteByAgents(routeId, agentIds);
+        Optional<String> contentHash = diagramStore.findLatestContentHashForAppRoute(appSlug, routeId, env.slug());
        if (contentHash.isEmpty()) {
            return ResponseEntity.notFound().build();
        }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/EnvironmentAdminController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/EnvironmentAdminController.java
@@ -1,6 +1,7 @@
 package com.cameleer.server.app.controller;

 import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentColor;
 import com.cameleer.server.core.runtime.EnvironmentService;
 import com.cameleer.server.core.runtime.RuntimeType;
 import io.swagger.v3.oas.annotations.Operation;
@@ -58,16 +59,22 @@ public class EnvironmentAdminController {
    }

    @PutMapping("/{envSlug}")
-    @Operation(summary = "Update an environment's mutable fields (displayName, production, enabled)",
+    @Operation(summary = "Update an environment's mutable fields (displayName, production, enabled, color)",
            description = "Slug is immutable after creation and cannot be changed. "
-                    + "Any slug field in the request body is ignored.")
+                    + "Any slug field in the request body is ignored. "
+                    + "If color is null or absent, the existing color is preserved.")
    @ApiResponse(responseCode = "200", description = "Environment updated")
+    @ApiResponse(responseCode = "400", description = "Unknown color value")
    @ApiResponse(responseCode = "404", description = "Environment not found")
    public ResponseEntity<?> updateEnvironment(@PathVariable String envSlug,
                                                @RequestBody UpdateEnvironmentRequest request) {
        try {
            Environment current = environmentService.getBySlug(envSlug);
-            environmentService.update(current.id(), request.displayName(), request.production(), request.enabled());
+            String nextColor = request.color() == null ? current.color() : request.color();
+            if (!EnvironmentColor.isValid(nextColor)) {
+                return ResponseEntity.badRequest().body(Map.of("error", "unknown environment color: " + request.color()));
+            }
+            environmentService.update(current.id(), request.displayName(), request.production(), request.enabled(), nextColor);
            return ResponseEntity.ok(environmentService.getBySlug(envSlug));
        } catch (IllegalArgumentException e) {
            if (e.getMessage().contains("not found")) {
@@ -149,6 +156,6 @@ public class EnvironmentAdminController {
    }

    public record CreateEnvironmentRequest(String slug, String displayName, boolean production) {}
-    public record UpdateEnvironmentRequest(String displayName, boolean production, boolean enabled) {}
+    public record UpdateEnvironmentRequest(String displayName, boolean production, boolean enabled, String color) {}
    public record JarRetentionRequest(Integer jarRetentionCount) {}
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java
@@ -44,6 +44,7 @@ public class LogQueryController {
            @RequestParam(required = false) String exchangeId,
            @RequestParam(required = false) String logger,
            @RequestParam(required = false) String source,
+            @RequestParam(required = false) String instanceIds,
            @RequestParam(required = false) String from,
            @RequestParam(required = false) String to,
            @RequestParam(required = false) String cursor,
@@ -69,12 +70,21 @@ public class LogQueryController {
                    .toList();
        }

+        List<String> instanceIdList = List.of();
+        if (instanceIds != null && !instanceIds.isEmpty()) {
+            instanceIdList = Arrays.stream(instanceIds.split(","))
+                    .map(String::trim)
+                    .filter(s -> !s.isEmpty())
+                    .toList();
+        }
+
        Instant fromInstant = from != null ? Instant.parse(from) : null;
        Instant toInstant = to != null ? Instant.parse(to) : null;

        LogSearchRequest request = new LogSearchRequest(
                searchText, levels, application, instanceId, exchangeId,
-                logger, env.slug(), sources, fromInstant, toInstant, cursor, limit, sort);
+                logger, env.slug(), sources, fromInstant, toInstant, cursor, limit, sort,
+                instanceIdList);

        LogSearchResponse result = logIndex.search(request);

--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/RouteCatalogController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/RouteCatalogController.java
@@ -132,13 +132,12 @@ public class RouteCatalogController {
            List<AgentInfo> agents = agentsByApp.getOrDefault(appId, List.of());

            Set<String> routeIds = routesByApp.getOrDefault(appId, Set.of());
-            List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
            List<RouteSummary> routeSummaries = routeIds.stream()
                    .map(routeId -> {
                        String key = appId + "/" + routeId;
                        long count = routeExchangeCounts.getOrDefault(key, 0L);
                        Instant lastSeen = routeLastSeen.get(key);
-                        String fromUri = resolveFromEndpointUri(routeId, agentIds);
+                        String fromUri = resolveFromEndpointUri(appId, routeId, envSlug);
                        String state = routeStateRegistry.getState(appId, routeId).name().toLowerCase();
                        String routeState = "started".equals(state) ? null : state;
                        return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
@@ -160,8 +159,8 @@ public class RouteCatalogController {
        return ResponseEntity.ok(catalog);
    }

-    private String resolveFromEndpointUri(String routeId, List<String> agentIds) {
-        return diagramStore.findContentHashForRouteByAgents(routeId, agentIds)
+    private String resolveFromEndpointUri(String applicationId, String routeId, String environment) {
+        return diagramStore.findLatestContentHashForAppRoute(applicationId, routeId, environment)
                .flatMap(diagramStore::findByContentHash)
                .map(RouteGraph::getRoot)
                .map(root -> root.getEndpointUri())
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/SearchController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/SearchController.java
@@ -4,6 +4,7 @@ import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.admin.AppSettings;
 import com.cameleer.server.core.admin.AppSettingsRepository;
 import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.search.AttributeFilter;
 import com.cameleer.server.core.search.ExecutionStats;
 import com.cameleer.server.core.search.ExecutionSummary;
 import com.cameleer.server.core.search.SearchRequest;
@@ -14,6 +15,7 @@ import com.cameleer.server.core.search.TopError;
 import com.cameleer.server.core.storage.StatsStore;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.tags.Tag;
+import org.springframework.http.HttpStatus;
 import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.GetMapping;
 import org.springframework.web.bind.annotation.PostMapping;
@@ -21,8 +23,10 @@ import org.springframework.web.bind.annotation.RequestBody;
 import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;

 import java.time.Instant;
+import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;

@@ -57,11 +61,19 @@ public class SearchController {
            @RequestParam(name = "agentId", required = false) String instanceId,
            @RequestParam(required = false) String processorType,
            @RequestParam(required = false) String application,
+            @RequestParam(name = "attr", required = false) List<String> attr,
            @RequestParam(defaultValue = "0") int offset,
            @RequestParam(defaultValue = "50") int limit,
            @RequestParam(required = false) String sortField,
            @RequestParam(required = false) String sortDir) {

+        List<AttributeFilter> attributeFilters;
+        try {
+            attributeFilters = parseAttrParams(attr);
+        } catch (IllegalArgumentException e) {
+            throw new ResponseStatusException(HttpStatus.BAD_REQUEST, e.getMessage(), e);
+        }
+
        SearchRequest request = new SearchRequest(
                status, timeFrom, timeTo,
                null, null,
@@ -71,12 +83,37 @@ public class SearchController {
                application, null,
                offset, limit,
                sortField, sortDir,
-                env.slug()
+                null,
+                env.slug(),
+                attributeFilters
        );

        return ResponseEntity.ok(searchService.search(request));
    }

+    /**
+     * Parses {@code attr} query params of the form {@code key} (key-only) or {@code key:value}
+     * (exact or wildcard via {@code *}). Splits on the first {@code :}; later colons are part of
+     * the value. Blank / null list → empty result. Key validation is delegated to
+     * {@link AttributeFilter}'s compact constructor, which throws {@link IllegalArgumentException}
+     * on invalid keys (mapped to 400 by the caller).
+     */
+    static List<AttributeFilter> parseAttrParams(List<String> raw) {
+        if (raw == null || raw.isEmpty()) return List.of();
+        List<AttributeFilter> out = new ArrayList<>(raw.size());
+        for (String entry : raw) {
+            if (entry == null || entry.isBlank()) continue;
+            int colon = entry.indexOf(':');
+            if (colon < 0) {
+                out.add(new AttributeFilter(entry.trim(), null));
+            } else {
+                out.add(new AttributeFilter(entry.substring(0, colon).trim(),
+                        entry.substring(colon + 1)));
+            }
+        }
+        return out;
+    }
+
    @PostMapping("/executions/search")
    @Operation(summary = "Advanced search with all filters",
            description = "Env from the path overrides any environment field in the body.")
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ServerMetricsAdminController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ServerMetricsAdminController.java
@@ -0,0 +1,148 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.server.core.storage.ServerMetricsQueryStore;
+import com.cameleer.server.core.storage.model.ServerInstanceInfo;
+import com.cameleer.server.core.storage.model.ServerMetricCatalogEntry;
+import com.cameleer.server.core.storage.model.ServerMetricQueryRequest;
+import com.cameleer.server.core.storage.model.ServerMetricQueryResponse;
+import io.swagger.v3.oas.annotations.Operation;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
+import org.springframework.http.ResponseEntity;
+import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.web.bind.annotation.ExceptionHandler;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.PostMapping;
+import org.springframework.web.bind.annotation.RequestBody;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RequestParam;
+import org.springframework.web.bind.annotation.RestController;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Generic read API over the ClickHouse {@code server_metrics} table. Lets
+ * SaaS control planes build server-health dashboards without requiring direct
+ * ClickHouse access.
+ *
+ * <p>Three endpoints cover all 17 panels in {@code docs/server-self-metrics.md}:
+ * <ul>
+ *   <li>{@code GET /catalog} — discover available metric names, types, statistics, and tags</li>
+ *   <li>{@code POST /query} — generic time-series query with aggregation, grouping, filtering, and counter-delta mode</li>
+ *   <li>{@code GET /instances} — list server instances (useful for partitioning counter math)</li>
+ * </ul>
+ *
+ * <p>Visibility matches {@code ClickHouseAdminController} / {@code DatabaseAdminController}:
+ * <ul>
+ *   <li>Conditional on {@code cameleer.server.security.infrastructureendpoints=true} (default).</li>
+ *   <li>Class-level {@code @PreAuthorize("hasRole('ADMIN')")} on top of the
+ *       {@code /api/v1/admin/**} catch-all in {@code SecurityConfig}.</li>
+ * </ul>
+ */
+@ConditionalOnProperty(
+    name = "cameleer.server.security.infrastructureendpoints",
+    havingValue = "true",
+    matchIfMissing = true
+)
+@RestController
+@RequestMapping("/api/v1/admin/server-metrics")
+@PreAuthorize("hasRole('ADMIN')")
+@Tag(name = "Server Self-Metrics",
+     description = "Read API over the server's own Micrometer registry snapshots (ADMIN only)")
+public class ServerMetricsAdminController {
+
+    /** Default lookback window for catalog/instances when from/to are omitted. */
+    private static final long DEFAULT_LOOKBACK_SECONDS = 3_600L;
+
+    private final ServerMetricsQueryStore store;
+
+    public ServerMetricsAdminController(ServerMetricsQueryStore store) {
+        this.store = store;
+    }
+
+    @GetMapping("/catalog")
+    @Operation(summary = "List metric names observed in the window",
+               description = "For each metric_name, returns metric_type, the set of statistics emitted, and the union of tag keys.")
+    public ResponseEntity<List<ServerMetricCatalogEntry>> catalog(
+            @RequestParam(required = false) String from,
+            @RequestParam(required = false) String to) {
+        Instant[] window = resolveWindow(from, to);
+        return ResponseEntity.ok(store.catalog(window[0], window[1]));
+    }
+
+    @GetMapping("/instances")
+    @Operation(summary = "List server_instance_id values observed in the window",
+               description = "Returns first/last seen timestamps — use to partition counter-delta computations.")
+    public ResponseEntity<List<ServerInstanceInfo>> instances(
+            @RequestParam(required = false) String from,
+            @RequestParam(required = false) String to) {
+        Instant[] window = resolveWindow(from, to);
+        return ResponseEntity.ok(store.listInstances(window[0], window[1]));
+    }
+
+    @PostMapping("/query")
+    @Operation(summary = "Generic time-series query",
+               description = "Returns bucketed series for a single metric_name. Supports aggregation (avg/sum/max/min/latest), group-by-tag, filter-by-tag, counter delta mode, and a derived 'mean' statistic for timers.")
+    public ResponseEntity<ServerMetricQueryResponse> query(@RequestBody QueryBody body) {
+        ServerMetricQueryRequest request = new ServerMetricQueryRequest(
+                body.metric(),
+                body.statistic(),
+                parseInstant(body.from(), "from"),
+                parseInstant(body.to(), "to"),
+                body.stepSeconds(),
+                body.groupByTags(),
+                body.filterTags(),
+                body.aggregation(),
+                body.mode(),
+                body.serverInstanceIds());
+        return ResponseEntity.ok(store.query(request));
+    }
+
+    @ExceptionHandler(IllegalArgumentException.class)
+    public ResponseEntity<Map<String, String>> handleBadRequest(IllegalArgumentException e) {
+        return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
+    }
+
+    private static Instant[] resolveWindow(String from, String to) {
+        Instant toI = to != null ? parseInstant(to, "to") : Instant.now();
+        Instant fromI = from != null
+                ? parseInstant(from, "from")
+                : toI.minusSeconds(DEFAULT_LOOKBACK_SECONDS);
+        if (!fromI.isBefore(toI)) {
+            throw new IllegalArgumentException("from must be strictly before to");
+        }
+        return new Instant[]{fromI, toI};
+    }
+
+    private static Instant parseInstant(String raw, String field) {
+        if (raw == null || raw.isBlank()) {
+            throw new IllegalArgumentException(field + " is required");
+        }
+        try {
+            return Instant.parse(raw);
+        } catch (Exception e) {
+            throw new IllegalArgumentException(
+                    field + " must be an ISO-8601 instant (e.g. 2026-04-23T10:00:00Z)");
+        }
+    }
+
+    /**
+     * Request body for {@link #query(QueryBody)}. Uses ISO-8601 strings on
+     * the wire so the OpenAPI schema stays language-neutral.
+     */
+    public record QueryBody(
+            String metric,
+            String statistic,
+            String from,
+            String to,
+            Integer stepSeconds,
+            List<String> groupByTags,
+            Map<String, String> filterTags,
+            String aggregation,
+            String mode,
+            List<String> serverInstanceIds
+    ) {
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/DirtyStateResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/DirtyStateResponse.java
@@ -0,0 +1,12 @@
+package com.cameleer.server.app.dto;
+
+import com.cameleer.server.core.runtime.DirtyStateResult;
+
+import java.util.List;
+
+public record DirtyStateResponse(
+        boolean dirty,
+        String lastSuccessfulDeploymentId,
+        List<DirtyStateResult.Difference> differences
+) {
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/interceptor/AuditInterceptor.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/interceptor/AuditInterceptor.java
@@ -6,8 +6,10 @@ import com.cameleer.server.core.admin.AuditService;
 import jakarta.servlet.http.HttpServletRequest;
 import jakarta.servlet.http.HttpServletResponse;
 import org.springframework.stereotype.Component;
+import org.springframework.util.AntPathMatcher;
 import org.springframework.web.servlet.HandlerInterceptor;

+import java.util.List;
 import java.util.Map;
 import java.util.Set;

@@ -22,7 +24,9 @@ import java.util.Set;
 public class AuditInterceptor implements HandlerInterceptor {

    private static final Set<String> AUDITABLE_METHODS = Set.of("POST", "PUT", "DELETE");
-    private static final Set<String> EXCLUDED_PATHS = Set.of("/api/v1/search/executions");
+    private static final List<String> EXCLUDED_PATH_PATTERNS = List.of(
+            "/api/v1/environments/*/executions/search");
+    private static final AntPathMatcher PATH_MATCHER = new AntPathMatcher();

    private final AuditService auditService;

@@ -41,8 +45,10 @@ public class AuditInterceptor implements HandlerInterceptor {
        }

        String path = request.getRequestURI();
-        if (EXCLUDED_PATHS.contains(path)) {
-            return;
+        for (String pattern : EXCLUDED_PATH_PATTERNS) {
+            if (PATH_MATCHER.match(pattern, path)) {
+                return;
+            }
        }
        AuditResult result = response.getStatus() < 400 ? AuditResult.SUCCESS : AuditResult.FAILURE;

--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/metrics/ServerInstanceIdConfig.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/metrics/ServerInstanceIdConfig.java
@@ -0,0 +1,63 @@
+package com.cameleer.server.app.metrics;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.beans.factory.annotation.Value;
+import org.springframework.context.annotation.Bean;
+import org.springframework.context.annotation.Configuration;
+
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.UUID;
+
+/**
+ * Resolves a stable identifier for this server process, used as the
+ * {@code server_instance_id} on every server_metrics sample. The value is
+ * fixed at boot, so counters restart cleanly whenever the id rotates.
+ *
+ * <p>Precedence:
+ * <ol>
+ *   <li>{@code cameleer.server.instance-id} property / {@code CAMELEER_SERVER_INSTANCE_ID} env
+ *   <li>{@code HOSTNAME} env (populated by Docker/Kubernetes)
+ *   <li>{@link InetAddress#getLocalHost()} hostname
+ *   <li>Random UUID (fallback — only hit when DNS and env are both silent)
+ * </ol>
+ */
+@Configuration
+public class ServerInstanceIdConfig {
+
+    private static final Logger log = LoggerFactory.getLogger(ServerInstanceIdConfig.class);
+
+    @Bean("serverInstanceId")
+    public String serverInstanceId(
+            @Value("${cameleer.server.instance-id:}") String configuredId) {
+        if (!isBlank(configuredId)) {
+            log.info("Server instance id resolved from configuration: {}", configuredId);
+            return configuredId;
+        }
+
+        String hostnameEnv = System.getenv("HOSTNAME");
+        if (!isBlank(hostnameEnv)) {
+            log.info("Server instance id resolved from HOSTNAME env: {}", hostnameEnv);
+            return hostnameEnv;
+        }
+
+        try {
+            String localHost = InetAddress.getLocalHost().getHostName();
+            if (!isBlank(localHost)) {
+                log.info("Server instance id resolved from localhost lookup: {}", localHost);
+                return localHost;
+            }
+        } catch (UnknownHostException e) {
+            log.debug("InetAddress.getLocalHost() failed, falling back to UUID: {}", e.getMessage());
+        }
+
+        String fallback = UUID.randomUUID().toString();
+        log.warn("Server instance id could not be resolved; using random UUID {}", fallback);
+        return fallback;
+    }
+
+    private static boolean isBlank(String s) {
+        return s == null || s.isBlank();
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/metrics/ServerMetricsSnapshotScheduler.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/metrics/ServerMetricsSnapshotScheduler.java
@@ -0,0 +1,106 @@
+package com.cameleer.server.app.metrics;
+
+import com.cameleer.server.core.storage.ServerMetricsStore;
+import com.cameleer.server.core.storage.model.ServerMetricSample;
+import io.micrometer.core.instrument.Measurement;
+import io.micrometer.core.instrument.Meter;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.Tag;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.beans.factory.annotation.Qualifier;
+import org.springframework.beans.factory.annotation.Value;
+import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
+import org.springframework.scheduling.annotation.Scheduled;
+import org.springframework.stereotype.Component;
+
+import java.time.Instant;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Periodically snapshots every meter in the server's {@link MeterRegistry}
+ * and writes the result to ClickHouse via {@link ServerMetricsStore}. This
+ * gives us historical server-health data (buffer depths, agent transitions,
+ * flush latency, JVM memory, HTTP response counts, etc.) without requiring
+ * an external Prometheus.
+ *
+ * <p>Each Micrometer {@link Meter#measure() measurement} becomes one row, so
+ * a single Timer produces rows for {@code count}, {@code total_time}, and
+ * {@code max} each tick. Counter values are cumulative since meter
+ * registration (Prometheus convention) — callers compute rate() themselves.
+ *
+ * <p>Disabled via {@code cameleer.server.self-metrics.enabled=false}.
+ */
+@Component
+@ConditionalOnProperty(
+        prefix = "cameleer.server.self-metrics",
+        name = "enabled",
+        havingValue = "true",
+        matchIfMissing = true)
+public class ServerMetricsSnapshotScheduler {
+
+    private static final Logger log = LoggerFactory.getLogger(ServerMetricsSnapshotScheduler.class);
+
+    private final MeterRegistry registry;
+    private final ServerMetricsStore store;
+    private final String tenantId;
+    private final String serverInstanceId;
+
+    public ServerMetricsSnapshotScheduler(
+            MeterRegistry registry,
+            ServerMetricsStore store,
+            @Value("${cameleer.server.tenant.id:default}") String tenantId,
+            @Qualifier("serverInstanceId") String serverInstanceId) {
+        this.registry = registry;
+        this.store = store;
+        this.tenantId = tenantId;
+        this.serverInstanceId = serverInstanceId;
+    }
+
+    @Scheduled(fixedDelayString = "${cameleer.server.self-metrics.interval-ms:60000}",
+               initialDelayString = "${cameleer.server.self-metrics.interval-ms:60000}")
+    public void snapshot() {
+        try {
+            Instant now = Instant.now();
+            List<ServerMetricSample> batch = new ArrayList<>();
+
+            for (Meter meter : registry.getMeters()) {
+                Meter.Id id = meter.getId();
+                Map<String, String> tags = flattenTags(id.getTagsAsIterable());
+                String type = id.getType().name().toLowerCase();
+
+                for (Measurement m : meter.measure()) {
+                    double v = m.getValue();
+                    if (!Double.isFinite(v)) continue;
+                    batch.add(new ServerMetricSample(
+                            tenantId,
+                            now,
+                            serverInstanceId,
+                            id.getName(),
+                            type,
+                            m.getStatistic().getTagValueRepresentation(),
+                            v,
+                            tags));
+                }
+            }
+
+            if (!batch.isEmpty()) {
+                store.insertBatch(batch);
+                log.debug("Persisted {} server self-metric samples", batch.size());
+            }
+        } catch (Exception e) {
+            log.warn("Server self-metrics snapshot failed: {}", e.getMessage());
+        }
+    }
+
+    private static Map<String, String> flattenTags(Iterable<Tag> tags) {
+        Map<String, String> out = new LinkedHashMap<>();
+        for (Tag t : tags) {
+            out.put(t.getKey(), t.getValue());
+        }
+        return out;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java
@@ -1,6 +1,8 @@
 package com.cameleer.server.app.runtime;

+import com.cameleer.common.model.ApplicationConfig;
 import com.cameleer.server.app.metrics.ServerMetrics;
+import com.cameleer.server.app.storage.PostgresApplicationConfigRepository;
 import com.cameleer.server.app.storage.PostgresDeploymentRepository;
 import com.cameleer.server.core.runtime.*;
 import org.slf4j.Logger;
@@ -25,6 +27,7 @@ public class DeploymentExecutor {
    private final EnvironmentService envService;
    private final DeploymentRepository deploymentRepository;
    private final PostgresDeploymentRepository pgDeployRepo;
+    private final PostgresApplicationConfigRepository applicationConfigRepository;

    @Autowired(required = false)
    private DockerNetworkManager networkManager;
@@ -59,6 +62,9 @@ public class DeploymentExecutor {
    @Value("${cameleer.server.runtime.serverurl:}")
    private String globalServerUrl;

+    @Value("${cameleer.server.runtime.certresolver:}")
+    private String globalCertResolver;
+
    @Value("${cameleer.server.runtime.jardockervolume:}")
    private String jarDockerVolume;

@@ -75,15 +81,45 @@ public class DeploymentExecutor {
                              DeploymentService deploymentService,
                              AppService appService,
                              EnvironmentService envService,
-                              DeploymentRepository deploymentRepository) {
+                              DeploymentRepository deploymentRepository,
+                              PostgresApplicationConfigRepository applicationConfigRepository) {
        this.orchestrator = orchestrator;
        this.deploymentService = deploymentService;
        this.appService = appService;
        this.envService = envService;
        this.deploymentRepository = deploymentRepository;
        this.pgDeployRepo = (PostgresDeploymentRepository) deploymentRepository;
+        this.applicationConfigRepository = applicationConfigRepository;
    }

+    /** Deployment-scoped id suffix — distinguishes container names and
+     * CAMELEER_AGENT_INSTANCEID across redeploys so old + new replicas can
+     * coexist during a blue/green swap. First 8 chars of the deployment UUID. */
+    static String generationOf(Deployment deployment) {
+        return deployment.id().toString().substring(0, 8);
+    }
+
+    /**
+     * Per-deployment context assembled once at the top of executeAsync and passed
+     * into strategy handlers. Keeps the strategy methods readable instead of
+     * threading 12 positional args.
+     */
+    private record DeployCtx(
+            Deployment deployment,
+            App app,
+            Environment env,
+            ResolvedContainerConfig config,
+            String jarPath,
+            String resolvedRuntimeType,
+            String mainClass,
+            String generation,
+            String primaryNetwork,
+            List<String> additionalNets,
+            Map<String, String> baseEnvVars,
+            Map<String, String> prometheusLabels,
+            long deployStart
+    ) {}
+
    @Async("deploymentTaskExecutor")
    public void executeAsync(Deployment deployment) {
        long deployStart = System.currentTimeMillis();
@@ -91,13 +127,15 @@ public class DeploymentExecutor {
            App app = appService.getById(deployment.appId());
            Environment env = envService.getById(deployment.environmentId());
            String jarPath = appService.resolveJarPath(deployment.appVersionId());
+            String generation = generationOf(deployment);

            var globalDefaults = new ConfigMerger.GlobalRuntimeDefaults(
                    parseMemoryLimitMb(globalMemoryLimit),
                    globalCpuShares,
                    globalRoutingMode,
                    globalRoutingDomain,
-                    globalServerUrl.isBlank() ? "http://cameleer-server:8081" : globalServerUrl
+                    globalServerUrl.isBlank() ? "http://cameleer-server:8081" : globalServerUrl,
+                    globalCertResolver.isBlank() ? null : globalCertResolver
            );
            ResolvedContainerConfig config = ConfigMerger.resolve(
                    globalDefaults, env.defaultContainerConfig(), app.containerConfig());
@@ -139,7 +177,6 @@ public class DeploymentExecutor {
            updateStage(deployment.id(), DeployStage.CREATE_NETWORK);
            // Primary network: use configured CAMELEER_DOCKER_NETWORK (tenant-isolated in SaaS mode)
            String primaryNetwork = dockerNetwork;
-            String envNet = null;
            List<String> additionalNets = new ArrayList<>();
            if (networkManager != null) {
                networkManager.ensureNetwork(primaryNetwork);
@@ -147,7 +184,7 @@ public class DeploymentExecutor {
                networkManager.ensureNetwork(DockerNetworkManager.TRAEFIK_NETWORK);
                additionalNets.add(DockerNetworkManager.TRAEFIK_NETWORK);
                // Per-environment network scoped to tenant to prevent cross-tenant collisions
-                envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
+                String envNet = DockerNetworkManager.envNetworkName(tenantId, env.slug());
                networkManager.ensureNetwork(envNet);
                additionalNets.add(envNet);
            }
@@ -162,111 +199,21 @@ public class DeploymentExecutor {
                }
            }

-            // === START REPLICAS ===
-            updateStage(deployment.id(), DeployStage.START_REPLICAS);
+            DeployCtx ctx = new DeployCtx(
+                    deployment, app, env, config, jarPath,
+                    resolvedRuntimeType, mainClass, generation,
+                    primaryNetwork, additionalNets,
+                    buildEnvVars(app, env, config),
+                    PrometheusLabelBuilder.build(resolvedRuntimeType),
+                    deployStart);

-            Map<String, String> baseEnvVars = buildEnvVars(app, env, config);
-            Map<String, String> prometheusLabels = PrometheusLabelBuilder.build(resolvedRuntimeType);
-
-            List<Map<String, Object>> replicaStates = new ArrayList<>();
-            List<String> newContainerIds = new ArrayList<>();
-
-            for (int i = 0; i < config.replicas(); i++) {
-                String instanceId = env.slug() + "-" + app.slug() + "-" + i;
-                String containerName = tenantId + "-" + instanceId;
-
-                // Per-replica labels (include replica index and instance-id)
-                Map<String, String> labels = TraefikLabelBuilder.build(app.slug(), env.slug(), tenantId, config, i);
-                labels.putAll(prometheusLabels);
-
-                // Per-replica env vars (set agent instance ID to match container log identity)
-                Map<String, String> replicaEnvVars = new LinkedHashMap<>(baseEnvVars);
-                replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
-
-                String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
-                ContainerRequest request = new ContainerRequest(
-                        containerName, baseImage, jarPath,
-                        volumeName, jarStoragePath,
-                        primaryNetwork,
-                        additionalNets,
-                        replicaEnvVars, labels,
-                        config.memoryLimitBytes(), config.memoryReserveBytes(),
-                        config.dockerCpuShares(), config.dockerCpuQuota(),
-                        config.exposedPorts(), agentHealthPort,
-                        "on-failure", 3,
-                        resolvedRuntimeType, config.customArgs(), mainClass
-                );
-
-                String containerId = orchestrator.startContainer(request);
-                newContainerIds.add(containerId);
-
-                // Connect to additional networks after container is started
-                for (String net : additionalNets) {
-                    if (networkManager != null) {
-                        networkManager.connectContainer(containerId, net);
-                    }
-                }
-
-                orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
-
-                replicaStates.add(Map.of(
-                        "index", i,
-                        "containerId", containerId,
-                        "containerName", containerName,
-                        "status", "STARTING"
-                ));
+            // Dispatch on strategy. Unknown values fall back to BLUE_GREEN via fromWire.
+            DeploymentStrategy strategy = DeploymentStrategy.fromWire(config.deploymentStrategy());
+            switch (strategy) {
+                case BLUE_GREEN -> deployBlueGreen(ctx);
+                case ROLLING -> deployRolling(ctx);
            }

-            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
-
-            // === HEALTH CHECK ===
-            updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
-            int healthyCount = waitForAnyHealthy(newContainerIds, healthCheckTimeout);
-
-            if (healthyCount == 0) {
-                for (String cid : newContainerIds) {
-                    try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
-                    catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
-                }
-                pgDeployRepo.updateDeployStage(deployment.id(), null);
-                deploymentService.markFailed(deployment.id(), "No replicas passed health check within " + healthCheckTimeout + "s");
-                serverMetrics.recordDeploymentOutcome("FAILED");
-                serverMetrics.recordDeploymentDuration(deployStart);
-                return;
-            }
-
-            replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
-            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
-
-            // === SWAP TRAFFIC ===
-            updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
-
-            Optional<Deployment> existing = deploymentRepository.findActiveByAppIdAndEnvironmentId(
-                    deployment.appId(), deployment.environmentId());
-            if (existing.isPresent() && !existing.get().id().equals(deployment.id())) {
-                stopDeploymentContainers(existing.get());
-                deploymentService.markStopped(existing.get().id());
-                log.info("Stopped previous deployment {} for replacement", existing.get().id());
-            }
-
-            // === COMPLETE ===
-            updateStage(deployment.id(), DeployStage.COMPLETE);
-
-            String primaryContainerId = newContainerIds.get(0);
-            DeploymentStatus finalStatus = healthyCount == config.replicas()
-                    ? DeploymentStatus.RUNNING : DeploymentStatus.DEGRADED;
-            deploymentService.markRunning(deployment.id(), primaryContainerId);
-            if (finalStatus == DeploymentStatus.DEGRADED) {
-                deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.DEGRADED,
-                        primaryContainerId, null);
-            }
-
-            pgDeployRepo.updateDeployStage(deployment.id(), null);
-            serverMetrics.recordDeploymentOutcome(finalStatus.name());
-            serverMetrics.recordDeploymentDuration(deployStart);
-            log.info("Deployment {} is {} ({}/{} replicas healthy)",
-                    deployment.id(), finalStatus, healthyCount, config.replicas());
-
        } catch (Exception e) {
            log.error("Deployment {} FAILED: {}", deployment.id(), e.getMessage(), e);
            pgDeployRepo.updateDeployStage(deployment.id(), null);
@@ -276,6 +223,262 @@ public class DeploymentExecutor {
        }
    }

+    /**
+     * Blue/green strategy: start all N new replicas (coexisting with the old
+     * ones thanks to the gen-suffixed container names), wait for ALL healthy,
+     * then stop the previous deployment. Strict all-healthy — partial failure
+     * preserves the previous deployment untouched.
+     */
+    private void deployBlueGreen(DeployCtx ctx) {
+        ResolvedContainerConfig config = ctx.config();
+        Deployment deployment = ctx.deployment();
+
+        // === START REPLICAS ===
+        updateStage(deployment.id(), DeployStage.START_REPLICAS);
+        List<Map<String, Object>> replicaStates = new ArrayList<>();
+        List<String> newContainerIds = new ArrayList<>();
+        for (int i = 0; i < config.replicas(); i++) {
+            Map<String, Object> state = new LinkedHashMap<>();
+            String containerId = startReplica(ctx, i, state);
+            newContainerIds.add(containerId);
+            replicaStates.add(state);
+        }
+        pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
+
+        // === HEALTH CHECK ===
+        updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
+        int healthyCount = waitForAllHealthy(newContainerIds, healthCheckTimeout);
+
+        if (healthyCount < config.replicas()) {
+            // Strict abort: tear down new replicas, leave the previous deployment untouched.
+            for (String cid : newContainerIds) {
+                try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
+                catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
+            }
+            pgDeployRepo.updateDeployStage(deployment.id(), null);
+            String reason = String.format(
+                    "blue-green: %d/%d replicas healthy within %ds; preserving previous deployment",
+                    healthyCount, config.replicas(), healthCheckTimeout);
+            deploymentService.markFailed(deployment.id(), reason);
+            serverMetrics.recordDeploymentOutcome("FAILED");
+            serverMetrics.recordDeploymentDuration(ctx.deployStart());
+            return;
+        }
+
+        replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
+        pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
+
+        // === SWAP TRAFFIC ===
+        // All new replicas are healthy; Traefik labels are already attracting
+        // traffic to them. Stop the previous deployment now — the swap is
+        // implicit in the label-driven load balancer.
+        updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
+        Optional<Deployment> previous = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
+                deployment.appId(), deployment.environmentId(), deployment.id());
+        if (previous.isPresent()) {
+            log.info("blue-green: stopping previous deployment {} now that new replicas are healthy",
+                    previous.get().id());
+            stopDeploymentContainers(previous.get());
+            deploymentService.markStopped(previous.get().id());
+        }
+
+        // === COMPLETE ===
+        updateStage(deployment.id(), DeployStage.COMPLETE);
+        persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
+        log.info("Deployment {} is RUNNING (blue-green, {}/{} replicas healthy)",
+                deployment.id(), healthyCount, config.replicas());
+    }
+
+    /**
+     * Rolling strategy: replace replicas one at a time — start new[i], wait
+     * healthy, stop old[i]. On any replica's health failure, stop the
+     * in-flight new container, leave remaining old replicas serving, mark
+     * FAILED. Already-replaced old containers are not restored (can't unring
+     * that bell) — user redeploys to recover.
+     *
+     * Resource peak: replicas + 1 (briefly while a new replica warms up
+     * before its counterpart is stopped).
+     */
+    private void deployRolling(DeployCtx ctx) {
+        ResolvedContainerConfig config = ctx.config();
+        Deployment deployment = ctx.deployment();
+
+        // Capture previous deployment's per-index container ids up front.
+        Optional<Deployment> previousOpt = deploymentRepository.findActiveByAppIdAndEnvironmentIdExcluding(
+                deployment.appId(), deployment.environmentId(), deployment.id());
+        Map<Integer, String> oldContainerByIndex = new LinkedHashMap<>();
+        if (previousOpt.isPresent() && previousOpt.get().replicaStates() != null) {
+            for (Map<String, Object> r : previousOpt.get().replicaStates()) {
+                Object idx = r.get("index");
+                Object cid = r.get("containerId");
+                if (idx instanceof Number n && cid instanceof String s) {
+                    oldContainerByIndex.put(n.intValue(), s);
+                }
+            }
+        }
+
+        // === START REPLICAS ===
+        updateStage(deployment.id(), DeployStage.START_REPLICAS);
+        List<Map<String, Object>> replicaStates = new ArrayList<>();
+        List<String> newContainerIds = new ArrayList<>();
+
+        for (int i = 0; i < config.replicas(); i++) {
+            // Start new replica i (gen-suffixed name; coexists with old[i]).
+            Map<String, Object> state = new LinkedHashMap<>();
+            String newCid = startReplica(ctx, i, state);
+            newContainerIds.add(newCid);
+            replicaStates.add(state);
+            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
+
+            // === HEALTH CHECK (per-replica) ===
+            updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
+            boolean healthy = waitForOneHealthy(newCid, healthCheckTimeout);
+            if (!healthy) {
+                // Abort: stop this in-flight new replica AND any new replicas
+                // started so far. Already-stopped old replicas stay stopped
+                // (rolling is not reversible). Remaining un-replaced old
+                // replicas keep serving traffic.
+                for (String cid : newContainerIds) {
+                    try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
+                    catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
+                }
+                pgDeployRepo.updateDeployStage(deployment.id(), null);
+                String reason = String.format(
+                        "rolling: replica %d failed to reach healthy within %ds; %d previous replicas still running",
+                        i, healthCheckTimeout, oldContainerByIndex.size());
+                deploymentService.markFailed(deployment.id(), reason);
+                serverMetrics.recordDeploymentOutcome("FAILED");
+                serverMetrics.recordDeploymentDuration(ctx.deployStart());
+                return;
+            }
+
+            // Health check passed: update replica status to RUNNING, stop the
+            // corresponding old[i] if present, and continue with replica i+1.
+            replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
+            pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
+
+            String oldCid = oldContainerByIndex.remove(i);
+            if (oldCid != null) {
+                try {
+                    orchestrator.stopContainer(oldCid);
+                    orchestrator.removeContainer(oldCid);
+                    log.info("rolling: replaced replica {} (old={}, new={})", i, oldCid, newCid);
+                } catch (Exception e) {
+                    log.warn("rolling: failed to stop old replica {} ({}): {}", i, oldCid, e.getMessage());
+                }
+            }
+        }
+
+        // === SWAP TRAFFIC ===
+        // Any old replicas with indices >= new.replicas (e.g., when replica
+        // count shrank) are still running; sweep them now so the old
+        // deployment can be marked STOPPED.
+        updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
+        for (Map.Entry<Integer, String> e : oldContainerByIndex.entrySet()) {
+            try {
+                orchestrator.stopContainer(e.getValue());
+                orchestrator.removeContainer(e.getValue());
+                log.info("rolling: stopped leftover old replica {} ({})", e.getKey(), e.getValue());
+            } catch (Exception ex) {
+                log.warn("rolling: failed to stop leftover old replica {}: {}", e.getKey(), ex.getMessage());
+            }
+        }
+        if (previousOpt.isPresent()) {
+            deploymentService.markStopped(previousOpt.get().id());
+        }
+
+        // === COMPLETE ===
+        updateStage(deployment.id(), DeployStage.COMPLETE);
+        persistSnapshotAndMarkRunning(ctx, newContainerIds.get(0));
+        log.info("Deployment {} is RUNNING (rolling, {}/{} replicas replaced)",
+                deployment.id(), config.replicas(), config.replicas());
+    }
+
+    /** Poll a single container until healthy or the timeout expires. Returns
+     * true on healthy, false on timeout or thread interrupt. */
+    private boolean waitForOneHealthy(String containerId, int timeoutSeconds) {
+        long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
+        while (System.currentTimeMillis() < deadline) {
+            ContainerStatus status = orchestrator.getContainerStatus(containerId);
+            if ("healthy".equals(status.state())) return true;
+            try { Thread.sleep(2000); } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();
+                return false;
+            }
+        }
+        return false;
+    }
+
+    /** Start one replica container with the gen-suffixed name and return its
+     * container id. Fills `stateOut` with the replicaStates JSONB row. */
+    private String startReplica(DeployCtx ctx, int i, Map<String, Object> stateOut) {
+        Environment env = ctx.env();
+        App app = ctx.app();
+        ResolvedContainerConfig config = ctx.config();
+
+        String instanceId = env.slug() + "-" + app.slug() + "-" + i + "-" + ctx.generation();
+        String containerName = tenantId + "-" + instanceId;
+
+        Map<String, String> labels = TraefikLabelBuilder.build(
+                app.slug(), env.slug(), tenantId, config, i, ctx.generation());
+        labels.putAll(ctx.prometheusLabels());
+
+        Map<String, String> replicaEnvVars = new LinkedHashMap<>(ctx.baseEnvVars());
+        replicaEnvVars.put("CAMELEER_AGENT_INSTANCEID", instanceId);
+
+        String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
+        ContainerRequest request = new ContainerRequest(
+                containerName, baseImage, ctx.jarPath(),
+                volumeName, jarStoragePath,
+                ctx.primaryNetwork(),
+                ctx.additionalNets(),
+                replicaEnvVars, labels,
+                config.memoryLimitBytes(), config.memoryReserveBytes(),
+                config.dockerCpuShares(), config.dockerCpuQuota(),
+                config.exposedPorts(), agentHealthPort,
+                "on-failure", 3,
+                ctx.resolvedRuntimeType(), config.customArgs(), ctx.mainClass()
+        );
+
+        String containerId = orchestrator.startContainer(request);
+
+        // Connect to additional networks after container is started
+        for (String net : ctx.additionalNets()) {
+            if (networkManager != null) {
+                networkManager.connectContainer(containerId, net);
+            }
+        }
+
+        orchestrator.startLogCapture(containerId, instanceId, app.slug(), env.slug(), tenantId);
+
+        stateOut.put("index", i);
+        stateOut.put("containerId", containerId);
+        stateOut.put("containerName", containerName);
+        stateOut.put("status", "STARTING");
+        return containerId;
+    }
+
+    /** Persist the deployment snapshot and mark the deployment RUNNING.
+     * Finalizes the deploy in a single place shared by all strategy paths. */
+    private void persistSnapshotAndMarkRunning(DeployCtx ctx, String primaryContainerId) {
+        Deployment deployment = ctx.deployment();
+        ApplicationConfig agentConfig = applicationConfigRepository
+                .findByApplicationAndEnvironment(ctx.app().slug(), ctx.env().slug())
+                .orElse(null);
+        List<String> snapshotSensitiveKeys = agentConfig != null ? agentConfig.getSensitiveKeys() : null;
+        DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
+                deployment.appVersionId(),
+                agentConfig,
+                ctx.app().containerConfig(),
+                snapshotSensitiveKeys);
+        pgDeployRepo.saveDeployedConfigSnapshot(deployment.id(), snapshot);
+
+        deploymentService.markRunning(deployment.id(), primaryContainerId);
+        pgDeployRepo.updateDeployStage(deployment.id(), null);
+        serverMetrics.recordDeploymentOutcome("RUNNING");
+        serverMetrics.recordDeploymentDuration(ctx.deployStart());
+    }
+
    public void stopDeployment(Deployment deployment) {
        pgDeployRepo.updateTargetState(deployment.id(), "STOPPED");
        deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.STOPPING,
@@ -341,7 +544,10 @@ public class DeploymentExecutor {
        return envVars;
    }

-    private int waitForAnyHealthy(List<String> containerIds, int timeoutSeconds) {
+    /** Poll until all containers are healthy or the timeout expires. Returns
+     * the healthy count at return time — == ids.size() on full success, less
+     * if the timeout won. */
+    private int waitForAllHealthy(List<String> containerIds, int timeoutSeconds) {
        long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
        int lastHealthy = 0;
        while (System.currentTimeMillis() < deadline) {
@@ -403,6 +609,10 @@ public class DeploymentExecutor {
        map.put("runtimeType", config.runtimeType());
        map.put("customArgs", config.customArgs());
        map.put("extraNetworks", config.extraNetworks());
+        map.put("externalRouting", config.externalRouting());
+        if (config.certResolver() != null) {
+            map.put("certResolver", config.certResolver());
+        }
        return map;
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/TraefikLabelBuilder.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/TraefikLabelBuilder.java
@@ -10,19 +10,28 @@ public final class TraefikLabelBuilder {
    private TraefikLabelBuilder() {}

    public static Map<String, String> build(String appSlug, String envSlug, String tenantId,
-                                              ResolvedContainerConfig config, int replicaIndex) {
+                                              ResolvedContainerConfig config, int replicaIndex,
+                                              String generation) {
+        // Traefik router/service keys stay generation-agnostic so load balancing
+        // spans old + new replicas during a blue/green overlap. instance-id and
+        // the new generation label carry the per-deploy identity.
        String svc = envSlug + "-" + appSlug;
-        String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex;
+        String instanceId = envSlug + "-" + appSlug + "-" + replicaIndex + "-" + generation;
        Map<String, String> labels = new LinkedHashMap<>();

-        labels.put("traefik.enable", "true");
        labels.put("managed-by", "cameleer-server");
        labels.put("cameleer.tenant", tenantId);
        labels.put("cameleer.app", appSlug);
        labels.put("cameleer.environment", envSlug);
        labels.put("cameleer.replica", String.valueOf(replicaIndex));
+        labels.put("cameleer.generation", generation);
        labels.put("cameleer.instance-id", instanceId);

+        if (!config.externalRouting()) {
+            return labels;
+        }
+
+        labels.put("traefik.enable", "true");
        labels.put("traefik.http.services." + svc + ".loadbalancer.server.port",
                String.valueOf(config.appPort()));

@@ -46,7 +55,10 @@ public final class TraefikLabelBuilder {

        if (config.sslOffloading()) {
            labels.put("traefik.http.routers." + svc + ".tls", "true");
-            labels.put("traefik.http.routers." + svc + ".tls.certresolver", "default");
+            if (config.certResolver() != null && !config.certResolver().isBlank()) {
+                labels.put("traefik.http.routers." + svc + ".tls.certresolver",
+                        config.certResolver());
+            }
        }

        return labels;
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java
@@ -122,6 +122,14 @@ public class ClickHouseLogStore implements LogIndex {
            baseParams.add(request.instanceId());
        }

+        if (!request.instanceIds().isEmpty()) {
+            String placeholders = String.join(", ", Collections.nCopies(request.instanceIds().size(), "?"));
+            baseConditions.add("instance_id IN (" + placeholders + ")");
+            for (String id : request.instanceIds()) {
+                baseParams.add(id);
+            }
+        }
+
        if (request.exchangeId() != null && !request.exchangeId().isEmpty()) {
            baseConditions.add("(exchange_id = ?" +
                    " OR (mapContains(mdc, 'cameleer.exchangeId') AND mdc['cameleer.exchangeId'] = ?)" +
@@ -281,6 +289,14 @@ public class ClickHouseLogStore implements LogIndex {
            params.add(request.instanceId());
        }

+        if (!request.instanceIds().isEmpty()) {
+            String placeholders = String.join(", ", Collections.nCopies(request.instanceIds().size(), "?"));
+            conditions.add("instance_id IN (" + placeholders + ")");
+            for (String id : request.instanceIds()) {
+                params.add(id);
+            }
+        }
+
        if (request.exchangeId() != null && !request.exchangeId().isEmpty()) {
            conditions.add("(exchange_id = ?" +
                    " OR (mapContains(mdc, 'cameleer.exchangeId') AND mdc['cameleer.exchangeId'] = ?)" +
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java
@@ -1,6 +1,7 @@
 package com.cameleer.server.app.search;

 import com.cameleer.server.core.alerting.AlertMatchSpec;
+import com.cameleer.server.core.search.AttributeFilter;
 import com.cameleer.server.core.search.ExecutionSummary;
 import com.cameleer.server.core.search.SearchRequest;
 import com.cameleer.server.core.search.SearchResult;
@@ -81,13 +82,24 @@ public class ClickHouseSearchIndex implements SearchIndex {
            String sortColumn = SORT_FIELD_MAP.getOrDefault(request.sortField(), "start_time");
            String sortDir = "asc".equalsIgnoreCase(request.sortDir()) ? "ASC" : "DESC";

+            // Composite-cursor callers (afterExecutionId set) need a deterministic tiebreak inside
+            // same-millisecond groups so the client-side last-row pick matches ClickHouse's row order.
+            // Without this, a same-start_time tail >LIMIT can silently drop rows: the page ends mid-ms,
+            // the cursor advances past the returned lastRowId, and the skipped rows with smaller
+            // execution_id values never reappear. Other callers (UI/stats) keep the unchanged
+            // single-column ORDER BY — they don't use the composite cursor.
+            String orderBy = sortColumn + " " + sortDir;
+            if (request.afterExecutionId() != null) {
+                orderBy += ", execution_id " + sortDir;
+            }
+
            String dataSql = "SELECT execution_id, route_id, instance_id, application_id, "
                    + "status, start_time, end_time, duration_ms, correlation_id, "
                    + "error_message, error_stacktrace, diagram_content_hash, attributes, "
                    + "has_trace_data, is_replay, "
                    + "input_body, output_body, input_headers, output_headers, root_cause_message "
                    + "FROM executions FINAL WHERE " + whereClause
-                    + " ORDER BY " + sortColumn + " " + sortDir
+                    + " ORDER BY " + orderBy
                    + " LIMIT ? OFFSET ?";

            List<Object> dataParams = new ArrayList<>(params);
@@ -124,7 +136,13 @@ public class ClickHouseSearchIndex implements SearchIndex {
        conditions.add("tenant_id = ?");
        params.add(tenantId);

-        if (request.timeFrom() != null) {
+        if (request.timeFrom() != null && request.afterExecutionId() != null) {
+            // composite predicate: strictly-after in (start_time, execution_id) tuple order
+            conditions.add("(start_time > ? OR (start_time = ? AND execution_id > ?))");
+            params.add(Timestamp.from(request.timeFrom()));
+            params.add(Timestamp.from(request.timeFrom()));
+            params.add(request.afterExecutionId());
+        } else if (request.timeFrom() != null) {
            conditions.add("start_time >= ?");
            params.add(Timestamp.from(request.timeFrom()));
        }
@@ -239,6 +257,23 @@ public class ClickHouseSearchIndex implements SearchIndex {
            params.add(likeTerm);
        }

+        // Structured attribute filters. Keys were validated at AttributeFilter construction
+        // time against ^[a-zA-Z0-9._-]+$ so they are safe to single-quote-inline; the JSON path
+        // argument of JSONExtractString does not accept a ? placeholder in ClickHouse JDBC
+        // (same constraint as countExecutionsForAlerting below). Values are parameter-bound.
+        for (AttributeFilter filter : request.attributeFilters()) {
+            String escapedKey = filter.key().replace("'", "\\'");
+            if (filter.isKeyOnly()) {
+                conditions.add("JSONHas(attributes, '" + escapedKey + "')");
+            } else if (filter.isWildcard()) {
+                conditions.add("JSONExtractString(attributes, '" + escapedKey + "') LIKE ?");
+                params.add(filter.toLikePattern());
+            } else {
+                conditions.add("JSONExtractString(attributes, '" + escapedKey + "') = ?");
+                params.add(filter.value());
+            }
+        }
+
        return String.join(" AND ", conditions);
    }

--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseDiagramStore.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseDiagramStore.java
@@ -16,8 +16,6 @@ import java.security.MessageDigest;
 import java.security.NoSuchAlgorithmException;
 import java.sql.Timestamp;
 import java.time.Instant;
-import java.util.ArrayList;
-import java.util.Collections;
 import java.util.HashMap;
 import java.util.HexFormat;
 import java.util.List;
@@ -57,6 +55,12 @@ public class ClickHouseDiagramStore implements DiagramStore {
            ORDER BY created_at DESC LIMIT 1
            """;

+    private static final String SELECT_HASH_FOR_APP_ROUTE = """
+            SELECT content_hash FROM route_diagrams
+            WHERE tenant_id = ? AND application_id = ? AND environment = ? AND route_id = ?
+            ORDER BY created_at DESC LIMIT 1
+            """;
+
    private static final String SELECT_DEFINITIONS_FOR_APP = """
            SELECT DISTINCT route_id, definition FROM route_diagrams
            WHERE tenant_id = ? AND application_id = ? AND environment = ?
@@ -68,6 +72,8 @@ public class ClickHouseDiagramStore implements DiagramStore {

    // (routeId + "\0" + instanceId) → contentHash
    private final ConcurrentHashMap<String, String> hashCache = new ConcurrentHashMap<>();
+    // (applicationId + "\0" + environment + "\0" + routeId) → most recent contentHash
+    private final ConcurrentHashMap<String, String> appRouteHashCache = new ConcurrentHashMap<>();
    // contentHash → deserialized RouteGraph
    private final ConcurrentHashMap<String, RouteGraph> graphCache = new ConcurrentHashMap<>();

@@ -92,12 +98,37 @@ public class ClickHouseDiagramStore implements DiagramStore {
        } catch (Exception e) {
            log.warn("Failed to warm diagram hash cache — lookups will fall back to ClickHouse: {}", e.getMessage());
        }
+
+        try {
+            jdbc.query(
+                    "SELECT application_id, environment, route_id, " +
+                            "argMax(content_hash, created_at) AS content_hash " +
+                            "FROM route_diagrams WHERE tenant_id = ? " +
+                            "GROUP BY application_id, environment, route_id",
+                    rs -> {
+                        String key = appRouteCacheKey(
+                                rs.getString("application_id"),
+                                rs.getString("environment"),
+                                rs.getString("route_id"));
+                        appRouteHashCache.put(key, rs.getString("content_hash"));
+                    },
+                    tenantId);
+            log.info("Diagram app-route cache warmed: {} entries", appRouteHashCache.size());
+        } catch (Exception e) {
+            log.warn("Failed to warm diagram app-route cache — lookups will fall back to ClickHouse: {}", e.getMessage());
+        }
    }

    private static String cacheKey(String routeId, String instanceId) {
        return routeId + "\0" + instanceId;
    }

+    private static String appRouteCacheKey(String applicationId, String environment, String routeId) {
+        return (applicationId != null ? applicationId : "") + "\0"
+                + (environment != null ? environment : "") + "\0"
+                + (routeId != null ? routeId : "");
+    }
+
    @Override
    public void store(TaggedDiagram diagram) {
        try {
@@ -122,6 +153,7 @@ public class ClickHouseDiagramStore implements DiagramStore {

            // Update caches
            hashCache.put(cacheKey(routeId, agentId), contentHash);
+            appRouteHashCache.put(appRouteCacheKey(applicationId, environment, routeId), contentHash);
            graphCache.put(contentHash, graph);

            log.debug("Stored diagram for route={} agent={} with hash={}", routeId, agentId, contentHash);
@@ -170,33 +202,29 @@ public class ClickHouseDiagramStore implements DiagramStore {
    }

    @Override
-    public Optional<String> findContentHashForRouteByAgents(String routeId, List<String> agentIds) {
-        if (agentIds == null || agentIds.isEmpty()) {
+    public Optional<String> findLatestContentHashForAppRoute(String applicationId,
+                                                             String routeId,
+                                                             String environment) {
+        if (applicationId == null || applicationId.isBlank()
+                || routeId == null || routeId.isBlank()
+                || environment == null || environment.isBlank()) {
            return Optional.empty();
        }

-        // Try cache first — return first hit
-        for (String agentId : agentIds) {
-            String cached = hashCache.get(cacheKey(routeId, agentId));
-            if (cached != null) {
-                return Optional.of(cached);
-            }
+        String key = appRouteCacheKey(applicationId, environment, routeId);
+        String cached = appRouteHashCache.get(key);
+        if (cached != null) {
+            return Optional.of(cached);
        }

-        // Fall back to ClickHouse
-        String placeholders = String.join(", ", Collections.nCopies(agentIds.size(), "?"));
-        String sql = "SELECT content_hash FROM route_diagrams " +
-                "WHERE tenant_id = ? AND route_id = ? AND instance_id IN (" + placeholders + ") " +
-                "ORDER BY created_at DESC LIMIT 1";
-        var params = new ArrayList<Object>();
-        params.add(tenantId);
-        params.add(routeId);
-        params.addAll(agentIds);
-        List<Map<String, Object>> rows = jdbc.queryForList(sql, params.toArray());
+        List<Map<String, Object>> rows = jdbc.queryForList(
+                SELECT_HASH_FOR_APP_ROUTE, tenantId, applicationId, environment, routeId);
        if (rows.isEmpty()) {
            return Optional.empty();
        }
-        return Optional.of((String) rows.get(0).get("content_hash"));
+        String hash = (String) rows.get(0).get("content_hash");
+        appRouteHashCache.put(key, hash);
+        return Optional.of(hash);
    }

    @Override
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseServerMetricsQueryStore.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseServerMetricsQueryStore.java
@@ -0,0 +1,408 @@
+package com.cameleer.server.app.storage;
+
+import com.cameleer.server.core.storage.ServerMetricsQueryStore;
+import com.cameleer.server.core.storage.model.ServerInstanceInfo;
+import com.cameleer.server.core.storage.model.ServerMetricCatalogEntry;
+import com.cameleer.server.core.storage.model.ServerMetricPoint;
+import com.cameleer.server.core.storage.model.ServerMetricQueryRequest;
+import com.cameleer.server.core.storage.model.ServerMetricQueryResponse;
+import com.cameleer.server.core.storage.model.ServerMetricSeries;
+import org.springframework.jdbc.core.JdbcTemplate;
+
+import java.sql.Array;
+import java.sql.Timestamp;
+import java.time.Duration;
+import java.time.Instant;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.regex.Pattern;
+
+/**
+ * ClickHouse-backed {@link ServerMetricsQueryStore}.
+ *
+ * <p>Safety rules for every query:
+ * <ul>
+ *   <li>tenant_id always bound as a parameter — no cross-tenant reads.</li>
+ *   <li>Identifier-like inputs (metric name, statistic, tag keys,
+ *       aggregation, mode) are regex-validated. Tag keys flow through the
+ *       query as JDBC parameter-bound values of {@code tags[?]} map lookups,
+ *       so even with a "safe" regex they cannot inject SQL.</li>
+ *   <li>Literal values ({@code from}, {@code to}, tag filter values,
+ *       server_instance_id allow-list) always go through {@code ?}.</li>
+ *   <li>The time range is capped at {@link #MAX_RANGE}.</li>
+ *   <li>Result cardinality is capped at {@link #MAX_SERIES} series.</li>
+ * </ul>
+ */
+public class ClickHouseServerMetricsQueryStore implements ServerMetricsQueryStore {
+
+    private static final Pattern SAFE_IDENTIFIER = Pattern.compile("^[a-zA-Z0-9._]+$");
+    private static final Pattern SAFE_STATISTIC  = Pattern.compile("^[a-z_]+$");
+
+    private static final Set<String> AGGREGATIONS = Set.of("avg", "sum", "max", "min", "latest");
+    private static final Set<String> MODES = Set.of("raw", "delta");
+
+    /** Maximum {@code to - from} window accepted by the API. */
+    static final Duration MAX_RANGE = Duration.ofDays(31);
+
+    /** Clamp bounds and default for {@code stepSeconds}. */
+    static final int MIN_STEP = 10;
+    static final int MAX_STEP = 3600;
+    static final int DEFAULT_STEP = 60;
+
+    /** Defence against group-by explosion — limit the series count per response. */
+    static final int MAX_SERIES = 500;
+
+    private final String tenantId;
+    private final JdbcTemplate jdbc;
+
+    public ClickHouseServerMetricsQueryStore(String tenantId, JdbcTemplate jdbc) {
+        this.tenantId = tenantId;
+        this.jdbc = jdbc;
+    }
+
+    // ── catalog ─────────────────────────────────────────────────────────
+
+    @Override
+    public List<ServerMetricCatalogEntry> catalog(Instant from, Instant to) {
+        requireRange(from, to);
+        String sql = """
+                SELECT
+                    metric_name,
+                    any(metric_type) AS metric_type,
+                    arraySort(groupUniqArray(statistic)) AS statistics,
+                    arraySort(arrayDistinct(arrayFlatten(groupArray(mapKeys(tags))))) AS tag_keys
+                FROM server_metrics
+                WHERE tenant_id = ?
+                  AND collected_at >= ?
+                  AND collected_at < ?
+                GROUP BY metric_name
+                ORDER BY metric_name
+                """;
+        return jdbc.query(sql, (rs, n) -> new ServerMetricCatalogEntry(
+                rs.getString("metric_name"),
+                rs.getString("metric_type"),
+                arrayToStringList(rs.getArray("statistics")),
+                arrayToStringList(rs.getArray("tag_keys"))
+        ), tenantId, Timestamp.from(from), Timestamp.from(to));
+    }
+
+    // ── instances ───────────────────────────────────────────────────────
+
+    @Override
+    public List<ServerInstanceInfo> listInstances(Instant from, Instant to) {
+        requireRange(from, to);
+        String sql = """
+                SELECT
+                    server_instance_id,
+                    min(collected_at) AS first_seen,
+                    max(collected_at) AS last_seen
+                FROM server_metrics
+                WHERE tenant_id = ?
+                  AND collected_at >= ?
+                  AND collected_at < ?
+                GROUP BY server_instance_id
+                ORDER BY last_seen DESC
+                """;
+        return jdbc.query(sql, (rs, n) -> new ServerInstanceInfo(
+                rs.getString("server_instance_id"),
+                rs.getTimestamp("first_seen").toInstant(),
+                rs.getTimestamp("last_seen").toInstant()
+        ), tenantId, Timestamp.from(from), Timestamp.from(to));
+    }
+
+    // ── query ───────────────────────────────────────────────────────────
+
+    @Override
+    public ServerMetricQueryResponse query(ServerMetricQueryRequest request) {
+        if (request == null) throw new IllegalArgumentException("request is required");
+        String metric = requireSafeIdentifier(request.metric(), "metric");
+        requireRange(request.from(), request.to());
+
+        String aggregation = request.aggregation() != null ? request.aggregation().toLowerCase() : "avg";
+        if (!AGGREGATIONS.contains(aggregation)) {
+            throw new IllegalArgumentException("aggregation must be one of " + AGGREGATIONS);
+        }
+
+        String mode = request.mode() != null ? request.mode().toLowerCase() : "raw";
+        if (!MODES.contains(mode)) {
+            throw new IllegalArgumentException("mode must be one of " + MODES);
+        }
+
+        int step = request.stepSeconds() != null ? request.stepSeconds() : DEFAULT_STEP;
+        if (step < MIN_STEP || step > MAX_STEP) {
+            throw new IllegalArgumentException(
+                    "stepSeconds must be in [" + MIN_STEP + "," + MAX_STEP + "]");
+        }
+
+        String statistic = request.statistic();
+        if (statistic != null && !SAFE_STATISTIC.matcher(statistic).matches()) {
+            throw new IllegalArgumentException("statistic contains unsafe characters");
+        }
+
+        List<String> groupByTags = request.groupByTags() != null
+                ? request.groupByTags() : List.of();
+        for (String t : groupByTags) requireSafeIdentifier(t, "groupByTag");
+
+        Map<String, String> filterTags = request.filterTags() != null
+                ? request.filterTags() : Map.of();
+        for (String t : filterTags.keySet()) requireSafeIdentifier(t, "filterTag key");
+
+        List<String> instanceAllowList = request.serverInstanceIds() != null
+                ? request.serverInstanceIds() : List.of();
+
+        boolean isDelta = "delta".equals(mode);
+        boolean isMean  = "mean".equals(statistic);
+
+        String sql = isDelta
+                ? buildDeltaSql(step, groupByTags, filterTags, instanceAllowList, statistic, isMean)
+                : buildRawSql(step, groupByTags, filterTags, instanceAllowList,
+                              statistic, aggregation, isMean);
+
+        List<Object> params = buildParams(groupByTags, metric, statistic, isMean,
+                                          request.from(), request.to(),
+                                          filterTags, instanceAllowList);
+
+        List<Row> rows = jdbc.query(sql, (rs, n) -> {
+            int idx = 1;
+            Instant bucket = rs.getTimestamp(idx++).toInstant();
+            List<String> tagValues = new ArrayList<>(groupByTags.size());
+            for (int g = 0; g < groupByTags.size(); g++) {
+                tagValues.add(rs.getString(idx++));
+            }
+            double value = rs.getDouble(idx);
+            return new Row(bucket, tagValues, value);
+        }, params.toArray());
+
+        return assembleSeries(rows, metric, statistic, aggregation, mode, step, groupByTags);
+    }
+
+    // ── SQL builders ────────────────────────────────────────────────────
+
+    /**
+     * Builds a single-pass SQL for raw mode:
+     * <pre>{@code
+     * SELECT bucket, tag0, ..., <agg>(metric_value) AS value
+     * FROM server_metrics WHERE ...
+     * GROUP BY bucket, tag0, ...
+     * ORDER BY bucket, tag0, ...
+     * }</pre>
+     * For {@code statistic=mean}, replaces the aggregate with
+     * {@code sumIf(value, statistic IN ('total','total_time')) / nullIf(sumIf(value, statistic='count'), 0)}.
+     */
+    private String buildRawSql(int step, List<String> groupByTags,
+                               Map<String, String> filterTags,
+                               List<String> instanceAllowList,
+                               String statistic, String aggregation, boolean isMean) {
+        StringBuilder s = new StringBuilder(512);
+        s.append("SELECT\n  toDateTime64(toStartOfInterval(collected_at, INTERVAL ")
+         .append(step).append(" SECOND), 3) AS bucket");
+        for (int i = 0; i < groupByTags.size(); i++) {
+            s.append(",\n  tags[?] AS tag").append(i);
+        }
+        s.append(",\n  ").append(isMean ? meanExpr() : scalarAggExpr(aggregation))
+         .append(" AS value\nFROM server_metrics\n");
+        appendWhereClause(s, filterTags, instanceAllowList, statistic, isMean);
+        s.append("GROUP BY bucket");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        s.append("\nORDER BY bucket");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        return s.toString();
+    }
+
+    /**
+     * Builds a three-level SQL for delta mode. Inner fills one
+     * (bucket, instance, tag-group) row via {@code max(metric_value)};
+     * middle computes positive-clipped per-instance differences via a
+     * window function; outer sums across instances.
+     */
+    private String buildDeltaSql(int step, List<String> groupByTags,
+                                 Map<String, String> filterTags,
+                                 List<String> instanceAllowList,
+                                 String statistic, boolean isMean) {
+        StringBuilder s = new StringBuilder(1024);
+        s.append("SELECT bucket");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        s.append(", sum(delta) AS value FROM (\n");
+
+        // Middle: per-instance positive-clipped delta using window.
+        s.append("  SELECT bucket");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        s.append(", server_instance_id, greatest(0, value - coalesce(any(value) OVER (")
+         .append("PARTITION BY server_instance_id");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        s.append(" ORDER BY bucket ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), value)) AS delta FROM (\n");
+
+        // Inner: one representative value per (bucket, instance, tag-group).
+        s.append("    SELECT\n      toDateTime64(toStartOfInterval(collected_at, INTERVAL ")
+         .append(step).append(" SECOND), 3) AS bucket,\n      server_instance_id");
+        for (int i = 0; i < groupByTags.size(); i++) {
+            s.append(",\n      tags[?] AS tag").append(i);
+        }
+        s.append(",\n      ").append(isMean ? meanExpr() : "max(metric_value)")
+         .append(" AS value\n    FROM server_metrics\n");
+        appendWhereClause(s, filterTags, instanceAllowList, statistic, isMean);
+        s.append("    GROUP BY bucket, server_instance_id");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        s.append("\n  ) AS bucketed\n) AS deltas\n");
+
+        s.append("GROUP BY bucket");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        s.append("\nORDER BY bucket");
+        for (int i = 0; i < groupByTags.size(); i++) s.append(", tag").append(i);
+        return s.toString();
+    }
+
+    /**
+     * WHERE clause shared by both raw and delta SQL shapes. Appended at the
+     * correct indent under either the single {@code FROM server_metrics}
+     * (raw) or the innermost one (delta).
+     */
+    private void appendWhereClause(StringBuilder s, Map<String, String> filterTags,
+                                   List<String> instanceAllowList,
+                                   String statistic, boolean isMean) {
+        s.append("    WHERE tenant_id = ?\n")
+         .append("      AND metric_name = ?\n");
+        if (isMean) {
+            s.append("      AND statistic IN ('count', 'total', 'total_time')\n");
+        } else if (statistic != null) {
+            s.append("      AND statistic = ?\n");
+        }
+        s.append("      AND collected_at >= ?\n")
+         .append("      AND collected_at < ?\n");
+        for (int i = 0; i < filterTags.size(); i++) {
+            s.append("      AND tags[?] = ?\n");
+        }
+        if (!instanceAllowList.isEmpty()) {
+            s.append("      AND server_instance_id IN (")
+             .append("?,".repeat(instanceAllowList.size() - 1)).append("?)\n");
+        }
+    }
+
+    /**
+     * SQL-positional params for both raw and delta queries (same relative
+     * order because the WHERE clause is emitted by {@link #appendWhereClause}
+     * only once, with the {@code tags[?]} select-list placeholders appearing
+     * earlier in the SQL text).
+     */
+    private List<Object> buildParams(List<String> groupByTags, String metric,
+                                     String statistic, boolean isMean,
+                                     Instant from, Instant to,
+                                     Map<String, String> filterTags,
+                                     List<String> instanceAllowList) {
+        List<Object> params = new ArrayList<>();
+        // SELECT-list tags[?] placeholders
+        params.addAll(groupByTags);
+        // WHERE
+        params.add(tenantId);
+        params.add(metric);
+        if (!isMean && statistic != null) params.add(statistic);
+        params.add(Timestamp.from(from));
+        params.add(Timestamp.from(to));
+        for (Map.Entry<String, String> e : filterTags.entrySet()) {
+            params.add(e.getKey());
+            params.add(e.getValue());
+        }
+        params.addAll(instanceAllowList);
+        return params;
+    }
+
+    private static String scalarAggExpr(String aggregation) {
+        return switch (aggregation) {
+            case "avg"    -> "avg(metric_value)";
+            case "sum"    -> "sum(metric_value)";
+            case "max"    -> "max(metric_value)";
+            case "min"    -> "min(metric_value)";
+            case "latest" -> "argMax(metric_value, collected_at)";
+            default       -> throw new IllegalStateException("unreachable: " + aggregation);
+        };
+    }
+
+    private static String meanExpr() {
+        return "sumIf(metric_value, statistic IN ('total', 'total_time'))"
+             + " / nullIf(sumIf(metric_value, statistic = 'count'), 0)";
+    }
+
+    // ── response assembly ───────────────────────────────────────────────
+
+    private ServerMetricQueryResponse assembleSeries(
+            List<Row> rows, String metric, String statistic,
+            String aggregation, String mode, int step, List<String> groupByTags) {
+
+        Map<List<String>, List<ServerMetricPoint>> bySignature = new LinkedHashMap<>();
+        for (Row r : rows) {
+            if (Double.isNaN(r.value) || Double.isInfinite(r.value)) continue;
+            bySignature.computeIfAbsent(r.tagValues, k -> new ArrayList<>())
+                       .add(new ServerMetricPoint(r.bucket, r.value));
+        }
+
+        if (bySignature.size() > MAX_SERIES) {
+            throw new IllegalArgumentException(
+                    "query produced " + bySignature.size()
+                            + " series; reduce groupByTags or tighten filterTags (max "
+                            + MAX_SERIES + ")");
+        }
+
+        List<ServerMetricSeries> series = new ArrayList<>(bySignature.size());
+        for (Map.Entry<List<String>, List<ServerMetricPoint>> e : bySignature.entrySet()) {
+            Map<String, String> tags = new LinkedHashMap<>();
+            for (int i = 0; i < groupByTags.size(); i++) {
+                tags.put(groupByTags.get(i), e.getKey().get(i));
+            }
+            series.add(new ServerMetricSeries(Collections.unmodifiableMap(tags), e.getValue()));
+        }
+
+        return new ServerMetricQueryResponse(metric,
+                statistic != null ? statistic : "value",
+                aggregation, mode, step, series);
+    }
+
+    // ── helpers ─────────────────────────────────────────────────────────
+
+    private static void requireRange(Instant from, Instant to) {
+        if (from == null || to == null) {
+            throw new IllegalArgumentException("from and to are required");
+        }
+        if (!from.isBefore(to)) {
+            throw new IllegalArgumentException("from must be strictly before to");
+        }
+        if (Duration.between(from, to).compareTo(MAX_RANGE) > 0) {
+            throw new IllegalArgumentException(
+                    "time range exceeds maximum of " + MAX_RANGE.toDays() + " days");
+        }
+    }
+
+    private static String requireSafeIdentifier(String value, String field) {
+        if (value == null || value.isBlank()) {
+            throw new IllegalArgumentException(field + " is required");
+        }
+        if (!SAFE_IDENTIFIER.matcher(value).matches()) {
+            throw new IllegalArgumentException(
+                    field + " contains unsafe characters (allowed: [a-zA-Z0-9._])");
+        }
+        return value;
+    }
+
+    private static List<String> arrayToStringList(Array array) {
+        if (array == null) return List.of();
+        try {
+            Object[] values = (Object[]) array.getArray();
+            Set<String> sorted = new TreeSet<>();
+            for (Object v : values) {
+                if (v != null) sorted.add(v.toString());
+            }
+            return List.copyOf(sorted);
+        } catch (Exception e) {
+            return List.of();
+        } finally {
+            try { array.free(); } catch (Exception ignore) { }
+        }
+    }
+
+    private record Row(Instant bucket, List<String> tagValues, double value) {
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseServerMetricsStore.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseServerMetricsStore.java
@@ -0,0 +1,46 @@
+package com.cameleer.server.app.storage;
+
+import com.cameleer.server.core.storage.ServerMetricsStore;
+import com.cameleer.server.core.storage.model.ServerMetricSample;
+import org.springframework.jdbc.core.JdbcTemplate;
+
+import java.sql.Timestamp;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class ClickHouseServerMetricsStore implements ServerMetricsStore {
+
+    private final JdbcTemplate jdbc;
+
+    public ClickHouseServerMetricsStore(JdbcTemplate jdbc) {
+        this.jdbc = jdbc;
+    }
+
+    @Override
+    public void insertBatch(List<ServerMetricSample> samples) {
+        if (samples.isEmpty()) return;
+
+        jdbc.batchUpdate("""
+                INSERT INTO server_metrics
+                    (tenant_id, collected_at, server_instance_id, metric_name,
+                     metric_type, statistic, metric_value, tags)
+                VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+                """,
+                samples.stream().map(s -> new Object[]{
+                        s.tenantId(),
+                        Timestamp.from(s.collectedAt()),
+                        s.serverInstanceId(),
+                        s.metricName(),
+                        s.metricType(),
+                        s.statistic(),
+                        s.value(),
+                        tagsToClickHouseMap(s.tags())
+                }).toList());
+    }
+
+    private Map<String, String> tagsToClickHouseMap(Map<String, String> tags) {
+        if (tags == null || tags.isEmpty()) return new HashMap<>();
+        return new HashMap<>(tags);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresDeploymentRepository.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresDeploymentRepository.java
@@ -1,6 +1,7 @@
 package com.cameleer.server.app.storage;

 import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentConfigSnapshot;
 import com.cameleer.server.core.runtime.DeploymentRepository;
 import com.cameleer.server.core.runtime.DeploymentStatus;
 import com.fasterxml.jackson.core.type.TypeReference;
@@ -21,7 +22,7 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
    private static final String SELECT_COLS =
            "id, app_id, app_version_id, environment_id, status, target_state, deployment_strategy, " +
            "replica_states, deploy_stage, container_id, container_name, error_message, " +
-            "resolved_config, deployed_at, stopped_at, created_at";
+            "resolved_config, deployed_config_snapshot, deployed_at, stopped_at, created_at, created_by";

    private final JdbcTemplate jdbc;
    private final ObjectMapper objectMapper;
@@ -62,6 +63,16 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
        return results.isEmpty() ? Optional.empty() : Optional.of(results.get(0));
    }

+    @Override
+    public Optional<Deployment> findActiveByAppIdAndEnvironmentIdExcluding(UUID appId, UUID environmentId, UUID excludeDeploymentId) {
+        var results = jdbc.query(
+                "SELECT " + SELECT_COLS + " FROM deployments WHERE app_id = ? AND environment_id = ? " +
+                "AND status IN ('STARTING', 'RUNNING', 'DEGRADED') AND id <> ? " +
+                "ORDER BY created_at DESC LIMIT 1",
+                (rs, rowNum) -> mapRow(rs), appId, environmentId, excludeDeploymentId);
+        return results.isEmpty() ? Optional.empty() : Optional.of(results.get(0));
+    }
+
    public List<Deployment> findByStatus(List<DeploymentStatus> statuses) {
        String placeholders = String.join(",", statuses.stream().map(s -> "'" + s.name() + "'").toList());
        return jdbc.query(
@@ -70,10 +81,10 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
    }

    @Override
-    public UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName) {
+    public UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName, String createdBy) {
        UUID id = UUID.randomUUID();
-        jdbc.update("INSERT INTO deployments (id, app_id, app_version_id, environment_id, container_name) VALUES (?, ?, ?, ?, ?)",
-                id, appId, appVersionId, environmentId, containerName);
+        jdbc.update("INSERT INTO deployments (id, app_id, app_version_id, environment_id, container_name, created_by) VALUES (?, ?, ?, ?, ?, ?)",
+                id, appId, appVersionId, environmentId, containerName, createdBy);
        return id;
    }

@@ -115,8 +126,8 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
    }

    @Override
-    public void deleteTerminalByAppAndEnvironment(UUID appId, UUID environmentId) {
-        jdbc.update("DELETE FROM deployments WHERE app_id = ? AND environment_id = ? AND status IN ('STOPPED', 'FAILED')",
+    public void deleteFailedByAppAndEnvironment(UUID appId, UUID environmentId) {
+        jdbc.update("DELETE FROM deployments WHERE app_id = ? AND environment_id = ? AND status = 'FAILED'",
                appId, environmentId);
    }

@@ -129,6 +140,27 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
        }
    }

+    public void saveDeployedConfigSnapshot(UUID id, DeploymentConfigSnapshot snapshot) {
+        try {
+            String json = snapshot != null ? objectMapper.writeValueAsString(snapshot) : null;
+            jdbc.update("UPDATE deployments SET deployed_config_snapshot = ?::jsonb WHERE id = ?", json, id);
+        } catch (Exception e) {
+            throw new RuntimeException("Failed to serialize deployed_config_snapshot", e);
+        }
+    }
+
+    public Optional<Deployment> findLatestSuccessfulByAppAndEnv(UUID appId, UUID envId) {
+        // DEGRADED deploys also carry a snapshot (executor writes before the RUNNING/DEGRADED
+        // split), and represent a config that reached COMPLETE stage — restorable for the user.
+        var results = jdbc.query(
+                "SELECT " + SELECT_COLS + " FROM deployments "
+                + "WHERE app_id = ? AND environment_id = ? "
+                + "AND status IN ('RUNNING', 'DEGRADED') AND deployed_config_snapshot IS NOT NULL "
+                + "ORDER BY deployed_at DESC NULLS LAST LIMIT 1",
+                (rs, rowNum) -> mapRow(rs), appId, envId);
+        return results.isEmpty() ? Optional.empty() : Optional.of(results.get(0));
+    }
+
    public Optional<Deployment> findByContainerId(String containerId) {
        var results = jdbc.query(
                "SELECT " + SELECT_COLS + " FROM deployments WHERE replica_states::text LIKE ? " +
@@ -158,6 +190,15 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
                throw new SQLException("Failed to deserialize resolved_config", e);
            }
        }
+        DeploymentConfigSnapshot deployedConfigSnapshot = null;
+        String snapshotJson = rs.getString("deployed_config_snapshot");
+        if (snapshotJson != null) {
+            try {
+                deployedConfigSnapshot = objectMapper.readValue(snapshotJson, DeploymentConfigSnapshot.class);
+            } catch (Exception e) {
+                throw new SQLException("Failed to deserialize deployed_config_snapshot", e);
+            }
+        }
        return new Deployment(
                UUID.fromString(rs.getString("id")),
                UUID.fromString(rs.getString("app_id")),
@@ -172,9 +213,11 @@ public class PostgresDeploymentRepository implements DeploymentRepository {
                rs.getString("container_name"),
                rs.getString("error_message"),
                resolvedConfig,
+                deployedConfigSnapshot,
                deployedAt != null ? deployedAt.toInstant() : null,
                stoppedAt != null ? stoppedAt.toInstant() : null,
-                rs.getTimestamp("created_at").toInstant()
+                rs.getTimestamp("created_at").toInstant(),
+                rs.getString("created_by")
        );
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresEnvironmentRepository.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/storage/PostgresEnvironmentRepository.java
@@ -1,6 +1,7 @@
 package com.cameleer.server.app.storage;

 import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentColor;
 import com.cameleer.server.core.runtime.EnvironmentRepository;
 import com.fasterxml.jackson.core.type.TypeReference;
 import com.fasterxml.jackson.databind.ObjectMapper;
@@ -24,7 +25,8 @@ public class PostgresEnvironmentRepository implements EnvironmentRepository {
        this.objectMapper = objectMapper;
    }

-    private static final String SELECT_COLS = "id, slug, display_name, production, enabled, default_container_config, jar_retention_count, created_at";
+    private static final String SELECT_COLS =
+            "id, slug, display_name, production, enabled, default_container_config, jar_retention_count, color, created_at";

    @Override
    public List<Environment> findAll() {
@@ -58,9 +60,9 @@ public class PostgresEnvironmentRepository implements EnvironmentRepository {
    }

    @Override
-    public void update(UUID id, String displayName, boolean production, boolean enabled) {
-        jdbc.update("UPDATE environments SET display_name = ?, production = ?, enabled = ?, updated_at = now() WHERE id = ?",
-                displayName, production, enabled, id);
+    public void update(UUID id, String displayName, boolean production, boolean enabled, String color) {
+        jdbc.update("UPDATE environments SET display_name = ?, production = ?, enabled = ?, color = ?, updated_at = now() WHERE id = ?",
+                displayName, production, enabled, color, id);
    }

    @Override
@@ -93,6 +95,10 @@ public class PostgresEnvironmentRepository implements EnvironmentRepository {
        } catch (Exception e) { /* use empty default */ }
        int retentionRaw = rs.getInt("jar_retention_count");
        Integer jarRetentionCount = rs.wasNull() ? null : retentionRaw;
+        String color = rs.getString("color");
+        if (color == null || color.isBlank()) {
+            color = EnvironmentColor.DEFAULT;
+        }
        return new Environment(
                UUID.fromString(rs.getString("id")),
                rs.getString("slug"),
@@ -101,6 +107,7 @@ public class PostgresEnvironmentRepository implements EnvironmentRepository {
                rs.getBoolean("enabled"),
                config,
                jarRetentionCount,
+                color,
                rs.getTimestamp("created_at").toInstant()
        );
    }
--- a/cameleer-server-app/src/main/resources/application.yml
+++ b/cameleer-server-app/src/main/resources/application.yml
@@ -18,6 +18,8 @@ spring:
  mvc:
    async:
      request-timeout: -1
+  mustache:
+    check-template-location: false
  jackson:
    serialization:
      write-dates-as-timestamps: false
@@ -53,6 +55,7 @@ cameleer:
      routingmode: ${CAMELEER_SERVER_RUNTIME_ROUTINGMODE:path}
      routingdomain: ${CAMELEER_SERVER_RUNTIME_ROUTINGDOMAIN:localhost}
      serverurl: ${CAMELEER_SERVER_RUNTIME_SERVERURL:}
+      certresolver: ${CAMELEER_SERVER_RUNTIME_CERTRESOLVER:}
      jardockervolume: ${CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME:}
    indexer:
      debouncems: ${CAMELEER_SERVER_INDEXER_DEBOUNCEMS:2000}
@@ -93,6 +96,10 @@ cameleer:
      notification-retention-days: ${CAMELEER_SERVER_ALERTING_NOTIFICATIONRETENTIONDAYS:30}
      webhook-timeout-ms: ${CAMELEER_SERVER_ALERTING_WEBHOOKTIMEOUTMS:5000}
      webhook-max-attempts: ${CAMELEER_SERVER_ALERTING_WEBHOOKMAXATTEMPTS:3}
+      # PER_EXCHANGE first-run cursor clamp: on first tick with no persisted cursor, evaluator
+      # scans no further back than (now - this cap). Prevents one-time backlog flood for rules
+      # whose createdAt predates a migration. Set to 0 to disable and replay from createdAt.
+      per-exchange-deploy-backlog-cap-seconds: ${CAMELEER_SERVER_ALERTING_PEREXCHANGEDEPLOYBACKLOGCAPSECONDS:86400}
    outbound-http:
      trust-all: false
      trusted-ca-pem-paths: []
@@ -105,6 +112,10 @@ cameleer:
      url: ${CAMELEER_SERVER_CLICKHOUSE_URL:jdbc:clickhouse://localhost:8123/cameleer}
      username: ${CAMELEER_SERVER_CLICKHOUSE_USERNAME:default}
      password: ${CAMELEER_SERVER_CLICKHOUSE_PASSWORD:}
+    self-metrics:
+      enabled: ${CAMELEER_SERVER_SELFMETRICS_ENABLED:true}
+      interval-ms: ${CAMELEER_SERVER_SELFMETRICS_INTERVALMS:60000}
+    instance-id: ${CAMELEER_SERVER_INSTANCE_ID:}

 springdoc:
  api-docs:
--- a/cameleer-server-app/src/main/resources/clickhouse/init.sql
+++ b/cameleer-server-app/src/main/resources/clickhouse/init.sql
@@ -401,6 +401,29 @@ CREATE TABLE IF NOT EXISTS route_catalog (
 ENGINE = ReplacingMergeTree(last_seen)
 ORDER BY (tenant_id, environment, application_id, route_id);

+-- ── Server Self-Metrics ────────────────────────────────────────────────
+-- Periodic snapshot of the server's own Micrometer registry (written by
+-- ServerMetricsSnapshotScheduler). No `environment` column — the server
+-- straddles environments. `statistic` distinguishes Timer/DistributionSummary
+-- sub-measurements (count, total_time, max, mean) from plain counter/gauge values.
+
+CREATE TABLE IF NOT EXISTS server_metrics (
+    tenant_id          LowCardinality(String) DEFAULT 'default',
+    collected_at       DateTime64(3),
+    server_instance_id LowCardinality(String),
+    metric_name        LowCardinality(String),
+    metric_type        LowCardinality(String),
+    statistic          LowCardinality(String) DEFAULT 'value',
+    metric_value       Float64,
+    tags               Map(String, String) DEFAULT map(),
+    server_received_at DateTime64(3) DEFAULT now64(3)
+)
+ENGINE = MergeTree()
+PARTITION BY (tenant_id, toYYYYMM(collected_at))
+ORDER BY (tenant_id, collected_at, server_instance_id, metric_name, statistic)
+TTL toDateTime(collected_at) + INTERVAL 90 DAY DELETE
+SETTINGS index_granularity = 8192;
+
 -- insert_id tiebreak for keyset pagination (fixes same-millisecond cursor collision).
 -- IF NOT EXISTS on ADD COLUMN is idempotent. MATERIALIZE COLUMN is a background mutation,
 -- effectively a no-op once all parts are already materialized.
--- a/cameleer-server-app/src/main/resources/db/migration/V2__add_environment_color.sql
+++ b/cameleer-server-app/src/main/resources/db/migration/V2__add_environment_color.sql
@@ -0,0 +1,6 @@
+-- V2: per-environment color for UI indicator
+-- Added after V1 baseline (2026-04-22). 8-swatch preset palette; default 'slate'.
+
+ALTER TABLE environments
+    ADD COLUMN color VARCHAR(16) NOT NULL DEFAULT 'slate'
+        CHECK (color IN ('slate','red','amber','green','teal','blue','purple','pink'));
--- a/cameleer-server-app/src/main/resources/db/migration/V3__deployment_config_snapshot.sql
+++ b/cameleer-server-app/src/main/resources/db/migration/V3__deployment_config_snapshot.sql
@@ -0,0 +1,7 @@
+-- V3: per-deployment config snapshot for "last known good" + dirty detection
+-- Captures {jarVersionId, agentConfig, containerConfig} at the moment a
+-- deployment transitions to RUNNING. Historical rows are NULL; dirty detection
+-- treats NULL as "everything dirty" and the next successful Redeploy populates it.
+
+ALTER TABLE deployments
+    ADD COLUMN deployed_config_snapshot JSONB;
--- a/cameleer-server-app/src/main/resources/db/migration/V4__add_deployment_created_by.sql
+++ b/cameleer-server-app/src/main/resources/db/migration/V4__add_deployment_created_by.sql
@@ -0,0 +1,8 @@
+-- V4: add created_by column to deployments for audit trail
+-- Captures which user initiated a deployment. Nullable for backwards compatibility;
+-- pre-V4 historical deployments will have NULL.
+
+ALTER TABLE deployments
+    ADD COLUMN created_by TEXT REFERENCES users(user_id);
+
+CREATE INDEX idx_deployments_created_by ON deployments (created_by);
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/AbstractPostgresIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/AbstractPostgresIT.java
@@ -21,10 +21,12 @@ public abstract class AbstractPostgresIT {
        postgres = new PostgreSQLContainer<>("postgres:16")
                .withDatabaseName("cameleer")
                .withUsername("cameleer")
-                .withPassword("test");
+                .withPassword("test")
+                .withReuse(true);
        postgres.start();

-        clickhouse = new ClickHouseContainer("clickhouse/clickhouse-server:24.12");
+        clickhouse = new ClickHouseContainer("clickhouse/clickhouse-server:24.12")
+                .withReuse(true);
        clickhouse.start();
    }

--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java
@@ -7,8 +7,10 @@ import com.cameleer.server.app.alerting.eval.AlertEvaluatorJob;
 import com.cameleer.server.app.alerting.notify.NotificationDispatchJob;
 import com.cameleer.server.app.outbound.crypto.SecretCipher;
 import com.cameleer.server.app.search.ClickHouseLogStore;
+import com.cameleer.server.app.storage.ClickHouseExecutionStore;
 import com.cameleer.server.core.alerting.*;
 import com.cameleer.server.core.ingestion.BufferedLogEntry;
+import com.cameleer.server.core.ingestion.MergedExecution;
 import com.cameleer.server.core.outbound.OutboundConnectionRepository;
 import com.fasterxml.jackson.databind.JsonNode;
 import com.fasterxml.jackson.databind.ObjectMapper;
@@ -62,6 +64,7 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
    @Autowired private AlertSilenceRepository silenceRepo;
    @Autowired private OutboundConnectionRepository outboundRepo;
    @Autowired private ClickHouseLogStore logStore;
+    @Autowired private ClickHouseExecutionStore executionStore;
    @Autowired private SecretCipher secretCipher;
    @Autowired private TestRestTemplate restTemplate;
    @Autowired private TestSecurityHelper securityHelper;
@@ -399,6 +402,102 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
        jdbcTemplate.update("DELETE FROM alert_rules WHERE id = ?", reNotifyRuleId);
    }

+    /**
+     * Exactly-once-per-exchange end-to-end lifecycle.
+     * <p>
+     * 5 FAILED exchanges across 2 evaluator ticks must produce exactly
+     * 5 FIRING instances + 5 PENDING notifications (one per exchange, one webhook).
+     * A third tick with no new exchanges must be a no-op. Acking one instance
+     * must leave the other four untouched.
+     * <p>
+     * Exercises the full Phase-1+2+3 stack: evaluator cursor persistence across
+     * ticks, per-tick rollback isolation, and the ack-doesn't-cascade invariant.
+     * See: docs/superpowers/plans/2026-04-22-per-exchange-exactly-once.md
+     */
+    @Test
+    @Order(7)
+    void perExchange_5FailuresAcross2Ticks_exactlyOncePerExchange() {
+        // Relative-to-now timestamps so they fall inside the evaluator's
+        // [rule.createdAt .. ctx.now()] window. Using Instant.parse(...) would
+        // require reconciling with the mocked alertingClock AND rule.createdAt,
+        // which is wall-clock in createPerExchangeRuleWithWebhook.
+        Instant base = Instant.now().minusSeconds(30);
+
+        // Pin the mocked alertingClock to current wall time so ctx.now() is >
+        // every seeded execution timestamp (base + 0..4s) AND > rule.createdAt
+        // (now - 60s). Prior tests may have set simulatedNow far in the past
+        // (step1 used wall time but step6 advanced by 61s — test ordering means
+        // the last value lingers). Re-pinning here makes the window deterministic.
+        setSimulatedNow(Instant.now());
+
+        UUID perExRuleId = createPerExchangeRuleWithWebhook();
+
+        // ── Tick 1 — seed 3, tick ────────────────────────────────────────────
+        seedFailedExecution("ex1-exec-1", base);
+        seedFailedExecution("ex1-exec-2", base.plusSeconds(1));
+        seedFailedExecution("ex1-exec-3", base.plusSeconds(2));
+        evaluatorJob.tick();
+
+        // ── Tick 2 — seed 2 more, tick ───────────────────────────────────────
+        seedFailedExecution("ex1-exec-4", base.plusSeconds(3));
+        seedFailedExecution("ex1-exec-5", base.plusSeconds(4));
+        // Re-open the rule claim so it's due for tick 2.
+        jdbcTemplate.update(
+            "UPDATE alert_rules SET next_evaluation_at = now() - interval '1 second', " +
+            "claimed_by = NULL, claimed_until = NULL WHERE id = ?", perExRuleId);
+        evaluatorJob.tick();
+
+        // Assert: 5 instances, 5 PENDING notifications.
+        List<UUID> instanceIds = instanceIdsForRule(perExRuleId);
+        assertThat(instanceIds)
+                .as("5 FAILED exchanges across 2 ticks must produce exactly 5 FIRING instances")
+                .hasSize(5);
+        List<AlertNotification> allNotifs = notificationsForRule(perExRuleId);
+        assertThat(allNotifs)
+                .as("5 instances × 1 webhook must produce exactly 5 notifications")
+                .hasSize(5);
+        assertThat(allNotifs.stream().allMatch(n -> n.status() == NotificationStatus.PENDING))
+                .as("all notifications must be PENDING before dispatch")
+                .isTrue();
+
+        // ── Dispatch all pending, then tick 3 — expect no change ────────────
+        dispatchAllPending();
+        // Re-open the rule claim so it's due for tick 3.
+        jdbcTemplate.update(
+            "UPDATE alert_rules SET next_evaluation_at = now() - interval '1 second', " +
+            "claimed_by = NULL, claimed_until = NULL WHERE id = ?", perExRuleId);
+        evaluatorJob.tick();
+
+        assertThat(instanceIdsForRule(perExRuleId))
+                .as("tick 3 with no new exchanges must not create new instances")
+                .hasSize(5);
+        long pending = notificationsForRule(perExRuleId).stream()
+                .filter(n -> n.status() == NotificationStatus.PENDING)
+                .count();
+        assertThat(pending)
+                .as("tick 3 must not re-enqueue notifications — all prior were dispatched")
+                .isZero();
+
+        // ── Ack one — others unchanged ──────────────────────────────────────
+        UUID firstInstanceId = instanceIds.get(0);
+        instanceRepo.ack(firstInstanceId, "test-operator", Instant.now());
+
+        List<AlertInstance> all = instanceIds.stream()
+                .map(id -> instanceRepo.findById(id).orElseThrow())
+                .toList();
+        long ackedCount = all.stream().filter(i -> i.ackedBy() != null).count();
+        assertThat(ackedCount)
+                .as("ack on one instance must not cascade to peers")
+                .isEqualTo(1);
+
+        // Cleanup — the @AfterAll cleans by envId which covers us, but be explicit.
+        jdbcTemplate.update("DELETE FROM alert_notifications WHERE alert_instance_id IN " +
+            "(SELECT id FROM alert_instances WHERE rule_id = ?)", perExRuleId);
+        jdbcTemplate.update("DELETE FROM alert_instances WHERE rule_id = ?", perExRuleId);
+        jdbcTemplate.update("DELETE FROM alert_rule_targets WHERE rule_id = ?", perExRuleId);
+        jdbcTemplate.update("DELETE FROM alert_rules WHERE id = ?", perExRuleId);
+    }
+
    // ── Helpers ───────────────────────────────────────────────────────────────

    /** POST the main lifecycle rule via REST API. Returns the created rule ID. */
@@ -513,4 +612,96 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
        logStore.insertBufferedBatch(List.of(
            new BufferedLogEntry(tenantId, envSlug, "lc-agent-01", "lc-app", entry)));
    }
+
+    // ── Helpers for perExchange exactly-once test ────────────────────────────
+
+    private static final String PER_EX_APP_SLUG = "per-ex-app";
+
+    /**
+     * Create a PER_EXCHANGE rule bound to {@link #PER_EX_APP_SLUG} that fires on
+     * {@code status=FAILED} and enqueues one notification per match via the
+     * pre-seeded webhook connection ({@link #connId}). Returns the new rule id.
+     * <p>
+     * Replicates the pattern from {@code AlertEvaluatorJobIT#createPerExchangeRuleWithWebhook}
+     * but reuses this test's env + outbound connection.
+     */
+    private UUID createPerExchangeRuleWithWebhook() {
+        UUID rid = UUID.randomUUID();
+        Instant now = Instant.now();
+        var condition = new ExchangeMatchCondition(
+                new AlertScope(PER_EX_APP_SLUG, null, null),
+                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
+                FireMode.PER_EXCHANGE, null, null);
+        var webhook = new WebhookBinding(connId, null, null, Map.of());
+        var rule = new AlertRule(
+                rid, envId, "per-ex-lc-rule-" + rid, null,
+                AlertSeverity.WARNING, true, ConditionKind.EXCHANGE_MATCH, condition,
+                60, 0, 60,
+                "Exchange FAILED: {{exchange.id}}", "route={{exchange.routeId}}",
+                List.of(webhook), List.of(),
+                now.minusSeconds(5),               // due now
+                null, null, Map.of(),
+                now.minusSeconds(60), "test-operator",  // createdAt bounds first-run cursor
+                now.minusSeconds(60), "test-operator");
+        ruleRepo.save(rule);
+        return rid;
+    }
+
+    /**
+     * Seed one FAILED execution into ClickHouse, scoped to this test's tenant/env/app
+     * so it's picked up by a PER_EXCHANGE rule targeting {@link #PER_EX_APP_SLUG}.
+     */
+    private void seedFailedExecution(String executionId, Instant startTime) {
+        executionStore.insertExecutionBatch(List.of(new MergedExecution(
+                tenantId, 1L, executionId, "route-a", "inst-1", PER_EX_APP_SLUG, envSlug,
+                "FAILED", "", "exchange-" + executionId,
+                startTime, startTime.plusMillis(100), 100L,
+                "", "", "", "", "", "",     // error fields
+                "", "FULL",                 // diagramContentHash, engineLevel
+                "", "", "", "", "", "",     // bodies / headers / properties
+                "{}",                        // attributes (JSON)
+                "", "",                     // traceId, spanId
+                false, false,
+                null, null
+        )));
+    }
+
+    /** All instance ids for a rule, ordered by fired_at ascending (deterministic). */
+    private List<UUID> instanceIdsForRule(UUID rid) {
+        return jdbcTemplate.queryForList(
+                "SELECT id FROM alert_instances WHERE rule_id = ? ORDER BY fired_at ASC",
+                UUID.class, rid);
+    }
+
+    /** All notifications across every instance of a rule. */
+    private List<AlertNotification> notificationsForRule(UUID rid) {
+        List<UUID> ids = instanceIdsForRule(rid);
+        List<AlertNotification> out = new java.util.ArrayList<>();
+        for (UUID iid : ids) {
+            out.addAll(notificationRepo.listForInstance(iid));
+        }
+        return out;
+    }
+
+    /**
+     * Simulate a dispatch pass without hitting the real webhook — marks every
+     * PENDING notification for this rule as DELIVERED. Using
+     * {@code dispatchJob.tick()} would round-trip through WireMock and require
+     * extra plumbing; the exactly-once contract under test is about the
+     * evaluator re-enqueueing behaviour, not webhook delivery.
+     */
+    private void dispatchAllPending() {
+        Instant now = Instant.now();
+        // Drain PENDING notifications across the whole env (safe because the
+        // ackedBy-scoped assertions further down look at this rule only).
+        List<UUID> pendingIds = jdbcTemplate.queryForList(
+                "SELECT n.id FROM alert_notifications n " +
+                "JOIN alert_instances i ON n.alert_instance_id = i.id " +
+                "WHERE i.environment_id = ? " +
+                "AND n.status = 'PENDING'::notification_status_enum",
+                UUID.class, envId);
+        for (UUID nid : pendingIds) {
+            notificationRepo.markDelivered(nid, 200, "OK", now);
+        }
+    }
 }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/controller/AlertRuleControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/controller/AlertRuleControllerIT.java
@@ -145,7 +145,7 @@ class AlertRuleControllerIT extends AbstractPostgresIT {
                {"name":"sqli-test","severity":"WARNING","conditionKind":"EXCHANGE_MATCH",
                 "condition":{"kind":"EXCHANGE_MATCH","scope":{},
                   "filter":{"status":"FAILED","attributes":{"foo'; DROP TABLE executions; --":"x"}},
-                   "fireMode":"PER_EXCHANGE","perExchangeLingerSeconds":60}}
+                   "fireMode":"PER_EXCHANGE"}}
                """;

        ResponseEntity<String> resp = restTemplate.exchange(
@@ -164,7 +164,8 @@ class AlertRuleControllerIT extends AbstractPostgresIT {
                {"name":"valid-attr","severity":"WARNING","conditionKind":"EXCHANGE_MATCH",
                 "condition":{"kind":"EXCHANGE_MATCH","scope":{},
                   "filter":{"status":"FAILED","attributes":{"order.type":"x"}},
-                   "fireMode":"PER_EXCHANGE","perExchangeLingerSeconds":60}}
+                   "fireMode":"PER_EXCHANGE"},
+                 "targets":[{"kind":"USER","targetId":"test-operator"}]}
                """;

        ResponseEntity<String> resp = restTemplate.exchange(
@@ -246,6 +247,61 @@ class AlertRuleControllerIT extends AbstractPostgresIT {
        assertThat(preview.getStatusCode()).isEqualTo(HttpStatus.OK);
    }

+    // --- PER_EXCHANGE cross-field validation + empty-targets validation ---
+    // RED tests: today's controller accepts these bodies; Task 3.3 adds the validator.
+
+    @Test
+    void createPerExchangeRule_withReNotifyMinutesNonZero_returns400() {
+        String body = perExchangeRuleBodyWithExtras(
+                "per-exchange-renotify",
+                /*reNotifyMinutes*/ 60,
+                /*forDurationSeconds*/ null);
+
+        ResponseEntity<String> resp = restTemplate.exchange(
+                "/api/v1/environments/" + envSlug + "/alerts/rules",
+                HttpMethod.POST,
+                new HttpEntity<>(body, securityHelper.authHeaders(operatorJwt)),
+                String.class);
+
+        assertThat(resp.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+        assertThat(resp.getBody()).contains("reNotifyMinutes");
+    }
+
+    @Test
+    void createPerExchangeRule_withForDurationSecondsNonZero_returns400() {
+        String body = perExchangeRuleBodyWithExtras(
+                "per-exchange-forduration",
+                /*reNotifyMinutes*/ null,
+                /*forDurationSeconds*/ 60);
+
+        ResponseEntity<String> resp = restTemplate.exchange(
+                "/api/v1/environments/" + envSlug + "/alerts/rules",
+                HttpMethod.POST,
+                new HttpEntity<>(body, securityHelper.authHeaders(operatorJwt)),
+                String.class);
+
+        assertThat(resp.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+        assertThat(resp.getBody()).contains("forDurationSeconds");
+    }
+
+    @Test
+    void createAnyRule_withEmptyWebhooksAndTargets_returns400() {
+        // baseValidPerExchangeRuleRequest() already produces no webhooks / no targets — that's
+        // precisely the "empty webhooks + empty targets" shape this test pins as a 400.
+        String body = baseValidPerExchangeRuleRequest("no-sinks");
+
+        ResponseEntity<String> resp = restTemplate.exchange(
+                "/api/v1/environments/" + envSlug + "/alerts/rules",
+                HttpMethod.POST,
+                new HttpEntity<>(body, securityHelper.authHeaders(operatorJwt)),
+                String.class);
+
+        assertThat(resp.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+        assertThat(resp.getBody()).satisfiesAnyOf(
+                s -> assertThat(s).contains("webhook"),
+                s -> assertThat(s).contains("target"));
+    }
+
    // --- Unknown env returns 404 ---

    @Test
@@ -269,10 +325,49 @@ class AlertRuleControllerIT extends AbstractPostgresIT {
    }

    private static String routeMetricRuleBody(String name) {
+        // Includes a USER target so the rule passes the "at least one webhook or target" guard.
        return """
                {"name":"%s","severity":"WARNING","conditionKind":"ROUTE_METRIC",
                 "condition":{"kind":"ROUTE_METRIC","scope":{},
-                   "metric":"ERROR_RATE","comparator":"GT","threshold":0.05,"windowSeconds":60}}
+                   "metric":"ERROR_RATE","comparator":"GT","threshold":0.05,"windowSeconds":60},
+                 "targets":[{"kind":"USER","targetId":"test-operator"}]}
                """.formatted(name);
    }
+
+    /**
+     * Produces a request body for a valid PER_EXCHANGE rule (baseline) — no webhooks,
+     * no targets, no reNotifyMinutes, no forDurationSeconds. The controller currently
+     * accepts this shape; Task 3.3 tightens that (empty sinks will 400).
+     */
+    private static String baseValidPerExchangeRuleRequest(String name) {
+        return """
+                {"name":"%s","severity":"WARNING","conditionKind":"EXCHANGE_MATCH",
+                 "condition":{"kind":"EXCHANGE_MATCH","scope":{},
+                   "filter":{"status":"FAILED","attributes":{}},
+                   "fireMode":"PER_EXCHANGE"}}
+                """.formatted(name);
+    }
+
+    /**
+     * Variant of {@link #baseValidPerExchangeRuleRequest(String)} that sets
+     * reNotifyMinutes and/or forDurationSeconds at the top-level request. Used to pin
+     * the PER_EXCHANGE cross-field validation contract (Task 3.3).
+     */
+    private static String perExchangeRuleBodyWithExtras(String name,
+                                                        Integer reNotifyMinutes,
+                                                        Integer forDurationSeconds) {
+        StringBuilder extras = new StringBuilder();
+        if (reNotifyMinutes != null) {
+            extras.append(",\"reNotifyMinutes\":").append(reNotifyMinutes);
+        }
+        if (forDurationSeconds != null) {
+            extras.append(",\"forDurationSeconds\":").append(forDurationSeconds);
+        }
+        return """
+                {"name":"%s","severity":"WARNING","conditionKind":"EXCHANGE_MATCH",
+                 "condition":{"kind":"EXCHANGE_MATCH","scope":{},
+                   "filter":{"status":"FAILED","attributes":{}},
+                   "fireMode":"PER_EXCHANGE"}%s}
+                """.formatted(name, extras.toString());
+    }
 }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentLifecycleEvaluatorTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentLifecycleEvaluatorTest.java
@@ -37,7 +37,7 @@ class AgentLifecycleEvaluatorTest {
        events  = mock(AgentEventRepository.class);
        envRepo = mock(EnvironmentRepository.class);
        when(envRepo.findById(ENV_ID)).thenReturn(Optional.of(
-                new Environment(ENV_ID, ENV_SLUG, "Prod", true, true, Map.of(), 5, Instant.EPOCH)));
+                new Environment(ENV_ID, ENV_SLUG, "Prod", true, true, Map.of(), 5, "slate", Instant.EPOCH)));
        eval = new AgentLifecycleEvaluator(events, envRepo);
    }

--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java
@@ -2,20 +2,30 @@ package com.cameleer.server.app.alerting.eval;

 import com.cameleer.server.app.AbstractPostgresIT;
 import com.cameleer.server.app.search.ClickHouseLogStore;
+import com.cameleer.server.app.storage.ClickHouseExecutionStore;
 import com.cameleer.server.core.agent.AgentInfo;
 import com.cameleer.server.core.agent.AgentRegistryService;
 import com.cameleer.server.core.agent.AgentState;
 import com.cameleer.server.core.alerting.*;
+import com.cameleer.server.core.ingestion.MergedExecution;
 import org.junit.jupiter.api.AfterEach;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.beans.factory.annotation.Qualifier;
+import org.springframework.boot.test.context.TestConfiguration;
 import org.springframework.boot.test.mock.mockito.MockBean;
+import org.springframework.context.annotation.Bean;
+import org.springframework.context.annotation.Import;
+import org.springframework.context.annotation.Primary;
+import org.springframework.jdbc.core.JdbcTemplate;

 import java.time.Instant;
 import java.util.List;
 import java.util.Map;
+import java.util.Optional;
 import java.util.UUID;
+import java.util.concurrent.atomic.AtomicInteger;

 import static org.assertj.core.api.Assertions.assertThat;
 import static org.mockito.Mockito.when;
@@ -29,6 +39,7 @@ import static org.mockito.Mockito.when;
 * {@code AgentRegistryService} is mocked so tests can control which agents
 * are DEAD without depending on in-memory timing.
 */
+@Import(AlertEvaluatorJobIT.FaultInjectingNotificationRepoConfig.class)
 class AlertEvaluatorJobIT extends AbstractPostgresIT {

    @MockBean(name = "clickHouseLogStore") ClickHouseLogStore    clickHouseLogStore;
@@ -37,24 +48,37 @@ class AlertEvaluatorJobIT extends AbstractPostgresIT {
    @Autowired private AlertEvaluatorJob job;
    @Autowired private AlertRuleRepository ruleRepo;
    @Autowired private AlertInstanceRepository instanceRepo;
+    @Autowired private AlertNotificationRepository notificationRepo;
+    @Autowired private FaultInjectingNotificationRepository faultInjectingNotificationRepo;
+    @Autowired private ClickHouseExecutionStore executionStore;
+    @Autowired @Qualifier("clickHouseJdbcTemplate") private JdbcTemplate clickHouseJdbc;

    private UUID envId;
    private UUID ruleId;
    private static final String SYS_USER    = "sys-eval-it";
    private static final String APP_SLUG    = "orders";
    private static final String AGENT_ID    = "test-agent-01";
+    private String envSlug;

    @BeforeEach
    void setup() {
+        // ClickHouse — purge any executions left over from prior tests in the
+        // shared CH instance. Matches the house-style used across the CH IT
+        // suite (see ClickHouseExecutionStoreIT, ClickHouseStatsStoreIT, etc.).
+        // TRUNCATE is synchronous, unlike ALTER ... DELETE (mutations_sync=0).
+        clickHouseJdbc.execute("TRUNCATE TABLE executions");
+        clickHouseJdbc.execute("TRUNCATE TABLE processor_executions");
+
        // Default: empty registry — all evaluators return Clear
        when(agentRegistryService.findAll()).thenReturn(List.of());

-        envId  = UUID.randomUUID();
-        ruleId = UUID.randomUUID();
+        envId   = UUID.randomUUID();
+        ruleId  = UUID.randomUUID();
+        envSlug = "eval-it-env-" + envId;

        jdbcTemplate.update(
            "INSERT INTO environments (id, slug, display_name) VALUES (?, ?, ?)",
-            envId, "eval-it-env-" + envId, "Eval IT Env");
+            envId, envSlug, "Eval IT Env");
        jdbcTemplate.update(
            "INSERT INTO users (user_id, provider, email) VALUES (?, 'local', ?) ON CONFLICT (user_id) DO NOTHING",
            SYS_USER, SYS_USER + "@test.example.com");
@@ -76,12 +100,17 @@ class AlertEvaluatorJobIT extends AbstractPostgresIT {

    @AfterEach
    void cleanup() {
+        // Always reset the fault-injector — a prior @Test may have left it armed,
+        // and Spring reuses the same context (and thus the same decorator bean)
+        // across tests in this class.
+        faultInjectingNotificationRepo.clearFault();
        jdbcTemplate.update("DELETE FROM alert_notifications WHERE alert_instance_id IN " +
            "(SELECT id FROM alert_instances WHERE environment_id = ?)", envId);
        jdbcTemplate.update("DELETE FROM alert_instances WHERE environment_id = ?", envId);
        jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId);
        jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
        jdbcTemplate.update("DELETE FROM users WHERE user_id = ?", SYS_USER);
+        // ClickHouse `executions` is truncated in @BeforeEach (house style).
    }

    // -------------------------------------------------------------------------
@@ -94,6 +123,77 @@ class AlertEvaluatorJobIT extends AbstractPostgresIT {
                AgentState.DEAD, lastHeartbeat.minusSeconds(300), lastHeartbeat, null);
    }

+    /**
+     * Seed one FAILED execution row into the ClickHouse {@code executions} table,
+     * scoped to this test's tenant/env/app so it's picked up by a PER_EXCHANGE rule
+     * targeting {@code APP_SLUG}. Executions older than {@code rule.createdAt} are
+     * filtered out by the evaluator; callers must pick {@code startTime} accordingly.
+     */
+    private void seedFailedExecution(String executionId, Instant startTime) {
+        executionStore.insertExecutionBatch(List.of(new MergedExecution(
+                "default", 1L, executionId, "route-a", "inst-1", APP_SLUG, envSlug,
+                "FAILED", "", "exchange-" + executionId,
+                startTime, startTime.plusMillis(100), 100L,
+                "", "", "", "", "", "",     // errorMessage..rootCauseMessage
+                "", "FULL",                 // diagramContentHash, engineLevel
+                "", "", "", "", "", "",     // bodies / headers / properties
+                "{}",                        // attributes (JSON)
+                "", "",                     // traceId, spanId
+                false, false,
+                null, null
+        )));
+    }
+
+    /**
+     * Create a PER_EXCHANGE rule targeting {@code APP_SLUG} + status=FAILED with a
+     * single webhook binding. Returns the persisted rule id.
+     * <p>
+     * {@code createdAt} is set {@code 60s} before {@code Instant.now()} so the
+     * evaluator's first-run lower bound ({@code timeFrom = rule.createdAt}) picks
+     * up the seeded executions.
+     */
+    private UUID createPerExchangeRuleWithWebhook() {
+        UUID ruleId2 = UUID.randomUUID();
+        Instant now = Instant.now();
+        var condition = new ExchangeMatchCondition(
+                new AlertScope(APP_SLUG, null, null),
+                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
+                FireMode.PER_EXCHANGE, null, null);
+        var webhook = new WebhookBinding(UUID.randomUUID(), null, null, Map.of());
+        var rule = new AlertRule(
+                ruleId2, envId, "per-exchange-rule-" + ruleId2, null,
+                AlertSeverity.WARNING, true, ConditionKind.EXCHANGE_MATCH, condition,
+                60, 0, 60,
+                "Exchange FAILED: {{exchange.id}}", "route={{exchange.routeId}}",
+                List.of(webhook), List.of(),
+                now.minusSeconds(5),                 // due now
+                null, null, Map.of(),
+                now.minusSeconds(60), SYS_USER,      // createdAt — bounds first-run cursor
+                now.minusSeconds(60), SYS_USER);
+        ruleRepo.save(rule);
+        return ruleId2;
+    }
+
+    /** List all notifications enqueued for any instance of {@code ruleId}. */
+    private List<AlertNotification> listNotificationsByRule(UUID ruleId) {
+        List<UUID> instanceIds = jdbcTemplate.queryForList(
+                "SELECT id FROM alert_instances WHERE rule_id = ?",
+                UUID.class, ruleId);
+        List<AlertNotification> out = new java.util.ArrayList<>();
+        for (UUID iid : instanceIds) {
+            out.addAll(notificationRepo.listForInstance(iid));
+        }
+        return out;
+    }
+
+    /** List all instances for {@code ruleId} (open or resolved). */
+    private int countInstancesByRule(UUID ruleId) {
+        Long c = jdbcTemplate.queryForObject(
+                "SELECT count(*) FROM alert_instances WHERE rule_id = ?",
+                Long.class, ruleId);
+        return c == null ? 0 : c.intValue();
+    }
+
    // -------------------------------------------------------------------------
    // Tests
    // -------------------------------------------------------------------------
@@ -238,4 +338,247 @@ class AlertEvaluatorJobIT extends AbstractPostgresIT {
        assertThat(snapshotAfter).contains("\"name\": \"dead-agent-rule\"");
        assertThat(snapshotAfter).contains("\"severity\": \"WARNING\"");
    }
+
+    // -------------------------------------------------------------------------
+    // PER_EXCHANGE regression pin — notifications must not re-enqueue for
+    // already-matched exchanges across tick boundaries (cursor must be persisted
+    // via releaseClaim, then read back on the next tick).
+    // See: docs/superpowers/plans/2026-04-22-per-exchange-exactly-once.md
+    // -------------------------------------------------------------------------
+
+    @Test
+    void tick2_noNewExchanges_enqueuesZeroAdditionalNotifications() {
+        // Arrange: 2 FAILED executions in ClickHouse, 1 PER_EXCHANGE rule with 1 webhook.
+        // Use relative-to-now timestamps so they sort within the evaluator's
+        // [rule.createdAt .. ctx.now()] window.
+        Instant t0 = Instant.now().minusSeconds(30);
+        seedFailedExecution("exec-1", t0);
+        seedFailedExecution("exec-2", t0.plusSeconds(1));
+        UUID perExRuleId = createPerExchangeRuleWithWebhook();
+
+        // Tick 1 — expect 2 instances, 2 notifications.
+        job.tick();
+        assertThat(countInstancesByRule(perExRuleId))
+                .as("tick 1 must create one FIRING instance per matched exchange")
+                .isEqualTo(2);
+        List<AlertNotification> afterTick1 = listNotificationsByRule(perExRuleId);
+        assertThat(afterTick1)
+                .as("tick 1 must enqueue one notification per instance (1 webhook × 2 instances)")
+                .hasSize(2);
+
+        // Simulate NotificationDispatchJob draining the queue.
+        Instant now = Instant.now();
+        afterTick1.forEach(n -> notificationRepo.markDelivered(n.id(), 200, "OK", now));
+
+        // Reopen the claim so the rule is due for Tick 2.
+        jdbcTemplate.update(
+            "UPDATE alert_rules SET next_evaluation_at = now() - interval '1 second', " +
+            "claimed_by = NULL, claimed_until = NULL WHERE id = ?", perExRuleId);
+
+        // Tick 2 — no new ClickHouse rows; cursor should have advanced past exec-2
+        // during tick 1 and persisted via releaseClaim. Therefore: no new firings,
+        // no new notifications.
+        job.tick();
+
+        // Instance count unchanged.
+        assertThat(countInstancesByRule(perExRuleId))
+                .as("tick 2 must not create new instances — cursor persisted past exec-2")
+                .isEqualTo(2);
+
+        // THE BLEED: any new PENDING notification after tick 2 indicates the
+        // evaluator re-matched already-processed exchanges (cursor not persisted
+        // across ticks). Must be zero after the Phase 1 fix.
+        long pending = listNotificationsByRule(perExRuleId).stream()
+                .filter(n -> n.status() == NotificationStatus.PENDING)
+                .count();
+        assertThat(pending)
+                .as("tick 2 must NOT re-enqueue notifications for already-matched exchanges")
+                .isZero();
+
+        // Clean up the extra rule (setup-created rule is handled by @AfterEach).
+        jdbcTemplate.update("DELETE FROM alert_notifications WHERE alert_instance_id IN " +
+            "(SELECT id FROM alert_instances WHERE rule_id = ?)", perExRuleId);
+        jdbcTemplate.update("DELETE FROM alert_instances WHERE rule_id = ?", perExRuleId);
+        jdbcTemplate.update("DELETE FROM alert_rules WHERE id = ?", perExRuleId);
+    }
+
+    // -------------------------------------------------------------------------
+    // Tick atomicity regression pin — a crash mid-batch must roll back every
+    // instance + notification write AND leave the cursor unchanged so the
+    // next tick re-processes the entire batch exactly once.
+    // See: docs/superpowers/plans/2026-04-22-per-exchange-exactly-once.md (Task 2.1)
+    // -------------------------------------------------------------------------
+
+    @Test
+    void tickRollback_faultOnSecondNotificationInsert_leavesCursorUnchanged() {
+        // Seed 3 FAILED executions so the rule's PER_EXCHANGE batch has 3 firings.
+        // Relative-to-now timestamps so they fall inside [rule.createdAt .. ctx.now()].
+        Instant t0 = Instant.now().minusSeconds(30);
+        seedFailedExecution("exec-1", t0);
+        seedFailedExecution("exec-2", t0.plusSeconds(1));
+        seedFailedExecution("exec-3", t0.plusSeconds(2));
+        UUID perExRuleId = createPerExchangeRuleWithWebhook();
+
+        var ruleBefore = ruleRepo.findById(perExRuleId).orElseThrow();
+        Object cursorBefore = ruleBefore.evalState().get("lastExchangeCursor"); // null on first run
+        Instant nextRunBefore = ruleBefore.nextEvaluationAt();
+
+        // Arm the fault injector: the 2nd notification save() throws.
+        // (Instance saves are NOT counted — the injector is scoped to notification saves.)
+        faultInjectingNotificationRepo.failOnSave(2);
+
+        // Today (Phase 1, non-transactional): the evaluator catches the exception
+        // per-rule and logs a warning — see AlertEvaluatorJob#tick's try/catch
+        // around applyResult. So tick() itself does NOT rethrow. That is exactly
+        // why this IT is RED pre-Phase-2: post-rollback asserts expect 0 instances
+        // and 0 notifications, but the current code will have persisted
+        // firing #1 (instance + notification) and firing #2's instance before the
+        // fault on firing #2's notification. Phase 2 wraps the per-rule body in
+        // @Transactional so the single-rule failure rolls back all of its writes.
+        try {
+            job.tick();
+        } catch (RuntimeException expectedAfterPhase2) {
+            // Phase 2 may choose to rethrow; either way the rollback assertions
+            // below are what pin the contract.
+            // intentionally empty — fault-injection swallow/rethrow tolerance; see comment above
+        }
+
+        // Post-rollback: zero instances, zero notifications, cursor unchanged,
+        // nextRunAt unchanged (Phase 2 will hold the claim so the next tick retries).
+        assertThat(countInstancesByRule(perExRuleId))
+                .as("Phase 2 contract: mid-batch fault rolls back every instance write")
+                .isZero();
+        assertThat(listNotificationsByRule(perExRuleId))
+                .as("Phase 2 contract: mid-batch fault rolls back every notification write")
+                .isEmpty();
+        var ruleAfter = ruleRepo.findById(perExRuleId).orElseThrow();
+        assertThat(ruleAfter.evalState().get("lastExchangeCursor"))
+                .as("Phase 2 contract: cursor MUST NOT advance when the tick fails")
+                .isEqualTo(cursorBefore);
+        assertThat(ruleAfter.nextEvaluationAt())
+                .as("Phase 2 contract: nextEvaluationAt MUST NOT advance when the tick fails")
+                .isEqualTo(nextRunBefore);
+
+        // Recover: clear the fault, reopen the claim, tick again.
+        // All 3 firings must land on the second tick — exactly-once-per-exchange.
+        faultInjectingNotificationRepo.clearFault();
+        jdbcTemplate.update(
+            "UPDATE alert_rules SET next_evaluation_at = now() - interval '1 second', " +
+            "claimed_by = NULL, claimed_until = NULL WHERE id = ?", perExRuleId);
+
+        job.tick();
+
+        assertThat(countInstancesByRule(perExRuleId))
+                .as("after recovery: all 3 exchanges produce exactly one instance each")
+                .isEqualTo(3);
+        assertThat(listNotificationsByRule(perExRuleId))
+                .as("after recovery: all 3 instances produce exactly one notification each")
+                .hasSize(3);
+        assertThat(ruleRepo.findById(perExRuleId).orElseThrow()
+                .evalState().get("lastExchangeCursor"))
+                .as("after recovery: cursor advanced past exec-3")
+                .isNotEqualTo(cursorBefore);
+
+        // Clean up the extra rule (setup-created rule is handled by @AfterEach).
+        jdbcTemplate.update("DELETE FROM alert_notifications WHERE alert_instance_id IN " +
+            "(SELECT id FROM alert_instances WHERE rule_id = ?)", perExRuleId);
+        jdbcTemplate.update("DELETE FROM alert_instances WHERE rule_id = ?", perExRuleId);
+        jdbcTemplate.update("DELETE FROM alert_rules WHERE id = ?", perExRuleId);
+    }
+
+    // -------------------------------------------------------------------------
+    // Fault-injecting AlertNotificationRepository decorator
+    //
+    // Delegates all calls to the real Postgres-backed repository except
+    // {@link #save(AlertNotification)} — that method increments a counter and
+    // throws a RuntimeException when the configured trigger-count is reached.
+    // Only notification saves are counted; instance saves go through a separate
+    // repo and are unaffected.
+    // -------------------------------------------------------------------------
+
+    static class FaultInjectingNotificationRepository implements AlertNotificationRepository {
+        private final AlertNotificationRepository delegate;
+        private final AtomicInteger saveCount = new AtomicInteger(0);
+        private volatile int failOnNthSave = -1; // -1 = disabled
+
+        FaultInjectingNotificationRepository(AlertNotificationRepository delegate) {
+            this.delegate = delegate;
+        }
+
+        /** Arms the fault: the {@code n}-th call to {@link #save} (1-indexed) throws. */
+        void failOnSave(int n) {
+            saveCount.set(0);
+            failOnNthSave = n;
+        }
+
+        /** Disarms the fault and resets the counter. */
+        void clearFault() {
+            failOnNthSave = -1;
+            saveCount.set(0);
+        }
+
+        @Override
+        public AlertNotification save(AlertNotification n) {
+            int current = saveCount.incrementAndGet();
+            if (failOnNthSave > 0 && current == failOnNthSave) {
+                throw new RuntimeException(
+                        "FaultInjectingNotificationRepository: injected failure on save #" + current);
+            }
+            return delegate.save(n);
+        }
+
+        @Override
+        public Optional<AlertNotification> findById(UUID id) { return delegate.findById(id); }
+
+        @Override
+        public List<AlertNotification> listForInstance(UUID alertInstanceId) {
+            return delegate.listForInstance(alertInstanceId);
+        }
+
+        @Override
+        public List<AlertNotification> claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds) {
+            return delegate.claimDueNotifications(instanceId, batchSize, claimTtlSeconds);
+        }
+
+        @Override
+        public void markDelivered(UUID id, int status, String snippet, Instant when) {
+            delegate.markDelivered(id, status, snippet, when);
+        }
+
+        @Override
+        public void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet) {
+            delegate.scheduleRetry(id, nextAttemptAt, status, snippet);
+        }
+
+        @Override
+        public void markFailed(UUID id, int status, String snippet) {
+            delegate.markFailed(id, status, snippet);
+        }
+
+        @Override
+        public void resetForRetry(UUID id, Instant nextAttemptAt) {
+            delegate.resetForRetry(id, nextAttemptAt);
+        }
+
+        @Override
+        public void deleteSettledBefore(Instant cutoff) {
+            delegate.deleteSettledBefore(cutoff);
+        }
+    }
+
+    /**
+     * {@link TestConfiguration} that installs the fault-injecting decorator as the
+     * {@code @Primary} {@link AlertNotificationRepository}. The real Postgres repo is
+     * still registered (via {@code AlertingBeanConfig}) and is injected into the
+     * decorator as the delegate, so every non-instrumented call path exercises real SQL.
+     */
+    @TestConfiguration
+    static class FaultInjectingNotificationRepoConfig {
+        @Bean
+        @Primary
+        FaultInjectingNotificationRepository faultInjectingNotificationRepository(
+                @Qualifier("alertNotificationRepository") AlertNotificationRepository realRepo) {
+            return new FaultInjectingNotificationRepository(realRepo);
+        }
+    }
 }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java
@@ -157,7 +157,7 @@ class AlertStateTransitionsTest {

    @Test
    void batchResultAlwaysEmpty() {
-        var batch = new EvalResult.Batch(List.of(FIRING_RESULT));
+        var batch = new EvalResult.Batch(List.of(FIRING_RESULT), Map.of());
        var next = AlertStateTransitions.apply(null, batch, ruleWith(0), NOW);
        assertThat(next).isEmpty();
    }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/DeploymentStateEvaluatorTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/DeploymentStateEvaluatorTest.java
@@ -48,7 +48,7 @@ class DeploymentStateEvaluatorTest {
    private Deployment deployment(DeploymentStatus status) {
        return new Deployment(DEP_ID, APP_ID, UUID.randomUUID(), ENV_ID, status,
                null, null, List.of(), null, null, "orders-0", null,
-                Map.of(), NOW.minusSeconds(60), null, NOW.minusSeconds(120));
+                Map.of(), null, NOW.minusSeconds(60), null, NOW.minusSeconds(120), "test-user");
    }

    @Test
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/ExchangeMatchEvaluatorTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/ExchangeMatchEvaluatorTest.java
@@ -1,15 +1,18 @@
 package com.cameleer.server.app.alerting.eval;

+import com.cameleer.server.app.alerting.config.AlertingProperties;
 import com.cameleer.server.app.search.ClickHouseSearchIndex;
 import com.cameleer.server.core.alerting.*;
 import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.runtime.EnvironmentRepository;
 import com.cameleer.server.core.search.ExecutionSummary;
+import com.cameleer.server.core.search.SearchRequest;
 import com.cameleer.server.core.search.SearchResult;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.mockito.ArgumentCaptor;

+import java.time.Duration;
 import java.time.Instant;
 import java.util.List;
 import java.util.Map;
@@ -34,9 +37,11 @@ class ExchangeMatchEvaluatorTest {
    void setUp() {
        searchIndex = mock(ClickHouseSearchIndex.class);
        envRepo     = mock(EnvironmentRepository.class);
-        eval = new ExchangeMatchEvaluator(searchIndex, envRepo);
+        AlertingProperties props = new AlertingProperties(
+                null, null, null, null, null, null, null, null, null, null, null, null, null, null);
+        eval = new ExchangeMatchEvaluator(searchIndex, envRepo, props);

-        var env = new Environment(ENV_ID, "prod", "Production", false, true, null, null, null);
+        var env = new Environment(ENV_ID, "prod", "Production", false, true, null, null, "slate", null);
        when(envRepo.findById(ENV_ID)).thenReturn(Optional.of(env));
    }

@@ -45,10 +50,21 @@ class ExchangeMatchEvaluatorTest {
    }

    private AlertRule ruleWith(AlertCondition condition, Map<String, Object> evalState) {
+        return ruleWith(condition, evalState, null);
+    }
+
+    private AlertRule ruleWith(AlertCondition condition, Map<String, Object> evalState, Instant createdAt) {
        return new AlertRule(RULE_ID, ENV_ID, "test", null,
                AlertSeverity.WARNING, true, condition.kind(), condition,
                60, 0, 0, null, null, List.of(), List.of(),
-                null, null, null, evalState, null, null, null, null);
+                null, null, null, evalState, createdAt, null, null, null);
+    }
+
+    private ExchangeMatchCondition perExchangeCondition() {
+        return new ExchangeMatchCondition(
+                new AlertScope("orders", null, null),
+                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
+                FireMode.PER_EXCHANGE, null, null);
    }

    private ExecutionSummary summary(String id, Instant startTime, String status) {
@@ -64,7 +80,7 @@ class ExchangeMatchEvaluatorTest {
        var condition = new ExchangeMatchCondition(
                new AlertScope("orders", null, null),
                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
-                FireMode.COUNT_IN_WINDOW, 5, 300, null);
+                FireMode.COUNT_IN_WINDOW, 5, 300);

        when(searchIndex.countExecutionsForAlerting(any())).thenReturn(7L);

@@ -79,7 +95,7 @@ class ExchangeMatchEvaluatorTest {
        var condition = new ExchangeMatchCondition(
                new AlertScope("orders", null, null),
                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
-                FireMode.COUNT_IN_WINDOW, 5, 300, null);
+                FireMode.COUNT_IN_WINDOW, 5, 300);

        when(searchIndex.countExecutionsForAlerting(any())).thenReturn(3L);

@@ -92,7 +108,7 @@ class ExchangeMatchEvaluatorTest {
        var condition = new ExchangeMatchCondition(
                new AlertScope("orders", "direct:pay", null),
                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of("orderId", "123")),
-                FireMode.COUNT_IN_WINDOW, 1, 120, null);
+                FireMode.COUNT_IN_WINDOW, 1, 120);

        when(searchIndex.countExecutionsForAlerting(any())).thenReturn(2L);

@@ -119,7 +135,7 @@ class ExchangeMatchEvaluatorTest {
        var condition = new ExchangeMatchCondition(
                new AlertScope("orders", null, null),
                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
-                FireMode.PER_EXCHANGE, null, null, 60);
+                FireMode.PER_EXCHANGE, null, null);

        when(searchIndex.search(any())).thenReturn(SearchResult.empty(0, 50));

@@ -133,7 +149,7 @@ class ExchangeMatchEvaluatorTest {
        var condition = new ExchangeMatchCondition(
                new AlertScope("orders", null, null),
                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
-                FireMode.PER_EXCHANGE, null, null, 60);
+                FireMode.PER_EXCHANGE, null, null);

        Instant t1 = NOW.minusSeconds(50);
        Instant t2 = NOW.minusSeconds(30);
@@ -153,11 +169,11 @@ class ExchangeMatchEvaluatorTest {
    }

    @Test
-    void perExchange_lastFiringCarriesNextCursor() {
+    void perExchange_batchCarriesNextCursorInEvalState() {
        var condition = new ExchangeMatchCondition(
                new AlertScope("orders", null, null),
                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
-                FireMode.PER_EXCHANGE, null, null, 60);
+                FireMode.PER_EXCHANGE, null, null);

        Instant t1 = NOW.minusSeconds(50);
        Instant t2 = NOW.minusSeconds(10); // latest
@@ -169,32 +185,119 @@ class ExchangeMatchEvaluatorTest {
        EvalResult r = eval.evaluate(condition, ruleWith(condition), new EvalContext("default", NOW, new TickCache()));
        var batch = (EvalResult.Batch) r;

-        // last firing carries the _nextCursor key with the latest startTime
-        EvalResult.Firing last = batch.firings().get(batch.firings().size() - 1);
-        assertThat(last.context()).containsKey("_nextCursor");
-        assertThat(last.context().get("_nextCursor")).isEqualTo(t2);
+        // The batch carries the composite next-cursor in nextEvalState under "lastExchangeCursor"
+        assertThat(batch.nextEvalState()).containsKey("lastExchangeCursor");
+        assertThat(batch.nextEvalState().get("lastExchangeCursor"))
+                .isEqualTo(t2.toString() + "|ex-2");
    }

    @Test
-    void perExchange_usesLastExchangeTsFromEvalState() {
+    void perExchange_usesLastExchangeCursorFromEvalState() {
        var condition = new ExchangeMatchCondition(
                new AlertScope("orders", null, null),
                new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
-                FireMode.PER_EXCHANGE, null, null, 60);
+                FireMode.PER_EXCHANGE, null, null);

        Instant cursor = NOW.minusSeconds(120);
-        var rule = ruleWith(condition, Map.of("lastExchangeTs", cursor.toString()));
+        var rule = ruleWith(condition, Map.of("lastExchangeCursor", cursor.toString() + "|ex-prev"));

        when(searchIndex.search(any())).thenReturn(SearchResult.empty(0, 50));

        eval.evaluate(condition, rule, new EvalContext("default", NOW, new TickCache()));

-        // Verify the search request used the cursor as the lower-bound
+        // Verify the search request used the cursor tuple: timeFrom + afterExecutionId
        ArgumentCaptor<com.cameleer.server.core.search.SearchRequest> captor =
                ArgumentCaptor.forClass(com.cameleer.server.core.search.SearchRequest.class);
        verify(searchIndex).search(captor.capture());
-        // timeFrom should be the cursor value
        assertThat(captor.getValue().timeFrom()).isEqualTo(cursor);
+        assertThat(captor.getValue().afterExecutionId()).isEqualTo("ex-prev");
+    }
+
+    @Test
+    void cursorMonotonicity_sameMillisecondExchanges_fireExactlyOncePerTick() {
+        var t = Instant.parse("2026-04-22T10:00:00Z");
+        var exA = summary("exec-a", t, "FAILED");
+        var exB = summary("exec-b", t, "FAILED");
+        when(searchIndex.search(any())).thenReturn(new SearchResult<>(List.of(exA, exB), 2L, 0, 50));
+
+        ExchangeMatchCondition condition = perExchangeCondition();
+        AlertRule rule = ruleWith(condition, Map.of()).withEvalState(Map.of()); // first-run
+        EvalResult r1 = eval.evaluate(condition, rule,
+                new EvalContext("default", t.plusSeconds(1), new TickCache()));
+
+        assertThat(r1).isInstanceOf(EvalResult.Batch.class);
+        var batch1 = (EvalResult.Batch) r1;
+        assertThat(batch1.firings()).hasSize(2);
+        assertThat(batch1.nextEvalState()).containsKey("lastExchangeCursor");
+        // cursor is (t, "exec-b") since "exec-b" > "exec-a" lexicographically
+
+        // Tick 2: reflect the advanced cursor back; expect zero firings
+        AlertRule advanced = rule.withEvalState(batch1.nextEvalState());
+        when(searchIndex.search(any())).thenReturn(new SearchResult<>(List.of(), 0L, 0, 50));
+        EvalResult r2 = eval.evaluate(condition, advanced,
+                new EvalContext("default", t.plusSeconds(2), new TickCache()));
+        assertThat(((EvalResult.Batch) r2).firings()).isEmpty();
+
+        // Tick 3: a third exchange at the same t with exec-c > exec-b; expect exactly one firing
+        var exC = summary("exec-c", t, "FAILED");
+        when(searchIndex.search(any())).thenReturn(new SearchResult<>(List.of(exC), 1L, 0, 50));
+        EvalResult r3 = eval.evaluate(condition, advanced,
+                new EvalContext("default", t.plusSeconds(3), new TickCache()));
+        assertThat(((EvalResult.Batch) r3).firings()).hasSize(1);
+        assertThat(((EvalResult.Batch) r3).nextEvalState()).containsKey("lastExchangeCursor");
+    }
+
+    @Test
+    void firstRun_boundedByRuleCreatedAt_notRetentionHistory() {
+        var created = Instant.parse("2026-04-22T09:00:00Z");
+        var after   = created.plus(Duration.ofMinutes(30));
+
+        // The evaluator must pass `timeFrom = created` to the search.
+        ArgumentCaptor<SearchRequest> cap = ArgumentCaptor.forClass(SearchRequest.class);
+        when(searchIndex.search(cap.capture())).thenReturn(
+                new SearchResult<>(List.of(summary("exec-after", after, "FAILED")), 1L, 0, 50));
+
+        ExchangeMatchCondition condition = perExchangeCondition();
+        AlertRule rule = ruleWith(condition, Map.of(), created).withEvalState(Map.of()); // no cursor
+        EvalResult r = eval.evaluate(condition, rule,
+                new EvalContext("default", after.plusSeconds(10), new TickCache()));
+
+        SearchRequest req = cap.getValue();
+        assertThat(req.timeFrom()).isEqualTo(created);
+        assertThat(((EvalResult.Batch) r).firings()).hasSize(1);
+    }
+
+    @Test
+    void firstRun_clampsCursorToDeployBacklogCap_whenRuleCreatedLongAgo() {
+        // Rule created 7 days ago, cap default is 24h; expect timeFrom to be now - 24h, not rule.createdAt.
+        Instant now = Instant.parse("2026-04-22T12:00:00Z");
+        Instant createdLongAgo = now.minus(Duration.ofDays(7));
+        Instant expectedClampFloor = now.minusSeconds(86_400); // 24h
+
+        ArgumentCaptor<SearchRequest> cap = ArgumentCaptor.forClass(SearchRequest.class);
+        when(searchIndex.search(cap.capture())).thenReturn(new SearchResult<>(List.of(), 0L, 0, 50));
+
+        ExchangeMatchCondition condition = perExchangeCondition();
+        AlertRule rule = ruleWith(condition, Map.of(), createdLongAgo);
+        eval.evaluate(condition, rule, new EvalContext("default", now, new TickCache()));
+
+        SearchRequest req = cap.getValue();
+        assertThat(req.timeFrom()).isEqualTo(expectedClampFloor);
+    }
+
+    @Test
+    void firstRun_usesCreatedAt_whenWithinDeployBacklogCap() {
+        Instant now = Instant.parse("2026-04-22T12:00:00Z");
+        Instant createdRecent = now.minus(Duration.ofHours(1)); // 1h < 24h cap
+
+        ArgumentCaptor<SearchRequest> cap = ArgumentCaptor.forClass(SearchRequest.class);
+        when(searchIndex.search(cap.capture())).thenReturn(new SearchResult<>(List.of(), 0L, 0, 50));
+
+        ExchangeMatchCondition condition = perExchangeCondition();
+        AlertRule rule = ruleWith(condition, Map.of(), createdRecent);
+        eval.evaluate(condition, rule, new EvalContext("default", now, new TickCache()));
+
+        assertThat(cap.getValue().timeFrom()).isEqualTo(createdRecent);
    }

    @Test
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/LogPatternEvaluatorTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/LogPatternEvaluatorTest.java
@@ -35,7 +35,7 @@ class LogPatternEvaluatorTest {
        envRepo  = mock(EnvironmentRepository.class);
        eval = new LogPatternEvaluator(logStore, envRepo);

-        var env = new Environment(ENV_ID, "prod", "Production", false, true, null, null, null);
+        var env = new Environment(ENV_ID, "prod", "Production", false, true, null, null, "slate", null);
        when(envRepo.findById(ENV_ID)).thenReturn(Optional.of(env));
    }

--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/RouteMetricEvaluatorTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/RouteMetricEvaluatorTest.java
@@ -36,7 +36,7 @@ class RouteMetricEvaluatorTest {
        envRepo    = mock(EnvironmentRepository.class);
        eval = new RouteMetricEvaluator(statsStore, envRepo);

-        var env = new Environment(ENV_ID, "prod", "Production", false, true, null, null, null);
+        var env = new Environment(ENV_ID, "prod", "Production", false, true, null, null, "slate", null);
        when(envRepo.findById(ENV_ID)).thenReturn(Optional.of(env));
    }

--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java
@@ -28,7 +28,7 @@ class NotificationContextBuilderTest {
    // ---- helpers ----

    private Environment env() {
-        return new Environment(ENV_ID, "prod", "Production", true, true, Map.of(), 5, Instant.EPOCH);
+        return new Environment(ENV_ID, "prod", "Production", true, true, Map.of(), 5, "slate", Instant.EPOCH);
    }

    private AlertRule rule(ConditionKind kind) {
@@ -39,7 +39,7 @@ class NotificationContextBuilderTest {
            case EXCHANGE_MATCH   -> new ExchangeMatchCondition(
                                         new AlertScope("my-app", "route-1", null),
                                         new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
-                                         FireMode.PER_EXCHANGE, null, null, 30);
+                                         FireMode.PER_EXCHANGE, null, null);
            case AGENT_STATE      -> new AgentStateCondition(
                                         new AlertScope(null, null, null),
                                         "DEAD", 0);
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/WebhookDispatcherIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/WebhookDispatcherIT.java
@@ -50,7 +50,7 @@ class WebhookDispatcherIT {
                new ApacheOutboundHttpClientFactory(props, new SslContextBuilder()),
                cipher,
                new MustacheRenderer(),
-                new AlertingProperties(null, null, null, null, null, null, null, null, null, null, null, null, null),
+                new AlertingProperties(null, null, null, null, null, null, null, null, null, null, null, null, null, null),
                new ObjectMapper()
        );
    }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/retention/AlertingRetentionJobIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/retention/AlertingRetentionJobIT.java
@@ -167,7 +167,7 @@ class AlertingRetentionJobIT extends AbstractPostgresIT {
            // effectiveEventRetentionDays = 90, effectiveNotificationRetentionDays = 30
            new com.cameleer.server.app.alerting.config.AlertingProperties(
                null, null, null, null, null, null, null, null, null,
-                90, 30, null, null),
+                90, 30, null, null, null),
            instanceRepo,
            notificationRepo,
            fixedClock);
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/SchemaBootstrapIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/SchemaBootstrapIT.java
@@ -52,10 +52,14 @@ class SchemaBootstrapIT extends AbstractPostgresIT {

    @Test
    void alerting_enums_exist() {
+        // Scope to current schema's namespace — Testcontainers reuse can otherwise
+        // expose enums from a previous run's tenant_default schema alongside public.
        var enums = jdbcTemplate.queryForList("""
-            SELECT typname FROM pg_type
-             WHERE typname IN ('severity_enum','condition_kind_enum','alert_state_enum',
-                               'target_kind_enum','notification_status_enum')
+            SELECT t.typname FROM pg_type t
+              JOIN pg_namespace n ON n.oid = t.typnamespace
+             WHERE t.typname IN ('severity_enum','condition_kind_enum','alert_state_enum',
+                                 'target_kind_enum','notification_status_enum')
+               AND n.nspname = current_schema()
            """, String.class);
        assertThat(enums).containsExactlyInAnyOrder(
            "severity_enum", "condition_kind_enum", "alert_state_enum",
@@ -86,6 +90,7 @@ class SchemaBootstrapIT extends AbstractPostgresIT {
            SELECT column_name FROM information_schema.columns
             WHERE table_name = 'alert_instances'
               AND column_name IN ('read_at','deleted_at')
+               AND table_schema = current_schema()
            """, String.class);
        assertThat(cols).containsExactlyInAnyOrder("read_at", "deleted_at");
    }
@@ -96,13 +101,16 @@ class SchemaBootstrapIT extends AbstractPostgresIT {
            SELECT COUNT(*)::int FROM pg_indexes
             WHERE indexname = 'alert_instances_open_rule_uq'
               AND tablename = 'alert_instances'
+               AND schemaname = current_schema()
            """, Integer.class);
        assertThat(count).isEqualTo(1);

        Boolean isUnique = jdbcTemplate.queryForObject("""
            SELECT indisunique FROM pg_index
-              JOIN pg_class ON pg_class.oid = pg_index.indexrelid
-             WHERE pg_class.relname = 'alert_instances_open_rule_uq'
+              JOIN pg_class c ON c.oid = pg_index.indexrelid
+              JOIN pg_namespace n ON n.oid = c.relnamespace
+             WHERE c.relname = 'alert_instances_open_rule_uq'
+               AND n.nspname = current_schema()
            """, Boolean.class);
        assertThat(isUnique).isTrue();
    }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/AppDirtyStateIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/AppDirtyStateIT.java
@@ -0,0 +1,239 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.app.TestSecurityHelper;
+import com.cameleer.server.app.dto.DirtyStateResponse;
+import com.cameleer.server.app.storage.PostgresDeploymentRepository;
+import com.cameleer.server.core.runtime.ContainerStatus;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentStatus;
+import com.cameleer.server.core.runtime.RuntimeOrchestrator;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.boot.test.mock.mockito.MockBean;
+import org.springframework.boot.test.web.client.TestRestTemplate;
+import org.springframework.core.io.ByteArrayResource;
+import org.springframework.http.HttpEntity;
+import org.springframework.http.HttpHeaders;
+import org.springframework.http.HttpMethod;
+import org.springframework.http.MediaType;
+import org.springframework.test.annotation.DirtiesContext;
+import org.springframework.util.LinkedMultiValueMap;
+import org.springframework.util.MultiValueMap;
+
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.awaitility.Awaitility.await;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.Mockito.when;
+
+/**
+ * Integration tests for GET /api/v1/environments/{envSlug}/apps/{appSlug}/dirty-state.
+ *
+ * <p>Uses @MockBean RuntimeOrchestrator (same pattern as DeploymentSnapshotIT).
+ * @DirtiesContext prevents context cache conflicts when both IT classes are loaded together.</p>
+ */
+@DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_CLASS)
+class AppDirtyStateIT extends AbstractPostgresIT {
+
+    @MockBean
+    RuntimeOrchestrator runtimeOrchestrator;
+
+    @Autowired
+    private TestRestTemplate restTemplate;
+
+    @Autowired
+    private ObjectMapper objectMapper;
+
+    @Autowired
+    private TestSecurityHelper securityHelper;
+
+    @Autowired
+    private PostgresDeploymentRepository deploymentRepository;
+
+    private String operatorJwt;
+
+    @BeforeEach
+    void setUp() {
+        operatorJwt = securityHelper.operatorToken();
+        jdbcTemplate.update("DELETE FROM deployments");
+        jdbcTemplate.update("DELETE FROM app_versions");
+        jdbcTemplate.update("DELETE FROM apps");
+        jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
+
+        // Ensure test-operator exists in users table (required for deployments.created_by FK)
+        jdbcTemplate.update(
+                "INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
+    }
+
+    // -----------------------------------------------------------------------
+    // Test 1: no deployment ever → dirty=true, lastSuccessfulDeploymentId=null
+    // -----------------------------------------------------------------------
+
+    @Test
+    void dirtyState_noDeployEver_returnsDirtyTrue() throws Exception {
+        String appSlug = "ds-nodeploy-" + UUID.randomUUID().toString().substring(0, 8);
+        post("/api/v1/environments/default/apps",
+                String.format("{\"slug\": \"%s\", \"displayName\": \"DS No Deploy\"}", appSlug),
+                operatorJwt);
+        uploadJar(appSlug, ("fake-jar-" + appSlug).getBytes());
+        put("/api/v1/environments/default/apps/" + appSlug + "/config",
+                "{\"samplingRate\": 0.5}", operatorJwt);
+
+        DirtyStateResponse body = getDirtyState("default", appSlug);
+
+        assertThat(body.dirty()).isTrue();
+        assertThat(body.lastSuccessfulDeploymentId()).isNull();
+    }
+
+    // -----------------------------------------------------------------------
+    // Test 2: after a successful deploy with matching desired state → dirty=false
+    // -----------------------------------------------------------------------
+
+    @Test
+    void dirtyState_afterSuccessfulDeploy_matchingDesiredState_returnsDirtyFalse() throws Exception {
+        String fakeContainerId = "fake-cid-" + UUID.randomUUID();
+        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
+        when(runtimeOrchestrator.startContainer(any())).thenReturn(fakeContainerId);
+        when(runtimeOrchestrator.getContainerStatus(fakeContainerId))
+                .thenReturn(new ContainerStatus("healthy", true, 0, null));
+
+        String appSlug = "ds-clean-" + UUID.randomUUID().toString().substring(0, 8);
+        post("/api/v1/environments/default/apps",
+                String.format("{\"slug\": \"%s\", \"displayName\": \"DS Clean\"}", appSlug),
+                operatorJwt);
+        put("/api/v1/environments/default/apps/" + appSlug + "/container-config",
+                "{\"runtimeType\": \"spring-boot\", \"appPort\": 8081}", operatorJwt);
+        String versionId = uploadJar(appSlug, ("fake-jar-clean-" + appSlug).getBytes());
+        put("/api/v1/environments/default/apps/" + appSlug + "/config",
+                "{\"samplingRate\": 0.25}", operatorJwt);
+
+        // Deploy and wait for RUNNING
+        JsonNode deploy = post(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
+                String.format("{\"appVersionId\": \"%s\"}", versionId),
+                operatorJwt);
+        String deploymentId = deploy.path("id").asText();
+
+        await().atMost(30, TimeUnit.SECONDS).pollInterval(500, TimeUnit.MILLISECONDS)
+                .untilAsserted(() -> {
+                    Deployment d = deploymentRepository.findById(UUID.fromString(deploymentId))
+                            .orElseThrow(() -> new AssertionError("Deployment not found"));
+                    assertThat(d.status()).isEqualTo(DeploymentStatus.RUNNING);
+                });
+
+        // Desired state matches what was deployed → dirty=false
+        DirtyStateResponse body = getDirtyState("default", appSlug);
+
+        assertThat(body.dirty()).isFalse();
+        assertThat(body.differences()).isEmpty();
+        assertThat(body.lastSuccessfulDeploymentId()).isEqualTo(deploymentId);
+    }
+
+    // -----------------------------------------------------------------------
+    // Test 3: after successful deploy, config changed → dirty=true
+    // -----------------------------------------------------------------------
+
+    @Test
+    void dirtyState_afterSuccessfulDeploy_configChanged_returnsDirtyTrue() throws Exception {
+        String fakeContainerId = "fake-cid2-" + UUID.randomUUID();
+        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
+        when(runtimeOrchestrator.startContainer(any())).thenReturn(fakeContainerId);
+        when(runtimeOrchestrator.getContainerStatus(fakeContainerId))
+                .thenReturn(new ContainerStatus("healthy", true, 0, null));
+
+        String appSlug = "ds-dirty-" + UUID.randomUUID().toString().substring(0, 8);
+        post("/api/v1/environments/default/apps",
+                String.format("{\"slug\": \"%s\", \"displayName\": \"DS Dirty\"}", appSlug),
+                operatorJwt);
+        put("/api/v1/environments/default/apps/" + appSlug + "/container-config",
+                "{\"runtimeType\": \"spring-boot\", \"appPort\": 8081}", operatorJwt);
+        String versionId = uploadJar(appSlug, ("fake-jar-dirty-" + appSlug).getBytes());
+        put("/api/v1/environments/default/apps/" + appSlug + "/config",
+                "{\"samplingRate\": 0.1}", operatorJwt);
+
+        // Deploy and wait for RUNNING
+        JsonNode deploy = post(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
+                String.format("{\"appVersionId\": \"%s\"}", versionId),
+                operatorJwt);
+        String deploymentId = deploy.path("id").asText();
+
+        await().atMost(30, TimeUnit.SECONDS).pollInterval(500, TimeUnit.MILLISECONDS)
+                .untilAsserted(() -> {
+                    Deployment d = deploymentRepository.findById(UUID.fromString(deploymentId))
+                            .orElseThrow(() -> new AssertionError("Deployment not found"));
+                    assertThat(d.status()).isEqualTo(DeploymentStatus.RUNNING);
+                });
+
+        // Change samplingRate after deploy
+        put("/api/v1/environments/default/apps/" + appSlug + "/config",
+                "{\"samplingRate\": 0.9}", operatorJwt);
+
+        // Now desired state differs from snapshot → dirty=true
+        DirtyStateResponse body = getDirtyState("default", appSlug);
+
+        assertThat(body.dirty()).isTrue();
+        assertThat(body.lastSuccessfulDeploymentId()).isEqualTo(deploymentId);
+        assertThat(body.differences()).isNotEmpty();
+        assertThat(body.differences())
+                .anyMatch(d -> d.field().contains("samplingRate"));
+    }
+
+    // -----------------------------------------------------------------------
+    // Helpers
+    // -----------------------------------------------------------------------
+
+    private DirtyStateResponse getDirtyState(String envSlug, String appSlug) {
+        HttpHeaders headers = securityHelper.authHeaders(operatorJwt);
+        var response = restTemplate.exchange(
+                "/api/v1/environments/" + envSlug + "/apps/" + appSlug + "/dirty-state",
+                HttpMethod.GET,
+                new HttpEntity<>(headers),
+                DirtyStateResponse.class);
+        assertThat(response.getStatusCode().value()).isEqualTo(200);
+        return response.getBody();
+    }
+
+    private JsonNode post(String path, String json, String jwt) throws Exception {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        var response = restTemplate.exchange(
+                path, HttpMethod.POST,
+                new HttpEntity<>(json, headers),
+                String.class);
+        return objectMapper.readTree(response.getBody());
+    }
+
+    private void put(String path, String json, String jwt) {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        restTemplate.exchange(path, HttpMethod.PUT, new HttpEntity<>(json, headers), String.class);
+    }
+
+    private String uploadJar(String appSlug, byte[] content) throws Exception {
+        ByteArrayResource resource = new ByteArrayResource(content) {
+            @Override
+            public String getFilename() { return "app.jar"; }
+        };
+        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
+        body.add("file", resource);
+
+        HttpHeaders headers = new HttpHeaders();
+        headers.set("Authorization", "Bearer " + operatorJwt);
+        headers.set("X-Cameleer-Protocol-Version", "1");
+        headers.setContentType(MediaType.MULTIPART_FORM_DATA);
+
+        var response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/versions",
+                HttpMethod.POST,
+                new HttpEntity<>(body, headers),
+                String.class);
+
+        JsonNode versionNode = objectMapper.readTree(response.getBody());
+        return versionNode.path("id").asText();
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/ApplicationConfigControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/ApplicationConfigControllerIT.java
@@ -0,0 +1,200 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.common.model.ApplicationConfig;
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.app.TestSecurityHelper;
+import com.cameleer.server.app.storage.PostgresApplicationConfigRepository;
+import com.cameleer.server.core.agent.AgentRegistryService;
+import com.cameleer.server.core.agent.CommandType;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.boot.test.mock.mockito.SpyBean;
+import org.springframework.boot.test.web.client.TestRestTemplate;
+import org.springframework.http.HttpEntity;
+import org.springframework.http.HttpMethod;
+import org.springframework.http.HttpStatus;
+import org.springframework.http.ResponseEntity;
+import org.springframework.test.annotation.DirtiesContext;
+import org.springframework.test.annotation.DirtiesContext.ClassMode;
+
+import java.util.List;
+import java.util.UUID;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.ArgumentMatchers.eq;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.verify;
+
+@DirtiesContext(classMode = ClassMode.AFTER_CLASS)
+class ApplicationConfigControllerIT extends AbstractPostgresIT {
+
+    /**
+     * Spy on the real AgentRegistryService bean so we can verify whether
+     * addGroupCommandWithReplies was invoked (live) or skipped (staged).
+     */
+    @SpyBean
+    AgentRegistryService registryService;
+
+    @Autowired private TestRestTemplate restTemplate;
+    @Autowired private TestSecurityHelper securityHelper;
+    @Autowired private PostgresApplicationConfigRepository configRepository;
+
+    private String operatorJwt;
+    /** Unique env slug per test to avoid cross-test pollution. */
+    private String envSlug;
+    private UUID envId;
+    /** Unique app slug per test run to avoid cross-test row collisions. */
+    private String appSlug;
+
+    @BeforeEach
+    void setUp() {
+        operatorJwt = securityHelper.operatorToken();
+        envSlug  = "cfg-it-" + UUID.randomUUID().toString().substring(0, 8);
+        envId    = UUID.randomUUID();
+        appSlug  = "paygw-" + UUID.randomUUID().toString().substring(0, 8);
+
+        jdbcTemplate.update(
+                "INSERT INTO environments (id, slug, display_name) VALUES (?, ?, ?) ON CONFLICT (id) DO NOTHING",
+                envId, envSlug, envSlug);
+    }
+
+    @AfterEach
+    void cleanUp() {
+        jdbcTemplate.update("DELETE FROM application_config WHERE environment = ?", envSlug);
+        jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
+    }
+
+    // ── helpers ──────────────────────────────────────────────────────────────
+
+    private void registerLiveAgent(String agentId) {
+        // Use the bootstrap HTTP endpoint — same pattern as AgentCommandControllerIT.
+        String body = """
+                {
+                  "instanceId": "%s",
+                  "applicationId": "%s",
+                  "environmentId": "%s",
+                  "version": "1.0.0",
+                  "routeIds": ["route-1"],
+                  "capabilities": {}
+                }
+                """.formatted(agentId, appSlug, envSlug);
+        restTemplate.postForEntity(
+                "/api/v1/agents/register",
+                new HttpEntity<>(body, securityHelper.bootstrapHeaders()),
+                String.class);
+    }
+
+    private ResponseEntity<String> putConfig(String apply) {
+        String url = "/api/v1/environments/" + envSlug + "/apps/" + appSlug + "/config"
+                + (apply != null ? "?apply=" + apply : "");
+        String body = """
+                {"samplingRate": 0.1, "metricsEnabled": true}
+                """;
+        return restTemplate.exchange(url, HttpMethod.PUT,
+                new HttpEntity<>(body, securityHelper.authHeaders(operatorJwt)), String.class);
+    }
+
+    // ── tests ─────────────────────────────────────────────────────────────────
+
+    @Test
+    void putConfig_staged_savesButDoesNotPush() {
+        // Given — one LIVE agent registered for (appSlug, envSlug)
+        String agentId = "staged-agent-" + UUID.randomUUID().toString().substring(0, 8);
+        registerLiveAgent(agentId);
+
+        // When — PUT with apply=staged
+        ResponseEntity<String> response = putConfig("staged");
+
+        // Then — HTTP 200
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        // And — DB has the new config
+        ApplicationConfig saved = configRepository
+                .findByApplicationAndEnvironment(appSlug, envSlug)
+                .orElseThrow(() -> new AssertionError("Config not found in DB"));
+        assertThat(saved.getSamplingRate()).isEqualTo(0.1);
+
+        // And — NO CONFIG_UPDATE was pushed to any agent
+        verify(registryService, never())
+                .addGroupCommandWithReplies(eq(appSlug), eq(envSlug), eq(CommandType.CONFIG_UPDATE), any());
+    }
+
+    @Test
+    void putConfig_live_savesAndPushes() {
+        // Given — one LIVE agent registered for (appSlug, envSlug)
+        String agentId = "live-agent-" + UUID.randomUUID().toString().substring(0, 8);
+        registerLiveAgent(agentId);
+
+        // When — PUT without apply param (default is live)
+        ResponseEntity<String> response = putConfig(null);
+
+        // Then — HTTP 200
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        // And — DB has the new config
+        ApplicationConfig saved = configRepository
+                .findByApplicationAndEnvironment(appSlug, envSlug)
+                .orElseThrow(() -> new AssertionError("Config not found in DB"));
+        assertThat(saved.getSamplingRate()).isEqualTo(0.1);
+
+        // And — CONFIG_UPDATE was pushed (addGroupCommandWithReplies called once)
+        verify(registryService)
+                .addGroupCommandWithReplies(eq(appSlug), eq(envSlug), eq(CommandType.CONFIG_UPDATE), any());
+    }
+
+    @Test
+    void putConfig_liveExplicit_savesAndPushes() {
+        // Same as above but with explicit apply=live
+        String agentId = "live-explicit-" + UUID.randomUUID().toString().substring(0, 8);
+        registerLiveAgent(agentId);
+
+        ResponseEntity<String> response = putConfig("live");
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+        verify(registryService)
+                .addGroupCommandWithReplies(eq(appSlug), eq(envSlug), eq(CommandType.CONFIG_UPDATE), any());
+    }
+
+    @Test
+    void putConfig_unknownApplyValue_returns400() {
+        ResponseEntity<String> response = putConfig("BOGUS");
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+
+        int auditCount = jdbcTemplate.queryForObject(
+                "SELECT COUNT(*) FROM audit_log WHERE target = ?", Integer.class, appSlug);
+        assertThat(auditCount).isZero();
+    }
+
+    @Test
+    void putConfig_staged_auditActionIsStagedAppConfig() {
+        registerLiveAgent("audit-agent-" + UUID.randomUUID().toString().substring(0, 8));
+
+        ResponseEntity<String> response = putConfig("staged");
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        List<String> actions = jdbcTemplate.queryForList(
+                "SELECT action FROM audit_log WHERE target = ? ORDER BY timestamp DESC",
+                String.class, appSlug);
+        assertThat(actions).hasSize(1);
+        assertThat(actions.get(0)).isEqualTo("stage_app_config");
+    }
+
+    @Test
+    void putConfig_live_auditActionIsUpdateAppConfig() {
+        registerLiveAgent("audit-agent-live-" + UUID.randomUUID().toString().substring(0, 8));
+
+        ResponseEntity<String> response = putConfig(null);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        List<String> actions = jdbcTemplate.queryForList(
+                "SELECT action FROM audit_log WHERE target = ? ORDER BY timestamp DESC",
+                String.class, appSlug);
+        assertThat(actions).hasSize(1);
+        assertThat(actions.get(0)).isEqualTo("update_app_config");
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DeploymentControllerAuditIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DeploymentControllerAuditIT.java
@@ -0,0 +1,253 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.app.TestSecurityHelper;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.boot.test.web.client.TestRestTemplate;
+import org.springframework.core.io.ByteArrayResource;
+import org.springframework.http.HttpEntity;
+import org.springframework.http.HttpHeaders;
+import org.springframework.http.HttpMethod;
+import org.springframework.http.HttpStatus;
+import org.springframework.http.MediaType;
+import org.springframework.http.ResponseEntity;
+import org.springframework.util.LinkedMultiValueMap;
+import org.springframework.util.MultiValueMap;
+
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+class DeploymentControllerAuditIT extends AbstractPostgresIT {
+
+    @Autowired
+    private TestRestTemplate restTemplate;
+
+    @Autowired
+    private ObjectMapper objectMapper;
+
+    @Autowired
+    private TestSecurityHelper securityHelper;
+
+    private String aliceJwt;
+    private String adminJwt;
+    private String appSlug;
+    private String versionId;
+
+    @BeforeEach
+    void setUp() throws Exception {
+        // Mint JWT for alice (OPERATOR) — subject must start with "user:" for JwtAuthenticationFilter
+        aliceJwt = securityHelper.createToken("user:alice", "user", List.of("OPERATOR"));
+        adminJwt = securityHelper.adminToken();
+
+        // Clean up deployment-related tables and test-created environments
+        jdbcTemplate.update("DELETE FROM deployments");
+        jdbcTemplate.update("DELETE FROM app_versions");
+        jdbcTemplate.update("DELETE FROM apps");
+        jdbcTemplate.update("DELETE FROM environments WHERE slug LIKE 'promote-target-%'");
+        jdbcTemplate.update("DELETE FROM audit_log");
+
+        // Ensure alice exists in the users table (required for deployments.created_by FK)
+        jdbcTemplate.update(
+                "INSERT INTO users (user_id, provider, display_name) VALUES ('alice', 'local', 'Alice Test') ON CONFLICT (user_id) DO NOTHING");
+
+        // Create app in the seeded "default" environment
+        appSlug = "audit-test-" + UUID.randomUUID().toString().substring(0, 8);
+        String appJson = String.format("""
+                {"slug": "%s", "displayName": "Audit Test App"}
+                """, appSlug);
+        ResponseEntity<String> appResponse = restTemplate.exchange(
+                "/api/v1/environments/default/apps", HttpMethod.POST,
+                new HttpEntity<>(appJson, authHeaders(aliceJwt)),
+                String.class);
+        assertThat(appResponse.getStatusCode()).isEqualTo(HttpStatus.CREATED);
+
+        // Upload a JAR version
+        byte[] jarContent = "fake-jar-for-audit-test".getBytes();
+        ByteArrayResource resource = new ByteArrayResource(jarContent) {
+            @Override
+            public String getFilename() {
+                return "audit-test.jar";
+            }
+        };
+        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
+        body.add("file", resource);
+        HttpHeaders headers = new HttpHeaders();
+        headers.set("Authorization", "Bearer " + aliceJwt);
+        headers.set("X-Cameleer-Protocol-Version", "1");
+        headers.setContentType(MediaType.MULTIPART_FORM_DATA);
+        ResponseEntity<String> versionResponse = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/versions", HttpMethod.POST,
+                new HttpEntity<>(body, headers),
+                String.class);
+        assertThat(versionResponse.getStatusCode().is2xxSuccessful()).isTrue();
+        versionId = objectMapper.readTree(versionResponse.getBody()).path("id").asText();
+    }
+
+    @Test
+    void deploy_writes_audit_row_with_DEPLOYMENT_category_and_alice_actor() throws Exception {
+        String json = String.format("""
+                {"appVersionId": "%s"}
+                """, versionId);
+
+        ResponseEntity<String> response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
+                new HttpEntity<>(json, authHeaders(aliceJwt)),
+                String.class);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
+
+        Map<String, Object> row = queryAuditRow("deploy_app");
+        assertThat(row).isNotNull();
+        assertThat(row.get("username")).isEqualTo("alice");
+        assertThat(row.get("action")).isEqualTo("deploy_app");
+        assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
+        assertThat(row.get("result")).isEqualTo("SUCCESS");
+        assertThat(row.get("target")).isNotNull();
+        assertThat(row.get("target").toString()).isNotBlank();
+    }
+
+    @Test
+    void stop_writes_audit_row() throws Exception {
+        // First deploy
+        String deployJson = String.format("""
+                {"appVersionId": "%s"}
+                """, versionId);
+        ResponseEntity<String> deployResponse = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
+                new HttpEntity<>(deployJson, authHeaders(aliceJwt)),
+                String.class);
+        assertThat(deployResponse.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
+        String deploymentId = objectMapper.readTree(deployResponse.getBody()).path("id").asText();
+
+        // Clear audit log to isolate stop audit row
+        jdbcTemplate.update("DELETE FROM audit_log");
+
+        // Stop the deployment
+        ResponseEntity<String> stopResponse = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments/" + deploymentId + "/stop",
+                HttpMethod.POST,
+                new HttpEntity<>(authHeadersNoBody(aliceJwt)),
+                String.class);
+        assertThat(stopResponse.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        Map<String, Object> row = queryAuditRow("stop_deployment");
+        assertThat(row).isNotNull();
+        assertThat(row.get("username")).isEqualTo("alice");
+        assertThat(row.get("action")).isEqualTo("stop_deployment");
+        assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
+        assertThat(row.get("result")).isEqualTo("SUCCESS");
+        assertThat(row.get("target").toString()).isEqualTo(deploymentId);
+    }
+
+    @Test
+    void promote_writes_audit_row() throws Exception {
+        // Create a second environment for promotion target
+        String targetEnvSlug = "promote-target-" + UUID.randomUUID().toString().substring(0, 8);
+        String envJson = String.format("""
+                {"slug": "%s", "displayName": "Promote Target Env"}
+                """, targetEnvSlug);
+        ResponseEntity<String> envResponse = restTemplate.exchange(
+                "/api/v1/admin/environments", HttpMethod.POST,
+                new HttpEntity<>(envJson, authHeaders(adminJwt)),
+                String.class);
+        assertThat(envResponse.getStatusCode()).isEqualTo(HttpStatus.CREATED);
+
+        // Create the same app slug in the target environment
+        String appJson = String.format("""
+                {"slug": "%s", "displayName": "Audit Test App (target)"}
+                """, appSlug);
+        ResponseEntity<String> targetAppResponse = restTemplate.exchange(
+                "/api/v1/environments/" + targetEnvSlug + "/apps", HttpMethod.POST,
+                new HttpEntity<>(appJson, authHeaders(aliceJwt)),
+                String.class);
+        assertThat(targetAppResponse.getStatusCode()).isEqualTo(HttpStatus.CREATED);
+
+        // Deploy in source (default) env
+        String deployJson = String.format("""
+                {"appVersionId": "%s"}
+                """, versionId);
+        ResponseEntity<String> deployResponse = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
+                new HttpEntity<>(deployJson, authHeaders(aliceJwt)),
+                String.class);
+        assertThat(deployResponse.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
+        String deploymentId = objectMapper.readTree(deployResponse.getBody()).path("id").asText();
+
+        // Clear audit log to isolate promote audit row
+        jdbcTemplate.update("DELETE FROM audit_log");
+
+        // Promote to target env
+        String promoteJson = String.format("""
+                {"targetEnvironment": "%s"}
+                """, targetEnvSlug);
+        ResponseEntity<String> promoteResponse = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments/" + deploymentId + "/promote",
+                HttpMethod.POST,
+                new HttpEntity<>(promoteJson, authHeaders(aliceJwt)),
+                String.class);
+        assertThat(promoteResponse.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
+
+        Map<String, Object> row = queryAuditRow("promote_deployment");
+        assertThat(row).isNotNull();
+        assertThat(row.get("username")).isEqualTo("alice");
+        assertThat(row.get("action")).isEqualTo("promote_deployment");
+        assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
+        assertThat(row.get("result")).isEqualTo("SUCCESS");
+        assertThat(row.get("target")).isNotNull();
+        assertThat(row.get("target").toString()).isNotBlank();
+    }
+
+    @Test
+    void deploy_with_unknown_appVersion_writes_FAILURE_audit_row() throws Exception {
+        String unknownVersionId = UUID.randomUUID().toString();
+        String json = String.format("""
+                {"appVersionId": "%s"}
+                """, unknownVersionId);
+
+        ResponseEntity<String> response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments", HttpMethod.POST,
+                new HttpEntity<>(json, authHeaders(aliceJwt)),
+                String.class);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.NOT_FOUND);
+
+        Map<String, Object> row = queryAuditRow("deploy_app");
+        assertThat(row).isNotNull();
+        assertThat(row.get("username")).isEqualTo("alice");
+        assertThat(row.get("action")).isEqualTo("deploy_app");
+        assertThat(row.get("category")).isEqualTo("DEPLOYMENT");
+        assertThat(row.get("result")).isEqualTo("FAILURE");
+    }
+
+    // ---- helpers ----
+
+    private HttpHeaders authHeaders(String jwt) {
+        HttpHeaders headers = new HttpHeaders();
+        headers.set("Authorization", "Bearer " + jwt);
+        headers.set("X-Cameleer-Protocol-Version", "1");
+        headers.setContentType(MediaType.APPLICATION_JSON);
+        return headers;
+    }
+
+    private HttpHeaders authHeadersNoBody(String jwt) {
+        HttpHeaders headers = new HttpHeaders();
+        headers.set("Authorization", "Bearer " + jwt);
+        headers.set("X-Cameleer-Protocol-Version", "1");
+        return headers;
+    }
+
+    /** Query the most recent audit_log row for the given action. Returns null if not found. */
+    private Map<String, Object> queryAuditRow(String action) {
+        List<Map<String, Object>> rows = jdbcTemplate.queryForList(
+                "SELECT username, action, category, target, result FROM audit_log WHERE action = ? ORDER BY timestamp DESC LIMIT 1",
+                action);
+        return rows.isEmpty() ? null : rows.get(0);
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DeploymentControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DeploymentControllerIT.java
@@ -48,6 +48,10 @@ class DeploymentControllerIT extends AbstractPostgresIT {
        jdbcTemplate.update("DELETE FROM app_versions");
        jdbcTemplate.update("DELETE FROM apps");

+        // Ensure test-operator exists in users table (required for deployments.created_by FK)
+        jdbcTemplate.update(
+                "INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
+
        // Get default environment ID
        ResponseEntity<String> envResponse = restTemplate.exchange(
                "/api/v1/admin/environments", HttpMethod.GET,
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DiagramRenderControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/DiagramRenderControllerIT.java
@@ -166,6 +166,157 @@ class DiagramRenderControllerIT extends AbstractPostgresIT {
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.NOT_FOUND);
    }

+    @Test
+    void findByAppAndRoute_returnsLatestDiagram_noLiveAgentPrereq() {
+        // The env-scoped /routes/{routeId}/diagram endpoint no longer depends
+        // on the agent registry — routes whose publishing agents have been
+        // removed must still resolve. The seed step stored a diagram for
+        // route "render-test-route" under app "test-group" / env "default",
+        // so the same lookup must succeed even though the registry-driven
+        // "find agents for app" path used to be a hard 404 prerequisite.
+        HttpHeaders headers = securityHelper.authHeadersNoBody(viewerJwt);
+        headers.set("Accept", "application/json");
+
+        ResponseEntity<String> response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/test-group/routes/render-test-route/diagram",
+                HttpMethod.GET,
+                new HttpEntity<>(headers),
+                String.class);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+        assertThat(response.getBody()).contains("nodes");
+        assertThat(response.getBody()).contains("edges");
+    }
+
+    @Test
+    void findByAppAndRoute_returns404ForUnknownRoute() {
+        HttpHeaders headers = securityHelper.authHeadersNoBody(viewerJwt);
+        headers.set("Accept", "application/json");
+
+        ResponseEntity<String> response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/test-group/routes/nonexistent-route/diagram",
+                HttpMethod.GET,
+                new HttpEntity<>(headers),
+                String.class);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.NOT_FOUND);
+    }
+
+    @Test
+    void exchangeDiagramHash_pinsPointInTimeEvenAfterNewerVersion() throws Exception {
+        // Point-in-time guarantee: an execution's stored diagramContentHash
+        // must keep resolving to the route shape captured at execution time,
+        // even after a newer diagram version for the same route is stored.
+        // Content-hash addressing + never-delete of route_diagrams makes this
+        // automatic — this test locks the invariant in.
+        HttpHeaders viewerHeaders = securityHelper.authHeadersNoBody(viewerJwt);
+        viewerHeaders.set("Accept", "application/json");
+
+        // Snapshot the pinned v1 render via the flat content-hash endpoint
+        // BEFORE a newer version is stored, so the post-v2 fetch can compare
+        // byte-for-byte.
+        ResponseEntity<String> pinnedBefore = restTemplate.exchange(
+                "/api/v1/diagrams/{hash}/render",
+                HttpMethod.GET,
+                new HttpEntity<>(viewerHeaders),
+                String.class,
+                contentHash);
+        assertThat(pinnedBefore.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        // Also snapshot the by-route "latest" render for the same route.
+        ResponseEntity<String> latestBefore = restTemplate.exchange(
+                "/api/v1/environments/default/apps/test-group/routes/render-test-route/diagram",
+                HttpMethod.GET,
+                new HttpEntity<>(viewerHeaders),
+                String.class);
+        assertThat(latestBefore.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        // Store a materially different v2 for the same (app, env, route).
+        // The renderer walks the `root` tree (not the legacy flat `nodes`
+        // list that the seed payload uses), so v2 uses the tree shape and
+        // will render non-empty output — letting us detect the version flip.
+        String newerDiagramJson = """
+                {
+                  "routeId": "render-test-route",
+                  "description": "v2 with extra step",
+                  "version": 2,
+                  "root": {
+                    "id": "n1",
+                    "type": "ENDPOINT",
+                    "label": "timer:tick-v2",
+                    "children": [
+                      {
+                        "id": "n2",
+                        "type": "BEAN",
+                        "label": "myBeanV2",
+                        "children": [
+                          {
+                            "id": "n3",
+                            "type": "TO",
+                            "label": "log:out-v2",
+                            "children": [
+                              {"id": "n4", "type": "TO", "label": "log:audit"}
+                            ]
+                          }
+                        ]
+                      }
+                    ]
+                  },
+                  "edges": [
+                    {"source": "n1", "target": "n2", "edgeType": "FLOW"},
+                    {"source": "n2", "target": "n3", "edgeType": "FLOW"},
+                    {"source": "n3", "target": "n4", "edgeType": "FLOW"}
+                  ]
+                }
+                """;
+        restTemplate.postForEntity(
+                "/api/v1/data/diagrams",
+                new HttpEntity<>(newerDiagramJson, securityHelper.authHeaders(jwt)),
+                String.class);
+
+        // Invariant 1: The execution's stored diagramContentHash must not
+        // drift — exchanges stay pinned to the version captured at ingest.
+        ResponseEntity<String> detailAfter = restTemplate.exchange(
+                "/api/v1/environments/default/executions?correlationId=render-probe-corr",
+                HttpMethod.GET,
+                new HttpEntity<>(viewerHeaders),
+                String.class);
+        JsonNode search = objectMapper.readTree(detailAfter.getBody());
+        String execId = search.get("data").get(0).get("executionId").asText();
+        ResponseEntity<String> exec = restTemplate.exchange(
+                "/api/v1/executions/" + execId,
+                HttpMethod.GET,
+                new HttpEntity<>(viewerHeaders),
+                String.class);
+        JsonNode execBody = objectMapper.readTree(exec.getBody());
+        assertThat(execBody.path("diagramContentHash").asText()).isEqualTo(contentHash);
+
+        // Invariant 2: The pinned render (by H1) must be byte-identical
+        // before and after v2 is stored — content-hash addressing is stable.
+        ResponseEntity<String> pinnedAfter = restTemplate.exchange(
+                "/api/v1/diagrams/{hash}/render",
+                HttpMethod.GET,
+                new HttpEntity<>(viewerHeaders),
+                String.class,
+                contentHash);
+        assertThat(pinnedAfter.getStatusCode()).isEqualTo(HttpStatus.OK);
+        assertThat(pinnedAfter.getBody()).isEqualTo(pinnedBefore.getBody());
+
+        // Invariant 3: The by-route "latest" endpoint must now surface v2,
+        // so its body differs from the pre-v2 snapshot. Retry briefly to
+        // absorb the diagram-ingest flush path.
+        await().atMost(20, SECONDS).untilAsserted(() -> {
+            ResponseEntity<String> latestAfter = restTemplate.exchange(
+                    "/api/v1/environments/default/apps/test-group/routes/render-test-route/diagram",
+                    HttpMethod.GET,
+                    new HttpEntity<>(viewerHeaders),
+                    String.class);
+            assertThat(latestAfter.getStatusCode()).isEqualTo(HttpStatus.OK);
+            assertThat(latestAfter.getBody()).isNotEqualTo(latestBefore.getBody());
+            assertThat(latestAfter.getBody()).contains("myBeanV2");
+        });
+    }
+
    @Test
    void getWithNoAcceptHeader_defaultsToSvg() {
        HttpHeaders headers = securityHelper.authHeadersNoBody(viewerJwt);
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/EnvironmentAdminControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/EnvironmentAdminControllerIT.java
@@ -88,9 +88,80 @@ class EnvironmentAdminControllerIT extends AbstractPostgresIT {
        assertThat(body.path("displayName").asText()).isEqualTo("Staging");
        assertThat(body.path("production").asBoolean()).isFalse();
        assertThat(body.path("enabled").asBoolean()).isTrue();
+        assertThat(body.path("color").asText()).isEqualTo("slate");
        assertThat(body.has("id")).isTrue();
    }

+    @Test
+    void updateEnvironment_withValidColor_persists() throws Exception {
+        restTemplate.exchange(
+                "/api/v1/admin/environments", HttpMethod.POST,
+                new HttpEntity<>("""
+                        {"slug": "color-ok", "displayName": "Color OK", "production": false}
+                        """, securityHelper.authHeaders(adminJwt)),
+                String.class);
+
+        ResponseEntity<String> response = restTemplate.exchange(
+                "/api/v1/admin/environments/color-ok", HttpMethod.PUT,
+                new HttpEntity<>("""
+                        {"displayName": "Color OK", "production": false, "enabled": true, "color": "amber"}
+                        """, securityHelper.authHeaders(adminJwt)),
+                String.class);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+        JsonNode body = objectMapper.readTree(response.getBody());
+        assertThat(body.path("color").asText()).isEqualTo("amber");
+    }
+
+    @Test
+    void updateEnvironment_withNullColor_preservesExisting() throws Exception {
+        restTemplate.exchange(
+                "/api/v1/admin/environments", HttpMethod.POST,
+                new HttpEntity<>("""
+                        {"slug": "color-preserve", "displayName": "Keep", "production": false}
+                        """, securityHelper.authHeaders(adminJwt)),
+                String.class);
+        // Set color to teal
+        restTemplate.exchange(
+                "/api/v1/admin/environments/color-preserve", HttpMethod.PUT,
+                new HttpEntity<>("""
+                        {"displayName": "Keep", "production": false, "enabled": true, "color": "teal"}
+                        """, securityHelper.authHeaders(adminJwt)),
+                String.class);
+
+        // Update without color field → teal preserved
+        ResponseEntity<String> response = restTemplate.exchange(
+                "/api/v1/admin/environments/color-preserve", HttpMethod.PUT,
+                new HttpEntity<>("""
+                        {"displayName": "Still Keep", "production": false, "enabled": true}
+                        """, securityHelper.authHeaders(adminJwt)),
+                String.class);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+        JsonNode body = objectMapper.readTree(response.getBody());
+        assertThat(body.path("displayName").asText()).isEqualTo("Still Keep");
+        assertThat(body.path("color").asText()).isEqualTo("teal");
+    }
+
+    @Test
+    void updateEnvironment_withUnknownColor_returns400() throws Exception {
+        restTemplate.exchange(
+                "/api/v1/admin/environments", HttpMethod.POST,
+                new HttpEntity<>("""
+                        {"slug": "color-bad", "displayName": "Bad", "production": false}
+                        """, securityHelper.authHeaders(adminJwt)),
+                String.class);
+
+        ResponseEntity<String> response = restTemplate.exchange(
+                "/api/v1/admin/environments/color-bad", HttpMethod.PUT,
+                new HttpEntity<>("""
+                        {"displayName": "Bad", "production": false, "enabled": true, "color": "neon"}
+                        """, securityHelper.authHeaders(adminJwt)),
+                String.class);
+
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+    }
+
    @Test
    void updateEnvironment_asAdmin_returns200() throws Exception {
        // Create an environment first
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/SearchControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/SearchControllerIT.java
@@ -166,6 +166,42 @@ class SearchControllerIT extends AbstractPostgresIT {
                    """, i, i, i, i, i));
        }

+        // Executions 11-12: carry structured attributes used by the attribute-filter tests.
+        ingest("""
+                {
+                  "exchangeId": "ex-search-attr-1",
+                  "applicationId": "test-group",
+                  "instanceId": "test-agent-search-it",
+                  "routeId": "search-route-attr-1",
+                  "correlationId": "corr-attr-alpha",
+                  "status": "COMPLETED",
+                  "startTime": "2026-03-12T10:00:00Z",
+                  "endTime": "2026-03-12T10:00:00.050Z",
+                  "durationMs": 50,
+                  "attributes": {"order": "12345", "tenant": "acme"},
+                  "chunkSeq": 0,
+                  "final": true,
+                  "processors": []
+                }
+                """);
+        ingest("""
+                {
+                  "exchangeId": "ex-search-attr-2",
+                  "applicationId": "test-group",
+                  "instanceId": "test-agent-search-it",
+                  "routeId": "search-route-attr-2",
+                  "correlationId": "corr-attr-beta",
+                  "status": "COMPLETED",
+                  "startTime": "2026-03-12T10:01:00Z",
+                  "endTime": "2026-03-12T10:01:00.050Z",
+                  "durationMs": 50,
+                  "attributes": {"order": "99999"},
+                  "chunkSeq": 0,
+                  "final": true,
+                  "processors": []
+                }
+                """);
+
        // Wait for async ingestion + search indexing via REST (no raw SQL).
        // Probe the last seeded execution to avoid false positives from
        // other test classes that may have written into the shared CH tables.
@@ -174,6 +210,11 @@ class SearchControllerIT extends AbstractPostgresIT {
            JsonNode body = objectMapper.readTree(r.getBody());
            assertThat(body.get("total").asLong()).isGreaterThanOrEqualTo(1);
        });
+        await().atMost(30, SECONDS).untilAsserted(() -> {
+            ResponseEntity<String> r = searchGet("?correlationId=corr-attr-beta");
+            JsonNode body = objectMapper.readTree(r.getBody());
+            assertThat(body.get("total").asLong()).isGreaterThanOrEqualTo(1);
+        });
    }

    @Test
@@ -371,6 +412,69 @@ class SearchControllerIT extends AbstractPostgresIT {
        assertThat(body.get("limit").asInt()).isEqualTo(50);
    }

+    @Test
+    void attrParam_exactMatch_filtersToMatchingExecution() throws Exception {
+        ResponseEntity<String> response = searchGet("?attr=order:12345&correlationId=corr-attr-alpha");
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = objectMapper.readTree(response.getBody());
+        assertThat(body.get("total").asLong()).isEqualTo(1);
+        assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
+    }
+
+    @Test
+    void attrParam_keyOnly_matchesAnyExecutionCarryingTheKey() throws Exception {
+        ResponseEntity<String> response = searchGet("?attr=tenant&correlationId=corr-attr-alpha");
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = objectMapper.readTree(response.getBody());
+        assertThat(body.get("total").asLong()).isEqualTo(1);
+        assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
+    }
+
+    @Test
+    void attrParam_multipleValues_produceIntersection() throws Exception {
+        // order:99999 AND tenant=* should yield zero — exec-attr-2 has order=99999 but no tenant.
+        ResponseEntity<String> response = searchGet("?attr=order:99999&attr=tenant");
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = objectMapper.readTree(response.getBody());
+        assertThat(body.get("total").asLong()).isZero();
+    }
+
+    @Test
+    void attrParam_invalidKey_returns400() throws Exception {
+        ResponseEntity<String> response = searchGet("?attr=bad%20key:x");
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+    }
+
+    @Test
+    void attributeFilters_inPostBody_filtersCorrectly() throws Exception {
+        ResponseEntity<String> response = searchPost("""
+                {
+                  "attributeFilters": [
+                    {"key": "order", "value": "12345"}
+                  ],
+                  "correlationId": "corr-attr-alpha"
+                }
+                """);
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = objectMapper.readTree(response.getBody());
+        assertThat(body.get("total").asLong()).isEqualTo(1);
+        assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
+    }
+
+    @Test
+    void attrParam_wildcardValue_matchesOnPrefix() throws Exception {
+        ResponseEntity<String> response = searchGet("?attr=order:1*&correlationId=corr-attr-alpha");
+        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = objectMapper.readTree(response.getBody());
+        assertThat(body.get("total").asLong()).isEqualTo(1);
+        assertThat(body.get("data").get(0).get("correlationId").asText()).isEqualTo("corr-attr-alpha");
+    }
+
    // --- Helper methods ---

    private void ingest(String json) {
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/ServerMetricsAdminControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/controller/ServerMetricsAdminControllerIT.java
@@ -0,0 +1,314 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.app.TestSecurityHelper;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.boot.test.web.client.TestRestTemplate;
+import org.springframework.http.HttpEntity;
+import org.springframework.http.HttpHeaders;
+import org.springframework.http.HttpMethod;
+import org.springframework.http.HttpStatus;
+import org.springframework.http.ResponseEntity;
+
+import java.sql.Timestamp;
+import java.time.Instant;
+import java.util.Map;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+class ServerMetricsAdminControllerIT extends AbstractPostgresIT {
+
+    @Autowired
+    private TestRestTemplate restTemplate;
+
+    @Autowired
+    private TestSecurityHelper securityHelper;
+
+    private final ObjectMapper mapper = new ObjectMapper();
+
+    private HttpHeaders adminJson;
+    private HttpHeaders adminGet;
+    private HttpHeaders viewerGet;
+
+    @BeforeEach
+    void seedAndAuth() {
+        adminJson = securityHelper.adminHeaders();
+        adminGet  = securityHelper.authHeadersNoBody(securityHelper.adminToken());
+        viewerGet = securityHelper.authHeadersNoBody(securityHelper.viewerToken());
+
+        // Fresh rows for each test. The Spring-context ClickHouse JdbcTemplate
+        // lives in a different bean; reach for it here by executing through
+        // the same JdbcTemplate used by the store via the ClickHouseConfig bean.
+        org.springframework.jdbc.core.JdbcTemplate ch = clickhouseJdbc();
+        ch.execute("TRUNCATE TABLE server_metrics");
+
+        Instant t0 = Instant.parse("2026-04-23T10:00:00Z");
+        // Gauge: cameleer.agents.connected, two states, two buckets.
+        insert(ch, "default", t0, "srv-A", "cameleer.agents.connected", "gauge", "value", 3.0,
+                Map.of("state", "live"));
+        insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.agents.connected", "gauge", "value", 4.0,
+                Map.of("state", "live"));
+        insert(ch, "default", t0, "srv-A", "cameleer.agents.connected", "gauge", "value", 1.0,
+                Map.of("state", "stale"));
+        insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.agents.connected", "gauge", "value", 0.0,
+                Map.of("state", "stale"));
+
+        // Counter: cumulative drops, +5 per minute on srv-A.
+        insert(ch, "default", t0,                  "srv-A", "cameleer.ingestion.drops", "counter", "count",  0.0, Map.of("reason", "buffer_full"));
+        insert(ch, "default", t0.plusSeconds(60),  "srv-A", "cameleer.ingestion.drops", "counter", "count",  5.0, Map.of("reason", "buffer_full"));
+        insert(ch, "default", t0.plusSeconds(120), "srv-A", "cameleer.ingestion.drops", "counter", "count", 10.0, Map.of("reason", "buffer_full"));
+        // Simulated restart to srv-B: counter resets to 0, then climbs to 2.
+        insert(ch, "default", t0.plusSeconds(180), "srv-B", "cameleer.ingestion.drops", "counter", "count",  0.0, Map.of("reason", "buffer_full"));
+        insert(ch, "default", t0.plusSeconds(240), "srv-B", "cameleer.ingestion.drops", "counter", "count",  2.0, Map.of("reason", "buffer_full"));
+
+        // Timer mean inputs: two buckets, 2 samples each (count=2, total_time=30).
+        insert(ch, "default", t0,                 "srv-A", "cameleer.ingestion.flush.duration", "timer", "count",      2.0, Map.of("type", "execution"));
+        insert(ch, "default", t0,                 "srv-A", "cameleer.ingestion.flush.duration", "timer", "total_time", 30.0, Map.of("type", "execution"));
+        insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.ingestion.flush.duration", "timer", "count",      4.0, Map.of("type", "execution"));
+        insert(ch, "default", t0.plusSeconds(60), "srv-A", "cameleer.ingestion.flush.duration", "timer", "total_time", 100.0, Map.of("type", "execution"));
+    }
+
+    // ── catalog ─────────────────────────────────────────────────────────
+
+    @Test
+    void catalog_listsSeededMetricsWithStatisticsAndTagKeys() throws Exception {
+        ResponseEntity<String> r = restTemplate.exchange(
+                "/api/v1/admin/server-metrics/catalog?from=2026-04-23T09:00:00Z&to=2026-04-23T11:00:00Z",
+                HttpMethod.GET, new HttpEntity<>(adminGet), String.class);
+        assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = mapper.readTree(r.getBody());
+        assertThat(body.isArray()).isTrue();
+
+        JsonNode drops = findByField(body, "metricName", "cameleer.ingestion.drops");
+        assertThat(drops.get("metricType").asText()).isEqualTo("counter");
+        assertThat(asStringList(drops.get("statistics"))).contains("count");
+        assertThat(asStringList(drops.get("tagKeys"))).contains("reason");
+
+        JsonNode timer = findByField(body, "metricName", "cameleer.ingestion.flush.duration");
+        assertThat(asStringList(timer.get("statistics"))).contains("count", "total_time");
+    }
+
+    // ── instances ───────────────────────────────────────────────────────
+
+    @Test
+    void instances_listsDistinctServerInstanceIdsWithFirstAndLastSeen() throws Exception {
+        ResponseEntity<String> r = restTemplate.exchange(
+                "/api/v1/admin/server-metrics/instances?from=2026-04-23T09:00:00Z&to=2026-04-23T11:00:00Z",
+                HttpMethod.GET, new HttpEntity<>(adminGet), String.class);
+        assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = mapper.readTree(r.getBody());
+        assertThat(body.isArray()).isTrue();
+        assertThat(body.size()).isEqualTo(2);
+        // Ordered by last_seen DESC — srv-B saw a later row.
+        assertThat(body.get(0).get("serverInstanceId").asText()).isEqualTo("srv-B");
+        assertThat(body.get(1).get("serverInstanceId").asText()).isEqualTo("srv-A");
+    }
+
+    // ── query — gauge with group-by-tag ─────────────────────────────────
+
+    @Test
+    void query_gaugeWithGroupByTag_returnsSeriesPerTagValue() throws Exception {
+        String requestBody = """
+                {
+                  "metric": "cameleer.agents.connected",
+                  "statistic": "value",
+                  "from": "2026-04-23T09:59:00Z",
+                  "to":   "2026-04-23T10:02:00Z",
+                  "stepSeconds": 60,
+                  "groupByTags": ["state"],
+                  "aggregation": "avg",
+                  "mode": "raw"
+                }
+                """;
+
+        ResponseEntity<String> r = restTemplate.postForEntity(
+                "/api/v1/admin/server-metrics/query",
+                new HttpEntity<>(requestBody, adminJson), String.class);
+        assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = mapper.readTree(r.getBody());
+        assertThat(body.get("metric").asText()).isEqualTo("cameleer.agents.connected");
+        assertThat(body.get("statistic").asText()).isEqualTo("value");
+        assertThat(body.get("mode").asText()).isEqualTo("raw");
+        assertThat(body.get("stepSeconds").asInt()).isEqualTo(60);
+
+        JsonNode series = body.get("series");
+        assertThat(series.isArray()).isTrue();
+        assertThat(series.size()).isEqualTo(2);
+
+        JsonNode live = findByTag(series, "state", "live");
+        assertThat(live.get("points").size()).isEqualTo(2);
+        assertThat(live.get("points").get(0).get("v").asDouble()).isEqualTo(3.0);
+        assertThat(live.get("points").get(1).get("v").asDouble()).isEqualTo(4.0);
+    }
+
+    // ── query — counter delta across instance rotation ──────────────────
+
+    @Test
+    void query_counterDelta_clipsNegativesAcrossInstanceRotation() throws Exception {
+        String requestBody = """
+                {
+                  "metric": "cameleer.ingestion.drops",
+                  "statistic": "count",
+                  "from": "2026-04-23T09:59:00Z",
+                  "to":   "2026-04-23T10:05:00Z",
+                  "stepSeconds": 60,
+                  "groupByTags": ["reason"],
+                  "aggregation": "sum",
+                  "mode": "delta"
+                }
+                """;
+
+        ResponseEntity<String> r = restTemplate.postForEntity(
+                "/api/v1/admin/server-metrics/query",
+                new HttpEntity<>(requestBody, adminJson), String.class);
+        assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = mapper.readTree(r.getBody());
+        JsonNode reason = findByTag(body.get("series"), "reason", "buffer_full");
+        // Deltas: 0 (first bucket on srv-A), 5, 5, 0 (first on srv-B, clipped), 2.
+        // Sum across the window should be 12 if we tally all positive deltas.
+        double sum = 0;
+        for (JsonNode p : reason.get("points")) sum += p.get("v").asDouble();
+        assertThat(sum).isEqualTo(12.0);
+        // No individual point may be negative.
+        for (JsonNode p : reason.get("points")) {
+            assertThat(p.get("v").asDouble()).isGreaterThanOrEqualTo(0.0);
+        }
+    }
+
+    // ── query — derived 'mean' statistic for timers ─────────────────────
+
+    @Test
+    void query_timerMeanStatistic_computesTotalOverCountPerBucket() throws Exception {
+        String requestBody = """
+                {
+                  "metric": "cameleer.ingestion.flush.duration",
+                  "statistic": "mean",
+                  "from": "2026-04-23T09:59:00Z",
+                  "to":   "2026-04-23T10:02:00Z",
+                  "stepSeconds": 60,
+                  "groupByTags": ["type"],
+                  "aggregation": "avg",
+                  "mode": "raw"
+                }
+                """;
+
+        ResponseEntity<String> r = restTemplate.postForEntity(
+                "/api/v1/admin/server-metrics/query",
+                new HttpEntity<>(requestBody, adminJson), String.class);
+        assertThat(r.getStatusCode()).isEqualTo(HttpStatus.OK);
+
+        JsonNode body = mapper.readTree(r.getBody());
+        JsonNode points = findByTag(body.get("series"), "type", "execution").get("points");
+        // Bucket 0: 30 / 2 = 15.0
+        // Bucket 1: 100 / 4 = 25.0
+        assertThat(points.get(0).get("v").asDouble()).isEqualTo(15.0);
+        assertThat(points.get(1).get("v").asDouble()).isEqualTo(25.0);
+    }
+
+    // ── query — input validation ────────────────────────────────────────
+
+    @Test
+    void query_rejectsUnsafeMetricName() {
+        String requestBody = """
+                {
+                  "metric": "cameleer.agents; DROP TABLE server_metrics",
+                  "from": "2026-04-23T09:59:00Z",
+                  "to":   "2026-04-23T10:02:00Z"
+                }
+                """;
+
+        ResponseEntity<String> r = restTemplate.postForEntity(
+                "/api/v1/admin/server-metrics/query",
+                new HttpEntity<>(requestBody, adminJson), String.class);
+        assertThat(r.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+    }
+
+    @Test
+    void query_rejectsRangeBeyondMax() {
+        String requestBody = """
+                {
+                  "metric": "cameleer.agents.connected",
+                  "from": "2026-01-01T00:00:00Z",
+                  "to":   "2026-04-23T00:00:00Z"
+                }
+                """;
+
+        ResponseEntity<String> r = restTemplate.postForEntity(
+                "/api/v1/admin/server-metrics/query",
+                new HttpEntity<>(requestBody, adminJson), String.class);
+        assertThat(r.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
+    }
+
+    // ── authorization ───────────────────────────────────────────────────
+
+    @Test
+    void allEndpoints_requireAdminRole() {
+        ResponseEntity<String> catalog = restTemplate.exchange(
+                "/api/v1/admin/server-metrics/catalog",
+                HttpMethod.GET, new HttpEntity<>(viewerGet), String.class);
+        assertThat(catalog.getStatusCode()).isEqualTo(HttpStatus.FORBIDDEN);
+
+        ResponseEntity<String> instances = restTemplate.exchange(
+                "/api/v1/admin/server-metrics/instances",
+                HttpMethod.GET, new HttpEntity<>(viewerGet), String.class);
+        assertThat(instances.getStatusCode()).isEqualTo(HttpStatus.FORBIDDEN);
+
+        HttpHeaders viewerPost = securityHelper.authHeaders(securityHelper.viewerToken());
+        ResponseEntity<String> query = restTemplate.exchange(
+                "/api/v1/admin/server-metrics/query",
+                HttpMethod.POST, new HttpEntity<>("{}", viewerPost), String.class);
+        assertThat(query.getStatusCode()).isEqualTo(HttpStatus.FORBIDDEN);
+    }
+
+    // ── helpers ─────────────────────────────────────────────────────────
+
+    private org.springframework.jdbc.core.JdbcTemplate clickhouseJdbc() {
+        return org.springframework.test.util.AopTestUtils.getTargetObject(
+                applicationContext.getBean("clickHouseJdbcTemplate"));
+    }
+
+    @Autowired
+    private org.springframework.context.ApplicationContext applicationContext;
+
+    private static void insert(org.springframework.jdbc.core.JdbcTemplate jdbc,
+                               String tenantId, Instant collectedAt, String serverInstanceId,
+                               String metricName, String metricType, String statistic,
+                               double value, Map<String, String> tags) {
+        jdbc.update("""
+                INSERT INTO server_metrics
+                    (tenant_id, collected_at, server_instance_id,
+                     metric_name, metric_type, statistic, metric_value, tags)
+                VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+                """,
+                tenantId, Timestamp.from(collectedAt), serverInstanceId,
+                metricName, metricType, statistic, value, tags);
+    }
+
+    private static JsonNode findByField(JsonNode array, String field, String value) {
+        for (JsonNode n : array) {
+            if (value.equals(n.path(field).asText())) return n;
+        }
+        throw new AssertionError("no element with " + field + "=" + value);
+    }
+
+    private static JsonNode findByTag(JsonNode seriesArray, String tagKey, String tagValue) {
+        for (JsonNode s : seriesArray) {
+            if (tagValue.equals(s.path("tags").path(tagKey).asText())) return s;
+        }
+        throw new AssertionError("no series with tag " + tagKey + "=" + tagValue);
+    }
+
+    private static java.util.List<String> asStringList(JsonNode arr) {
+        java.util.List<String> out = new java.util.ArrayList<>();
+        if (arr != null) for (JsonNode n : arr) out.add(n.asText());
+        return out;
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/metrics/ServerMetricsSnapshotSchedulerTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/metrics/ServerMetricsSnapshotSchedulerTest.java
@@ -0,0 +1,130 @@
+package com.cameleer.server.app.metrics;
+
+import com.cameleer.server.core.storage.ServerMetricsStore;
+import com.cameleer.server.core.storage.model.ServerMetricSample;
+import io.micrometer.core.instrument.Counter;
+import io.micrometer.core.instrument.Gauge;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.Timer;
+import io.micrometer.core.instrument.simple.SimpleMeterRegistry;
+import org.junit.jupiter.api.Test;
+
+import java.time.Duration;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+class ServerMetricsSnapshotSchedulerTest {
+
+    @Test
+    void snapshot_capturesCounterGaugeAndTimerMeasurements() {
+        MeterRegistry registry = new SimpleMeterRegistry();
+
+        Counter counter = Counter.builder("cameleer.test.counter")
+                .tag("env", "dev")
+                .register(registry);
+        counter.increment(3);
+
+        AtomicInteger gaugeSource = new AtomicInteger(42);
+        Gauge.builder("cameleer.test.gauge", gaugeSource, AtomicInteger::doubleValue)
+                .register(registry);
+
+        Timer timer = Timer.builder("cameleer.test.timer").register(registry);
+        timer.record(Duration.ofMillis(5));
+        timer.record(Duration.ofMillis(15));
+
+        RecordingStore store = new RecordingStore();
+        ServerMetricsSnapshotScheduler scheduler =
+                new ServerMetricsSnapshotScheduler(registry, store, "tenant-7", "server-A");
+
+        scheduler.snapshot();
+
+        assertThat(store.batches).hasSize(1);
+        List<ServerMetricSample> samples = store.batches.get(0);
+
+        // Every sample is stamped with tenant + instance + finite value
+        assertThat(samples).allSatisfy(s -> {
+            assertThat(s.tenantId()).isEqualTo("tenant-7");
+            assertThat(s.serverInstanceId()).isEqualTo("server-A");
+            assertThat(Double.isFinite(s.value())).isTrue();
+            assertThat(s.collectedAt()).isNotNull();
+        });
+
+        // Counter -> 1 row with statistic=count, value=3, tag propagated
+        List<ServerMetricSample> counterRows = samples.stream()
+                .filter(s -> s.metricName().equals("cameleer.test.counter"))
+                .toList();
+        assertThat(counterRows).hasSize(1);
+        assertThat(counterRows.get(0).statistic()).isEqualTo("count");
+        assertThat(counterRows.get(0).metricType()).isEqualTo("counter");
+        assertThat(counterRows.get(0).value()).isEqualTo(3.0);
+        assertThat(counterRows.get(0).tags()).containsEntry("env", "dev");
+
+        // Gauge -> 1 row with statistic=value
+        List<ServerMetricSample> gaugeRows = samples.stream()
+                .filter(s -> s.metricName().equals("cameleer.test.gauge"))
+                .toList();
+        assertThat(gaugeRows).hasSize(1);
+        assertThat(gaugeRows.get(0).statistic()).isEqualTo("value");
+        assertThat(gaugeRows.get(0).metricType()).isEqualTo("gauge");
+        assertThat(gaugeRows.get(0).value()).isEqualTo(42.0);
+
+        // Timer -> emits multiple statistics (count, total_time, max)
+        List<ServerMetricSample> timerRows = samples.stream()
+                .filter(s -> s.metricName().equals("cameleer.test.timer"))
+                .toList();
+        assertThat(timerRows).isNotEmpty();
+        // SimpleMeterRegistry emits Statistic.TOTAL ("total"); other registries (Prometheus)
+        // emit TOTAL_TIME ("total_time"). Accept either so the test isn't registry-coupled.
+        assertThat(timerRows).extracting(ServerMetricSample::statistic)
+                .contains("count", "max");
+        assertThat(timerRows).extracting(ServerMetricSample::statistic)
+                .containsAnyOf("total_time", "total");
+        assertThat(timerRows).allSatisfy(s ->
+                assertThat(s.metricType()).isEqualTo("timer"));
+        ServerMetricSample count = timerRows.stream()
+                .filter(s -> s.statistic().equals("count"))
+                .findFirst().orElseThrow();
+        assertThat(count.value()).isEqualTo(2.0);
+    }
+
+    @Test
+    void snapshot_withEmptyRegistry_doesNotWriteBatch() {
+        MeterRegistry registry = new SimpleMeterRegistry();
+        // Force removal of any auto-registered meters (SimpleMeterRegistry has none by default).
+        RecordingStore store = new RecordingStore();
+        ServerMetricsSnapshotScheduler scheduler =
+                new ServerMetricsSnapshotScheduler(registry, store, "t", "s");
+
+        scheduler.snapshot();
+
+        assertThat(store.batches).isEmpty();
+    }
+
+    @Test
+    void snapshot_swallowsStoreFailures() {
+        MeterRegistry registry = new SimpleMeterRegistry();
+        Counter.builder("cameleer.test").register(registry).increment();
+
+        ServerMetricsStore throwingStore = batch -> {
+            throw new RuntimeException("clickhouse down");
+        };
+
+        ServerMetricsSnapshotScheduler scheduler =
+                new ServerMetricsSnapshotScheduler(registry, throwingStore, "t", "s");
+
+        // Must not propagate — the scheduler thread would otherwise die.
+        scheduler.snapshot();
+    }
+
+    private static final class RecordingStore implements ServerMetricsStore {
+        final List<List<ServerMetricSample>> batches = new ArrayList<>();
+
+        @Override
+        public void insertBatch(List<ServerMetricSample> samples) {
+            batches.add(List.copyOf(samples));
+        }
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/controller/OutboundConnectionAdminControllerIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/controller/OutboundConnectionAdminControllerIT.java
@@ -34,6 +34,10 @@ class OutboundConnectionAdminControllerIT extends AbstractPostgresIT {
    @org.junit.jupiter.api.AfterEach
    void cleanupRows() {
        jdbcTemplate.update("DELETE FROM outbound_connections WHERE tenant_id = 'default'");
+        // Clear deployments.created_by for our test users — sibling ITs
+        // (DeploymentControllerIT etc.) may have left rows that FK-block user deletion.
+        jdbcTemplate.update(
+            "DELETE FROM deployments WHERE created_by IN ('test-admin','test-operator','test-viewer')");
        jdbcTemplate.update("DELETE FROM users WHERE user_id IN ('test-admin','test-operator','test-viewer')");
    }

--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/BlueGreenStrategyIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/BlueGreenStrategyIT.java
@@ -0,0 +1,194 @@
+package com.cameleer.server.app.runtime;
+
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.app.TestSecurityHelper;
+import com.cameleer.server.app.storage.PostgresDeploymentRepository;
+import com.cameleer.server.core.runtime.ContainerStatus;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentStatus;
+import com.cameleer.server.core.runtime.RuntimeOrchestrator;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.boot.test.mock.mockito.MockBean;
+import org.springframework.boot.test.web.client.TestRestTemplate;
+import org.springframework.core.io.ByteArrayResource;
+import org.springframework.http.HttpEntity;
+import org.springframework.http.HttpHeaders;
+import org.springframework.http.HttpMethod;
+import org.springframework.http.MediaType;
+import org.springframework.test.context.TestPropertySource;
+import org.springframework.util.LinkedMultiValueMap;
+import org.springframework.util.MultiValueMap;
+
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.awaitility.Awaitility.await;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+/**
+ * Verifies the blue-green deployment strategy: start all new → health-check
+ * all → stop old. Strict all-healthy — partial failure preserves the previous
+ * deployment untouched.
+ */
+@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
+class BlueGreenStrategyIT extends AbstractPostgresIT {
+
+    @MockBean
+    RuntimeOrchestrator runtimeOrchestrator;
+
+    @Autowired private TestRestTemplate restTemplate;
+    @Autowired private ObjectMapper objectMapper;
+    @Autowired private TestSecurityHelper securityHelper;
+    @Autowired private PostgresDeploymentRepository deploymentRepository;
+
+    private String operatorJwt;
+    private String appSlug;
+    private String versionId;
+
+    @BeforeEach
+    void setUp() throws Exception {
+        operatorJwt = securityHelper.operatorToken();
+
+        jdbcTemplate.update("DELETE FROM deployments");
+        jdbcTemplate.update("DELETE FROM app_versions");
+        jdbcTemplate.update("DELETE FROM apps");
+        jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
+
+        // Ensure test-operator exists in users table (required for deployments.created_by FK)
+        jdbcTemplate.update(
+                "INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
+
+        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
+
+        appSlug = "bg-" + UUID.randomUUID().toString().substring(0, 8);
+        post("/api/v1/environments/default/apps", String.format("""
+                {"slug": "%s", "displayName": "BG App"}
+                """, appSlug), operatorJwt);
+        put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
+                {"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "blue-green"}
+                """, operatorJwt);
+        versionId = uploadJar(appSlug, ("bg-jar-" + appSlug).getBytes());
+    }
+
+    @Test
+    void blueGreen_allHealthy_stopsOldAfterNew() throws Exception {
+        when(runtimeOrchestrator.startContainer(any()))
+                .thenReturn("old-0", "old-1", "new-0", "new-1");
+        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
+        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
+
+        String firstDeployId = triggerDeploy();
+        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
+
+        String secondDeployId = triggerDeploy();
+        awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
+
+        // Previous deployment was stopped once new was healthy
+        Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
+        assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
+
+        verify(runtimeOrchestrator).stopContainer("old-0");
+        verify(runtimeOrchestrator).stopContainer("old-1");
+        verify(runtimeOrchestrator, never()).stopContainer("new-0");
+        verify(runtimeOrchestrator, never()).stopContainer("new-1");
+
+        // New deployment has both new replicas recorded
+        Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
+        assertThat(second.replicaStates()).hasSize(2);
+    }
+
+    @Test
+    void blueGreen_partialHealthy_preservesOldAndMarksFailed() throws Exception {
+        when(runtimeOrchestrator.startContainer(any()))
+                .thenReturn("old-0", "old-1", "new-0", "new-1");
+        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
+        ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
+        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
+
+        String firstDeployId = triggerDeploy();
+        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
+
+        String secondDeployId = triggerDeploy();
+        awaitStatus(secondDeployId, DeploymentStatus.FAILED);
+
+        Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
+        assertThat(second.errorMessage())
+                .contains("blue-green")
+                .contains("1/2");
+
+        // Previous deployment stays RUNNING — blue-green's safety promise.
+        Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
+        assertThat(first.status()).isEqualTo(DeploymentStatus.RUNNING);
+
+        verify(runtimeOrchestrator, never()).stopContainer("old-0");
+        verify(runtimeOrchestrator, never()).stopContainer("old-1");
+        // Cleanup ran on both new replicas.
+        verify(runtimeOrchestrator).stopContainer("new-0");
+        verify(runtimeOrchestrator).stopContainer("new-1");
+    }
+
+    // ---- helpers ----
+
+    private String triggerDeploy() throws Exception {
+        JsonNode deployResponse = post(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
+                String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
+        return deployResponse.path("id").asText();
+    }
+
+    private void awaitStatus(String deployId, DeploymentStatus expected) {
+        await().atMost(30, TimeUnit.SECONDS)
+                .pollInterval(500, TimeUnit.MILLISECONDS)
+                .untilAsserted(() -> {
+                    Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
+                            .orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
+                    assertThat(d.status()).isEqualTo(expected);
+                });
+    }
+
+    private JsonNode post(String path, String json, String jwt) throws Exception {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        var response = restTemplate.exchange(path, HttpMethod.POST,
+                new HttpEntity<>(json, headers), String.class);
+        return objectMapper.readTree(response.getBody());
+    }
+
+    private void put(String path, String json, String jwt) {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        restTemplate.exchange(path, HttpMethod.PUT,
+                new HttpEntity<>(json, headers), String.class);
+    }
+
+    private String uploadJar(String appSlug, byte[] content) throws Exception {
+        ByteArrayResource resource = new ByteArrayResource(content) {
+            @Override public String getFilename() { return "app.jar"; }
+        };
+        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
+        body.add("file", resource);
+
+        HttpHeaders headers = new HttpHeaders();
+        headers.set("Authorization", "Bearer " + operatorJwt);
+        headers.set("X-Cameleer-Protocol-Version", "1");
+        headers.setContentType(MediaType.MULTIPART_FORM_DATA);
+
+        var response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/versions",
+                HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
+        JsonNode versionNode = objectMapper.readTree(response.getBody());
+        return versionNode.path("id").asText();
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/DeploymentSnapshotIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/DeploymentSnapshotIT.java
@@ -0,0 +1,289 @@
+package com.cameleer.server.app.runtime;
+
+import com.cameleer.common.model.ApplicationConfig;
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.app.TestSecurityHelper;
+import com.cameleer.server.app.storage.PostgresDeploymentRepository;
+import com.cameleer.server.core.runtime.ContainerStatus;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentStatus;
+import com.cameleer.server.core.runtime.RuntimeOrchestrator;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.boot.test.mock.mockito.MockBean;
+import org.springframework.boot.test.web.client.TestRestTemplate;
+import org.springframework.core.io.ByteArrayResource;
+import org.springframework.http.HttpEntity;
+import org.springframework.http.HttpHeaders;
+import org.springframework.http.HttpMethod;
+import org.springframework.http.MediaType;
+import org.springframework.test.context.TestPropertySource;
+import org.springframework.util.LinkedMultiValueMap;
+import org.springframework.util.MultiValueMap;
+
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.awaitility.Awaitility.await;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.Mockito.when;
+
+/**
+ * Verifies that DeploymentExecutor writes DeploymentConfigSnapshot on successful
+ * RUNNING transition and does NOT write it on a FAILED path (both the
+ * startContainer-throws path and the health-check-fails path).
+ */
+@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
+class DeploymentSnapshotIT extends AbstractPostgresIT {
+
+    @MockBean
+    RuntimeOrchestrator runtimeOrchestrator;
+
+    @Autowired
+    private TestRestTemplate restTemplate;
+
+    @Autowired
+    private ObjectMapper objectMapper;
+
+    @Autowired
+    private TestSecurityHelper securityHelper;
+
+    @Autowired
+    private PostgresDeploymentRepository deploymentRepository;
+
+    private String operatorJwt;
+    private String adminJwt;
+
+    @BeforeEach
+    void setUp() throws Exception {
+        operatorJwt = securityHelper.operatorToken();
+        adminJwt = securityHelper.adminToken();
+
+        // Clean up between tests
+        jdbcTemplate.update("DELETE FROM deployments");
+        jdbcTemplate.update("DELETE FROM app_versions");
+        jdbcTemplate.update("DELETE FROM apps");
+        jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
+
+        // Ensure test-operator exists in users table (required for deployments.created_by FK)
+        jdbcTemplate.update(
+                "INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
+    }
+
+    // -----------------------------------------------------------------------
+    // Test 1: snapshot is populated when deployment reaches RUNNING
+    // -----------------------------------------------------------------------
+
+    @Test
+    void snapshot_isPopulated_whenDeploymentReachesRunning() throws Exception {
+        // --- given: mock orchestrator that simulates a healthy single-replica container ---
+        String fakeContainerId = "fake-container-" + UUID.randomUUID();
+
+        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
+        when(runtimeOrchestrator.startContainer(any()))
+                .thenReturn(fakeContainerId);
+        when(runtimeOrchestrator.getContainerStatus(fakeContainerId))
+                .thenReturn(new ContainerStatus("healthy", true, 0, null));
+
+        // --- given: create app with explicit runtimeType so auto-detection is not needed ---
+        String appSlug = "snap-success-" + UUID.randomUUID().toString().substring(0, 8);
+        String containerConfigJson = """
+                {"runtimeType": "spring-boot", "appPort": 8081}
+                """;
+        String createAppJson = String.format("""
+                {"slug": "%s", "displayName": "Snapshot Success App"}
+                """, appSlug);
+
+        JsonNode createdApp = post("/api/v1/environments/default/apps", createAppJson, operatorJwt);
+        String appId = createdApp.path("id").asText();
+
+        // --- given: update containerConfig to set runtimeType ---
+        put("/api/v1/environments/default/apps/" + appSlug + "/container-config",
+                containerConfigJson, operatorJwt);
+
+        // --- given: upload a JAR (fake bytes; real file written to disk by AppService) ---
+        String versionId = uploadJar(appSlug, ("fake-jar-bytes-" + appSlug).getBytes());
+
+        // --- given: save agentConfig with samplingRate = 0.25 ---
+        String configJson = """
+                {"samplingRate": 0.25}
+                """;
+        put("/api/v1/environments/default/apps/" + appSlug + "/config", configJson, operatorJwt);
+
+        // --- when: trigger deploy ---
+        String deployJson = String.format("""
+                {"appVersionId": "%s"}
+                """, versionId);
+        JsonNode deployResponse = post(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
+                deployJson, operatorJwt);
+        String deploymentId = deployResponse.path("id").asText();
+
+        // --- await RUNNING (async executor) ---
+        AtomicReference<Deployment> deploymentRef = new AtomicReference<>();
+        await().atMost(30, TimeUnit.SECONDS)
+                .pollInterval(500, TimeUnit.MILLISECONDS)
+                .untilAsserted(() -> {
+                    Deployment d = deploymentRepository.findById(UUID.fromString(deploymentId))
+                            .orElseThrow(() -> new AssertionError("Deployment not found: " + deploymentId));
+                    assertThat(d.status()).isEqualTo(DeploymentStatus.RUNNING);
+                    deploymentRef.set(d);
+                });
+
+        // --- then: snapshot is populated ---
+        Deployment deployed = deploymentRef.get();
+        assertThat(deployed.deployedConfigSnapshot()).isNotNull();
+        assertThat(deployed.deployedConfigSnapshot().jarVersionId())
+                .isEqualTo(UUID.fromString(versionId));
+        assertThat(deployed.deployedConfigSnapshot().agentConfig()).isNotNull();
+        assertThat(deployed.deployedConfigSnapshot().agentConfig().getSamplingRate())
+                .isEqualTo(0.25);
+        assertThat(deployed.deployedConfigSnapshot().containerConfig())
+                .containsEntry("runtimeType", "spring-boot")
+                .containsEntry("appPort", 8081);
+    }
+
+    // -----------------------------------------------------------------------
+    // Test 2: snapshot is NOT populated when deployment fails
+    // -----------------------------------------------------------------------
+
+    @Test
+    void snapshot_isNotPopulated_whenDeploymentFails() throws Exception {
+        // --- given: mock orchestrator that throws on startContainer ---
+        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
+        when(runtimeOrchestrator.startContainer(any()))
+                .thenThrow(new RuntimeException("Simulated container start failure"));
+
+        // --- given: create app with explicit runtimeType ---
+        String appSlug = "snap-fail-" + UUID.randomUUID().toString().substring(0, 8);
+        String createAppJson = String.format("""
+                {"slug": "%s", "displayName": "Snapshot Fail App"}
+                """, appSlug);
+        post("/api/v1/environments/default/apps", createAppJson, operatorJwt);
+
+        put("/api/v1/environments/default/apps/" + appSlug + "/container-config",
+                """
+                {"runtimeType": "spring-boot", "appPort": 8081}
+                """, operatorJwt);
+
+        String versionId = uploadJar(appSlug, ("fake-jar-fail-" + appSlug).getBytes());
+
+        // --- when: trigger deploy ---
+        String deployJson = String.format("""
+                {"appVersionId": "%s"}
+                """, versionId);
+        JsonNode deployResponse = post(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
+                deployJson, operatorJwt);
+        String deploymentId = deployResponse.path("id").asText();
+
+        // --- await FAILED (async executor catches exception and marks failed) ---
+        await().atMost(30, TimeUnit.SECONDS)
+                .pollInterval(500, TimeUnit.MILLISECONDS)
+                .untilAsserted(() -> {
+                    Deployment d = deploymentRepository.findById(UUID.fromString(deploymentId))
+                            .orElseThrow(() -> new AssertionError("Deployment not found: " + deploymentId));
+                    assertThat(d.status()).isEqualTo(DeploymentStatus.FAILED);
+                });
+
+        // --- then: snapshot is null ---
+        Deployment failed = deploymentRepository.findById(UUID.fromString(deploymentId)).orElseThrow();
+        assertThat(failed.deployedConfigSnapshot()).isNull();
+    }
+
+    // -----------------------------------------------------------------------
+    // Test 3: snapshot is NOT populated when the health check never passes.
+    // This exercises the early-exit path in DeploymentExecutor (line ~231) —
+    // startContainer succeeds, but no replica ever reports healthy, so
+    // waitForAnyHealthy returns 0 before the snapshot-write point.
+    // -----------------------------------------------------------------------
+
+    @Test
+    void snapshot_isNotPopulated_whenHealthCheckFails() throws Exception {
+        // --- given: container starts but never becomes healthy ---
+        String fakeContainerId = "fake-unhealthy-" + UUID.randomUUID();
+
+        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
+        when(runtimeOrchestrator.startContainer(any())).thenReturn(fakeContainerId);
+        when(runtimeOrchestrator.getContainerStatus(fakeContainerId))
+                .thenReturn(new ContainerStatus("starting", true, 0, null));
+
+        String appSlug = "snap-unhealthy-" + UUID.randomUUID().toString().substring(0, 8);
+        post("/api/v1/environments/default/apps", String.format("""
+                {"slug": "%s", "displayName": "Snapshot Unhealthy App"}
+                """, appSlug), operatorJwt);
+        put("/api/v1/environments/default/apps/" + appSlug + "/container-config",
+                """
+                {"runtimeType": "spring-boot", "appPort": 8081}
+                """, operatorJwt);
+        String versionId = uploadJar(appSlug, ("fake-jar-unhealthy-" + appSlug).getBytes());
+
+        // --- when: trigger deploy ---
+        JsonNode deployResponse = post(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
+                String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
+        String deploymentId = deployResponse.path("id").asText();
+
+        // --- await FAILED (healthchecktimeout overridden to 2s in @TestPropertySource) ---
+        await().atMost(30, TimeUnit.SECONDS)
+                .pollInterval(500, TimeUnit.MILLISECONDS)
+                .untilAsserted(() -> {
+                    Deployment d = deploymentRepository.findById(UUID.fromString(deploymentId))
+                            .orElseThrow(() -> new AssertionError("Deployment not found: " + deploymentId));
+                    assertThat(d.status()).isEqualTo(DeploymentStatus.FAILED);
+                });
+
+        // --- then: snapshot is null (snapshot-write is gated behind health check) ---
+        Deployment failed = deploymentRepository.findById(UUID.fromString(deploymentId)).orElseThrow();
+        assertThat(failed.deployedConfigSnapshot()).isNull();
+    }
+
+    // -----------------------------------------------------------------------
+    // Helpers
+    // -----------------------------------------------------------------------
+
+    private JsonNode post(String path, String json, String jwt) throws Exception {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        var response = restTemplate.exchange(
+                path, HttpMethod.POST,
+                new HttpEntity<>(json, headers),
+                String.class);
+        return objectMapper.readTree(response.getBody());
+    }
+
+    private void put(String path, String json, String jwt) {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        restTemplate.exchange(
+                path, HttpMethod.PUT,
+                new HttpEntity<>(json, headers),
+                String.class);
+    }
+
+    private String uploadJar(String appSlug, byte[] content) throws Exception {
+        ByteArrayResource resource = new ByteArrayResource(content) {
+            @Override
+            public String getFilename() { return "app.jar"; }
+        };
+        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
+        body.add("file", resource);
+
+        HttpHeaders headers = new HttpHeaders();
+        headers.set("Authorization", "Bearer " + operatorJwt);
+        headers.set("X-Cameleer-Protocol-Version", "1");
+        headers.setContentType(MediaType.MULTIPART_FORM_DATA);
+
+        var response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/versions",
+                HttpMethod.POST,
+                new HttpEntity<>(body, headers),
+                String.class);
+
+        JsonNode versionNode = objectMapper.readTree(response.getBody());
+        return versionNode.path("id").asText();
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/RollingStrategyIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/RollingStrategyIT.java
@@ -0,0 +1,198 @@
+package com.cameleer.server.app.runtime;
+
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.app.TestSecurityHelper;
+import com.cameleer.server.app.storage.PostgresDeploymentRepository;
+import com.cameleer.server.core.runtime.ContainerStatus;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentStatus;
+import com.cameleer.server.core.runtime.RuntimeOrchestrator;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.mockito.InOrder;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.boot.test.mock.mockito.MockBean;
+import org.springframework.boot.test.web.client.TestRestTemplate;
+import org.springframework.core.io.ByteArrayResource;
+import org.springframework.http.HttpEntity;
+import org.springframework.http.HttpHeaders;
+import org.springframework.http.HttpMethod;
+import org.springframework.http.MediaType;
+import org.springframework.test.context.TestPropertySource;
+import org.springframework.util.LinkedMultiValueMap;
+import org.springframework.util.MultiValueMap;
+
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.awaitility.Awaitility.await;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.Mockito.inOrder;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+/**
+ * Verifies the rolling deployment strategy: per-replica start → health → stop
+ * old. Mid-rollout health failure preserves remaining un-replaced old replicas;
+ * already-stopped old replicas are not restored.
+ */
+@TestPropertySource(properties = "cameleer.server.runtime.healthchecktimeout=2")
+class RollingStrategyIT extends AbstractPostgresIT {
+
+    @MockBean
+    RuntimeOrchestrator runtimeOrchestrator;
+
+    @Autowired private TestRestTemplate restTemplate;
+    @Autowired private ObjectMapper objectMapper;
+    @Autowired private TestSecurityHelper securityHelper;
+    @Autowired private PostgresDeploymentRepository deploymentRepository;
+
+    private String operatorJwt;
+    private String appSlug;
+    private String versionId;
+
+    @BeforeEach
+    void setUp() throws Exception {
+        operatorJwt = securityHelper.operatorToken();
+
+        jdbcTemplate.update("DELETE FROM deployments");
+        jdbcTemplate.update("DELETE FROM app_versions");
+        jdbcTemplate.update("DELETE FROM apps");
+        jdbcTemplate.update("DELETE FROM application_config WHERE environment = 'default'");
+
+        // Ensure test-operator exists in users table (required for deployments.created_by FK)
+        jdbcTemplate.update(
+                "INSERT INTO users (user_id, provider, display_name) VALUES ('test-operator', 'local', 'Test Operator') ON CONFLICT (user_id) DO NOTHING");
+
+        when(runtimeOrchestrator.isEnabled()).thenReturn(true);
+
+        appSlug = "roll-" + UUID.randomUUID().toString().substring(0, 8);
+        post("/api/v1/environments/default/apps", String.format("""
+                {"slug": "%s", "displayName": "Rolling App"}
+                """, appSlug), operatorJwt);
+        put("/api/v1/environments/default/apps/" + appSlug + "/container-config", """
+                {"runtimeType": "spring-boot", "appPort": 8081, "replicas": 2, "deploymentStrategy": "rolling"}
+                """, operatorJwt);
+        versionId = uploadJar(appSlug, ("roll-jar-" + appSlug).getBytes());
+    }
+
+    @Test
+    void rolling_allHealthy_replacesOneByOne() throws Exception {
+        when(runtimeOrchestrator.startContainer(any()))
+                .thenReturn("old-0", "old-1", "new-0", "new-1");
+        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
+        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(healthy);
+
+        String firstDeployId = triggerDeploy();
+        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
+
+        String secondDeployId = triggerDeploy();
+        awaitStatus(secondDeployId, DeploymentStatus.RUNNING);
+
+        // Rolling invariant: old-0 is stopped BEFORE old-1 (replicas replaced
+        // one at a time, not all at once). Checking stop order is sufficient —
+        // a blue-green path would have both stops adjacent at the end with no
+        // interleaved starts; rolling interleaves starts between stops.
+        InOrder inOrder = inOrder(runtimeOrchestrator);
+        inOrder.verify(runtimeOrchestrator).stopContainer("old-0");
+        inOrder.verify(runtimeOrchestrator).stopContainer("old-1");
+
+        // Total of 4 startContainer calls: 2 for first deploy, 2 for rolling.
+        verify(runtimeOrchestrator, times(4)).startContainer(any());
+        // New replicas were not stopped — they're the running ones now.
+        verify(runtimeOrchestrator, never()).stopContainer("new-0");
+        verify(runtimeOrchestrator, never()).stopContainer("new-1");
+
+        Deployment first = deploymentRepository.findById(UUID.fromString(firstDeployId)).orElseThrow();
+        assertThat(first.status()).isEqualTo(DeploymentStatus.STOPPED);
+    }
+
+    @Test
+    void rolling_failsMidRollout_preservesRemainingOld() throws Exception {
+        when(runtimeOrchestrator.startContainer(any()))
+                .thenReturn("old-0", "old-1", "new-0", "new-1");
+        ContainerStatus healthy = new ContainerStatus("healthy", true, 0, null);
+        ContainerStatus starting = new ContainerStatus("starting", true, 0, null);
+        when(runtimeOrchestrator.getContainerStatus("old-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("old-1")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-0")).thenReturn(healthy);
+        when(runtimeOrchestrator.getContainerStatus("new-1")).thenReturn(starting);
+
+        String firstDeployId = triggerDeploy();
+        awaitStatus(firstDeployId, DeploymentStatus.RUNNING);
+
+        String secondDeployId = triggerDeploy();
+        awaitStatus(secondDeployId, DeploymentStatus.FAILED);
+
+        Deployment second = deploymentRepository.findById(UUID.fromString(secondDeployId)).orElseThrow();
+        assertThat(second.errorMessage())
+                .contains("rolling")
+                .contains("replica 1");
+
+        // old-0 was replaced before the failure; old-1 was never touched.
+        verify(runtimeOrchestrator).stopContainer("old-0");
+        verify(runtimeOrchestrator, never()).stopContainer("old-1");
+        // Cleanup stops both new replicas started so far.
+        verify(runtimeOrchestrator).stopContainer("new-0");
+        verify(runtimeOrchestrator).stopContainer("new-1");
+    }
+
+    // ---- helpers (same pattern as BlueGreenStrategyIT) ----
+
+    private String triggerDeploy() throws Exception {
+        JsonNode deployResponse = post(
+                "/api/v1/environments/default/apps/" + appSlug + "/deployments",
+                String.format("{\"appVersionId\": \"%s\"}", versionId), operatorJwt);
+        return deployResponse.path("id").asText();
+    }
+
+    private void awaitStatus(String deployId, DeploymentStatus expected) {
+        await().atMost(30, TimeUnit.SECONDS)
+                .pollInterval(500, TimeUnit.MILLISECONDS)
+                .untilAsserted(() -> {
+                    Deployment d = deploymentRepository.findById(UUID.fromString(deployId))
+                            .orElseThrow(() -> new AssertionError("Deployment not found: " + deployId));
+                    assertThat(d.status()).isEqualTo(expected);
+                });
+    }
+
+    private JsonNode post(String path, String json, String jwt) throws Exception {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        var response = restTemplate.exchange(path, HttpMethod.POST,
+                new HttpEntity<>(json, headers), String.class);
+        return objectMapper.readTree(response.getBody());
+    }
+
+    private void put(String path, String json, String jwt) {
+        HttpHeaders headers = securityHelper.authHeaders(jwt);
+        restTemplate.exchange(path, HttpMethod.PUT,
+                new HttpEntity<>(json, headers), String.class);
+    }
+
+    private String uploadJar(String appSlug, byte[] content) throws Exception {
+        ByteArrayResource resource = new ByteArrayResource(content) {
+            @Override public String getFilename() { return "app.jar"; }
+        };
+        MultiValueMap<String, Object> body = new LinkedMultiValueMap<>();
+        body.add("file", resource);
+
+        HttpHeaders headers = new HttpHeaders();
+        headers.set("Authorization", "Bearer " + operatorJwt);
+        headers.set("X-Cameleer-Protocol-Version", "1");
+        headers.setContentType(MediaType.MULTIPART_FORM_DATA);
+
+        var response = restTemplate.exchange(
+                "/api/v1/environments/default/apps/" + appSlug + "/versions",
+                HttpMethod.POST, new HttpEntity<>(body, headers), String.class);
+        JsonNode versionNode = objectMapper.readTree(response.getBody());
+        return versionNode.path("id").asText();
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/TraefikLabelBuilderTest.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/runtime/TraefikLabelBuilderTest.java
@@ -0,0 +1,90 @@
+package com.cameleer.server.app.runtime;
+
+import com.cameleer.server.core.runtime.ResolvedContainerConfig;
+import org.junit.jupiter.api.Test;
+
+import java.util.List;
+import java.util.Map;
+
+import static org.junit.jupiter.api.Assertions.*;
+
+class TraefikLabelBuilderTest {
+
+    private static ResolvedContainerConfig config(boolean externalRouting, String certResolver) {
+        return new ResolvedContainerConfig(
+                512, null, 500, null,
+                8080, List.of(), Map.of(),
+                true, true,
+                "path", "example.com", "https://cameleer.example.com",
+                1, "blue-green",
+                true, true,
+                "spring-boot", "", List.of(),
+                externalRouting,
+                certResolver
+        );
+    }
+
+    @Test
+    void build_emitsTraefikLabelsWhenExternalRoutingEnabled() {
+        Map<String, String> labels = TraefikLabelBuilder.build(
+                "myapp", "dev", "acme", config(true, null), 0, "abcdef01");
+
+        assertEquals("true", labels.get("traefik.enable"));
+        assertEquals("8080", labels.get("traefik.http.services.dev-myapp.loadbalancer.server.port"));
+        assertEquals("PathPrefix(`/dev/myapp/`)", labels.get("traefik.http.routers.dev-myapp.rule"));
+    }
+
+    @Test
+    void build_omitsAllTraefikLabelsWhenExternalRoutingDisabled() {
+        Map<String, String> labels = TraefikLabelBuilder.build(
+                "myapp", "dev", "acme", config(false, null), 0, "abcdef01");
+
+        long traefikLabelCount = labels.keySet().stream()
+                .filter(k -> k.startsWith("traefik."))
+                .count();
+        assertEquals(0, traefikLabelCount, "expected no traefik.* labels but found: " + labels);
+    }
+
+    @Test
+    void build_preservesIdentityLabelsWhenExternalRoutingDisabled() {
+        Map<String, String> labels = TraefikLabelBuilder.build(
+                "myapp", "dev", "acme", config(false, null), 2, "abcdef01");
+
+        assertEquals("cameleer-server", labels.get("managed-by"));
+        assertEquals("acme", labels.get("cameleer.tenant"));
+        assertEquals("myapp", labels.get("cameleer.app"));
+        assertEquals("dev", labels.get("cameleer.environment"));
+        assertEquals("2", labels.get("cameleer.replica"));
+        assertEquals("abcdef01", labels.get("cameleer.generation"));
+        assertEquals("dev-myapp-2-abcdef01", labels.get("cameleer.instance-id"));
+    }
+
+    @Test
+    void build_emitsCertResolverLabelWhenConfigured() {
+        Map<String, String> labels = TraefikLabelBuilder.build(
+                "myapp", "dev", "acme", config(true, "letsencrypt"), 0, "abcdef01");
+
+        assertEquals("true", labels.get("traefik.http.routers.dev-myapp.tls"));
+        assertEquals("letsencrypt", labels.get("traefik.http.routers.dev-myapp.tls.certresolver"));
+    }
+
+    @Test
+    void build_omitsCertResolverLabelWhenNull() {
+        Map<String, String> labels = TraefikLabelBuilder.build(
+                "myapp", "dev", "acme", config(true, null), 0, "abcdef01");
+
+        assertEquals("true", labels.get("traefik.http.routers.dev-myapp.tls"),
+                "sslOffloading=true should still mark the router TLS-enabled");
+        assertNull(labels.get("traefik.http.routers.dev-myapp.tls.certresolver"),
+                "cert resolver label must be omitted when none is configured");
+    }
+
+    @Test
+    void build_omitsCertResolverLabelWhenBlank() {
+        Map<String, String> labels = TraefikLabelBuilder.build(
+                "myapp", "dev", "acme", config(true, "   "), 0, "abcdef01");
+
+        assertNull(labels.get("traefik.http.routers.dev-myapp.tls.certresolver"),
+                "whitespace-only cert resolver must be treated as unset");
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java
@@ -79,7 +79,8 @@ class ClickHouseLogStoreCountIT {
                base.plusSeconds(30),
                null,
                100,
-                "desc"));
+                "desc",
+                null));

        assertThat(count).isEqualTo(3);
    }
@@ -102,7 +103,8 @@ class ClickHouseLogStoreCountIT {
                base.plusSeconds(30),
                null,
                100,
-                "desc"));
+                "desc",
+                null));

        assertThat(count).isZero();
    }
@@ -120,7 +122,7 @@ class ClickHouseLogStoreCountIT {
                null, List.of("ERROR"), "orders", null, null, null,
                "dev", List.of(),
                base.minusSeconds(1), base.plusSeconds(60),
-                null, 100, "desc"));
+                null, 100, "desc", null));

        assertThat(devCount).isEqualTo(2);
    }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreIT.java
@@ -53,7 +53,7 @@ class ClickHouseLogStoreIT {
    }

    private LogSearchRequest req(String application) {
-        return new LogSearchRequest(null, null, application, null, null, null, null, null, null, null, null, 100, "desc");
+        return new LogSearchRequest(null, null, application, null, null, null, null, null, null, null, null, 100, "desc", null);
    }

    // ── Tests ─────────────────────────────────────────────────────────────
@@ -99,7 +99,7 @@ class ClickHouseLogStoreIT {
        ));

        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
+                null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(1);
        assertThat(result.data().get(0).level()).isEqualTo("ERROR");
@@ -116,7 +116,7 @@ class ClickHouseLogStoreIT {
        ));

        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, List.of("WARN", "ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
+                null, List.of("WARN", "ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(2);
    }
@@ -130,7 +130,7 @@ class ClickHouseLogStoreIT {
        ));

        LogSearchResponse result = store.search(new LogSearchRequest(
-                "order #12345", null, "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
+                "order #12345", null, "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(1);
        assertThat(result.data().get(0).message()).contains("order #12345");
@@ -147,7 +147,7 @@ class ClickHouseLogStoreIT {
        ));

        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, null, "my-app", null, "exchange-abc", null, null, null, null, null, null, 100, "desc"));
+                null, null, "my-app", null, "exchange-abc", null, null, null, null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(1);
        assertThat(result.data().get(0).message()).isEqualTo("msg with exchange");
@@ -170,7 +170,7 @@ class ClickHouseLogStoreIT {
        Instant to = Instant.parse("2026-03-31T13:00:00Z");

        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, null, "my-app", null, null, null, null, null, from, to, null, 100, "desc"));
+                null, null, "my-app", null, null, null, null, null, from, to, null, 100, "desc", null));

        assertThat(result.data()).hasSize(1);
        assertThat(result.data().get(0).message()).isEqualTo("noon");
@@ -188,7 +188,7 @@ class ClickHouseLogStoreIT {

        // No application filter — should return both
        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, null, null, null, null, null, null, null, null, null, null, 100, "desc"));
+                null, null, null, null, null, null, null, null, null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(2);
    }
@@ -202,7 +202,7 @@ class ClickHouseLogStoreIT {
        ));

        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, null, "my-app", null, null, "OrderProcessor", null, null, null, null, null, 100, "desc"));
+                null, null, "my-app", null, null, "OrderProcessor", null, null, null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(1);
        assertThat(result.data().get(0).loggerName()).contains("OrderProcessor");
@@ -221,7 +221,7 @@ class ClickHouseLogStoreIT {

        // Page 1: limit 2
        LogSearchResponse page1 = store.search(new LogSearchRequest(
-                null, null, "my-app", null, null, null, null, null, null, null, null, 2, "desc"));
+                null, null, "my-app", null, null, null, null, null, null, null, null, 2, "desc", null));

        assertThat(page1.data()).hasSize(2);
        assertThat(page1.hasMore()).isTrue();
@@ -230,7 +230,7 @@ class ClickHouseLogStoreIT {

        // Page 2: use cursor
        LogSearchResponse page2 = store.search(new LogSearchRequest(
-                null, null, "my-app", null, null, null, null, null, null, null, page1.nextCursor(), 2, "desc"));
+                null, null, "my-app", null, null, null, null, null, null, null, page1.nextCursor(), 2, "desc", null));

        assertThat(page2.data()).hasSize(2);
        assertThat(page2.hasMore()).isTrue();
@@ -238,7 +238,7 @@ class ClickHouseLogStoreIT {

        // Page 3: last page
        LogSearchResponse page3 = store.search(new LogSearchRequest(
-                null, null, "my-app", null, null, null, null, null, null, null, page2.nextCursor(), 2, "desc"));
+                null, null, "my-app", null, null, null, null, null, null, null, page2.nextCursor(), 2, "desc", null));

        assertThat(page3.data()).hasSize(1);
        assertThat(page3.hasMore()).isFalse();
@@ -257,7 +257,7 @@ class ClickHouseLogStoreIT {

        // Filter for ERROR only, but counts should include all levels
        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc"));
+                null, List.of("ERROR"), "my-app", null, null, null, null, null, null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(1);
        assertThat(result.levelCounts()).containsEntry("INFO", 2L);
@@ -275,7 +275,7 @@ class ClickHouseLogStoreIT {
        ));

        LogSearchResponse result = store.search(new LogSearchRequest(
-                null, null, "my-app", null, null, null, null, null, null, null, null, 100, "asc"));
+                null, null, "my-app", null, null, null, null, null, null, null, null, 100, "asc", null));

        assertThat(result.data()).hasSize(3);
        assertThat(result.data().get(0).message()).isEqualTo("msg-1");
@@ -340,7 +340,7 @@ class ClickHouseLogStoreIT {

        LogSearchResponse result = store.search(new LogSearchRequest(
                null, null, "my-app", null, null, null, null,
-                List.of("container"), null, null, null, 100, "desc"));
+                List.of("container"), null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(1);
        assertThat(result.data().get(0).message()).isEqualTo("container msg");
@@ -365,7 +365,7 @@ class ClickHouseLogStoreIT {

        LogSearchResponse result = store.search(new LogSearchRequest(
                null, null, "my-app", null, null, null, null,
-                List.of("app", "container"), null, null, null, 100, "desc"));
+                List.of("app", "container"), null, null, null, 100, "desc", null));

        assertThat(result.data()).hasSize(2);
        assertThat(result.data()).extracting(LogEntryResult::message)
@@ -388,7 +388,7 @@ class ClickHouseLogStoreIT {
        for (int page = 0; page < 10; page++) {
            LogSearchResponse resp = store.search(new LogSearchRequest(
                    null, null, "my-app", null, null, null, null, null,
-                    null, null, cursor, 2, "desc"));
+                    null, null, cursor, 2, "desc", null));
            for (LogEntryResult r : resp.data()) {
                assertThat(seen.add(r.message())).as("duplicate row returned: " + r.message()).isTrue();
            }
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreInstanceIdsIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreInstanceIdsIT.java
@@ -0,0 +1,196 @@
+package com.cameleer.server.app.search;
+
+import com.cameleer.server.core.ingestion.BufferedLogEntry;
+import com.cameleer.server.core.search.LogSearchRequest;
+import com.cameleer.server.core.search.LogSearchResponse;
+import com.cameleer.common.model.LogEntry;
+import com.cameleer.server.app.ClickHouseTestHelper;
+import com.zaxxer.hikari.HikariDataSource;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.testcontainers.clickhouse.ClickHouseContainer;
+import org.testcontainers.junit.jupiter.Container;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import java.time.Instant;
+import java.util.List;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/**
+ * Integration test for the {@code instanceIds} multi-value filter on
+ * {@link ClickHouseLogStore#search(LogSearchRequest)}.
+ *
+ * <p>Three rows are seeded with distinct {@code instance_id} values:
+ * <ul>
+ *   <li>{@code prod-app1-0-aaa11111} — included in filter</li>
+ *   <li>{@code prod-app1-1-aaa11111} — included in filter</li>
+ *   <li>{@code prod-app1-0-bbb22222} — excluded from filter</li>
+ * </ul>
+ */
+@Testcontainers
+class ClickHouseLogStoreInstanceIdsIT {
+
+    @Container
+    static final ClickHouseContainer clickhouse =
+            new ClickHouseContainer("clickhouse/clickhouse-server:24.12");
+
+    private JdbcTemplate jdbc;
+    private ClickHouseLogStore store;
+
+    private static final String TENANT  = "default";
+    private static final String ENV     = "prod";
+    private static final String APP     = "app1";
+    private static final String INST_A  = "prod-app1-0-aaa11111";
+    private static final String INST_B  = "prod-app1-1-aaa11111";
+    private static final String INST_C  = "prod-app1-0-bbb22222";
+
+    @BeforeEach
+    void setUp() throws Exception {
+        HikariDataSource ds = new HikariDataSource();
+        ds.setJdbcUrl(clickhouse.getJdbcUrl());
+        ds.setUsername(clickhouse.getUsername());
+        ds.setPassword(clickhouse.getPassword());
+
+        jdbc  = new JdbcTemplate(ds);
+        ClickHouseTestHelper.executeInitSql(jdbc);
+        jdbc.execute("TRUNCATE TABLE logs");
+
+        store = new ClickHouseLogStore(TENANT, jdbc);
+
+        Instant base = Instant.parse("2026-04-23T09:00:00Z");
+        seedLog(INST_A, base,               "msg-from-replica-0-gen-aaa");
+        seedLog(INST_B, base.plusSeconds(1), "msg-from-replica-1-gen-aaa");
+        seedLog(INST_C, base.plusSeconds(2), "msg-from-replica-0-gen-bbb");
+    }
+
+    @AfterEach
+    void tearDown() {
+        jdbc.execute("TRUNCATE TABLE logs");
+    }
+
+    private void seedLog(String instanceId, Instant ts, String message) {
+        LogEntry entry = new LogEntry(ts, "INFO", "com.example.Svc", message, "main", null, null);
+        store.insertBufferedBatch(List.of(
+                new BufferedLogEntry(TENANT, ENV, instanceId, APP, entry)));
+    }
+
+    // ── Tests ─────────────────────────────────────────────────────────────
+
+    @Test
+    void search_instanceIds_returnsOnlyMatchingInstances() {
+        LogSearchResponse result = store.search(new LogSearchRequest(
+                null,
+                List.of(),
+                APP,
+                null,
+                null,
+                null,
+                ENV,
+                List.of(),
+                null,
+                null,
+                null,
+                100,
+                "desc",
+                List.of(INST_A, INST_B)));
+
+        assertThat(result.data()).hasSize(2);
+        assertThat(result.data())
+                .extracting(r -> r.instanceId())
+                .containsExactlyInAnyOrder(INST_A, INST_B);
+        assertThat(result.data())
+                .extracting(r -> r.instanceId())
+                .doesNotContain(INST_C);
+    }
+
+    @Test
+    void search_emptyInstanceIds_returnsAllRows() {
+        LogSearchResponse result = store.search(new LogSearchRequest(
+                null,
+                List.of(),
+                APP,
+                null,
+                null,
+                null,
+                ENV,
+                List.of(),
+                null,
+                null,
+                null,
+                100,
+                "desc",
+                List.of()));
+
+        assertThat(result.data()).hasSize(3);
+    }
+
+    @Test
+    void search_nullInstanceIds_returnsAllRows() {
+        LogSearchResponse result = store.search(new LogSearchRequest(
+                null,
+                List.of(),
+                APP,
+                null,
+                null,
+                null,
+                ENV,
+                List.of(),
+                null,
+                null,
+                null,
+                100,
+                "desc",
+                null));
+
+        assertThat(result.data()).hasSize(3);
+    }
+
+    @Test
+    void search_instanceIds_singleValue_filtersToOneReplica() {
+        LogSearchResponse result = store.search(new LogSearchRequest(
+                null,
+                List.of(),
+                APP,
+                null,
+                null,
+                null,
+                ENV,
+                List.of(),
+                null,
+                null,
+                null,
+                100,
+                "desc",
+                List.of(INST_C)));
+
+        assertThat(result.data()).hasSize(1);
+        assertThat(result.data().get(0).instanceId()).isEqualTo(INST_C);
+        assertThat(result.data().get(0).message()).isEqualTo("msg-from-replica-0-gen-bbb");
+    }
+
+    @Test
+    void search_instanceIds_doesNotConflictWithSingularInstanceId() {
+        // Singular instanceId=INST_A AND instanceIds=[INST_B] → intersection = empty
+        // (both conditions apply: instance_id = A AND instance_id IN (B))
+        LogSearchResponse result = store.search(new LogSearchRequest(
+                null,
+                List.of(),
+                APP,
+                INST_A,         // singular
+                null,
+                null,
+                ENV,
+                List.of(),
+                null,
+                null,
+                null,
+                100,
+                "desc",
+                List.of(INST_B)));  // plural — no overlap
+
+        assertThat(result.data()).isEmpty();
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexIT.java
@@ -2,6 +2,7 @@ package com.cameleer.server.app.search;

 import com.cameleer.server.app.storage.ClickHouseExecutionStore;
 import com.cameleer.server.core.ingestion.MergedExecution;
+import com.cameleer.server.core.search.AttributeFilter;
 import com.cameleer.server.core.search.ExecutionSummary;
 import com.cameleer.server.core.search.SearchRequest;
 import com.cameleer.server.core.search.SearchResult;
@@ -62,7 +63,7 @@ class ClickHouseSearchIndexIT {
                500L,
                "", "", "", "", "", "",
                "hash-abc", "FULL",
-                "{\"order\":\"12345\"}", "", "", "", "", "", "{\"env\":\"prod\"}",
+                "", "", "", "", "", "", "{\"order\":\"12345\",\"tenant\":\"acme\"}",
                "", "",
                false, false,
                null, null
@@ -79,7 +80,7 @@ class ClickHouseSearchIndexIT {
                "java.lang.NPE\n  at Foo.bar(Foo.java:42)",
                "NullPointerException", "RUNTIME", "", "",
                "", "FULL",
-                "", "", "", "", "", "", "",
+                "", "", "", "", "", "", "{\"order\":\"99999\"}",
                "", "",
                false, false,
                null, null
@@ -118,7 +119,7 @@ class ClickHouseSearchIndexIT {
    void search_withNoFilters_returnsAllExecutions() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, null, null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -130,7 +131,7 @@ class ClickHouseSearchIndexIT {
    void search_byStatus_filtersCorrectly() {
        SearchRequest request = new SearchRequest(
                "FAILED", null, null, null, null, null, null, null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -145,7 +146,7 @@ class ClickHouseSearchIndexIT {
        // Time window covering exec-1 and exec-2 but not exec-3
        SearchRequest request = new SearchRequest(
                null, baseTime, baseTime.plusMillis(1500), null, null, null, null, null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -158,7 +159,7 @@ class ClickHouseSearchIndexIT {
    void search_fullTextSearch_findsInErrorMessage() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, "NullPointerException", null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -170,7 +171,7 @@ class ClickHouseSearchIndexIT {
    void search_fullTextSearch_findsInInputBody() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, "12345", null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -182,7 +183,7 @@ class ClickHouseSearchIndexIT {
    void search_textInBody_searchesProcessorBodies() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, null, "Hello World", null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -194,7 +195,7 @@ class ClickHouseSearchIndexIT {
    void search_textInHeaders_searchesProcessorHeaders() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, null, null, "secret-token", null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -206,7 +207,7 @@ class ClickHouseSearchIndexIT {
    void search_textInErrors_searchesErrorFields() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, null, null, null, "Foo.bar",
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -218,7 +219,7 @@ class ClickHouseSearchIndexIT {
    void search_withHighlight_returnsSnippet() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, "NullPointerException", null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -230,7 +231,7 @@ class ClickHouseSearchIndexIT {
    void search_pagination_works() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, null, null, null, null,
-                null, null, null, null, null, 0, 2, null, null, null);
+                null, null, null, null, null, 0, 2, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -244,7 +245,7 @@ class ClickHouseSearchIndexIT {
    void search_byApplication_filtersCorrectly() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, null, null, null, null,
-                null, null, null, "other-app", null, 0, 50, null, null, null);
+                null, null, null, "other-app", null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -256,7 +257,7 @@ class ClickHouseSearchIndexIT {
    void search_byAgentIds_filtersCorrectly() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, null, null, null, null, null,
-                null, null, null, null, List.of("agent-b"), 0, 50, null, null, null);
+                null, null, null, null, List.of("agent-b"), 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -268,7 +269,7 @@ class ClickHouseSearchIndexIT {
    void count_returnsMatchingCount() {
        SearchRequest request = new SearchRequest(
                "COMPLETED", null, null, null, null, null, null, null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        long count = searchIndex.count(request);

@@ -279,7 +280,7 @@ class ClickHouseSearchIndexIT {
    void search_multipleStatusFilter_works() {
        SearchRequest request = new SearchRequest(
                "COMPLETED,FAILED", null, null, null, null, null, null, null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -290,7 +291,7 @@ class ClickHouseSearchIndexIT {
    void search_byCorrelationId_filtersCorrectly() {
        SearchRequest request = new SearchRequest(
                null, null, null, null, null, "corr-1", null, null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

@@ -302,7 +303,62 @@ class ClickHouseSearchIndexIT {
    void search_byDurationRange_filtersCorrectly() {
        SearchRequest request = new SearchRequest(
                null, null, null, 300L, 600L, null, null, null, null, null,
-                null, null, null, null, null, 0, 50, null, null, null);
+                null, null, null, null, null, 0, 50, null, null, null, null);
+
+        SearchResult<ExecutionSummary> result = searchIndex.search(request);
+
+        assertThat(result.total()).isEqualTo(1);
+        assertThat(result.data().get(0).executionId()).isEqualTo("exec-1");
+    }
+
+    @Test
+    void search_byAttributeFilter_exactMatch_matchesExec1() {
+        SearchRequest request = new SearchRequest(
+                null, null, null, null, null, null, null, null, null, null,
+                null, null, null, null, null, 0, 50, null, null, null, null,
+                List.of(new AttributeFilter("order", "12345")));
+
+        SearchResult<ExecutionSummary> result = searchIndex.search(request);
+
+        assertThat(result.total()).isEqualTo(1);
+        assertThat(result.data().get(0).executionId()).isEqualTo("exec-1");
+    }
+
+    @Test
+    void search_byAttributeFilter_keyOnly_matchesExec1AndExec2() {
+        SearchRequest request = new SearchRequest(
+                null, null, null, null, null, null, null, null, null, null,
+                null, null, null, null, null, 0, 50, null, null, null, null,
+                List.of(new AttributeFilter("order", null)));
+
+        SearchResult<ExecutionSummary> result = searchIndex.search(request);
+
+        assertThat(result.total()).isEqualTo(2);
+        assertThat(result.data()).extracting(ExecutionSummary::executionId)
+                .containsExactlyInAnyOrder("exec-1", "exec-2");
+    }
+
+    @Test
+    void search_byAttributeFilter_wildcardValue_matchesExec1Only() {
+        SearchRequest request = new SearchRequest(
+                null, null, null, null, null, null, null, null, null, null,
+                null, null, null, null, null, 0, 50, null, null, null, null,
+                List.of(new AttributeFilter("order", "123*")));
+
+        SearchResult<ExecutionSummary> result = searchIndex.search(request);
+
+        assertThat(result.total()).isEqualTo(1);
+        assertThat(result.data().get(0).executionId()).isEqualTo("exec-1");
+    }
+
+    @Test
+    void search_byAttributeFilter_multipleFiltersAreAnded() {
+        SearchRequest request = new SearchRequest(
+                null, null, null, null, null, null, null, null, null, null,
+                null, null, null, null, null, 0, 50, null, null, null, null,
+                List.of(
+                        new AttributeFilter("order", "12345"),
+                        new AttributeFilter("tenant", "acme")));

        SearchResult<ExecutionSummary> result = searchIndex.search(request);

--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/ClickHouseChunkPipelineIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/ClickHouseChunkPipelineIT.java
@@ -157,7 +157,7 @@ class ClickHouseChunkPipelineIT {
                null, null, null, null, null, null,
                "ORD-123", null, null, null,
                null, null, null, null, null,
-                0, 50, null, null, null));
+                0, 50, null, null, null, null));
        assertThat(result.total()).isEqualTo(1);
        assertThat(result.data().get(0).executionId()).isEqualTo("pipeline-1");
        assertThat(result.data().get(0).status()).isEqualTo("COMPLETED");
@@ -168,7 +168,7 @@ class ClickHouseChunkPipelineIT {
                null, null, null, null, null, null,
                null, "ABC-123", null, null,
                null, null, null, null, null,
-                0, 50, null, null, null));
+                0, 50, null, null, null, null));
        assertThat(bodyResult.total()).isEqualTo(1);

        // Verify iteration data in processor_executions
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/ClickHouseDiagramStoreIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/ClickHouseDiagramStoreIT.java
@@ -155,21 +155,51 @@ class ClickHouseDiagramStoreIT {
    }

    @Test
-    void findContentHashForRouteByAgents_returnsHash() {
-        RouteGraph graph = buildGraph("route-4", "node-z");
-        store.store(tagged("agent-10", "app-b", graph));
-        store.store(tagged("agent-20", "app-b", graph));
+    void findLatestContentHashForAppRoute_returnsLatestAcrossInstances() throws InterruptedException {
+        // v1 published by one agent, v2 by a different agent. The app+env+route
+        // resolver must pick v2 regardless of which instance produced it, and
+        // must keep working even if neither instance is "live" anywhere.
+        RouteGraph v1 = buildGraph("evolving-route", "n-a");
+        v1.setDescription("v1");
+        RouteGraph v2 = buildGraph("evolving-route", "n-a", "n-b");
+        v2.setDescription("v2");

-        Optional<String> result = store.findContentHashForRouteByAgents(
-                "route-4", java.util.List.of("agent-10", "agent-20"));
+        store.store(new TaggedDiagram("publisher-old", "versioned-app", "default", v1));
+        Thread.sleep(10);
+        store.store(new TaggedDiagram("publisher-new", "versioned-app", "default", v2));

-        assertThat(result).isPresent();
+        Optional<String> hashOpt = store.findLatestContentHashForAppRoute(
+                "versioned-app", "evolving-route", "default");
+        assertThat(hashOpt).isPresent();
+
+        RouteGraph retrieved = store.findByContentHash(hashOpt.get()).orElseThrow();
+        assertThat(retrieved.getDescription()).isEqualTo("v2");
    }

    @Test
-    void findContentHashForRouteByAgents_emptyListReturnsEmpty() {
-        Optional<String> result = store.findContentHashForRouteByAgents("route-x", java.util.List.of());
-        assertThat(result).isEmpty();
+    void findLatestContentHashForAppRoute_isolatesByAppAndEnv() {
+        RouteGraph graph = buildGraph("shared-route", "node-1");
+        store.store(new TaggedDiagram("a1", "app-alpha", "dev", graph));
+        store.store(new TaggedDiagram("a2", "app-beta", "prod", graph));
+
+        // Same route id exists across two (app, env) combos. The resolver must
+        // return empty for a mismatch on either dimension.
+        assertThat(store.findLatestContentHashForAppRoute("app-alpha", "shared-route", "dev"))
+                .isPresent();
+        assertThat(store.findLatestContentHashForAppRoute("app-alpha", "shared-route", "prod"))
+                .isEmpty();
+        assertThat(store.findLatestContentHashForAppRoute("app-beta", "shared-route", "dev"))
+                .isEmpty();
+        assertThat(store.findLatestContentHashForAppRoute("app-gamma", "shared-route", "dev"))
+                .isEmpty();
+    }
+
+    @Test
+    void findLatestContentHashForAppRoute_emptyInputsReturnEmpty() {
+        assertThat(store.findLatestContentHashForAppRoute(null, "r", "default")).isEmpty();
+        assertThat(store.findLatestContentHashForAppRoute("app", null, "default")).isEmpty();
+        assertThat(store.findLatestContentHashForAppRoute("app", "r", null)).isEmpty();
+        assertThat(store.findLatestContentHashForAppRoute("", "r", "default")).isEmpty();
    }

    @Test
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/ClickHouseServerMetricsStoreIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/ClickHouseServerMetricsStoreIT.java
@@ -0,0 +1,117 @@
+package com.cameleer.server.app.storage;
+
+import com.cameleer.server.core.storage.model.ServerMetricSample;
+import com.zaxxer.hikari.HikariDataSource;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.testcontainers.clickhouse.ClickHouseContainer;
+import org.testcontainers.junit.jupiter.Container;
+import org.testcontainers.junit.jupiter.Testcontainers;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+@Testcontainers
+class ClickHouseServerMetricsStoreIT {
+
+    @Container
+    static final ClickHouseContainer clickhouse =
+            new ClickHouseContainer("clickhouse/clickhouse-server:24.12");
+
+    private JdbcTemplate jdbc;
+    private ClickHouseServerMetricsStore store;
+
+    @BeforeEach
+    void setUp() {
+        HikariDataSource ds = new HikariDataSource();
+        ds.setJdbcUrl(clickhouse.getJdbcUrl());
+        ds.setUsername(clickhouse.getUsername());
+        ds.setPassword(clickhouse.getPassword());
+
+        jdbc = new JdbcTemplate(ds);
+
+        jdbc.execute("""
+            CREATE TABLE IF NOT EXISTS server_metrics (
+                tenant_id          LowCardinality(String) DEFAULT 'default',
+                collected_at       DateTime64(3),
+                server_instance_id LowCardinality(String),
+                metric_name        LowCardinality(String),
+                metric_type        LowCardinality(String),
+                statistic          LowCardinality(String) DEFAULT 'value',
+                metric_value       Float64,
+                tags               Map(String, String) DEFAULT map(),
+                server_received_at DateTime64(3) DEFAULT now64(3)
+            )
+            ENGINE = MergeTree()
+            ORDER BY (tenant_id, collected_at, server_instance_id, metric_name, statistic)
+            """);
+
+        jdbc.execute("TRUNCATE TABLE server_metrics");
+
+        store = new ClickHouseServerMetricsStore(jdbc);
+    }
+
+    @Test
+    void insertBatch_roundTripsAllColumns() {
+        Instant ts = Instant.parse("2026-04-23T12:00:00Z");
+        store.insertBatch(List.of(
+                new ServerMetricSample("tenant-a", ts, "srv-1",
+                        "cameleer.ingestion.drops", "counter", "count", 17.0,
+                        Map.of("reason", "buffer_full")),
+                new ServerMetricSample("tenant-a", ts, "srv-1",
+                        "jvm.memory.used", "gauge", "value", 1_048_576.0,
+                        Map.of("area", "heap", "id", "G1 Eden Space"))
+        ));
+
+        Integer count = jdbc.queryForObject(
+                "SELECT count() FROM server_metrics WHERE tenant_id = 'tenant-a'",
+                Integer.class);
+        assertThat(count).isEqualTo(2);
+
+        Double dropsValue = jdbc.queryForObject(
+                """
+                SELECT metric_value FROM server_metrics
+                WHERE tenant_id = 'tenant-a'
+                  AND server_instance_id = 'srv-1'
+                  AND metric_name = 'cameleer.ingestion.drops'
+                  AND statistic = 'count'
+                """,
+                Double.class);
+        assertThat(dropsValue).isEqualTo(17.0);
+
+        String heapArea = jdbc.queryForObject(
+                """
+                SELECT tags['area'] FROM server_metrics
+                WHERE tenant_id = 'tenant-a'
+                  AND metric_name = 'jvm.memory.used'
+                """,
+                String.class);
+        assertThat(heapArea).isEqualTo("heap");
+    }
+
+    @Test
+    void insertBatch_emptyList_doesNothing() {
+        store.insertBatch(List.of());
+
+        Integer count = jdbc.queryForObject(
+                "SELECT count() FROM server_metrics", Integer.class);
+        assertThat(count).isEqualTo(0);
+    }
+
+    @Test
+    void insertBatch_nullTags_storesEmptyMap() {
+        store.insertBatch(List.of(
+                new ServerMetricSample("default", Instant.parse("2026-04-23T12:00:00Z"),
+                        "srv-2", "process.cpu.usage", "gauge", "value", 0.12, null)
+        ));
+
+        Integer count = jdbc.queryForObject(
+                "SELECT count() FROM server_metrics WHERE server_instance_id = 'srv-2'",
+                Integer.class);
+        assertThat(count).isEqualTo(1);
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/PostgresDeploymentRepositoryCreatedByIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/PostgresDeploymentRepositoryCreatedByIT.java
@@ -0,0 +1,77 @@
+package com.cameleer.server.app.storage;
+
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentService;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.jdbc.core.JdbcTemplate;
+
+import java.util.UUID;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+class PostgresDeploymentRepositoryCreatedByIT extends AbstractPostgresIT {
+
+    @Autowired DeploymentService deploymentService;
+    @Autowired JdbcTemplate jdbc;
+
+    private UUID appId;
+    private UUID envId;
+    private UUID versionId;
+
+    @BeforeEach
+    void seedAppAndVersion() {
+        // Clean up to avoid conflicts across test runs
+        jdbc.update("DELETE FROM deployments");
+        jdbc.update("DELETE FROM app_versions");
+        jdbc.update("DELETE FROM apps");
+        jdbc.update("DELETE FROM users WHERE user_id IN ('alice', 'bob')");
+
+        envId = jdbc.queryForObject(
+            "SELECT id FROM environments WHERE slug = 'default'", UUID.class);
+
+        // Seed users (alice, bob) — use the bare user_id convention; provider is NOT NULL
+        jdbc.update("INSERT INTO users (user_id, provider) VALUES (?, 'LOCAL') " +
+                    "ON CONFLICT (user_id) DO NOTHING", "alice");
+        jdbc.update("INSERT INTO users (user_id, provider) VALUES (?, 'LOCAL') " +
+                    "ON CONFLICT (user_id) DO NOTHING", "bob");
+
+        // Seed app
+        appId = UUID.randomUUID();
+        jdbc.update("INSERT INTO apps (id, environment_id, slug, display_name) " +
+                    "VALUES (?, ?, 'test-app', 'Test App')",
+                    appId, envId);
+
+        // Seed version
+        versionId = UUID.randomUUID();
+        jdbc.update("INSERT INTO app_versions (id, app_id, version, jar_path, jar_checksum) " +
+                    "VALUES (?, ?, 1, '/tmp/x.jar', 'abc')",
+                    versionId, appId);
+    }
+
+    @AfterEach
+    void cleanup() {
+        jdbc.update("DELETE FROM deployments");
+        jdbc.update("DELETE FROM app_versions");
+        jdbc.update("DELETE FROM apps");
+        jdbc.update("DELETE FROM users WHERE user_id IN ('alice', 'bob')");
+    }
+
+    @Test
+    void createDeployment_persists_createdBy_and_returns_it() {
+        Deployment d = deploymentService.createDeployment(appId, versionId, envId, "alice");
+        assertThat(d.createdBy()).isEqualTo("alice");
+        String fromDb = jdbc.queryForObject(
+            "SELECT created_by FROM deployments WHERE id = ?", String.class, d.id());
+        assertThat(fromDb).isEqualTo("alice");
+    }
+
+    @Test
+    void promote_persists_createdBy() {
+        Deployment promoted = deploymentService.promote(appId, versionId, envId, "bob");
+        assertThat(promoted.createdBy()).isEqualTo("bob");
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/PostgresDeploymentRepositoryIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/PostgresDeploymentRepositoryIT.java
@@ -0,0 +1,129 @@
+package com.cameleer.server.app.storage;
+
+import com.cameleer.common.model.ApplicationConfig;
+import com.cameleer.server.app.AbstractPostgresIT;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentConfigSnapshot;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+
+import java.util.Map;
+import java.util.UUID;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+class PostgresDeploymentRepositoryIT extends AbstractPostgresIT {
+
+    @Autowired PostgresDeploymentRepository repository;
+
+    private UUID envId;
+    private UUID appId;
+    private UUID appVersionId;
+
+    @BeforeEach
+    void setup() {
+        envId = UUID.randomUUID();
+        jdbcTemplate.update(
+                "INSERT INTO environments (id, slug, display_name) VALUES (?, ?, ?)",
+                envId, "test-env-" + envId, "Test Env");
+
+        appId = UUID.randomUUID();
+        jdbcTemplate.update(
+                "INSERT INTO apps (id, environment_id, slug, display_name) VALUES (?, ?, ?, ?)",
+                appId, envId, "app-it-" + appId, "App IT");
+
+        appVersionId = UUID.randomUUID();
+        jdbcTemplate.update(
+                "INSERT INTO app_versions (id, app_id, version, jar_path, jar_checksum) VALUES (?, ?, ?, ?, ?)",
+                appVersionId, appId, 1, "/tmp/app.jar", "deadbeef");
+    }
+
+    @AfterEach
+    void cleanup() {
+        jdbcTemplate.update("DELETE FROM deployments WHERE app_id = ?", appId);
+        jdbcTemplate.update("DELETE FROM app_versions WHERE app_id = ?", appId);
+        jdbcTemplate.update("DELETE FROM apps WHERE id = ?", appId);
+        jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
+    }
+
+    @Test
+    void deployedConfigSnapshot_roundtrips() {
+        // given — create a deployment then store a snapshot
+        ApplicationConfig agentConfig = new ApplicationConfig();
+        agentConfig.setApplication("app-it");
+        agentConfig.setEnvironment("staging");
+        agentConfig.setVersion(3);
+        agentConfig.setSamplingRate(0.5);
+
+        UUID jarVersionId = UUID.randomUUID();
+        DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
+                jarVersionId,
+                agentConfig,
+                Map.of("memoryLimitMb", 1024, "replicas", 2),
+                null
+        );
+
+        // pre-V4 rows: no creator (createdBy is nullable)
+        UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container", null);
+        repository.saveDeployedConfigSnapshot(deploymentId, snapshot);
+
+        // when — load it back
+        Deployment loaded = repository.findById(deploymentId).orElseThrow();
+
+        // then
+        assertThat(loaded.deployedConfigSnapshot().jarVersionId()).isEqualTo(jarVersionId);
+        assertThat(loaded.deployedConfigSnapshot().agentConfig().getSamplingRate()).isEqualTo(0.5);
+        assertThat(loaded.deployedConfigSnapshot().containerConfig()).containsEntry("memoryLimitMb", 1024);
+    }
+
+    @Test
+    void deployedConfigSnapshot_nullByDefault() {
+        // deployments created without a snapshot must return null (not throw)
+        UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container-null", null);
+
+        Deployment loaded = repository.findById(deploymentId).orElseThrow();
+
+        assertThat(loaded.deployedConfigSnapshot()).isNull();
+    }
+
+    @Test
+    void deleteFailedByAppAndEnvironment_keepsStoppedAndActive() {
+        // given: one STOPPED (checkpoint), one FAILED, one RUNNING
+        UUID stoppedId = repository.create(appId, appVersionId, envId, "stopped", null);
+        repository.updateStatus(stoppedId, com.cameleer.server.core.runtime.DeploymentStatus.STOPPED, null, null);
+
+        UUID failedId = repository.create(appId, appVersionId, envId, "failed", null);
+        repository.updateStatus(failedId, com.cameleer.server.core.runtime.DeploymentStatus.FAILED, null, "boom");
+
+        UUID runningId = repository.create(appId, appVersionId, envId, "running", null);
+        repository.updateStatus(runningId, com.cameleer.server.core.runtime.DeploymentStatus.RUNNING, "c1", null);
+
+        // when
+        repository.deleteFailedByAppAndEnvironment(appId, envId);
+
+        // then: STOPPED and RUNNING survive; FAILED is gone
+        assertThat(repository.findById(stoppedId)).isPresent();
+        assertThat(repository.findById(runningId)).isPresent();
+        assertThat(repository.findById(failedId)).isEmpty();
+    }
+
+    @Test
+    void deployedConfigSnapshot_canBeClearedToNull() {
+        UUID jarVersionId = UUID.randomUUID();
+        DeploymentConfigSnapshot snapshot = new DeploymentConfigSnapshot(
+                jarVersionId,
+                new ApplicationConfig(),
+                Map.of(),
+                null
+        );
+
+        UUID deploymentId = repository.create(appId, appVersionId, envId, "test-container-clear", null);
+        repository.saveDeployedConfigSnapshot(deploymentId, snapshot);
+        repository.saveDeployedConfigSnapshot(deploymentId, null);
+
+        Deployment loaded = repository.findById(deploymentId).orElseThrow();
+        assertThat(loaded.deployedConfigSnapshot()).isNull();
+    }
+}
--- a/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/V4DeploymentCreatedByMigrationIT.java
+++ b/cameleer-server-app/src/test/java/com/cameleer/server/app/storage/V4DeploymentCreatedByMigrationIT.java
@@ -0,0 +1,58 @@
+package com.cameleer.server.app.storage;
+
+import com.cameleer.server.app.AbstractPostgresIT;
+import org.junit.jupiter.api.Test;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.jdbc.core.JdbcTemplate;
+
+import java.util.List;
+import java.util.Map;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+class V4DeploymentCreatedByMigrationIT extends AbstractPostgresIT {
+
+    @Autowired JdbcTemplate jdbc;
+
+    @Test
+    void created_by_column_exists_with_correct_type_and_nullable() {
+        // Scope to current schema — Testcontainers reuse can otherwise leave
+        // a previous run's tenant_default schema visible alongside public.
+        List<Map<String, Object>> cols = jdbc.queryForList(
+            "SELECT column_name, data_type, is_nullable " +
+            "FROM information_schema.columns " +
+            "WHERE table_name = 'deployments' AND column_name = 'created_by' " +
+            "  AND table_schema = current_schema()"
+        );
+        assertThat(cols).hasSize(1);
+        assertThat(cols.get(0)).containsEntry("data_type", "text");
+        assertThat(cols.get(0)).containsEntry("is_nullable", "YES");
+    }
+
+    @Test
+    void created_by_index_exists() {
+        Integer count = jdbc.queryForObject(
+            "SELECT count(*)::int FROM pg_indexes " +
+            "WHERE tablename = 'deployments' AND indexname = 'idx_deployments_created_by' " +
+            "  AND schemaname = current_schema()",
+            Integer.class
+        );
+        assertThat(count).isEqualTo(1);
+    }
+
+    @Test
+    void created_by_has_fk_to_users() {
+        Integer count = jdbc.queryForObject(
+            "SELECT count(*)::int FROM information_schema.table_constraints tc " +
+            "JOIN information_schema.constraint_column_usage ccu " +
+            "  ON tc.constraint_name = ccu.constraint_name " +
+            "WHERE tc.table_name = 'deployments' " +
+            "  AND tc.constraint_type = 'FOREIGN KEY' " +
+            "  AND ccu.table_name = 'users' " +
+            "  AND ccu.column_name = 'user_id' " +
+            "  AND tc.table_schema = current_schema()",
+            Integer.class
+        );
+        assertThat(count).isGreaterThanOrEqualTo(1);
+    }
+}
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java
@@ -3,5 +3,6 @@ package com.cameleer.server.core.admin;
 public enum AuditCategory {
    INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
    OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
-    ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE
+    ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE,
+    DEPLOYMENT
 }
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertRule.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertRule.java
@@ -35,4 +35,15 @@ public record AlertRule(
        targets   = targets   == null ? List.of() : List.copyOf(targets);
        evalState = evalState == null ? Map.of()  : Map.copyOf(evalState);
    }
+
+    public AlertRule withEvalState(Map<String, Object> newEvalState) {
+        return new AlertRule(
+                id, environmentId, name, description, severity, enabled,
+                conditionKind, condition, evaluationIntervalSeconds,
+                forDurationSeconds, reNotifyMinutes,
+                notificationTitleTmpl, notificationMessageTmpl,
+                webhooks, targets, nextEvaluationAt, claimedBy, claimedUntil,
+                newEvalState,
+                createdAt, createdBy, updatedAt, updatedBy);
+    }
 }
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ExchangeMatchCondition.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ExchangeMatchCondition.java
@@ -9,8 +9,7 @@ public record ExchangeMatchCondition(
        ExchangeFilter filter,
        FireMode fireMode,
        Integer threshold,                // required when COUNT_IN_WINDOW; null for PER_EXCHANGE
-        Integer windowSeconds,            // required when COUNT_IN_WINDOW
-        Integer perExchangeLingerSeconds  // required when PER_EXCHANGE
+        Integer windowSeconds             // required when COUNT_IN_WINDOW
 ) implements AlertCondition {

    public ExchangeMatchCondition {
@@ -18,8 +17,6 @@ public record ExchangeMatchCondition(
            throw new IllegalArgumentException("fireMode is required (PER_EXCHANGE or COUNT_IN_WINDOW)");
        if (fireMode == FireMode.COUNT_IN_WINDOW && (threshold == null || windowSeconds == null))
            throw new IllegalArgumentException("COUNT_IN_WINDOW requires threshold + windowSeconds");
-        if (fireMode == FireMode.PER_EXCHANGE && perExchangeLingerSeconds == null)
-            throw new IllegalArgumentException("PER_EXCHANGE requires perExchangeLingerSeconds");
    }

    @Override
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/ConfigMerger.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/ConfigMerger.java
@@ -33,7 +33,9 @@ public final class ConfigMerger {
                boolVal(appConfig, envConfig, "replayEnabled", true),
                stringVal(appConfig, envConfig, "runtimeType", "auto"),
                stringVal(appConfig, envConfig, "customArgs", ""),
-                stringList(appConfig, envConfig, "extraNetworks")
+                stringList(appConfig, envConfig, "extraNetworks"),
+                boolVal(appConfig, envConfig, "externalRouting", true),
+                global.certResolver()
        );
    }

@@ -107,6 +109,7 @@ public final class ConfigMerger {
            int cpuRequest,
            String routingMode,
            String routingDomain,
-            String serverUrl
+            String serverUrl,
+            String certResolver
    ) {}
 }
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Deployment.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Deployment.java
@@ -19,14 +19,23 @@ public record Deployment(
        String containerName,
        String errorMessage,
        Map<String, Object> resolvedConfig,
+        DeploymentConfigSnapshot deployedConfigSnapshot,
        Instant deployedAt,
        Instant stoppedAt,
-        Instant createdAt
+        Instant createdAt,
+        String createdBy
 ) {
    public Deployment withStatus(DeploymentStatus newStatus) {
        return new Deployment(id, appId, appVersionId, environmentId, newStatus,
                targetState, deploymentStrategy, replicaStates, deployStage,
                containerId, containerName, errorMessage, resolvedConfig,
-                deployedAt, stoppedAt, createdAt);
+                deployedConfigSnapshot, deployedAt, stoppedAt, createdAt, createdBy);
+    }
+
+    public Deployment withDeployedConfigSnapshot(DeploymentConfigSnapshot snapshot) {
+        return new Deployment(id, appId, appVersionId, environmentId, status,
+                targetState, deploymentStrategy, replicaStates, deployStage,
+                containerId, containerName, errorMessage, resolvedConfig,
+                snapshot, deployedAt, stoppedAt, createdAt, createdBy);
    }
 }
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentConfigSnapshot.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentConfigSnapshot.java
@@ -0,0 +1,22 @@
+package com.cameleer.server.core.runtime;
+
+import com.cameleer.common.model.ApplicationConfig;
+
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * Snapshot of the config that was deployed, captured at the moment a deployment
+ * transitions to RUNNING. Used for "last known good" restore (checkpoints) and
+ * for dirty-state detection on the deployment page.
+ *
+ * <p>This is persisted as JSONB in {@code deployments.deployed_config_snapshot}.</p>
+ */
+public record DeploymentConfigSnapshot(
+        UUID jarVersionId,
+        ApplicationConfig agentConfig,
+        Map<String, Object> containerConfig,
+        List<String> sensitiveKeys
+) {
+}
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentRepository.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentRepository.java
@@ -9,9 +9,11 @@ public interface DeploymentRepository {
    List<Deployment> findByEnvironmentId(UUID environmentId);
    Optional<Deployment> findById(UUID id);
    Optional<Deployment> findActiveByAppIdAndEnvironmentId(UUID appId, UUID environmentId);
-    UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName);
+    Optional<Deployment> findActiveByAppIdAndEnvironmentIdExcluding(UUID appId, UUID environmentId, UUID excludeDeploymentId);
+    UUID create(UUID appId, UUID appVersionId, UUID environmentId, String containerName, String createdBy);
    void updateStatus(UUID id, DeploymentStatus status, String containerId, String errorMessage);
    void markDeployed(UUID id);
    void markStopped(UUID id);
-    void deleteTerminalByAppAndEnvironment(UUID appId, UUID environmentId);
+    /** Delete FAILED deployments for this (app, env). STOPPED deployments are preserved as checkpoints. */
+    void deleteFailedByAppAndEnvironment(UUID appId, UUID environmentId);
 }
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentService.java
@@ -23,19 +23,19 @@ public class DeploymentService {
    public Deployment getById(UUID id) { return deployRepo.findById(id).orElseThrow(() -> new IllegalArgumentException("Deployment not found: " + id)); }

    /** Create a deployment record. Actual container start is handled by DeploymentExecutor (async). */
-    public Deployment createDeployment(UUID appId, UUID appVersionId, UUID environmentId) {
+    public Deployment createDeployment(UUID appId, UUID appVersionId, UUID environmentId, String createdBy) {
        App app = appService.getById(appId);
        Environment env = envService.getById(environmentId);
        String containerName = env.slug() + "-" + app.slug();

-        deployRepo.deleteTerminalByAppAndEnvironment(appId, environmentId);
-        UUID deploymentId = deployRepo.create(appId, appVersionId, environmentId, containerName);
+        deployRepo.deleteFailedByAppAndEnvironment(appId, environmentId);
+        UUID deploymentId = deployRepo.create(appId, appVersionId, environmentId, containerName, createdBy);
        return deployRepo.findById(deploymentId).orElseThrow();
    }

    /** Promote: deploy the same app version to a different environment. */
-    public Deployment promote(UUID appId, UUID appVersionId, UUID targetEnvironmentId) {
-        return createDeployment(appId, appVersionId, targetEnvironmentId);
+    public Deployment promote(UUID appId, UUID appVersionId, UUID targetEnvironmentId, String createdBy) {
+        return createDeployment(appId, appVersionId, targetEnvironmentId, createdBy);
    }

    public void markRunning(UUID deploymentId, String containerId) {
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DeploymentStrategy.java
@@ -0,0 +1,31 @@
+package com.cameleer.server.core.runtime;
+
+/**
+ * Supported deployment strategies. Persisted as a kebab-case string on
+ * ApplicationConfig / ResolvedContainerConfig; {@link #fromWire(String)} is
+ * the only conversion entry point and falls back to {@link #BLUE_GREEN} for
+ * unknown or null input so the executor never has to null-check.
+ */
+public enum DeploymentStrategy {
+    BLUE_GREEN("blue-green"),
+    ROLLING("rolling");
+
+    private final String wire;
+
+    DeploymentStrategy(String wire) {
+        this.wire = wire;
+    }
+
+    public String toWire() {
+        return wire;
+    }
+
+    public static DeploymentStrategy fromWire(String value) {
+        if (value == null) return BLUE_GREEN;
+        String normalized = value.trim().toLowerCase();
+        for (DeploymentStrategy s : values()) {
+            if (s.wire.equals(normalized)) return s;
+        }
+        return BLUE_GREEN;
+    }
+}
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DirtyStateCalculator.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DirtyStateCalculator.java
@@ -0,0 +1,103 @@
+package com.cameleer.server.core.runtime;
+
+import com.cameleer.common.model.ApplicationConfig;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.node.ObjectNode;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.UUID;
+
+/**
+ * Compares the app's current desired state (JAR + agent config + container config) to the
+ * config snapshot from the last successful deployment, producing a structured dirty result.
+ *
+ * <p>Pure logic — no IO, no Spring. Safe to unit-test as a POJO.
+ * Caller must supply an {@link ObjectMapper} configured with {@code JavaTimeModule} so that
+ * {@code ApplicationConfig.updatedAt} (an {@link java.time.Instant}) serialises correctly.</p>
+ */
+public class DirtyStateCalculator {
+
+    // Live-pushed fields are excluded from the deploy diff: changes to them take effect
+    // via SSE config-update without a redeploy, so they are not "pending deploy" when they
+    // differ from the last successful deployment snapshot. See ui/rules: the Traces & Taps
+    // and Route Recording tabs apply with ?apply=live and "never mark dirty".
+    private static final Set<String> AGENT_CONFIG_IGNORED_KEYS = Set.of(
+            "version", "updatedAt", "updatedBy", "environment", "application",
+            "taps", "tapVersion", "tracedProcessors", "routeRecording"
+    );
+
+    private final ObjectMapper mapper;
+
+    public DirtyStateCalculator(ObjectMapper mapper) {
+        this.mapper = mapper;
+    }
+
+    private JsonNode scrubAgentConfig(JsonNode node) {
+        if (!(node instanceof ObjectNode obj)) return node;
+        ObjectNode copy = obj.deepCopy();
+        for (String k : AGENT_CONFIG_IGNORED_KEYS) copy.remove(k);
+        return copy;
+    }
+
+    public DirtyStateResult compute(UUID desiredJarVersionId,
+                                    ApplicationConfig desiredAgentConfig,
+                                    Map<String, Object> desiredContainerConfig,
+                                    DeploymentConfigSnapshot snapshot) {
+        List<DirtyStateResult.Difference> diffs = new ArrayList<>();
+
+        if (snapshot == null) {
+            diffs.add(new DirtyStateResult.Difference("snapshot", "(none)", "(none)"));
+            return new DirtyStateResult(true, diffs);
+        }
+
+        if (!Objects.equals(desiredJarVersionId, snapshot.jarVersionId())) {
+            diffs.add(new DirtyStateResult.Difference("jarVersionId",
+                    String.valueOf(desiredJarVersionId), String.valueOf(snapshot.jarVersionId())));
+        }
+
+        compareJson("agentConfig",
+                scrubAgentConfig(mapper.valueToTree(desiredAgentConfig)),
+                scrubAgentConfig(mapper.valueToTree(snapshot.agentConfig())),
+                diffs);
+        compareJson("containerConfig", mapper.valueToTree(desiredContainerConfig),
+                mapper.valueToTree(snapshot.containerConfig()), diffs);
+
+        return new DirtyStateResult(!diffs.isEmpty(), diffs);
+    }
+
+    private void compareJson(String prefix, JsonNode desired, JsonNode deployed,
+                             List<DirtyStateResult.Difference> diffs) {
+        if (!(desired instanceof ObjectNode desiredObj) || !(deployed instanceof ObjectNode deployedObj)) {
+            if (!Objects.equals(desired, deployed)) {
+                diffs.add(new DirtyStateResult.Difference(prefix,
+                        nodeToString(desired), nodeToString(deployed)));
+            }
+            return;
+        }
+        TreeSet<String> keys = new TreeSet<>();
+        desiredObj.fieldNames().forEachRemaining(keys::add);
+        deployedObj.fieldNames().forEachRemaining(keys::add);
+        for (String key : keys) {
+            JsonNode d = desiredObj.get(key);
+            JsonNode p = deployedObj.get(key);
+            if (Objects.equals(d, p)) continue;
+            if (d instanceof ObjectNode && p instanceof ObjectNode) {
+                compareJson(prefix + "." + key, d, p, diffs);
+            } else {
+                diffs.add(new DirtyStateResult.Difference(prefix + "." + key, nodeToString(d), nodeToString(p)));
+            }
+        }
+    }
+
+    private static String nodeToString(JsonNode n) {
+        if (n == null) return "(none)";
+        if (n.isValueNode()) return n.asText();
+        return n.toString();  // arrays/objects: compact JSON
+    }
+}
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DirtyStateResult.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/DirtyStateResult.java
@@ -0,0 +1,7 @@
+package com.cameleer.server.core.runtime;
+
+import java.util.List;
+
+public record DirtyStateResult(boolean dirty, List<Difference> differences) {
+    public record Difference(String field, String staged, String deployed) {}
+}
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Environment.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/Environment.java
@@ -12,5 +12,6 @@ public record Environment(
        boolean enabled,
        Map<String, Object> defaultContainerConfig,
        Integer jarRetentionCount,
+        String color,
        Instant createdAt
 ) {}
--- a/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/EnvironmentColor.java
+++ b/cameleer-server-core/src/main/java/com/cameleer/server/core/runtime/EnvironmentColor.java
@@ -0,0 +1,28 @@
+package com.cameleer.server.core.runtime;
+
+import java.util.Set;
+
+/**
+ * Preset palette for the per-environment UI color indicator. Stored as a plain
+ * lowercase string on {@link Environment#color()}. The eight values are
+ * CHECK-constrained in PostgreSQL (V2 migration) and validated again here on
+ * the write path so the controller can return a 400 with a readable message.
+ *
+ * <p>Unknown values are silently tolerated on read (the UI falls back to
+ * {@link #DEFAULT}), so a manual DB tweak won't break rendering — but the API
+ * refuses to persist anything outside this set.
+ */
+public final class EnvironmentColor {
+
+    public static final String DEFAULT = "slate";
+
+    public static final Set<String> VALUES = Set.of(
+            "slate", "red", "amber", "green", "teal", "blue", "purple", "pink"
+    );
+
+    private EnvironmentColor() {}
+
+    public static boolean isValid(String color) {
+        return color != null && VALUES.contains(color);
+    }
+}
--- a/Show More
+++ b/Show More