fix(dirty-state): exclude live-pushed fields from deploy diff

Live-pushed config fields (taps, tapVersion, tracedProcessors, routeRecording) apply via SSE CONFIG_UPDATE — they take effect on running agents without a redeploy and are fetched on agent restart from application_config. They must not contribute to the "pending deploy" diff against the last-successful-deployment snapshot. Before this fix, applying a tap from the process diagram correctly rolled out in real time but then marked the app "Pending Deploy (1)" because DirtyStateCalculator compared every agentConfig field. This also contradicted the UI rule (ui.md) that the live tabs "never mark dirty". Adds taps, tapVersion, tracedProcessors, routeRecording to AGENT_CONFIG_IGNORED_KEYS. Updates the nested-path test to use a staged field (sensitiveKeys) and adds a new test asserting that divergent live-push fields keep dirty=false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge pull request 'feat(ui): show deployment status + rich pending-deploy tooltip on app header' (#151 ) from feature/deployment-status-badge into main
2026-04-24 14:42:07 +02:00 · 2026-04-24 13:50:00 +02:00 · 2026-04-24 13:49:51 +02:00 · 2026-04-24 13:49:24 +02:00 · 2026-04-24 13:47:04 +02:00 · 2026-04-24 11:22:27 +02:00
567 changed files with 79233 additions and 6069 deletions
--- a/.claude/rules/app-classes.md
+++ b/.claude/rules/app-classes.md
@@ -7,52 +7,122 @@ paths:

 `cameleer-server-app/src/main/java/com/cameleer/server/app/`

+## URL taxonomy
+
+User-facing data and config endpoints live under `/api/v1/environments/{envSlug}/...`. Env is a path segment, never a query param. The `envSlug` is resolved to an `Environment` bean via the `@EnvPath` argument resolver (`web/EnvironmentPathResolver.java`) — 404 on unknown slug.
+
+**Slugs are immutable after creation** for both environments and apps. Slug regex: `^[a-z0-9][a-z0-9-]{0,63}$`. Validated in `EnvironmentService.create` and `AppService.createApp`. Update endpoints (`PUT`) do not accept a slug field; Jackson drops it as an unknown property.
+
+### Flat-endpoint allow-list
+
+These paths intentionally stay flat (no `/environments/{envSlug}` prefix). Every new endpoint should be env-scoped unless it appears here and the reason is documented.
+
+| Path prefix | Why flat |
+|---|---|
+| `/api/v1/data/**` | Agent ingestion. JWT `env` claim is authoritative; URL-embedded env would invite spoofing. |
+| `/api/v1/agents/register`, `/refresh`, `/{id}/heartbeat`, `/{id}/events` (SSE), `/{id}/deregister`, `/{id}/commands`, `/{id}/commands/{id}/ack`, `/{id}/replay` | Agent self-service; JWT-bound. |
+| `/api/v1/agents/commands`, `/api/v1/agents/groups/{group}/commands` | Operator fan-out; target scope is explicit in query params. |
+| `/api/v1/agents/config` | Agent-authoritative config read; JWT → registry → (app, env). |
+| `/api/v1/admin/{users,roles,groups,oidc,license,audit,rbac/stats,claim-mappings,thresholds,sensitive-keys,usage,clickhouse,database,environments,outbound-connections}` | Truly cross-env admin. Env CRUD URLs use `{envSlug}`, not UUID. |
+| `/api/v1/catalog`, `/api/v1/catalog/{applicationId}` | Cross-env discovery is the purpose. Env is an optional filter via `?environment=`. |
+| `/api/v1/executions/{execId}`, `/processors/**` | Exchange IDs are globally unique; permalinks. |
+| `/api/v1/diagrams/{contentHash}/render`, `POST /api/v1/diagrams/render` | Content-addressed or stateless. |
+| `/api/v1/alerts/notifications/{id}/retry` | Notification IDs are globally unique; no env routing needed. |
+| `/api/v1/auth/**` | Pre-auth; no env context exists. |
+| `/api/v1/health`, `/prometheus`, `/api-docs/**`, `/swagger-ui/**` | Server metadata. |
+
+## Tenant isolation invariant
+
+ClickHouse is shared across tenants. Every ClickHouse query must filter by `tenant_id` (from `CAMELEER_SERVER_TENANT_ID` env var, resolved via `TenantContext`/config) in addition to `environment`. New controllers added under `/environments/{envSlug}/...` must preserve this — the env filter from the path does not replace the tenant filter.
+
+## User ID conventions
+
+`users.user_id` stores the **bare** identifier:
+- Local users: `<username>` (e.g. `admin`, `alice`)
+- OIDC users: `oidc:<sub>` (e.g. `oidc:c7a93b…`)
+
+JWT subjects carry a `user:` namespace prefix (`user:admin`, `user:oidc:<sub>`) so `JwtAuthenticationFilter` can distinguish user tokens from agent tokens. All three write paths upsert the **bare** form:
+
+- `UiAuthController.login` — computes `userId = request.username()`, signs with `subject = "user:" + userId`.
+- `OidcAuthController.callback` — `userId = "oidc:" + oidcUser.subject()`, signs with `subject = "user:" + userId`.
+- `UserAdminController.createUser` — `userId = request.username()`.
+
+Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `AlertSilenceController`, `OutboundConnectionAdminController`) strip `"user:"` from `SecurityContextHolder.authentication.name` before using it as an FK. All FKs to `users(user_id)` (e.g. `alert_rules.created_by`, `outbound_connections.created_by`, `alert_reads.user_id`, `user_roles.user_id`, `user_groups.user_id`) therefore reference the bare form. If you add a new controller that needs the acting user id for an FK insert, follow the same strip pattern.
+
 ## controller/ — REST endpoints

- `AgentRegistrationController` — POST /register, POST /heartbeat, GET / (list), POST /refresh-token
- `AgentSseController` — GET /sse (Server-Sent Events connection)
- `AgentCommandController` — POST /broadcast, POST /{agentId}, POST /{agentId}/ack
- `AppController` — CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versions
- `DeploymentController` — GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logs
- `EnvironmentAdminController` — CRUD /api/v1/admin/environments, PUT /{id}/jar-retention
- `ExecutionController` — GET /api/v1/executions (search + detail)
- `SearchController` — POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcard
- `LogQueryController` — GET /api/v1/logs (filters: source, application, agentId, exchangeId, level, logger, q, environment, time range)
- `LogIngestionController` — POST /api/v1/data/logs (accepts `List<LogEntry>` JSON array, each entry has `source`: app/agent). Logs WARN for: missing agent identity, unregistered agents, empty payloads, buffer-full drops, deserialization failures. Normal acceptance at DEBUG.
- `CatalogController` — GET /api/v1/catalog (unified app catalog merging PG managed apps + in-memory agents + CH stats), DELETE /api/v1/catalog/{applicationId} (ADMIN: dismiss app, purge all CH data + PG record). Auto-filters discovered apps older than `discoveryttldays` with no live agents.
- `ChunkIngestionController` — POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}
- `UserAdminController` — CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-password
- `RoleAdminController` — CRUD /api/v1/admin/roles
- `GroupAdminController` — CRUD /api/v1/admin/groups
- `OidcConfigAdminController` — GET/POST /api/v1/admin/oidc, POST /test
- `SensitiveKeysAdminController` — GET/PUT /api/v1/admin/sensitive-keys. GET returns 200 with config or 204 if not configured. PUT accepts `{ keys: [...] }` with optional `?pushToAgents=true` to fan out merged keys to all LIVE agents. Stored in `server_config` table (key `sensitive_keys`).
- `AuditLogController` — GET /api/v1/admin/audit
- `MetricsController` — GET /api/v1/metrics, GET /timeseries
- `DiagramController` — GET /api/v1/diagrams/{id}, POST /
- `DiagramRenderController` — POST /api/v1/diagrams/render (ELK layout)
- `ClaimMappingAdminController` — CRUD /api/v1/admin/claim-mappings, POST /test (accepts inline rules + claims for preview without saving)
- `LicenseAdminController` — GET/POST /api/v1/admin/license
- `AgentEventsController` — GET /api/v1/agent-events (agent state change history)
- `AgentMetricsController` — GET /api/v1/agent-metrics (JVM/Camel metrics per agent instance)
- `AppSettingsController` — GET/PUT /api/v1/apps/{appId}/settings
- `ApplicationConfigController` — GET/PUT /api/v1/apps/{appId}/config (traced processors, route recording, sensitive keys per app)
- `ClickHouseAdminController` — GET /api/v1/admin/clickhouse (ClickHouse admin, conditional on infrastructure endpoints)
- `DatabaseAdminController` — GET /api/v1/admin/database (PG admin, conditional on infrastructure endpoints)
- `DetailController` — GET /api/v1/detail (execution detail with processor tree)
- `EventIngestionController` — POST /api/v1/data/events (agent event ingestion)
- `RbacStatsController` — GET /api/v1/admin/rbac/stats
- `RouteCatalogController` — GET /api/v1/routes/catalog (merged route catalog from registry + ClickHouse)
- `RouteMetricsController` — GET /api/v1/route-metrics (per-route Camel metrics)
- `ThresholdAdminController` — CRUD /api/v1/admin/thresholds
- `UsageAnalyticsController` — GET /api/v1/admin/usage (ClickHouse usage_events)
+### Env-scoped (user-facing data & config)
+
+- `AppController` — `/api/v1/environments/{envSlug}/apps`. GET list / POST create / GET `{appSlug}` / DELETE `{appSlug}` / GET `{appSlug}/versions` / POST `{appSlug}/versions` (JAR upload) / PUT `{appSlug}/container-config` / GET `{appSlug}/dirty-state` (returns `DirtyStateResponse{dirty, lastSuccessfulDeploymentId, differences}` — compares current JAR+config against last RUNNING deployment snapshot; dirty=true when no snapshot exists). App slug uniqueness is per-env (`(env, app_slug)` is the natural key). `CreateAppRequest` body has no env (path), validates slug regex. Injects `DirtyStateCalculator` bean (registered in `RuntimeBeanConfig`, requires `ObjectMapper` with `JavaTimeModule`).
+- `DeploymentController` — `/api/v1/environments/{envSlug}/apps/{appSlug}/deployments`. GET list / POST create (body `{ appVersionId }`) / POST `{id}/stop` / POST `{id}/promote` (body `{ targetEnvironment: slug }` — target app slug must exist in target env) / GET `{id}/logs`. All lifecycle ops (`POST /` deploy, `POST /{id}/stop`, `POST /{id}/promote`) audited under `AuditCategory.DEPLOYMENT`. Action codes: `deploy_app`, `stop_deployment`, `promote_deployment`. Acting user resolved via the `user:` prefix-strip convention; both SUCCESS and FAILURE branches write audit rows. `created_by` (TEXT, nullable) populated from `SecurityContextHolder` and surfaced on the `Deployment` DTO.
+- `ApplicationConfigController` — `/api/v1/environments/{envSlug}`. GET `/config` (list), GET/PUT `/apps/{appSlug}/config`, GET `/apps/{appSlug}/processor-routes`, POST `/apps/{appSlug}/config/test-expression`. PUT accepts `?apply=staged|live` (default `live`). `live` saves to DB and pushes `CONFIG_UPDATE` SSE to live agents in this env (existing behavior); `staged` saves to DB only, skipping the SSE push — used by the unified app deployment page. Audit action is `stage_app_config` for staged writes, `update_app_config` for live. Invalid `apply` values return 400.
+- `AppSettingsController` — `/api/v1/environments/{envSlug}`. GET `/app-settings` (list), GET/PUT/DELETE `/apps/{appSlug}/settings`. ADMIN/OPERATOR only.
+- `SearchController` — `/api/v1/environments/{envSlug}`. GET `/executions`, POST `/executions/search`, GET `/stats`, `/stats/timeseries`, `/stats/timeseries/by-app`, `/stats/timeseries/by-route`, `/stats/punchcard`, `/attributes/keys`, `/errors/top`. GET `/executions` accepts repeat `attr` query params: `attr=order` (key-exists), `attr=order:47` (exact), `attr=order:4*` (wildcard — `*` maps to SQL LIKE `%`). First `:` splits key/value; later colons stay in the value. Invalid keys → 400. POST `/executions/search` accepts the same filters via `SearchRequest.attributeFilters` in the body.
+- `LogQueryController` — GET `/api/v1/environments/{envSlug}/logs` (filters: source (multi, comma-split, OR-joined), level (multi, comma-split, OR-joined), application, agentId, exchangeId, logger, q, time range, instanceIds (multi, comma-split, AND-joined as WHERE instance_id IN (...) — used by the Checkpoint detail drawer to scope logs to a deployment's replicas); sort asc/desc). Cursor-paginated, returns `{ data, nextCursor, hasMore, levelCounts }`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — same-millisecond tiebreak via the `insert_id` UUID column on `logs`.
+- `RouteCatalogController` — GET `/api/v1/environments/{envSlug}/routes` (merged route catalog from registry + ClickHouse; env filter unconditional).
+- `RouteMetricsController` — GET `/api/v1/environments/{envSlug}/routes/metrics`, GET `/api/v1/environments/{envSlug}/routes/metrics/processors`.
+- `AgentListController` — GET `/api/v1/environments/{envSlug}/agents` (registered agents with runtime metrics, filtered to env).
+- `AgentEventsController` — GET `/api/v1/environments/{envSlug}/agents/events` (lifecycle events; cursor-paginated, returns `{ data, nextCursor, hasMore }`; order `(timestamp DESC, insert_id DESC)`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — `insert_id` is a stable UUID column used as a same-millisecond tiebreak).
+- `AgentMetricsController` — GET `/api/v1/environments/{envSlug}/agents/{agentId}/metrics` (JVM/Camel metrics). Rejects cross-env agents (404) as defence-in-depth.
+- `DiagramRenderController` — GET `/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram` returns the most recent diagram for (app, env, route) via `DiagramStore.findLatestContentHashForAppRoute`. Registry-independent — routes whose publishing agents were removed still resolve. Also GET `/api/v1/diagrams/{contentHash}/render` (flat — content hashes are globally unique), the point-in-time path consumed by the exchange viewer via `ExecutionDetail.diagramContentHash`.
+- `AlertRuleController` — `/api/v1/environments/{envSlug}/alerts/rules`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/enable` / POST `{id}/disable` / POST `{id}/render-preview` / POST `{id}/test-evaluate`. OPERATOR+ for mutations, VIEWER+ for reads. CRITICAL: attribute keys in `ExchangeMatchCondition.filter.attributes` are validated at rule-save time against `^[a-zA-Z0-9._-]+$` — they are later inlined into ClickHouse SQL. `AgentLifecycleCondition` is allowlist-only — the `AgentLifecycleEventType` enum (REGISTERED / RE_REGISTERED / DEREGISTERED / WENT_STALE / WENT_DEAD / RECOVERED) plus the record compact ctor (non-empty `eventTypes`, `withinSeconds ≥ 1`) do the validation; custom agent-emitted event types are tracked in backlog issue #145. Webhook validation: verifies `outboundConnectionId` exists and `isAllowedInEnvironment`. Null notification templates default to `""` (NOT NULL constraint). Audit: `ALERT_RULE_CHANGE`.
+- `AlertController` — `/api/v1/environments/{envSlug}/alerts`. GET list (inbox filtered by userId/groupIds/roleNames via `InAppInboxQuery`; optional multi-value `state`, `severity`, tri-state `acked`, tri-state `read` query params; soft-deleted rows always excluded) / GET `/unread-count` / GET `{id}` / POST `{id}/ack` / POST `{id}/read` / POST `/bulk-read` / POST `/bulk-ack` (VIEWER+) / DELETE `{id}` (OPERATOR+, soft-delete) / POST `/bulk-delete` (OPERATOR+) / POST `{id}/restore` (OPERATOR+, clears `deleted_at`). `requireLiveInstance` helper returns 404 on soft-deleted rows; `restore` explicitly fetches regardless of `deleted_at`. `BulkIdsRequest` is the shared body for bulk-read/ack/delete (`{ instanceIds }`). `AlertDto` includes `readAt`; `deletedAt` is intentionally NOT on the wire. Inbox SQL: `? = ANY(target_user_ids) OR target_group_ids && ? OR target_role_names && ?` — requires at least one matching target (no broadcast concept).
+- `AlertSilenceController` — `/api/v1/environments/{envSlug}/alerts/silences`. GET list / POST create / DELETE `{id}`. 422 if `endsAt <= startsAt`. OPERATOR+ for mutations, VIEWER+ for list. Audit: `ALERT_SILENCE_CHANGE`.
+- `AlertNotificationController` — Dual-path (no class-level prefix). GET `/api/v1/environments/{envSlug}/alerts/{alertId}/notifications` (VIEWER+); POST `/api/v1/alerts/notifications/{id}/retry` (OPERATOR+, flat — notification IDs globally unique). Retry resets attempts to 0 and sets `nextAttemptAt = now`.
+
+### Env admin (env-slug-parameterized, not env-scoped data)
+
+- `EnvironmentAdminController` — `/api/v1/admin/environments`. GET list / POST create / GET `{envSlug}` / PUT `{envSlug}` / DELETE `{envSlug}` / PUT `{envSlug}/default-container-config` / PUT `{envSlug}/jar-retention`. Slug immutable — PUT body has no slug field; any slug supplied is dropped by Jackson. Slug validated on POST. `UpdateEnvironmentRequest` carries `color` (nullable); unknown values rejected with 400 via `EnvironmentColor.isValid`. Null/absent color preserves the existing value.
+
+### Agent-only (JWT-authoritative, intentionally flat)
+
+- `AgentRegistrationController` — POST `/register` (requires `environmentId` in body; 400 if missing), POST `/{id}/refresh` (rejects tokens with no `env` claim), POST `/{id}/heartbeat` (env from body preferred, JWT fallback; 400 if neither), POST `/{id}/deregister`.
+- `AgentSseController` — GET `/{id}/events` (SSE connection).
+- `AgentCommandController` — POST `/{agentId}/commands`, POST `/groups/{group}/commands`, POST `/commands` (broadcast), POST `/{agentId}/commands/{commandId}/ack`, POST `/{agentId}/replay`.
+- `AgentConfigController` — GET `/api/v1/agents/config`. Agent-authoritative config read: resolves (app, env) from JWT subject → registry (registry miss falls back to JWT env claim; no registry entry → 404 since application can't be derived).
+
+### Ingestion (agent-only, JWT-authoritative)
+
+- `LogIngestionController` — POST `/api/v1/data/logs` (accepts `List<LogEntry>`; WARNs on missing identity, unregistered agents, empty payloads, buffer-full drops).
+- `EventIngestionController` — POST `/api/v1/data/events`.
+- `ChunkIngestionController` — POST `/api/v1/data/executions`. Accepts a single `ExecutionChunk` or an array (fields include `exchangeId`, `applicationId`, `instanceId`, `routeId`, `status`, `startTime`, `endTime`, `durationMs`, `chunkSeq`, `final`, `processors: FlatProcessorRecord[]`). The accumulator merges non-final chunks by exchangeId and emits the merged envelope on the final chunk or on stale timeout. Legacy `ExecutionController` / `RouteExecution` shape is retired.
+- `MetricsController` — POST `/api/v1/data/metrics`.
+- `DiagramController` — POST `/api/v1/data/diagrams` (resolves applicationId + environment from the agent registry keyed on JWT subject; stamps both on the stored `TaggedDiagram`).
+
+### Cross-env discovery (flat)
+
+- `CatalogController` — GET `/api/v1/catalog` (merges managed apps + in-memory agents + CH stats; optional `?environment=` filter). DELETE `/api/v1/catalog/{applicationId}` (ADMIN: dismiss app, purge all CH data + PG record).
+
+### Admin (cross-env, flat)
+
+- `UserAdminController` — CRUD `/api/v1/admin/users`, POST `/{id}/roles`, POST `/{id}/set-password`.
+- `RoleAdminController` — CRUD `/api/v1/admin/roles`.
+- `GroupAdminController` — CRUD `/api/v1/admin/groups`.
+- `OidcConfigAdminController` — GET/POST `/api/v1/admin/oidc`, POST `/test`.
+- `OutboundConnectionAdminController` — `/api/v1/admin/outbound-connections`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/test` / GET `{id}/usage`. RBAC: list/get/usage ADMIN|OPERATOR; mutations + test ADMIN.
+- `SensitiveKeysAdminController` — GET/PUT `/api/v1/admin/sensitive-keys`. GET returns 200 or 204 if not configured. PUT accepts `{ keys: [...] }` with optional `?pushToAgents=true`. Fan-out iterates every distinct `(application, environment)` slice — intentional global baseline + per-env overrides.
+- `ClaimMappingAdminController` — CRUD `/api/v1/admin/claim-mappings`, POST `/test`.
+- `LicenseAdminController` — GET/POST `/api/v1/admin/license`.
+- `ThresholdAdminController` — CRUD `/api/v1/admin/thresholds`.
+- `AuditLogController` — GET `/api/v1/admin/audit`.
+- `RbacStatsController` — GET `/api/v1/admin/rbac/stats`.
+- `UsageAnalyticsController` — GET `/api/v1/admin/usage` (ClickHouse `usage_events`).
+- `ClickHouseAdminController` — GET `/api/v1/admin/clickhouse/**` (conditional on `infrastructureendpoints` flag).
+- `DatabaseAdminController` — GET `/api/v1/admin/database/**` (conditional on `infrastructureendpoints` flag).
+- `ServerMetricsAdminController` — `/api/v1/admin/server-metrics/**`. GET `/catalog`, GET `/instances`, POST `/query`. Generic read API over the `server_metrics` ClickHouse table so SaaS dashboards don't need direct CH access. Delegates to `ServerMetricsQueryStore` (impl `ClickHouseServerMetricsQueryStore`). Visibility matches ClickHouse/Database admin: `@ConditionalOnProperty(infrastructureendpoints, matchIfMissing=true)` + class-level `@PreAuthorize("hasRole('ADMIN')")`. Validation: metric/tag regex `^[a-zA-Z0-9._]+$`, statistic regex `^[a-z_]+$`, `to - from ≤ 31 days`, stepSeconds ∈ [10, 3600], response capped at 500 series. `IllegalArgumentException` → 400. `/query` supports `raw` + `delta` modes (delta does per-`server_instance_id` positive-clipped differences, then aggregates across instances). Derived `statistic=mean` for timers computes `sum(total|total_time)/sum(count)` per bucket.
+
+### Other (flat)
+
+- `DetailController` — GET `/api/v1/executions/{executionId}` + processor snapshot endpoints.
+- `MetricsController` — exposes `/api/v1/metrics` and `/api/v1/prometheus` (server-side Prometheus scrape endpoint).

 ## runtime/ — Docker orchestration

 - `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}` (globally unique on Docker daemon). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}`.
+- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 chars of the deployment UUID — old and new replicas coexist during a blue/green swap. Per-replica `CAMELEER_AGENT_INSTANCEID` env var is `{envSlug}-{appSlug}-{replicaIndex}-{generation}`. Branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`: **blue-green** (default) starts all N → waits for all healthy → stops old (partial health = FAILED, preserves old untouched); **rolling** replaces replicas one at a time with rollback only for in-flight new containers (already-replaced old stay stopped; un-replaced old keep serving). DEGRADED is now only set by `DockerEventMonitor` post-deploy, never by the executor.
 - `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
 - `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
+- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Per-container identity labels: `cameleer.replica` (index), `cameleer.generation` (deployment-scoped 8-char id — for Prometheus/Grafana deploy-boundary annotations), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Router/service label keys are generation-agnostic so load balancing spans old + new replicas during a blue/green overlap.
 - `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
 - `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
 - `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
@@ -60,14 +130,16 @@ paths:
 ## metrics/ — Prometheus observability

 - `ServerMetrics` — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via `/api/v1/prometheus`.
+- `ServerInstanceIdConfig` — `@Configuration`, exposes `@Bean("serverInstanceId") String`. Resolution precedence: `cameleer.server.instance-id` property → `HOSTNAME` env → `InetAddress.getLocalHost()` → random UUID. Fixed at boot; rotates across restarts so counters restart cleanly.
+- `ServerMetricsSnapshotScheduler` — `@Scheduled(fixedDelayString = "${cameleer.server.self-metrics.interval-ms:60000}")`. Walks `MeterRegistry.getMeters()` each tick, emits one `ServerMetricSample` per `Measurement` (Timer/DistributionSummary produce multiple rows per meter — one per Micrometer `Statistic`). Skips non-finite values; logs and swallows store failures. Disabled via `cameleer.server.self-metrics.enabled=false` (`@ConditionalOnProperty`). Write-only — no query endpoint yet; inspect via `/api/v1/admin/clickhouse/query`.

 ## storage/ — PostgreSQL repositories (JdbcTemplate)

 - `PostgresAppRepository`, `PostgresAppVersionRepository`, `PostgresEnvironmentRepository`
- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId
+- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId. Also carries `deployed_config_snapshot` JSONB (Flyway V3) populated by `DeploymentExecutor` via `saveDeployedConfigSnapshot(UUID, DeploymentConfigSnapshot)` on successful RUNNING transition. Consumed by `DirtyStateCalculator` for the `/apps/{slug}/dirty-state` endpoint and by the UI for checkpoint restore.
 - `PostgresUserRepository`, `PostgresRoleRepository`, `PostgresGroupRepository`
 - `PostgresAuditRepository`, `PostgresOidcConfigRepository`, `PostgresClaimMappingRepository`, `PostgresSensitiveKeysRepository`
- `PostgresAppSettingsRepository`, `PostgresApplicationConfigRepository`, `PostgresThresholdRepository`
+- `PostgresAppSettingsRepository`, `PostgresApplicationConfigRepository`, `PostgresThresholdRepository`. Both `app_settings` and `application_config` are env-scoped (PK `(app_id, environment)` / `(application, environment)`); finders take `(app, env)` — no env-agnostic variants.

 ## storage/ — ClickHouse stores

@@ -75,6 +147,9 @@ paths:
 - `ClickHouseStatsStore` — pre-aggregated stats, punchcard
 - `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
 - `ClickHouseUsageTracker` — usage_events for billing
+- `ClickHouseRouteCatalogStore` — persistent route catalog with first_seen cache, warm-loaded on startup
+- `ClickHouseServerMetricsStore` — periodic dumps of the server's own Micrometer registry into the `server_metrics` table. Tenant-stamped (bound at the scheduler, not the bean); no `environment` column (server straddles envs). Batch-insert via `JdbcTemplate.batchUpdate` with `Map(String, String)` tag binding. Written by `ServerMetricsSnapshotScheduler`.
+- `ClickHouseServerMetricsQueryStore` — read side of `server_metrics` for dashboards. Implements `ServerMetricsQueryStore`. `catalog(from,to)` returns name+type+statistics+tagKeys, `listInstances(from,to)` returns server_instance_ids with first/last seen, `query(request)` builds bucketed time-series with `raw` or `delta` mode and supports a derived `mean` statistic for timers. All identifier inputs regex-validated; tenant_id always bound; max range 31 days; series count capped at 500. Exposed via `ServerMetricsAdminController`.

 ## search/ — ClickHouse search and log stores

@@ -83,10 +158,11 @@ paths:

 ## security/ — Spring Security

- `SecurityConfig` — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional
+- `SecurityConfig` — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional. `/api/v1/admin/outbound-connections/**` GETs permit OPERATOR in addition to ADMIN (defense-in-depth at controller level); mutations remain ADMIN-only. Alerting matchers: GET `/environments/*/alerts/**` VIEWER+; POST/PUT/DELETE rules and silences OPERATOR+; ack/read/bulk-read VIEWER+; POST `/alerts/notifications/*/retry` OPERATOR+.
 - `JwtAuthenticationFilter` — OncePerRequestFilter, validates Bearer tokens
 - `JwtServiceImpl` — HMAC-SHA256 JWT (Nimbus JOSE)
- `OidcAuthController` — /api/v1/auth/oidc (login-uri, token-exchange, logout)
+- `UiAuthController` — `/api/v1/auth` (login, refresh, me). Upserts `users.user_id = request.username()` (bare); signs JWTs with `subject = "user:" + userId`. `refresh`/`me` strip the `"user:"` prefix from incoming subjects via `stripSubjectPrefix()` before any DB/RBAC lookup.
+- `OidcAuthController` — `/api/v1/auth/oidc` (login-uri, token-exchange, logout). Upserts `users.user_id = "oidc:" + oidcUser.subject()` (no `user:` prefix); signs JWTs with `subject = "user:oidc:" + oidcUser.subject()`. `applyClaimMappings` + `getSystemRoleNames` calls all use the bare `oidc:<sub>` form.
 - `OidcTokenExchanger` — code -> tokens, role extraction from access_token then id_token
 - `OidcProviderHelper` — OIDC discovery, JWK source cache

@@ -100,6 +176,31 @@ paths:

 - `JarRetentionJob` — @Scheduled 03:00 daily, per-environment retention, skips deployed versions

+## alerting/eval/ — Rule evaluation
+
+- `AlertEvaluatorJob` — @Scheduled tick driver; per-rule claim/release via `AlertRuleRepository`, dispatches to per-kind `ConditionEvaluator`, persists advanced cursor on release via `AlertRule.withEvalState`.
+- `BatchResultApplier` — `@Component` that wraps a single rule's tick outcome (`EvalResult.Batch` = `firings` + `nextEvalState`) in one `@Transactional` boundary: instance upserts + notification enqueues + cursor advance commit atomically or roll back together. This is the exactly-once-per-exchange guarantee for `PER_EXCHANGE` fire mode.
+- `ConditionEvaluator` — interface; per-kind implementations: `ExchangeMatchEvaluator`, `AgentLifecycleEvaluator`, `AgentStateEvaluator`, `DeploymentStateEvaluator`, `JvmMetricEvaluator`, `LogPatternEvaluator`, `RouteMetricEvaluator`.
+- `AlertStateTransitions` — PER_EXCHANGE vs rule-level FSM helpers (fire/resolve/ack).
+- `PerKindCircuitBreaker` — trips noisy per-kind evaluators; `TickCache` — per-tick shared lookups (apps, envs, silences).
+
+## http/ — Outbound HTTP client implementation
+
+- `SslContextBuilder` — composes SSL context from `OutboundHttpProperties` + `OutboundHttpRequestContext`. Supports SYSTEM_DEFAULT (JDK roots + configured CA extras), TRUST_ALL (short-circuit no-op TrustManager), TRUST_PATHS (JDK roots + system extras + per-request extras). Throws `IllegalArgumentException("CA file not found: ...")` on missing PEM.
+- `ApacheOutboundHttpClientFactory` — Apache HttpClient 5 impl of `OutboundHttpClientFactory`. Memoizes clients per `CacheKey(trustAll, caPaths, mode, connectTimeout, readTimeout)`. Applies `NoopHostnameVerifier` when trust-all is active.
+- `config/OutboundHttpConfig` — `@ConfigurationProperties("cameleer.server.outbound-http")`. Exposes beans: `OutboundHttpProperties`, `SslContextBuilder`, `OutboundHttpClientFactory`. `@PostConstruct` logs WARN on trust-all and throws if configured CA paths don't exist.
+
+## outbound/ — Admin-managed outbound connections (implementation)
+
+- `crypto/SecretCipher` — AES-GCM symmetric cipher with key derived via HMAC-SHA256(jwtSecret, "cameleer-outbound-secret-v1"). Ciphertext format: base64(IV(12 bytes) || GCM output with 128-bit tag). `encrypt` throws `IllegalStateException`; `decrypt` throws `IllegalArgumentException` on tamper/wrong-key/malformed.
+- `storage/PostgresOutboundConnectionRepository` — JdbcTemplate impl. `save()` upserts by id; JSONB serialization via ObjectMapper; UUID arrays via `ConnectionCallback`. Reads `created_by`/`updated_by` as String (= users.user_id TEXT).
+- `OutboundConnectionServiceImpl` — service layer. Tenant bound at construction via `cameleer.server.tenant.id` property. Uniqueness check via `findByName`. Narrowing-envs guard: rejects update that removes envs while rules reference the connection (rulesReferencing stubbed in Plan 01, wired in Plan 02). Delete guard: rejects if referenced by rules.
+- `controller/OutboundConnectionAdminController` — REST controller. Class-level `@PreAuthorize("hasRole('ADMIN')")` defaults; GETs relaxed to ADMIN|OPERATOR. Resolves acting user id via the user-id convention (strip `"user:"` from `authentication.name` → matches `users.user_id` FK). Audit via `AuditCategory.OUTBOUND_CONNECTION_CHANGE`.
+- `dto/OutboundConnectionRequest` — Bean Validation: `@NotBlank` name, `@Pattern("^https://.+")` url, `@NotNull` method/tlsTrustMode/auth. Compact ctor throws `IllegalArgumentException` if TRUST_PATHS with empty paths list.
+- `dto/OutboundConnectionDto` — response DTO. `hmacSecretSet: boolean` instead of the ciphertext; `authKind: OutboundAuthKind` instead of the full auth config.
+- `dto/OutboundConnectionTestResult` — result of POST `/{id}/test`: status, latencyMs, responseSnippet (first 512 chars), tlsProtocol/cipherSuite/peerCertSubject (protocol is "TLS" stub; enriched in Plan 02 follow-up), error (nullable).
+- `config/OutboundBeanConfig` — registers `OutboundConnectionRepository`, `SecretCipher`, `OutboundConnectionService` beans.
+
 ## config/ — Spring beans

 - `RuntimeOrchestratorAutoConfig` — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
--- a/.claude/rules/cicd.md
+++ b/.claude/rules/cicd.md
@@ -8,8 +8,11 @@ paths:

 # CI/CD & Deployment

- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches
+- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches. `paths-ignore` skips the whole pipeline for docs-only / `.planning/` / `.claude/` / `*.md` changes (push and PR triggers).
 - Build step skips integration tests (`-DskipITs`) — Testcontainers needs Docker daemon
+- Build caches (parallel `actions/cache@v4` steps in the `build` job): `~/.m2/repository` (key on all `pom.xml`), `~/.npm` (key on `ui/package-lock.json`), `ui/node_modules/.vite` (key on `ui/package-lock.json` + `ui/vite.config.ts`). UI install uses `npm ci --prefer-offline --no-audit --fund=false` so the npm cache is the primary source.
+- Maven build performance (set in `pom.xml` and `cameleer-server-app/pom.xml`): `useIncrementalCompilation=true` on the compiler plugin; Surefire uses `forkCount=1C` + `reuseForks=true` (one JVM per CPU core, reused across test classes); Failsafe keeps `forkCount=1` + `reuseForks=true`. Unit tests must not rely on per-class JVM isolation.
+- UI build script (`ui/package.json`): `build` is `vite build` only — the type-check pass was split out into `npm run typecheck` (run separately when you want a full `tsc --noEmit` sweep).
 - Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime. `docker-entrypoint.sh` imports `/certs/ca.pem` into JVM truststore before starting the app (supports custom CAs for OIDC discovery without `CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY`).
 - `REGISTRY_TOKEN` build arg required for `cameleer-common` dependency resolution
 - Registry: `gitea.siegeln.net/cameleer/cameleer-server` (container images)
--- a/.claude/rules/core-classes.md
+++ b/.claude/rules/core-classes.md
@@ -17,7 +17,8 @@ paths:
 - `CommandType` — enum for command types (config-update, deep-trace, replay, route-control, etc.)
 - `CommandStatus` — enum for command acknowledgement states
 - `CommandReply` — record: command execution result from agent
- `AgentEventRecord`, `AgentEventRepository` — event persistence
+- `AgentEventRecord`, `AgentEventRepository` — event persistence. `AgentEventRepository.queryPage(...)` is cursor-paginated (`AgentEventPage{data, nextCursor, hasMore}`); the legacy non-paginated `query(...)` path is gone. `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes, from, to, limit)` returns matching events ordered by `(timestamp ASC, insert_id ASC)` — consumed by `AgentLifecycleEvaluator`.
+- `AgentEventPage` — record: `(List<AgentEventRecord> data, String nextCursor, boolean hasMore)` returned by `AgentEventRepository.queryPage`
 - `AgentEventListener` — callback interface for agent events
 - `RouteStateRegistry` — tracks per-agent route states

@@ -25,16 +26,18 @@ paths:

 - `App` — record: id, environmentId, slug, displayName, containerConfig (JSONB)
 - `AppVersion` — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
- `Environment` — record: id, slug, jarRetentionCount
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
+- `Environment` — record: id, slug, displayName, production, enabled, defaultContainerConfig, jarRetentionCount, color, createdAt. `color` is one of the 8 preset palette values validated by `EnvironmentColor.VALUES` and CHECK-constrained in PostgreSQL (V2 migration).
+- `EnvironmentColor` — constants: `DEFAULT = "slate"`, `VALUES = {slate,red,amber,green,teal,blue,purple,pink}`, `isValid(String)`.
+- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName, createdBy (String, user_id reference; nullable for pre-V4 historical rows)
+- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED. `DEGRADED` is reserved for post-deploy drift (a replica died after RUNNING); `DeploymentExecutor` now marks partial-healthy deploys FAILED, not DEGRADED.
 - `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
+- `DeploymentStrategy` — enum: BLUE_GREEN, ROLLING. Stored on `ResolvedContainerConfig.deploymentStrategy` as kebab-case string (`"blue-green"` / `"rolling"`). `fromWire(String)` is the only conversion entry point; unknown/null inputs fall back to BLUE_GREEN so the executor dispatch site never null-checks or throws.
+- `DeploymentService` — createDeployment (calls `deleteFailedByAppAndEnvironment` first so FAILED rows don't pile up; STOPPED rows are preserved as restorable checkpoints), markRunning, markFailed, markStopped
 - `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
 - `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
 - `ContainerRequest` — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
 - `ContainerStatus` — record: state, running, exitCode, error
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks
+- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks, externalRouting (default `true`; when `false`, `TraefikLabelBuilder` strips all `traefik.*` labels so the container is not publicly routed), certResolver (server-wide, sourced from `CAMELEER_SERVER_RUNTIME_CERTRESOLVER`; when blank the `tls.certresolver` label is omitted — use for dev installs with a static TLS store)
 - `RoutingMode` — enum for routing strategies
 - `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
 - `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs, startLogCapture, stopLogCapture
@@ -43,15 +46,17 @@ paths:

 ## search/ — Execution search and stats

- `SearchService` — search, count, stats, statsForApp, timeseries, timeseriesForApp, timeseriesForRoute, timeseriesGroupedByApp, timeseriesGroupedByRoute, slaCompliance, slaCountsByApp, slaCountsByRoute, topErrors, activeErrorTypes, punchcard, distinctAttributeKeys
- `SearchRequest` / `SearchResult` — search DTOs
+- `SearchService` — search, count, stats, statsForApp, statsForRoute, timeseries, timeseriesForApp, timeseriesForRoute, timeseriesGroupedByApp, timeseriesGroupedByRoute, slaCompliance, slaCountsByApp, slaCountsByRoute, topErrors, activeErrorTypes, punchcard, distinctAttributeKeys. `statsForRoute`/`timeseriesForRoute` take `(routeId, applicationId)` — app filter is applied to `stats_1m_route`.
+- `SearchRequest` / `SearchResult` — search DTOs. `SearchRequest.attributeFilters: List<AttributeFilter>` carries structured facet filters for execution attributes — key-only (exists), exact (key=value), or wildcard (`*` in value). The 21-arg legacy ctor is preserved for call-site churn; the compact ctor normalises null → `List.of()`.
+- `AttributeFilter(key, value)` — record with key regex `^[a-zA-Z0-9._-]+$` (inlined into SQL, same constraint as alerting), `value == null` means key-exists, `value` containing `*` becomes a SQL LIKE pattern via `toLikePattern()`.
 - `ExecutionStats`, `ExecutionSummary` — stats aggregation records
 - `StatsTimeseries`, `TopError` — timeseries and error DTOs
- `LogSearchRequest` / `LogSearchResponse` — log search DTOs
+- `LogSearchRequest` / `LogSearchResponse` — log search DTOs. `LogSearchRequest.sources` / `levels` are `List<String>` (null-normalized, multi-value OR); `cursor` + `limit` + `sort` drive keyset pagination. Response carries `nextCursor` + `hasMore` + per-level `levelCounts`.

 ## storage/ — Storage abstractions

- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `SearchIndex`, `LogIndex` — interfaces
+- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `RouteCatalogStore`, `SearchIndex`, `LogIndex` — interfaces. `DiagramStore.findLatestContentHashForAppRoute(appId, routeId, env)` resolves the latest diagram by (app, env, route) without consulting the agent registry, so routes whose publishing agents were removed between app versions still resolve. `findContentHashForRoute(route, instance)` is retained for the ingestion path that stamps a per-execution `diagramContentHash` at ingest time (point-in-time link from `ExecutionDetail`/`ExecutionSummary`).
+- `RouteCatalogEntry` — record: applicationId, routeId, environment, firstSeen, lastSeen
 - `LogEntryResult` — log query result record
 - `model/` — `ExecutionDocument`, `MetricTimeSeries`, `MetricsSnapshot`

@@ -73,10 +78,25 @@ paths:
 - `SensitiveKeysConfig` — record: keys (List<String>, immutable)
 - `SensitiveKeysRepository` — interface: find(), save()
 - `SensitiveKeysMerger` — pure function: merge(global, perApp) -> union with case-insensitive dedup, preserves first-seen casing. Returns null when both inputs null.
- `AppSettings`, `AppSettingsRepository` — per-app settings config and persistence
+- `AppSettings`, `AppSettingsRepository` — per-app-per-env settings config and persistence. Record carries `(applicationId, environment, …)`; repository methods are `findByApplicationAndEnvironment`, `findByEnvironment`, `save`, `delete(appId, env)`. `AppSettings.defaults(appId, env)` produces a default instance scoped to an environment.
 - `ThresholdConfig`, `ThresholdRepository` — alerting threshold config and persistence
 - `AuditService` — audit logging facade
- `AuditRecord`, `AuditResult`, `AuditCategory`, `AuditRepository` — audit trail records and persistence
+- `AuditRecord`, `AuditResult`, `AuditCategory` (enum: `INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE, ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE, DEPLOYMENT`), `AuditRepository` — audit trail records and persistence
+
+## http/ — Outbound HTTP primitives (cross-cutting)
+
+- `OutboundHttpClientFactory` — interface: `clientFor(context)` returns memoized `CloseableHttpClient`
+- `OutboundHttpProperties` — record: `trustAll, trustedCaPemPaths, defaultConnectTimeout, defaultReadTimeout, proxyUrl, proxyUsername, proxyPassword`
+- `OutboundHttpRequestContext` — record of per-call TLS/timeout overrides; `systemDefault()` static factory
+- `TrustMode` — enum: `SYSTEM_DEFAULT | TRUST_ALL | TRUST_PATHS`
+
+## outbound/ — Admin-managed outbound connections
+
+- `OutboundConnection` — record: id, tenantId, name, description, url, method, defaultHeaders, defaultBodyTmpl, tlsTrustMode, tlsCaPemPaths, hmacSecretCiphertext, auth, allowedEnvironmentIds, createdAt, createdBy (String user_id), updatedAt, updatedBy (String user_id). `isAllowedInEnvironment(envId)` returns true when allowed-envs list is empty OR contains the env.
+- `OutboundAuth` — sealed interface + records: `None | Bearer(tokenCiphertext) | Basic(username, passwordCiphertext)`. Jackson `@JsonTypeInfo(use = DEDUCTION)` — wire shape has no discriminator, subtype inferred from fields.
+- `OutboundAuthKind`, `OutboundMethod` — enums
+- `OutboundConnectionRepository` — CRUD by (tenantId, id): save/findById/findByName/listByTenant/delete
+- `OutboundConnectionService` — create/update/delete/get/list with uniqueness + narrow-envs + delete-if-referenced guards. `rulesReferencing(id)` stubbed in Plan 01 (returns `[]`); populated in Plan 02 against `AlertRuleRepository`.

 ## security/ — Auth

@@ -90,8 +110,8 @@ paths:

 ## ingestion/ — Buffered data pipeline

- `IngestionService` — ingestExecution, ingestMetric, ingestLog, ingestDiagram
- `ChunkAccumulator` — batches data for efficient flush
+- `IngestionService` — diagram + metrics facade (`ingestDiagram`, `acceptMetrics`, `getMetricsBuffer`). Execution ingestion went through here via the legacy `RouteExecution` shape until `ChunkAccumulator` took over writes from the chunked pipeline — the `ingestExecution` path plus its `ExecutionStore.upsert` / `upsertProcessors` dependencies were removed.
+- `ChunkAccumulator` — batches data for efficient flush; owns the execution write path (chunks → buffers → flush scheduler → `ClickHouseExecutionStore.insertExecutionBatch`).
 - `WriteBuffer` — bounded ring buffer for async flush
 - `BufferedLogEntry` — log entry wrapper with metadata
- `MergedExecution`, `TaggedExecution`, `TaggedDiagram` — tagged ingestion records
+- `MergedExecution`, `TaggedDiagram` — tagged ingestion records. `TaggedDiagram` carries `(instanceId, applicationId, environment, graph)` — env is resolved from the agent registry in the controller and stamped on the ClickHouse `route_diagrams` row.
--- a/.claude/rules/docker-orchestration.md
+++ b/.claude/rules/docker-orchestration.md
@@ -13,19 +13,28 @@ paths:
 When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:

 - **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
+- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Per-replica identity labels: `cameleer.replica` (index), `cameleer.generation` (8-char deployment UUID prefix — pin Prometheus/Grafana deploy boundaries with this), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Traefik router/service keys deliberately omit the generation so load balancing spans old + new replicas during a blue/green overlap. When `ResolvedContainerConfig.externalRouting()` is `false` (UI: Resources → External Routing, default `true`), the builder emits ONLY the identity labels (`managed-by`, `cameleer.*`) and skips every `traefik.*` label — the container stays on `cameleer-traefik` and the per-env network (so sibling containers can still reach it via Docker DNS) but is invisible to Traefik. The `tls.certresolver` label is emitted only when `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` is set to a non-blank resolver name (matching a resolver configured in the Traefik static config). When unset (dev installs backed by a static TLS store) only `tls=true` is emitted and Traefik serves the default cert from the TLS store.
 - **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
 - **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
  - `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
  - `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
 - **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
 - **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
+- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level. Instance-id changes per deployment — cross-deploy queries aggregate on `application + environment` (and optionally `replica_index`).
 - **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).

 ## DeploymentExecutor Details

-Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
+Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}-{generation}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
+
+**Container naming** — `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 characters of the deployment UUID. The generation suffix lets old + new replicas coexist during a blue/green swap (deterministic names without a generation used to 409). All lookups across the executor, `DockerEventMonitor`, and `ContainerLogForwarder` key on container **id**, not name — the name is operator-visibility only.
+
+**Strategy dispatch** — `DeploymentStrategy.fromWire(config.deploymentStrategy())` branches the executor. Unknown values fall back to BLUE_GREEN so misconfiguration never throws at runtime.
+
+- **Blue/green** (default): start all N new replicas → wait for ALL healthy → stop the previous deployment. Resource peak ≈ 2× replicas for the health-check window. Partial health aborts with status FAILED; the previous deployment is preserved untouched (user's safety net).
+- **Rolling**: replace replicas one at a time — start new[i] → wait healthy → stop old[i] → next. Resource peak = replicas + 1. Mid-rollout health failure stops in-flight new containers and aborts; already-replaced old replicas are NOT restored (not reversible) but un-replaced old[i+1..N] keep serving traffic. User redeploys to recover.
+
+Traffic routing is implicit: Traefik labels (`cameleer.app`, `cameleer.environment`) are generation-agnostic, so new replicas attract load balancing as soon as they come up healthy — no explicit swap step.

 ## Deployment Status Model

@@ -34,17 +43,13 @@ Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNET
 | `STOPPED` | Intentionally stopped or initial state |
 | `STARTING` | Deploy in progress |
 | `RUNNING` | All replicas healthy and serving |
-| `DEGRADED` | Some replicas healthy, some dead |
+| `DEGRADED` | Post-deploy: a replica died after the deploy was marked RUNNING. Set by `DockerEventMonitor` reconciliation, never by `DeploymentExecutor` directly. |
 | `STOPPING` | Graceful shutdown in progress |
-| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
+| `FAILED` | Terminal failure (pre-flight, health check, or crash). Partial-healthy deploys now mark FAILED — DEGRADED is reserved for post-deploy drift. |

-**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
+**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage). Rolling reuses the same stage labels inside the per-replica loop; the UI progress bar shows the most recent stage.

-**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
-
-**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
-
-**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
+**Deployment retention**: `DeploymentService.createDeployment()` deletes FAILED deployments for the same app+environment before creating a new one, preventing failed-attempt buildup. STOPPED deployments are preserved as restorable checkpoints — the UI Checkpoints disclosure lists every deployment with a non-null `deployed_config_snapshot` (RUNNING, DEGRADED, STOPPED) minus the current one.

 ## JAR Management

--- a/.claude/rules/metrics.md
+++ b/.claude/rules/metrics.md
@@ -8,7 +8,9 @@ paths:

 # Prometheus Metrics

-Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component:
+Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component.
+
+The same `MeterRegistry` is also snapshotted to ClickHouse every 60 s by `ServerMetricsSnapshotScheduler` (see "Server self-metrics persistence" at the bottom of this file) — so historical server-health data survives restarts without an external Prometheus.

 ## Gauges (auto-polled)

@@ -83,3 +85,23 @@ Mean processing time = `camel.route.policy.total_time / camel.route.policy.count
 | `cameleer.sse.reconnects.count` | counter | `instanceId` |
 | `cameleer.taps.evaluated.count` | counter | `instanceId` |
 | `cameleer.metrics.exported.count` | counter | `instanceId` |
+
+## Server self-metrics persistence
+
+`ServerMetricsSnapshotScheduler` walks `MeterRegistry.getMeters()` every 60 s (configurable via `cameleer.server.self-metrics.interval-ms`) and writes one row per Micrometer `Measurement` to the ClickHouse `server_metrics` table. Full registry is captured — Spring Boot Actuator series (`jvm.*`, `process.*`, `http.server.requests`, `hikaricp.*`, `jdbc.*`, `tomcat.*`, `logback.events`, `system.*`) plus `cameleer.*` and `alerting_*`.
+
+**Table** (`cameleer-server-app/src/main/resources/clickhouse/init.sql`):
+
+```
+server_metrics(tenant_id, collected_at, server_instance_id,
+               metric_name, metric_type, statistic, metric_value,
+               tags Map(String,String), server_received_at)
+```
+
+- `metric_type` — lowercase Micrometer `Meter.Type` (counter, gauge, timer, distribution_summary, long_task_timer, other)
+- `statistic` — Micrometer `Statistic.getTagValueRepresentation()` (value, count, total, total_time, max, mean, active_tasks, duration). Timers emit 3 rows per tick (count + total_time + max); gauges/counters emit 1 (`statistic='value'` or `'count'`).
+- No `environment` column — the server is env-agnostic.
+- `tenant_id` threaded from `cameleer.server.tenant.id` (single-tenant per server).
+- `server_instance_id` resolved once at boot by `ServerInstanceIdConfig` (property → HOSTNAME → localhost → UUID fallback). Rotates across restarts so counter resets are unambiguous.
+- TTL: 90 days (vs 365 for `agent_metrics`). Write-only in v1 — no query endpoint or UI page. Inspect via ClickHouse admin: `/api/v1/admin/clickhouse/query` or direct SQL.
+- Toggle off entirely with `cameleer.server.self-metrics.enabled=false` (uses `@ConditionalOnProperty`).
--- a/.claude/rules/ui.md
+++ b/.claude/rules/ui.md
@@ -9,14 +9,19 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments

 - **Exchanges** — route execution search and detail (`ui/src/pages/Exchanges/`)
 - **Dashboard** — metrics and stats with L1/L2/L3 drill-down (`ui/src/pages/DashboardTab/`)
- **Runtime** — live agent status, logs, commands (`ui/src/pages/RuntimeTab/`)
- **Deployments** — app management, JAR upload, deployment lifecycle (`ui/src/pages/AppsTab/`)
-  - Config sub-tabs: **Monitoring | Resources | Variables | Traces & Taps | Route Recording**
-  - Create app: full page at `/apps/new` (not a modal)
-  - Deployment progress: `ui/src/components/DeploymentProgress.tsx` (7-stage step indicator)
+- **Runtime** — live agent status, logs, commands (`ui/src/pages/RuntimeTab/`). AgentHealth supports compact view (dense health-tinted cards) and expanded view (full GroupCard+DataTable per app). View mode persisted to localStorage.
+- **Deployments** — unified app deployment page (`ui/src/pages/AppsTab/`)
+  - Routes: `/apps` (list, `AppListView` in `AppsTab.tsx`), `/apps/new` + `/apps/:slug` (both render `AppDeploymentPage`).
+  - Identity & Artifact section always visible; name editable pre-first-deploy, read-only after. JAR picker client-stages; new JAR + any form edits flip the primary button from `Save` to `Redeploy`. Environment fixed to the currently-selected env (no selector).
+  - Config sub-tabs: **Monitoring | Resources | Variables | Sensitive Keys | Deployment | ● Traces & Taps | ● Route Recording**. The four staged tabs feed dirty detection; the `●` live tabs apply in real-time (amber LiveBanner + default `?apply=live` on their writes) and never mark dirty.
+  - Primary action state machine: `Save` → `Uploading… N%` (during JAR upload; button shows percent with a tinted progress-fill overlay) → `Redeploy` → `Deploying…` during active deploy. Upload progress sourced from `useUploadJar` (XHR `upload.onprogress` → page-level `uploadPct` state). The button is disabled during `uploading` and `deploying`.
+  - Checkpoints render as a collapsible `CheckpointsTable` (default **collapsed**) **inside the Identity & Artifact `configGrid`** as an in-grid row (`Checkpoints | ▸ Expand (N)` / `▾ Collapse (N)`). `CheckpointsTable` returns a React.Fragment of grid-ready children so the label + trigger align with the other identity rows; when opened, a third grid child spans both columns via `grid-column: 1 / -1` so the 7-column table gets full width. Wired through `IdentitySection.checkpointsSlot` — `CheckpointDetailDrawer` stays in `IdentitySection.children` because it portals. Columns: Version · JAR (filename) · Deployed by · Deployed (relative `timeAgo` + user-locale sub-line via `new Date(iso).toLocaleString()`) · Strategy · Outcome · ›. Row click opens the drawer. Drawer tabs are ordered **Config | Logs** with `Config` as the default. Config panel has Snapshot / Diff vs current view modes. Replica filter in the Logs panel uses DS `Select`. Restore lives in the drawer footer (forces review). Visible row cap = `Environment.jarRetentionCount` (default 10 if 0/null); older rows accessible via "Show older (N)" expander. Currently-running deployment is excluded — represented separately by `StatusCard`. The empty-checkpoints case returns `null` (no row). The legacy `Checkpoints.tsx` row-list component is gone.
+  - Deployment tab: `StatusCard` + `DeploymentProgress` (during STARTING / FAILED) + flex-grow `StartupLogPanel` (no fixed maxHeight). Auto-activates when a deploy starts. The former `HistoryDisclosure` is retired — per-deployment config and logs live in the Checkpoints drawer. `StartupLogPanel` header mirrors the Runtime Application Log pattern: title + live/stopped badge + `N entries` + sort toggle (↑/↓, default **desc**) + refresh icon (`RefreshCw`). Sort drives the backend fetch via `useStartupLogs(…, sort)` so the 500-line limit returns the window closest to the user's interest; display order matches fetch order. Refresh scrolls to the latest edge (top for desc, bottom for asc). Sort + refresh buttons disable while a refetch is in flight. 3s polling while STARTING is unchanged.
+  - Unsaved-change router blocker uses DS `AlertDialog` (not `window.beforeunload`). Env switch intentionally discards edits without warning.

 **Admin pages** (ADMIN-only, under `/admin/`):
 - **Sensitive Keys** (`ui/src/pages/Admin/SensitiveKeysPage.tsx`) — global sensitive key masking config. Shows agent built-in defaults as outlined Badge reference, editable Tag pills for custom keys, amber-highlighted push-to-agents toggle. Keys add to (not replace) agent defaults. Per-app sensitive key additions managed via `ApplicationConfigController` API. Note: `AppConfigDetailPage.tsx` exists but is not routed in `router.tsx`.
+- **Server Metrics** (`ui/src/pages/Admin/ServerMetricsAdminPage.tsx`) — dashboard over the `server_metrics` ClickHouse table. Visibility matches Database/ClickHouse pages: gated on `capabilities.infrastructureEndpoints` in `buildAdminTreeNodes`; backend is `@ConditionalOnProperty(infrastructureendpoints) + @PreAuthorize('hasRole(ADMIN)')`. Uses the generic `/api/v1/admin/server-metrics/{catalog,instances,query}` API via `ui/src/api/queries/admin/serverMetrics.ts` hooks (`useServerMetricsCatalog`, `useServerMetricsInstances`, `useServerMetricsSeries`), all three of which take a `ServerMetricsRange = { from: Date; to: Date }`. Time range is driven by the global TopBar picker via `useGlobalFilters()` — no page-local selector; bucket size auto-scales through `stepSecondsFor(windowSeconds)` (10 s up to 1 h buckets). Toolbar is just server-instance badges. Sections: Server health (agents/ingestion/auth), JVM (memory/CPU/GC/threads), HTTP & DB pools, Alerting (conditional on catalog), Deployments (conditional on catalog). Each panel is a `ThemedChart` with `Line`/`Area` children from the design system; multi-series responses are flattened into overlap rows by bucket timestamp. Alerting and Deployments rows are hidden when their metrics aren't in the catalog (zero-deploy / alerting-disabled installs).

 ## Key UI Files

@@ -25,11 +30,38 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
 - `ui/src/auth/auth-store.ts` — Zustand: accessToken, user, roles, login/logout
 - `ui/src/api/environment-store.ts` — Zustand: selected environment (localStorage)
 - `ui/src/components/ContentTabs.tsx` — main tab switcher
+- `ui/src/components/EnvironmentSwitcherButton.tsx` + `EnvironmentSwitcherModal.tsx` — explicit env picker (button in TopBar; DS `Modal`-based list). Replaces the retired `EnvironmentSelector` (All-Envs dropdown). When `envRecords.length > 0` and the stored `selectedEnv` no longer matches any env, `LayoutShell` opens the modal in `forced` mode (non-dismissible). Switcher pulls env records from `useEnvironments()` (admin endpoint; readable by VIEWER+).
+- `ui/src/components/env-colors.ts` + `ui/src/styles/env-colors.css` — 8-swatch preset palette for the per-environment color indicator. Tokens `--env-color-slate/red/amber/green/teal/blue/purple/pink` are defined for both light and dark themes. `envColorVar(name)` falls back to `slate` for unknown values. `LayoutShell` renders a 3px fixed top bar in the current env's color (z-index 900, below DS modals).
 - `ui/src/components/ExecutionDiagram/` — interactive trace view (canvas)
 - `ui/src/components/ProcessDiagram/` — ELK-rendered route diagram
 - `ui/src/hooks/useScope.ts` — TabKey type, scope inference
 - `ui/src/components/StartupLogPanel.tsx` — deployment startup log viewer (container logs from ClickHouse, polls 3s while STARTING)
- `ui/src/api/queries/logs.ts` — `useStartupLogs` hook for container startup log polling, `useLogs`/`useApplicationLogs` for general log search
+- `ui/src/api/queries/logs.ts` — `useStartupLogs` hook for container startup log polling, `useLogs`/`useApplicationLogs` for bounded log search (single page), `useInfiniteApplicationLogs` for streaming log views (cursor-paginated, server-side source/level filters)
+- `ui/src/api/queries/agents.ts` — `useAgents` for agent list, `useInfiniteAgentEvents` for cursor-paginated timeline stream
+- `ui/src/hooks/useInfiniteStream.ts` — tanstack `useInfiniteQuery` wrapper with top-gated auto-refetch, flattened `items[]`, and `refresh()` invalidator
+- `ui/src/components/InfiniteScrollArea.tsx` — scrollable container with IntersectionObserver top/bottom sentinels. Streaming log/event views use this + `useInfiniteStream`. Bounded views (LogTab, StartupLogPanel) keep `useLogs`/`useStartupLogs`
+- `ui/src/components/SideDrawer.tsx` — project-local right-slide drawer (DS has Modal but no Drawer). Portal-rendered, ESC + transparent-backdrop click closes, sticky header/footer, sizes md/lg/xl. Currently consumed only by `CheckpointDetailDrawer` — promote to `@cameleer/design-system` once a second consumer appears.
+
+## Alerts
+
+- **Sidebar section** (`buildAlertsTreeNodes` in `ui/src/components/sidebar-utils.ts`) — Inbox, Rules, Silences.
+- **Routes** in `ui/src/router.tsx`: `/alerts` (redirect to inbox), `/alerts/inbox`, `/alerts/rules`, `/alerts/rules/new`, `/alerts/rules/:id`, `/alerts/silences`. No redirects for the retired `/alerts/all` and `/alerts/history` — stale URLs 404 per the clean-break policy.
+- **Pages** under `ui/src/pages/Alerts/`:
+  - `InboxPage.tsx` — single filterable inbox. Filters: severity (multi), state (PENDING/FIRING/RESOLVED, default FIRING), Hide acked toggle (default on), Hide read toggle (default on). Row actions: Acknowledge, Mark read, Silence rule… (duration quick menu), Delete (OPERATOR+, soft-delete with undo toast wired to `useRestoreAlert`). Bulk toolbar (selection-driven): Acknowledge N · Mark N read · Silence rules · Delete N (ConfirmDialog; OPERATOR+).
+  - `SilenceRuleMenu.tsx` — DS `Dropdown`-based duration picker (1h / 8h / 24h / Custom…). Used by the row-level and bulk silence actions. "Custom…" navigates to `/alerts/silences?ruleId=<id>`.
+  - `RulesListPage.tsx` — CRUD + enable/disable toggle + env-promotion dropdown (pure UI prefill, no new endpoint).
+  - `RuleEditor/RuleEditorWizard.tsx` — 5-step wizard (Scope / Condition / Trigger / Notify / Review). `form-state.ts` is the single source of truth (`initialForm` / `toRequest` / `validateStep`). Seven condition-form subcomponents under `RuleEditor/condition-forms/` — including `AgentLifecycleForm.tsx` (multi-select event-type chips for the six-entry `AgentLifecycleEventType` allowlist + lookback-window input).
+  - `SilencesPage.tsx` — matcher-based create + end-early. Reads `?ruleId=` search param to prefill the Rule ID field (driven by InboxPage's "Silence rule… → Custom…" flow).
+  - `AlertRow.tsx` shared list row; `alerts-page.module.css` shared styling.
+- **Components**:
+  - `NotificationBell.tsx` — polls `/alerts/unread-count` every 30 s (paused when tab hidden via TanStack Query `refetchIntervalInBackground: false`).
+  - `AlertStateChip.tsx`, `SeverityBadge.tsx` — shared state/severity indicators.
+  - `MustacheEditor/` — CodeMirror 6 editor with variable autocomplete + inline linter. Shared between rule title/message, webhook body/header overrides, and (future) Admin Outbound Connection editor (reduced-context mode for URL).
+  - `MustacheEditor/alert-variables.ts` — variable registry aligned with `NotificationContextBuilder.java`. Add new leaves here whenever the backend context grows.
+- **API queries** under `ui/src/api/queries/`: `alerts.ts`, `alertRules.ts`, `alertSilences.ts`, `alertNotifications.ts`, `alertMeta.ts`. All env-scoped via `useSelectedEnv` from `alertMeta`.
+- **CMD-K**: `buildAlertSearchData` in `LayoutShell.tsx` registers `alert` and `alertRule` result categories. Badges convey severity + state. Palette navigates directly to the deep-link path — no sidebar-reveal state for alerts.
+- **Sidebar accordion**: entering `/alerts/*` collapses Applications + Admin + Starred (mirrors Admin accordion).
+- **Top-nav**: `<NotificationBell />` is the first child of `<TopBar>`, sitting alongside `SearchTrigger` + status `ButtonGroup` + `TimeRangeDropdown` + `AutoRefreshToggle`.

 ## UI Styling

--- a/.gitea/workflows/ci.yml
+++ b/.gitea/workflows/ci.yml
@@ -5,8 +5,20 @@ on:
    branches: [main, 'feature/**', 'fix/**', 'feat/**']
    tags-ignore:
      - 'v*'
+    paths-ignore:
+      - '.planning/**'
+      - 'docs/**'
+      - '**/*.md'
+      - '.claude/**'
+      - 'AGENTS.md'
+      - 'CLAUDE.md'
  pull_request:
    branches: [main]
+    paths-ignore:
+      - '.planning/**'
+      - 'docs/**'
+      - '**/*.md'
+      - '.claude/**'
  delete:

 jobs:
@@ -45,11 +57,25 @@ jobs:
          key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
          restore-keys: ${{ runner.os }}-maven-

+      - name: Cache npm registry
+        uses: actions/cache@v4
+        with:
+          path: ~/.npm
+          key: ${{ runner.os }}-npm-${{ hashFiles('ui/package-lock.json') }}
+          restore-keys: ${{ runner.os }}-npm-
+
+      - name: Cache Vite build artifacts
+        uses: actions/cache@v4
+        with:
+          path: ui/node_modules/.vite
+          key: ${{ runner.os }}-vite-${{ hashFiles('ui/package-lock.json', 'ui/vite.config.ts') }}
+          restore-keys: ${{ runner.os }}-vite-
+
      - name: Build UI
        working-directory: ui
        run: |
          echo '//gitea.siegeln.net/api/packages/cameleer/npm/:_authToken=${REGISTRY_TOKEN}' >> .npmrc
-          npm ci
+          npm ci --prefer-offline --no-audit --fund=false
          npm run build
        env:
          REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
--- a/.planning/it-triage-report.md
+++ b/.planning/it-triage-report.md
@@ -0,0 +1,120 @@
+# IT Triage Report — 2026-04-21
+
+Branch: `main`, starting HEAD `90460705` (chore: refresh GitNexus index stats).
+
+## Summary
+
+- **Starting state**: 65 IT failures (46 F + 19 E) out of 555 tests on a clean build. Side-note: `target/classes` incremental-build staleness from the `90083f88` V1..V18 → V1 schema collapse makes the number look worse (every context load dies on `Flyway V2__claim_mapping.sql failed`). A fresh `mvn clean verify` gives the real 65.
+- **Final state**: **12 failures across 3 test classes** (`AgentSseControllerIT`, `SseSigningIT`, `ClickHouseStatsStoreIT`). **53 failures closed across 14 test classes.**
+- **11 commits landed on local `main`** (not pushed).
+- No new env vars, endpoints, tables, or columns added. `V1__init.sql` untouched. No tests rewritten to pass-by-weakening — every assertion change is accompanied by a comment explaining the contract it now captures.
+
+## Commits (in order)
+
+| SHA | Test classes | What changed |
+|---|---|---|
+| `7436a37b` | AgentRegistrationControllerIT | environmentId, flat→env URL, heartbeat auto-heal, absolute sseEndpoint |
+| `97a6b2e0` | AgentCommandControllerIT | environmentId, CommandGroupResponse new shape (200 w/ aggregate replies) |
+| `e955302f` | BootstrapTokenIT / JwtRefreshIT / RegistrationSecurityIT / SseSigningIT / AgentSseControllerIT | environmentId in register bodies; AGENT-role smoke target; drop flaky iat-coupled assertion |
+| `10e2b699` | SecurityFilterIT | env-scoped agent list URL |
+| `9bda4d8f` | FlywayMigrationIT, ConfigEnvIsolationIT | decouple from shared Testcontainers Postgres state |
+| `36571013` | (docs) | first version of this report |
+| `dfacedb0` | DetailControllerIT | **Cluster B template**: ExecutionChunk envelope + REST-driven lookup |
+| `87bada1f` | ExecutionControllerIT, MetricsControllerIT | Chunk payloads + REST flush-visibility probes |
+| `a6e7458a` | DiagramControllerIT, DiagramRenderControllerIT | Env-scoped render + execution-detail-derived content hash for flat SVG path |
+| `56844799` | SearchControllerIT | 10 seed payloads → ExecutionChunk; fix AGENT→VIEWER token on search GET |
+| `d5adaaab` | DiagramLinkingIT, IngestionSchemaIT | REST for diagramContentHash + processor-tree/snapshot assertions |
+| `8283d531` | ClickHouseChunkPipelineIT, ClickHouseExecutionReadIT | Replace removed `/clickhouse/V2_.sql` with consolidated init.sql; correct `iteration` vs `loopIndex` on seq-based tree path |
+| `95f90f43` | ForwardCompatIT, ProtocolVersionIT, BackpressureIT | Chunk payload; fix wrong property-key prefix in BackpressureIT (+ MetricsFlushScheduler's separate `ingestion.flush-interval-ms` key) |
+| `b55221e9` | SensitiveKeysAdminControllerIT | assert pushResult shape, not exact 0 (shared registry across ITs) |
+
+## The single biggest insight
+
+**`ExecutionController` (legacy PG path) is dead code.** It's `@ConditionalOnMissingBean(ChunkAccumulator.class)` and `ChunkAccumulator` is registered **unconditionally** in `StorageBeanConfig.java:92`, so `ExecutionController` never binds. Even if it did, `IngestionService.upsert` → `ClickHouseExecutionStore.upsert` throws `UnsupportedOperationException("ClickHouse writes use the chunked pipeline")` — the only `ExecutionStore` impl in `src/main/java` is ClickHouse, the Postgres variant lives in a planning doc only.
+
+Practical consequences for every IT that was exercising `/api/v1/data/executions`:
+1. `ChunkIngestionController` owns the URL and expects an `ExecutionChunk` envelope (`exchangeId`, `applicationId`, `instanceId`, `routeId`, `status`, `startTime`, `endTime`, `durationMs`, `chunkSeq`, `final`, `processors: FlatProcessorRecord[]`) — the legacy `RouteExecution` shape was being silently degraded to an empty/degenerate chunk.
+2. The test payload changes are accompanied by assertion changes that now go through REST endpoints instead of raw SQL against the (ClickHouse-resident) `executions` / `processor_executions` / `route_diagrams` / `agent_metrics` tables.
+3. **Recommendation for cleanup**: remove `ExecutionController` + the `upsert` path in `IngestionService` + the stubbed `ClickHouseExecutionStore.upsert` throwers. Separate PR. Happy to file.
+
+## Cluster breakdown
+
+**Cluster A — missing `environmentId` in register bodies (DONE)**
+Root cause: `POST /api/v1/agents/register` now 400s without `environmentId`. Test payloads minted before this requirement. Fixed across all agent-registering ITs plus side-cleanups (flaky iat-coupled assertion in JwtRefreshIT, wrong RBAC target in can-access tests, absolute vs relative sseEndpoint).
+
+**Cluster B — ingestion payload drift (DONE per user direction)**
+All controller + storage ITs that posted `RouteExecution` JSON now post `ExecutionChunk` envelopes. All CH-side assertions now go through REST endpoints (`/api/v1/environments/{env}/executions` search + `/api/v1/executions/{id}` detail + `/agents/{id}/metrics` + `/apps/{app}/routes/{route}/diagram`). DiagramRenderControllerIT's SVG tests still need a content hash → reads it off the execution-detail REST response rather than querying `route_diagrams`.
+
+**Cluster C — flat URL drift (DONE)**
+`/api/v1/agents` → `/api/v1/environments/{envSlug}/agents`. Two test classes touched.
+
+**Cluster D — heartbeat auto-heal contract (DONE)**
+`heartbeatUnknownAgent_returns404` renamed and asserts the 200 auto-heal path that `fb54f9cb` made the contract.
+
+**Cluster E — individual drifts (DONE except three parked)**
+
+| Test class | Status |
+|---|---|
+| FlywayMigrationIT | DONE (decouple from shared PG state) |
+| ConfigEnvIsolationIT.findByEnvironment_excludesOtherEnvs | DONE (unique slug prefix) |
+| ForwardCompatIT | DONE (chunk payload) |
+| ProtocolVersionIT | DONE (chunk payload) |
+| BackpressureIT | DONE (property-key prefix fix — see note below) |
+| SensitiveKeysAdminControllerIT | DONE (assert shape not count) |
+| ClickHouseChunkPipelineIT | DONE (consolidated init.sql) |
+| ClickHouseExecutionReadIT | DONE (iteration vs loopIndex mapping) |
+
+## PARKED — what you'll want to look at next
+
+### 1. ClickHouseStatsStoreIT (8 failures) — timezone bug in production code
+
+`ClickHouseStatsStore.buildStatsSql` uses `lit(Instant)` which formats as `'yyyy-MM-dd HH:mm:ss'` in UTC but with no timezone marker. ClickHouse parses that literal in the session timezone when comparing against the `DateTime`-typed `bucket` column in `stats_1m_*`. On a non-UTC CH host (e.g. CEST docker on a CEST laptop), the filter endpoint is off by the tz offset in hours and misses every row the MV bucketed.
+
+I confirmed this by instrumenting the test: `toDateTime(bucket)` returned `12:00:00` for a row inserted with `start_time=10:00:00Z` (i.e. the stored UTC Unix timestamp but displayed in CEST), and the filter literal `'2026-03-31 10:05:00'` was being parsed as CEST → UTC 08:05 → excluded all rows.
+
+**I didn't fix this** because the repair is in `src/main/java`, not the test. Two reasonable options:
+- **Test-side**: pin the container TZ via `.withEnv("TZ", "UTC")` + include `use_time_zone=UTC` in the JDBC URL. I tried both; neither was sufficient on their own — the CH server reads `timezone` from its own config, not `$TZ`. Getting all three layers (container env, CH server config, JDBC driver) aligned needs dedicated effort.
+- **Production-side (preferred)**: change `lit(Instant)` to `toDateTime('...', 'UTC')` or use the 3-arg `DateTime(3, 'UTC')` column type for `bucket`. That's a store change; would be caught by a matching unit test.
+
+I did add the explicit `'default'` env to the seed `INSERT`s per your directive, but reverted it locally because the timezone bug swallowed the fix. The raw unchanged test is what's committed.
+
+### 2. AgentSseControllerIT (3 failures) & SseSigningIT (1 failure) — SSE connection timing
+
+All failing assertions are `awaitConnection(5000)` timeouts or `ConditionTimeoutException` on SSE stream observation. Not related to any spec drift I could identify — the SSE server is up (other tests in the same classes connect fine), and auth/JWT is accepted. Looks like a real race on either the SseConnectionManager registration or on the HTTP client's first-read flush. Needs a dedicated debug session with a minimal reproducer; not something I wanted to hack around with sleeps.
+
+Specific tests:
+- `AgentSseControllerIT.sseConnect_unknownAgent_returns404` — 5s `CompletableFuture.get` timeout on an HTTP GET that should return 404 synchronously. Suggests the client is waiting on body data that never arrives (SSE stream opens even on 404?).
+- `AgentSseControllerIT.lastEventIdHeader_connectionSucceeds` — `stream.awaitConnection(5000)` false.
+- `AgentSseControllerIT.pingKeepalive_receivedViaSseStream` — waits for an event line in the stream snapshot, never sees it.
+- `SseSigningIT.deepTraceEvent_containsValidSignature` — same pattern.
+
+The sibling tests (`SseSigningIT.configUpdateEvent_containsValidEd25519Signature`) pass in isolation, which strongly suggests order-dependent flakiness rather than a protocol break.
+
+## Final verify command
+
+```bash
+mvn -pl cameleer-server-app -am -Dit.test='!SchemaBootstrapIT' -Dtest='!*' -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false verify
+```
+
+Reports land in `cameleer-server-app/target/failsafe-reports/`. Expect **12 failures** in the three classes above. Everything else is green.
+
+## Side notes worth flagging
+
+- **Property-key inconsistency in the main code** — surfaced via BackpressureIT. `IngestionConfig` is bound under `cameleer.server.ingestion.*`, but `MetricsFlushScheduler.@Scheduled` reads `ingestion.flush-interval-ms` (no prefix, hyphenated). In production this means the flush-interval in `application.yml` isn't actually being honoured by the metrics flush — it stays at the 1s fallback. Separate cleanup.
+- **Shared Testcontainers PG across IT classes** — several of the "cross-test state" fixes (FlywayMigrationIT, ConfigEnvIsolationIT, SensitiveKeysAdminControllerIT) are symptoms of one underlying issue: `AbstractPostgresIT` uses a singleton PG container, and nothing cleans between test classes. Could do with a global `@Sql("/test-reset.sql")` on `@BeforeAll`, but out of scope here.
+- **Agent registry shared across ITs** — same class of issue. Doesn't bite until a test explicitly inspects registry membership (SensitiveKeys `pushResult.total`).
+
+## Follow-up (2026-04-22) — 12 parked failures closed
+
+All three parked clusters now green. 560/560 tests passing.
+
+- **ClickHouseStatsStoreIT (8 failures)** — fixed in `a9a6b465`. Two-layer TZ fix: JVM default TZ pinned to UTC in `CameleerServerApplication.main()` (the ClickHouse JDBC 0.9.7 driver formats `java.sql.Timestamp` via `Timestamp.toString()`, which uses JVM default TZ — a CEST JVM shipping to a UTC CH server stored off-by-offset Unix timestamps), plus column-level `bucket DateTime('UTC')` on all `stats_1m_*` tables with explicit `toDateTime(..., 'UTC')` casts in MV projections and `ClickHouseStatsStore.lit(Instant)` as defence in depth.
+- **MetricsFlushScheduler property-key drift** — fixed in `a6944911`. Scheduler now reads `${cameleer.server.ingestion.flush-interval-ms:1000}` (the SpEL-via-`@ingestionConfig` approach doesn't work because `@EnableConfigurationProperties` uses a compound bean name). BackpressureIT workaround property removed.
+- **SSE flakiness (4 failures, `AgentSseControllerIT` + `SseSigningIT`)** — fixed in `41df042e`. Triage's "order-dependent flakiness" theory was wrong — all four reproduced in isolation. Three root causes: (a) `AgentSseController.events` auto-heal was over-permissive (spoofing vector), fixed with JWT-subject-equals-path-id check; (b) `SseConnectionManager.pingAll` read an unprefixed property key (`agent-registry.ping-interval-ms`), same family of bug as (a6944911); (c) SSE response headers didn't flush until the first `emitter.send()`, so `awaitConnection(5s)` assertions timed out under the 15s ping cadence — fixed by sending an initial `: connected` comment on `connect()`. Full diagnosis in `.planning/sse-flakiness-diagnosis.md`.
+
+Plus the two prod-code cleanups from the ExecutionController-removal follow-ons:
+
+- **Dead `SearchIndexer` subsystem** — removed in `98cbf8f3`. `ExecutionUpdatedEvent` had no publisher after `0f635576`, so the whole indexer + stats + `/admin/clickhouse/pipeline` endpoint + UI pipeline card carried zero signal.
+- **Unused `TaggedExecution` record** — removed in `06c6f53b`.
+
+Final verify: `mvn -pl cameleer-server-app -am -Dit.test='!SchemaBootstrapIT' ... verify` → **Tests run: 560, Failures: 0, Errors: 0, Skipped: 0**.
--- a/.planning/sse-flakiness-diagnosis.md
+++ b/.planning/sse-flakiness-diagnosis.md
@@ -0,0 +1,81 @@
+# SSE Flakiness — Root-Cause Analysis
+
+**Date:** 2026-04-21
+**Tests:** `AgentSseControllerIT.sseConnect_unknownAgent_returns404`, `.lastEventIdHeader_connectionSucceeds`, `.pingKeepalive_receivedViaSseStream`, `SseSigningIT.deepTraceEvent_containsValidSignature`
+
+## Summary
+
+Not order-dependent flakiness (triage report was wrong). Three distinct root causes, one production bug and one test-infrastructure issue, all reproducible when running the classes in isolation.
+
+## Reproduction
+
+```bash
+mvn -pl cameleer-server-app -am -Dit.test='AgentSseControllerIT' -Dtest='!*' \
+    -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false verify
+```
+
+Result: 3 failures out of 7 tests with a cold CH container. Not order-dependent.
+
+## Root causes
+
+### 1. `AgentSseController.events` auto-heal is over-permissive (security bug)
+
+**File:** `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentSseController.java:63-76`
+
+```java
+AgentInfo agent = registryService.findById(id);
+if (agent == null) {
+    var jwtResult = ...;
+    if (jwtResult != null) {     // ← only checks JWT presence
+        registryService.register(id, id, application, env, ...);
+    } else {
+        throw 404;
+    }
+}
+```
+
+**Bug:** auto-heal registers *any* path id when any valid JWT is present, regardless of whether the JWT subject matches the path id. A holder of agent X's JWT can open SSE for any path-id Y, silently spoofing Y.
+
+**Surface symptom:** `sseConnect_unknownAgent_returns404` sends a JWT for `test-agent-sse-it` and requests SSE for `unknown-sse-agent`. Auto-heal kicks in, returns 200 with an infinite empty stream. Test's `statusFuture.get(5s)` — which uses `BodyHandlers.ofString()` and waits for the full body — times out instead of getting a synchronous 404.
+
+**Fix:** only auto-heal when `jwtResult.subject().equals(id)`.
+
+### 2. `SseConnectionManager.pingAll` reads an unprefixed property key (production bug)
+
+**File:** `cameleer-server-app/src/main/java/com/cameleer/server/app/agent/SseConnectionManager.java:172`
+
+```java
+@Scheduled(fixedDelayString = "${agent-registry.ping-interval-ms:15000}")
+```
+
+**Bug:** `AgentRegistryConfig` is `@ConfigurationProperties(prefix = "cameleer.server.agentregistry")`. The scheduler reads an unprefixed `agent-registry.*` key that the YAML never defines — so the default 15s always applies, regardless of config. Same family of bug as the `MetricsFlushScheduler` fix in commit `a6944911`.
+
+**Fix:** `${cameleer.server.agentregistry.ping-interval-ms:15000}`.
+
+### 3. SSE response body doesn't flush until first event (test timing dependency)
+
+**File:** `cameleer-server-app/src/main/java/com/cameleer/server/app/agent/SseConnectionManager.java:connect()`
+
+Spring's `SseEmitter` holds the response open but doesn't flush headers to the client until the first `emitter.send()`. Until then, clients using `HttpResponse.BodyHandlers.ofInputStream()` block on the first byte.
+
+**Surface symptom:**
+- `lastEventIdHeader_connectionSucceeds` — asserts `awaitConnection(5000)` is `true`. The latch counts down in `.thenAccept(response -> ...)`, which in practice only fires once body bytes start flowing (JDK 21 behaviour with SSE streams). Default ping cadence is 15s → 5s assertion times out.
+- `pingKeepalive_receivedViaSseStream` — waits 5s for a `:ping` line. The scheduler runs every 15s (both by default, and because of bug #2, unconditionally).
+- `SseSigningIT.deepTraceEvent_containsValidSignature` — same family: `awaitConnection(5000).isTrue()`.
+
+**Fix:** send an initial `: connected` comment as part of `connect()`. Spring flushes on the first `.send()`, so an immediate comment forces the response headers + first byte to hit the wire, which triggers the client's `thenAccept` callback. Also solves the ping-test: the initial comment is observed as a keepalive line within the test's polling window.
+
+## Hypothesis ladder (ruled out)
+
+- **Order-dependent singleton leak** — ruled out: every failure reproduces when the class is run solo.
+- **Tomcat async thread pool exhaustion** — ruled out: `SseEmitter(Long.MAX_VALUE)` does hold threads, but the 7-test class doesn't reach Tomcat's defaults.
+- **SseConnectionManager emitter-map contamination** — ruled out: each test uses a unique agent id (UUID-suffixed), and the `@Component` is the same instance across tests but the emitter map is keyed by agent id, no collisions.
+
+## Verification
+
+```
+mvn -pl cameleer-server-app -am -Dit.test='AgentSseControllerIT,SseSigningIT' ... verify
+# Tests run: 9, Failures: 0, Errors: 0, Skipped: 0
+```
+
+All 9 tests green with the three fixes applied.
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,7 +1,7 @@
 <!-- gitnexus:start -->
 # GitNexus — Code Intelligence

-This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
+This project is indexed by GitNexus as **cameleer-server** (9731 symbols, 24987 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.

 > If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.

--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -22,8 +22,19 @@ Cameleer Server — observability server that receives, stores, and serves Camel
 ```bash
 mvn clean compile          # Compile all modules
 mvn clean verify           # Full build with tests
+mvn clean verify -DskipITs # Fast: unit tests only (no Testcontainers)
 ```

+### Faster local builds
+
+- **Surefire reuses forks** (`cameleer-server-app/pom.xml`): unit tests run with `forkCount=1C` + `reuseForks=true` — one JVM per CPU core, reused across classes. Test classes that mutate static state must clean up after themselves.
+- **Testcontainers reuse** — opt-in per developer. Add to `~/.testcontainers.properties`:
+  ```
+  testcontainers.reuse.enable=true
+  ```
+  Then `AbstractPostgresIT` containers persist across `mvn verify` runs (saves ~20s per run). Stop them manually when you need a clean DB: `docker rm -f $(docker ps -aq --filter label=org.testcontainers.reuse=true)`.
+- **UI build** dropped redundant `tsc --noEmit` from `npm run build` (Vite/esbuild type-checks during bundling). Run `npm run typecheck` explicitly when you want a full type-check pass.
+
 ## Run

 ```bash
@@ -37,8 +48,11 @@ java -jar cameleer-server-app/target/cameleer-server-app-1.0-SNAPSHOT.jar
 - Depends on `com.cameleer:cameleer-common` from Gitea Maven registry
 - Jackson `JavaTimeModule` for `Instant` deserialization
 - Communication: receives HTTP POST data from agents (executions, diagrams, metrics, logs), serves SSE event streams for config push/commands (config-update, deep-trace, replay, route-control)
- Environment filtering: all data queries filter by the selected environment. All commands target only agents in the selected environment. Backend endpoints accept optional `environment` query parameter; null = all environments (backward compatible).
- Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT `env` claim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeat `environmentId` > JWT `env` claim > `"default"`). Capabilities and route states updated on every heartbeat (protocol v2). Route catalog falls back to ClickHouse stats for route discovery when registry has incomplete data.
+- URL taxonomy: user-facing data, config, and query endpoints live under `/api/v1/environments/{envSlug}/...`. Env is a path segment, resolved via the `@EnvPath` argument resolver (404 on unknown slug). Flat endpoints are only for: agent self-service (JWT-authoritative), cross-env admin (RBAC, OIDC, audit, license, thresholds, env CRUD), cross-env discovery (`/catalog`), content-addressed lookups (`/diagrams/{contentHash}/render`, `/executions/{id}`), and auth. See `.claude/rules/app-classes.md` for the full allow-list.
+- Slug immutability: environment and app slugs are immutable after creation (both appear in URLs, Docker network names, container names, and ClickHouse partition keys). Slug regex `^[a-z0-9][a-z0-9-]{0,63}$` is enforced on POST; update endpoints silently drop any slug field in the request body via Jackson's default unknown-property handling.
+- App uniqueness: `(environment_id, app_slug)` is the natural key. The same app slug can legitimately exist in multiple environments; `AppService.getByEnvironmentAndSlug(envId, slug)` is the canonical lookup for controllers. Bare `getBySlug(slug)` remains for internal use but is ambiguous across envs.
+- Environment filtering: all data queries filter by the selected environment. All commands target only agents in the selected environment. Env is required on every env-scoped endpoint (path param); the legacy `?environment=` query form is retired.
+- Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT `env` claim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeat `environmentId` > JWT `env` claim; no silent default — missing env on heartbeat auto-heal returns 400). Registration (`POST /api/v1/agents/register`) requires `environmentId` in the request body; missing or blank returns 400. Capabilities and route states updated on every heartbeat (protocol v2). Route catalog merges three sources: in-memory agent registry, persistent `route_catalog` table (ClickHouse), and `stats_1m_route` execution stats. The persistent catalog tracks `first_seen`/`last_seen` per route per environment, updated on every registration and heartbeat. Routes appear in the sidebar when their lifecycle overlaps the selected time window (`first_seen <= to AND last_seen >= from`), so historical routes remain visible even after being dropped from newer app versions.
 - Multi-tenancy: each server instance serves one tenant (configured via `CAMELEER_SERVER_TENANT_ID`, default: `"default"`). Environments (dev/staging/prod) are first-class. PostgreSQL isolated via schema-per-tenant (`?currentSchema=tenant_{id}`) and `ApplicationName=tenant_{id}` on the JDBC URL. ClickHouse shared DB with `tenant_id` + `environment` columns, partitioned by `(tenant_id, toYYYYMM(timestamp))`.
 - Storage: PostgreSQL for RBAC, config, and audit; ClickHouse for all observability data (executions, search, logs, metrics, stats, diagrams). ClickHouse schema migrations in `clickhouse/*.sql`, run idempotently on startup by `ClickHouseSchemaInitializer`. Use `IF NOT EXISTS` for CREATE and ADD PROJECTION.
 - Log exchange correlation: `ClickHouseLogStore` extracts `exchange_id` from log entry MDC, preferring `cameleer.exchangeId` over `camel.exchangeId` (fallback for older agents). For `ON_COMPLETION` exchange copies, the agent sets `cameleer.exchangeId` to the parent's exchange ID via `CORRELATION_ID`.
@@ -48,25 +62,29 @@ java -jar cameleer-server-app/target/cameleer-server-app-1.0-SNAPSHOT.jar
 - OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API/UI, stored in database (`server_config` table). Resource server mode: accepts external access tokens (Logto M2M) via JWKS validation when `CAMELEER_SERVER_SECURITY_OIDCISSUERURI` is set. Scope-based role mapping via `SystemRole.normalizeScope()`. System roles synced on every OIDC login via `applyClaimMappings()` in `OidcAuthController` (calls `clearManagedAssignments` + `assignManagedRole` on `RbacService`) — always overwrites managed role assignments; uses managed assignment origin to avoid touching group-inherited or directly-assigned roles. Supports ES384, ES256, RS256.
 - OIDC role extraction: `OidcTokenExchanger` reads roles from the **access_token** first (JWT with `at+jwt` type), then falls back to id_token. `OidcConfig` includes `audience` (RFC 8707 resource indicator) and `additionalScopes`. All provider-specific configuration is external — no provider-specific code in the server.
 - Sensitive keys: Global enforced baseline for masking sensitive data in agent payloads. Merge rule: `final = global UNION per-app` (case-insensitive dedup, per-app can only add, never remove global keys).
- User persistence: PostgreSQL `users` table, admin CRUD at `/api/v1/admin/users`
+- User persistence: PostgreSQL `users` table, admin CRUD at `/api/v1/admin/users`. `users.user_id` is the **bare** identifier — local users as `<username>`, OIDC users as `oidc:<sub>`. JWT `sub` carries the `user:` namespace prefix so `JwtAuthenticationFilter` can tell user tokens from agent tokens; write paths (`UiAuthController`, `OidcAuthController`, `UserAdminController`) all upsert unprefixed, and env-scoped read-path controllers strip the `user:` prefix before using the value as an FK to `users.user_id` / `user_roles.user_id`. Alerting / outbound FKs (`alert_rules.created_by`, `outbound_connections.created_by`, …) therefore all reference the bare form.
 - Usage analytics: ClickHouse `usage_events` table tracks authenticated UI requests, flushed every 5s

 ## Database Migrations

 PostgreSQL (Flyway): `cameleer-server-app/src/main/resources/db/migration/`
- V1 — RBAC (users, roles, groups, audit_log)
- V2 — Claim mappings (OIDC)
- V3 — Runtime management (apps, environments, deployments, app_versions)
- V4 — Environment config (default_container_config JSONB)
- V5 — App container config (container_config JSONB on apps)
- V6 — JAR retention policy (jar_retention_count on environments)
- V7 — Deployment orchestration (target_state, deployment_strategy, replica_states JSONB, deploy_stage)
- V8 — Deployment active config (resolved_config JSONB on deployments)
- V9 — Password hardening (failed_login_attempts, locked_until, token_revoked_before on users)
- V10 — Runtime type detection (detected_runtime_type, detected_main_class on app_versions)
+- V1 — Consolidated baseline schema. All prior V1–V18 evolution was collapsed before first prod deploy. Contains: RBAC (users, roles, groups, user_roles, user_groups, group_roles, claim_mapping_rules), runtime management (environments, apps, app_versions, deployments), env-scoped application config (application_config PK `(application, environment)`, app_settings PK `(application_id, environment)`), audit_log, outbound_connections, server_config, and the full alerting subsystem (alert_rules, alert_rule_targets, alert_instances, alert_silences, alert_notifications). Seeds the 4 system roles (AGENT/VIEWER/OPERATOR/ADMIN), the `Admins` group with ADMIN role, and a default environment. Invariants covered by `SchemaBootstrapIT`.

 ClickHouse: `cameleer-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup)

+## Regenerating OpenAPI schema (SPA types)
+
+After any change to REST controller paths, request/response DTOs, or `@PathVariable`/`@RequestParam`/`@RequestBody` signatures, regenerate the TypeScript types the SPA consumes. Required for every controller-level change.
+
+```bash
+# Backend must be running on :8081
+cd ui && npm run generate-api:live   # fetches fresh openapi.json AND regenerates schema.d.ts
+# OR, if openapi.json was updated by other means:
+cd ui && npm run generate-api        # regenerates schema.d.ts from existing openapi.json
+```
+
+After regeneration, `ui/src/api/schema.d.ts` and `ui/src/api/openapi.json` will update. The TypeScript compiler then surfaces every SPA call site that needs updating — fix all compile errors before testing in the browser. Commit the regenerated files with the controller change.
+
 ## Maintaining .claude/rules/

 When adding, removing, or renaming classes, controllers, endpoints, UI components, or metrics, update the corresponding `.claude/rules/` file as part of the same change. The rule files are the class/API map that future sessions rely on — stale rules cause wrong assumptions. Treat rule file updates like updating an import: part of the change, not a separate task.
@@ -74,3 +92,105 @@ When adding, removing, or renaming classes, controllers, endpoints, UI component
 ## Disabled Skills

 - Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands.
+
+<!-- gitnexus:start -->
+# GitNexus — Code Intelligence
+
+This project is indexed by GitNexus as **cameleer-server** (9731 symbols, 24987 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
+
+> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
+
+## Always Do
+
+- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
+- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
+- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
+- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
+- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
+
+## When Debugging
+
+1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
+2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
+3. `READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
+4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
+
+## When Refactoring
+
+- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
+- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
+- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
+
+## Never Do
+
+- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
+- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
+- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
+- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
+
+## Tools Quick Reference
+
+| Tool | When to use | Command |
+|------|-------------|---------|
+| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` |
+| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` |
+| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` |
+| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` |
+| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` |
+| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` |
+
+## Impact Risk Levels
+
+| Depth | Meaning | Action |
+|-------|---------|--------|
+| d=1 | WILL BREAK — direct callers/importers | MUST update these |
+| d=2 | LIKELY AFFECTED — indirect deps | Should test |
+| d=3 | MAY NEED TESTING — transitive | Test if critical path |
+
+## Resources
+
+| Resource | Use for |
+|----------|---------|
+| `gitnexus://repo/cameleer-server/context` | Codebase overview, check index freshness |
+| `gitnexus://repo/cameleer-server/clusters` | All functional areas |
+| `gitnexus://repo/cameleer-server/processes` | All execution flows |
+| `gitnexus://repo/cameleer-server/process/{name}` | Step-by-step execution trace |
+
+## Self-Check Before Finishing
+
+Before completing any code modification task, verify:
+1. `gitnexus_impact` was run for all modified symbols
+2. No HIGH/CRITICAL risk warnings were ignored
+3. `gitnexus_detect_changes()` confirms changes match expected scope
+4. All d=1 (WILL BREAK) dependents were updated
+
+## Keeping the Index Fresh
+
+After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
+
+```bash
+npx gitnexus analyze
+```
+
+If the index previously included embeddings, preserve them by adding `--embeddings`:
+
+```bash
+npx gitnexus analyze --embeddings
+```
+
+To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
+
+> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
+
+## CLI
+
+| Task | Read this skill file |
+|------|---------------------|
+| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
+| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
+| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
+| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` |
+| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` |
+| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` |
+
+<!-- gitnexus:end -->
--- a/12
+++ b/12
@@ -1,10 +1,14 @@
 FROM --platform=$BUILDPLATFORM maven:3.9-eclipse-temurin-17 AS build
 WORKDIR /build

-# Configure Gitea Maven Registry for cameleer-common dependency
-ARG REGISTRY_TOKEN
-RUN mkdir -p ~/.m2 && \
-    echo '<settings><servers><server><id>gitea</id><username>cameleer</username><password>'${REGISTRY_TOKEN}'</password></server></servers></settings>' > ~/.m2/settings.xml
+# Optional auth for Gitea Maven Registry. The `cameleer/cameleer-common` package
+# is published publicly, so empty token → anonymous pull (no settings.xml).
+# Private packages require a non-empty token.
+ARG REGISTRY_TOKEN=""
+RUN if [ -n "$REGISTRY_TOKEN" ]; then \
+      mkdir -p ~/.m2 && \
+      printf '<settings><servers><server><id>gitea</id><username>cameleer</username><password>%s</password></server></servers></settings>\n' "$REGISTRY_TOKEN" > ~/.m2/settings.xml; \
+    fi

 COPY pom.xml .
 COPY cameleer-server-core/pom.xml cameleer-server-core/
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -19,38 +19,99 @@ mvn clean compile          # compile only
 mvn clean verify           # compile + run all tests (needs Docker for integration tests)
 ```

-## Infrastructure Setup
+## Start a brand-new local environment (Docker)

-Start PostgreSQL:
+The repo ships a `docker-compose.yml` with the full stack: PostgreSQL, ClickHouse, the Spring Boot server, and the nginx-served SPA. All dev defaults are baked into the compose file — no `.env` file or extra config needed for a first run.

 ```bash
+# 1. Clean slate (safe if this is already a first run — noop when no volumes exist)
+docker compose down -v
+
+# 2. Build + start everything. First run rebuilds both images (~2–4 min).
+docker compose up -d --build
+
+# 3. Watch the server come up (health check goes green in ~60–90s after Flyway + ClickHouse init)
+docker compose logs -f cameleer-server
+#   ready when you see "Started CameleerServerApplication in ...".
+#   Ctrl+C when ready — containers keep running.
+
+# 4. Smoke test
+curl -s http://localhost:8081/api/v1/health     # → {"status":"UP"}
+```
+
+Open the UI at **http://localhost:8080** (nginx) and log in with **admin / admin**.
+
+| Service    | Host port | URL / notes |
+|------------|-----------|-------------|
+| Web UI (nginx) | 8080 | http://localhost:8080 — proxies `/api` to the server |
+| Server API | 8081 | http://localhost:8081/api/v1/health, http://localhost:8081/api/v1/swagger-ui.html |
+| PostgreSQL | 5432 | user `cameleer`, password `cameleer_dev`, db `cameleer` |
+| ClickHouse | 8123 (HTTP), 9000 (native) | user `default`, no password, db `cameleer` |
+
+**Dev credentials baked into compose (do not use in production):**
+
+| Purpose | Value |
+|---|---|
+| UI login | `admin` / `admin` |
+| Bootstrap token (agent registration) | `dev-bootstrap-token-for-local-agent-registration` |
+| JWT secret | `dev-jwt-secret-32-bytes-min-0123456789abcdef0123456789abcdef` |
+| `CAMELEER_SERVER_RUNTIME_ENABLED` | `false` (Docker-in-Docker app orchestration off for the local stack) |
+
+Override any of these by editing `docker-compose.yml` or passing `-e KEY=value` to `docker compose run`.
+
+### Common lifecycle commands
+
+```bash
+# Stop everything but keep volumes (quick restart later)
+docker compose stop
+
+# Start again after a stop
+docker compose start
+
+# Apply changes to the server code / UI — rebuild just what changed
+docker compose up -d --build cameleer-server
+docker compose up -d --build cameleer-ui
+
+# Wipe the environment completely (drops PG + ClickHouse volumes — all data gone)
+docker compose down -v
+
+# Fresh Flyway run by dropping just the PG volume (keeps ClickHouse data)
+docker compose down
+docker volume rm cameleer-server_cameleer-pgdata
 docker compose up -d
 ```

-This starts PostgreSQL 16. The database schema is applied automatically via Flyway migrations on server startup. ClickHouse tables are created by the schema initializer on startup.
+### Infra-only mode (backend via `mvn` / UI via Vite)

-| Service    | Port | Purpose              |
-|------------|------|----------------------|
-| PostgreSQL | 5432 | JDBC (Spring JDBC)   |
-
-PostgreSQL credentials: `cameleer` / `cameleer_dev`, database `cameleer`.
-
-## Run the Server
+If you want to iterate on backend/UI code without rebuilding the server image on every change, start just the databases and run the server + UI locally:

 ```bash
+# 1. Only infra containers
+docker compose up -d cameleer-postgres cameleer-clickhouse
+
+# 2. Build and run the server jar against those containers
 mvn clean package -DskipTests
-SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/cameleer \
+SPRING_DATASOURCE_URL="jdbc:postgresql://localhost:5432/cameleer?currentSchema=tenant_default&ApplicationName=tenant_default" \
 SPRING_DATASOURCE_USERNAME=cameleer \
 SPRING_DATASOURCE_PASSWORD=cameleer_dev \
-CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKEN=my-secret-token \
+SPRING_FLYWAY_USER=cameleer \
+SPRING_FLYWAY_PASSWORD=cameleer_dev \
+CAMELEER_SERVER_CLICKHOUSE_URL="jdbc:clickhouse://localhost:8123/cameleer" \
+CAMELEER_SERVER_CLICKHOUSE_USERNAME=default \
+CAMELEER_SERVER_CLICKHOUSE_PASSWORD= \
+CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKEN=dev-bootstrap-token-for-local-agent-registration \
+CAMELEER_SERVER_SECURITY_JWTSECRET=dev-jwt-secret-32-bytes-min-0123456789abcdef0123456789abcdef \
+CAMELEER_SERVER_RUNTIME_ENABLED=false \
+CAMELEER_SERVER_TENANT_ID=default \
 java -jar cameleer-server-app/target/cameleer-server-app-1.0-SNAPSHOT.jar
+
+# 3. In another terminal — Vite dev server on :5173 (proxies /api → :8081)
+cd ui && npm install && npm run dev
 ```

-> **Note:** The Docker image no longer includes default database credentials. When running via `docker run`, pass `-e SPRING_DATASOURCE_URL=...` etc. The docker-compose setup provides these automatically.
+Database schema is applied automatically: PostgreSQL via Flyway migrations on server startup, ClickHouse tables via `ClickHouseSchemaInitializer`. No manual DDL needed.

-The server starts on **port 8081**. The `CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKEN` environment variable is **required** — the server fails fast on startup if it is not set.
-
-For token rotation without downtime, set `CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKENPREVIOUS` to the old token while rolling out the new one. The server accepts both during the overlap window.
+`CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKEN` is **required** for agent registration — the server fails fast on startup if it's not set. For token rotation without downtime, set `CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKENPREVIOUS` to the old token while rolling out the new one — the server accepts both during the overlap window.

 ## API Endpoints

@@ -438,6 +499,7 @@ Key settings in `cameleer-server-app/src/main/resources/application.yml`. All cu
 | `cameleer.server.runtime.routingmode` | `path` | `CAMELEER_SERVER_RUNTIME_ROUTINGMODE` | `path` or `subdomain` Traefik routing |
 | `cameleer.server.runtime.routingdomain` | `localhost` | `CAMELEER_SERVER_RUNTIME_ROUTINGDOMAIN` | Domain for Traefik routing labels |
 | `cameleer.server.runtime.serverurl` | *(empty)* | `CAMELEER_SERVER_RUNTIME_SERVERURL` | Server URL injected into app containers |
+| `cameleer.server.runtime.certresolver` | *(empty)* | `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` | Traefik TLS cert resolver name (e.g. `letsencrypt`). Blank = omit the `tls.certresolver` label and let Traefik serve the default TLS-store cert |
 | `cameleer.server.runtime.agenthealthport` | `9464` | `CAMELEER_SERVER_RUNTIME_AGENTHEALTHPORT` | Agent health check port |
 | `cameleer.server.runtime.healthchecktimeout` | `60` | `CAMELEER_SERVER_RUNTIME_HEALTHCHECKTIMEOUT` | Health check timeout (seconds) |
 | `cameleer.server.runtime.container.memorylimit` | `512m` | `CAMELEER_SERVER_RUNTIME_CONTAINER_MEMORYLIMIT` | Default memory limit for app containers |
--- a/cameleer-server-app/pom.xml
+++ b/cameleer-server-app/pom.xml
@@ -82,6 +82,11 @@
            <artifactId>org.eclipse.xtext.xbase.lib</artifactId>
            <version>2.37.0</version>
        </dependency>
+        <dependency>
+            <groupId>com.samskivert</groupId>
+            <artifactId>jmustache</artifactId>
+            <version>1.16</version>
+        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-validation</artifactId>
@@ -144,6 +149,12 @@
            <artifactId>awaitility</artifactId>
            <scope>test</scope>
        </dependency>
+        <dependency>
+            <groupId>org.wiremock</groupId>
+            <artifactId>wiremock-standalone</artifactId>
+            <version>3.9.1</version>
+            <scope>test</scope>
+        </dependency>
    </dependencies>

    <build>
@@ -178,8 +189,8 @@
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
-                    <forkCount>1</forkCount>
-                    <reuseForks>false</reuseForks>
+                    <forkCount>1C</forkCount>
+                    <reuseForks>true</reuseForks>
                </configuration>
            </plugin>
            <plugin>
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/CameleerServerApplication.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/CameleerServerApplication.java
@@ -8,6 +8,8 @@ import org.springframework.boot.context.properties.EnableConfigurationProperties
 import org.springframework.scheduling.annotation.EnableAsync;
 import org.springframework.scheduling.annotation.EnableScheduling;

+import java.util.TimeZone;
+
 /**
 * Main entry point for the Cameleer Server application.
 * <p>
@@ -23,6 +25,11 @@ import org.springframework.scheduling.annotation.EnableScheduling;
 public class CameleerServerApplication {

    public static void main(String[] args) {
+        // Pin JVM default TZ to UTC. The ClickHouse JDBC driver formats
+        // java.sql.Timestamp via toString() which uses JVM default TZ; a
+        // non-UTC JVM would then send CH timestamps off by the TZ offset.
+        // Standard practice for observability servers.
+        TimeZone.setDefault(TimeZone.getTimeZone("UTC"));
        SpringApplication.run(CameleerServerApplication.class, args);
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/agent/AgentLifecycleMonitor.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/agent/AgentLifecycleMonitor.java
@@ -55,7 +55,8 @@ public class AgentLifecycleMonitor {
                if (before != null && before != agent.state()) {
                    String eventType = mapTransitionEvent(before, agent.state());
                    if (eventType != null) {
-                        agentEventService.recordEvent(agent.instanceId(), agent.applicationId(), eventType,
+                        agentEventService.recordEvent(agent.instanceId(), agent.applicationId(),
+                                agent.environmentId(), eventType,
                                agent.displayName() + " " + before + " -> " + agent.state());
                        serverMetrics.recordAgentTransition(eventType);
                    }
@@ -69,7 +70,7 @@ public class AgentLifecycleMonitor {
    private String mapTransitionEvent(AgentState from, AgentState to) {
        if (from == AgentState.LIVE && to == AgentState.STALE) return "WENT_STALE";
        if (from == AgentState.STALE && to == AgentState.DEAD) return "WENT_DEAD";
-        if (from == AgentState.STALE && to == AgentState.LIVE) return "RECOVERED";
+        if (to == AgentState.LIVE && (from == AgentState.STALE || from == AgentState.DEAD)) return "RECOVERED";
        return null;
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/agent/SseConnectionManager.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/agent/SseConnectionManager.java
@@ -80,6 +80,17 @@ public class SseConnectionManager implements AgentEventListener {
            log.debug("SSE connection error for agent {}: {}", agentId, ex.getMessage());
        });

+        // Send an initial keepalive comment so Spring flushes the response
+        // headers immediately. Without this, clients blocking on the first
+        // body byte can hang for a full ping interval before observing the
+        // connection — surface symptom in ITs that assert awaitConnection().
+        try {
+            emitter.send(SseEmitter.event().comment("connected"));
+        } catch (IOException e) {
+            log.debug("Initial keepalive failed for agent {}: {}", agentId, e.getMessage());
+            emitters.remove(agentId, emitter);
+        }
+
        log.info("SSE connection established for agent {}", agentId);

        return emitter;
@@ -169,7 +180,7 @@ public class SseConnectionManager implements AgentEventListener {
    /**
     * Scheduled ping keepalive to all connected agents.
     */
-    @Scheduled(fixedDelayString = "${agent-registry.ping-interval-ms:15000}")
+    @Scheduled(fixedDelayString = "${cameleer.server.agentregistry.ping-interval-ms:15000}")
    void pingAll() {
        if (!emitters.isEmpty()) {
            sendPingToAll();
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java
@@ -0,0 +1,78 @@
+package com.cameleer.server.app.alerting.config;
+
+import com.cameleer.server.app.alerting.eval.PerKindCircuitBreaker;
+import com.cameleer.server.app.alerting.metrics.AlertingMetrics;
+import com.cameleer.server.app.alerting.storage.*;
+import com.cameleer.server.core.alerting.AlertInstanceRepository;
+import com.cameleer.server.core.alerting.AlertNotificationRepository;
+import com.cameleer.server.core.alerting.AlertRuleRepository;
+import com.cameleer.server.core.alerting.AlertSilenceRepository;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.boot.context.properties.EnableConfigurationProperties;
+import org.springframework.context.annotation.Bean;
+import org.springframework.context.annotation.Configuration;
+import org.springframework.jdbc.core.JdbcTemplate;
+
+import java.net.InetAddress;
+import java.time.Clock;
+
+@Configuration
+@EnableConfigurationProperties(AlertingProperties.class)
+public class AlertingBeanConfig {
+
+    private static final Logger log = LoggerFactory.getLogger(AlertingBeanConfig.class);
+
+    @Bean
+    public AlertRuleRepository alertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        return new PostgresAlertRuleRepository(jdbc, om);
+    }
+
+    @Bean
+    public AlertInstanceRepository alertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        return new PostgresAlertInstanceRepository(jdbc, om);
+    }
+
+    @Bean
+    public AlertSilenceRepository alertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        return new PostgresAlertSilenceRepository(jdbc, om);
+    }
+
+    @Bean
+    public AlertNotificationRepository alertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        return new PostgresAlertNotificationRepository(jdbc, om);
+    }
+
+    @Bean
+    public Clock alertingClock() {
+        return Clock.systemDefaultZone();
+    }
+
+    @Bean("alertingInstanceId")
+    public String alertingInstanceId() {
+        String hostname;
+        try {
+            hostname = InetAddress.getLocalHost().getHostName();
+        } catch (Exception e) {
+            hostname = "unknown";
+        }
+        return hostname + ":" + ProcessHandle.current().pid();
+    }
+
+    @Bean
+    public PerKindCircuitBreaker perKindCircuitBreaker(AlertingProperties props,
+                                                       AlertingMetrics alertingMetrics) {
+        if (props.evaluatorTickIntervalMs() != null
+                && props.evaluatorTickIntervalMs() < 5000) {
+            log.warn("cameleer.server.alerting.evaluatorTickIntervalMs={} is below the 5000 ms floor; clamping to 5000 ms",
+                    props.evaluatorTickIntervalMs());
+        }
+        PerKindCircuitBreaker breaker = new PerKindCircuitBreaker(
+                props.cbFailThreshold(),
+                props.cbWindowSeconds(),
+                props.cbCooldownSeconds());
+        breaker.setMetrics(alertingMetrics);
+        return breaker;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java
@@ -0,0 +1,79 @@
+package com.cameleer.server.app.alerting.config;
+
+import org.springframework.boot.context.properties.ConfigurationProperties;
+
+@ConfigurationProperties("cameleer.server.alerting")
+public record AlertingProperties(
+        Integer evaluatorTickIntervalMs,
+        Integer evaluatorBatchSize,
+        Integer claimTtlSeconds,
+        Integer notificationTickIntervalMs,
+        Integer notificationBatchSize,
+        Boolean inTickCacheEnabled,
+        Integer circuitBreakerFailThreshold,
+        Integer circuitBreakerWindowSeconds,
+        Integer circuitBreakerCooldownSeconds,
+        Integer eventRetentionDays,
+        Integer notificationRetentionDays,
+        Integer webhookTimeoutMs,
+        Integer webhookMaxAttempts,
+        Integer perExchangeDeployBacklogCapSeconds) {
+
+    public int effectiveEvaluatorTickIntervalMs() {
+        int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs;
+        return Math.max(5000, raw);  // floor: no faster than 5 s
+    }
+
+    public int effectiveEvaluatorBatchSize() {
+        return evaluatorBatchSize == null ? 20 : evaluatorBatchSize;
+    }
+
+    public int effectiveClaimTtlSeconds() {
+        return claimTtlSeconds == null ? 30 : claimTtlSeconds;
+    }
+
+    public int effectiveNotificationTickIntervalMs() {
+        return notificationTickIntervalMs == null ? 5000 : notificationTickIntervalMs;
+    }
+
+    public int effectiveNotificationBatchSize() {
+        return notificationBatchSize == null ? 50 : notificationBatchSize;
+    }
+
+    public boolean effectiveInTickCacheEnabled() {
+        return inTickCacheEnabled == null || inTickCacheEnabled;
+    }
+
+    public int effectiveEventRetentionDays() {
+        return eventRetentionDays == null ? 90 : eventRetentionDays;
+    }
+
+    public int effectiveNotificationRetentionDays() {
+        return notificationRetentionDays == null ? 30 : notificationRetentionDays;
+    }
+
+    public int effectiveWebhookTimeoutMs() {
+        return webhookTimeoutMs == null ? 5000 : webhookTimeoutMs;
+    }
+
+    public int effectiveWebhookMaxAttempts() {
+        return webhookMaxAttempts == null ? 3 : webhookMaxAttempts;
+    }
+
+    public int cbFailThreshold() {
+        return circuitBreakerFailThreshold == null ? 5 : circuitBreakerFailThreshold;
+    }
+
+    public int cbWindowSeconds() {
+        return circuitBreakerWindowSeconds == null ? 30 : circuitBreakerWindowSeconds;
+    }
+
+    public int cbCooldownSeconds() {
+        return circuitBreakerCooldownSeconds == null ? 60 : circuitBreakerCooldownSeconds;
+    }
+
+    public int effectivePerExchangeDeployBacklogCapSeconds() {
+        // Default 24 h. Zero or negative = disabled (no clamp — first-run uses rule.createdAt as today).
+        return perExchangeDeployBacklogCapSeconds == null ? 86_400 : perExchangeDeployBacklogCapSeconds;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertController.java
@@ -0,0 +1,165 @@
+package com.cameleer.server.app.alerting.controller;
+
+import com.cameleer.server.app.alerting.dto.AlertDto;
+import com.cameleer.server.app.alerting.dto.BulkIdsRequest;
+import com.cameleer.server.app.alerting.dto.UnreadCountResponse;
+import com.cameleer.server.app.alerting.notify.InAppInboxQuery;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.alerting.AlertInstance;
+import com.cameleer.server.core.alerting.AlertInstanceRepository;
+import com.cameleer.server.core.alerting.AlertSeverity;
+import com.cameleer.server.core.alerting.AlertState;
+import com.cameleer.server.core.runtime.Environment;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import jakarta.validation.Valid;
+import org.springframework.http.HttpStatus;
+import org.springframework.http.ResponseEntity;
+import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.security.core.context.SecurityContextHolder;
+import org.springframework.web.bind.annotation.DeleteMapping;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.PathVariable;
+import org.springframework.web.bind.annotation.PostMapping;
+import org.springframework.web.bind.annotation.RequestBody;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RequestParam;
+import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.UUID;
+
+/**
+ * REST controller for the in-app alert inbox (env-scoped).
+ * VIEWER+ can read their own inbox; OPERATOR+ can soft-delete and restore alerts.
+ */
+@RestController
+@RequestMapping("/api/v1/environments/{envSlug}/alerts")
+@Tag(name = "Alerts Inbox", description = "In-app alert inbox, ack and read tracking (env-scoped)")
+@PreAuthorize("hasAnyRole('VIEWER','OPERATOR','ADMIN')")
+public class AlertController {
+
+    private final InAppInboxQuery inboxQuery;
+    private final AlertInstanceRepository instanceRepo;
+
+    public AlertController(InAppInboxQuery inboxQuery,
+                           AlertInstanceRepository instanceRepo) {
+        this.inboxQuery = inboxQuery;
+        this.instanceRepo = instanceRepo;
+    }
+
+    @GetMapping
+    public List<AlertDto> list(
+            @EnvPath Environment env,
+            @RequestParam(defaultValue = "50") int limit,
+            @RequestParam(required = false) List<AlertState> state,
+            @RequestParam(required = false) List<AlertSeverity> severity,
+            @RequestParam(required = false) Boolean acked,
+            @RequestParam(required = false) Boolean read) {
+        String userId = currentUserId();
+        int effectiveLimit = Math.min(limit, 200);
+        return inboxQuery.listInbox(env.id(), userId, state, severity, acked, read, effectiveLimit)
+                .stream().map(AlertDto::from).toList();
+    }
+
+    @GetMapping("/unread-count")
+    public UnreadCountResponse unreadCount(@EnvPath Environment env) {
+        return inboxQuery.countUnread(env.id(), currentUserId());
+    }
+
+    @GetMapping("/{id}")
+    public AlertDto get(@EnvPath Environment env, @PathVariable UUID id) {
+        AlertInstance instance = requireLiveInstance(id, env.id());
+        return AlertDto.from(instance);
+    }
+
+    @PostMapping("/{id}/ack")
+    public AlertDto ack(@EnvPath Environment env, @PathVariable UUID id) {
+        AlertInstance instance = requireLiveInstance(id, env.id());
+        String userId = currentUserId();
+        instanceRepo.ack(id, userId, Instant.now());
+        // Re-fetch to return fresh state
+        return AlertDto.from(instanceRepo.findById(id)
+                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND)));
+    }
+
+    @PostMapping("/{id}/read")
+    public void read(@EnvPath Environment env, @PathVariable UUID id) {
+        requireLiveInstance(id, env.id());
+        instanceRepo.markRead(id, Instant.now());
+    }
+
+    @PostMapping("/bulk-read")
+    public void bulkRead(@EnvPath Environment env,
+                         @Valid @RequestBody BulkIdsRequest req) {
+        List<UUID> filtered = inEnvLiveIds(req.instanceIds(), env.id());
+        if (!filtered.isEmpty()) {
+            instanceRepo.bulkMarkRead(filtered, Instant.now());
+        }
+    }
+
+    @PostMapping("/bulk-ack")
+    public void bulkAck(@EnvPath Environment env,
+                        @Valid @RequestBody BulkIdsRequest req) {
+        List<UUID> filtered = inEnvLiveIds(req.instanceIds(), env.id());
+        if (!filtered.isEmpty()) {
+            instanceRepo.bulkAck(filtered, currentUserId(), Instant.now());
+        }
+    }
+
+    @DeleteMapping("/{id}")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public ResponseEntity<Void> delete(@EnvPath Environment env, @PathVariable UUID id) {
+        requireLiveInstance(id, env.id());
+        instanceRepo.softDelete(id, Instant.now());
+        return ResponseEntity.noContent().build();
+    }
+
+    @PostMapping("/bulk-delete")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public void bulkDelete(@EnvPath Environment env,
+                           @Valid @RequestBody BulkIdsRequest req) {
+        List<UUID> filtered = inEnvLiveIds(req.instanceIds(), env.id());
+        if (!filtered.isEmpty()) {
+            instanceRepo.bulkSoftDelete(filtered, Instant.now());
+        }
+    }
+
+    @PostMapping("/{id}/restore")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public ResponseEntity<Void> restore(@EnvPath Environment env, @PathVariable UUID id) {
+        // Unlike requireLiveInstance, restore explicitly targets soft-deleted rows
+        AlertInstance inst = instanceRepo.findById(id)
+                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND, "Alert not found"));
+        if (!inst.environmentId().equals(env.id()))
+            throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Alert not found in env");
+        instanceRepo.restore(id);
+        return ResponseEntity.noContent().build();
+    }
+
+    // -------------------------------------------------------------------------
+    // Helpers
+    // -------------------------------------------------------------------------
+
+    private AlertInstance requireLiveInstance(UUID id, UUID envId) {
+        AlertInstance i = instanceRepo.findById(id)
+                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND, "Alert not found"));
+        if (!i.environmentId().equals(envId) || i.deletedAt() != null)
+            throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Alert not found in env");
+        return i;
+    }
+
+    private List<UUID> inEnvLiveIds(List<UUID> ids, UUID envId) {
+        return instanceRepo.filterInEnvLive(ids, envId);
+    }
+
+    private String currentUserId() {
+        var auth = SecurityContextHolder.getContext().getAuthentication();
+        if (auth == null || auth.getName() == null) {
+            throw new ResponseStatusException(HttpStatus.UNAUTHORIZED, "No authentication");
+        }
+        String name = auth.getName();
+        return name.startsWith("user:") ? name.substring(5) : name;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertNotificationController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertNotificationController.java
@@ -0,0 +1,77 @@
+package com.cameleer.server.app.alerting.controller;
+
+import com.cameleer.server.app.alerting.dto.AlertNotificationDto;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.alerting.AlertNotification;
+import com.cameleer.server.core.alerting.AlertNotificationRepository;
+import com.cameleer.server.core.alerting.NotificationStatus;
+import com.cameleer.server.core.runtime.Environment;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import org.springframework.http.HttpStatus;
+import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.PathVariable;
+import org.springframework.web.bind.annotation.PostMapping;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.UUID;
+
+/**
+ * REST controller for alert notifications.
+ * <p>
+ * Env-scoped: GET /api/v1/environments/{envSlug}/alerts/{id}/notifications — lists outbound
+ * notifications for a given alert instance.
+ * <p>
+ * Flat: POST /api/v1/alerts/notifications/{id}/retry — globally unique notification IDs;
+ * flat path matches the /executions/{id} precedent. OPERATOR+ only.
+ */
+@RestController
+@Tag(name = "Alert Notifications", description = "Outbound webhook notification management")
+public class AlertNotificationController {
+
+    private final AlertNotificationRepository notificationRepo;
+
+    public AlertNotificationController(AlertNotificationRepository notificationRepo) {
+        this.notificationRepo = notificationRepo;
+    }
+
+    /**
+     * Lists notifications for a specific alert instance (env-scoped).
+     * VIEWER+.
+     */
+    @GetMapping("/api/v1/environments/{envSlug}/alerts/{alertId}/notifications")
+    @PreAuthorize("hasAnyRole('VIEWER','OPERATOR','ADMIN')")
+    public List<AlertNotificationDto> listForInstance(
+            @EnvPath Environment env,
+            @PathVariable UUID alertId) {
+        return notificationRepo.listForInstance(alertId)
+                .stream().map(AlertNotificationDto::from).toList();
+    }
+
+    /**
+     * Retries a failed notification — resets attempts and schedules it for immediate retry.
+     * Notification IDs are globally unique (flat path, matches /executions/{id} precedent).
+     * OPERATOR+ only.
+     */
+    @PostMapping("/api/v1/alerts/notifications/{id}/retry")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public AlertNotificationDto retry(@PathVariable UUID id) {
+        AlertNotification notification = notificationRepo.findById(id)
+                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND,
+                        "Notification not found: " + id));
+
+        if (notification.status() == NotificationStatus.PENDING) {
+            return AlertNotificationDto.from(notification);
+        }
+
+        // Reset for retry: status -> PENDING, attempts -> 0, next_attempt_at -> now
+        notificationRepo.resetForRetry(id, Instant.now());
+
+        return AlertNotificationDto.from(notificationRepo.findById(id)
+                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND)));
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertRuleController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertRuleController.java
@@ -0,0 +1,402 @@
+package com.cameleer.server.app.alerting.controller;
+
+import com.cameleer.server.app.alerting.dto.AlertRuleRequest;
+import com.cameleer.server.app.alerting.dto.AlertRuleResponse;
+import com.cameleer.server.app.alerting.dto.RenderPreviewRequest;
+import com.cameleer.server.app.alerting.dto.RenderPreviewResponse;
+import com.cameleer.server.app.alerting.dto.TestEvaluateRequest;
+import com.cameleer.server.app.alerting.dto.TestEvaluateResponse;
+import com.cameleer.server.app.alerting.dto.WebhookBindingRequest;
+import com.cameleer.server.app.alerting.eval.ConditionEvaluator;
+import com.cameleer.server.app.alerting.eval.EvalContext;
+import com.cameleer.server.app.alerting.eval.EvalResult;
+import com.cameleer.server.app.alerting.eval.TickCache;
+import com.cameleer.server.app.alerting.notify.MustacheRenderer;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.admin.AuditCategory;
+import com.cameleer.server.core.admin.AuditResult;
+import com.cameleer.server.core.admin.AuditService;
+import com.cameleer.server.core.alerting.AlertCondition;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.AlertRuleRepository;
+import com.cameleer.server.core.alerting.AlertRuleTarget;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.alerting.ExchangeMatchCondition;
+import com.cameleer.server.core.alerting.FireMode;
+import com.cameleer.server.core.alerting.WebhookBinding;
+import com.cameleer.server.core.outbound.OutboundConnection;
+import com.cameleer.server.core.outbound.OutboundConnectionService;
+import com.cameleer.server.core.runtime.Environment;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import jakarta.servlet.http.HttpServletRequest;
+import jakarta.validation.Valid;
+import org.springframework.beans.factory.annotation.Value;
+import org.springframework.http.HttpStatus;
+import org.springframework.http.ResponseEntity;
+import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.security.core.context.SecurityContextHolder;
+import org.springframework.web.bind.annotation.DeleteMapping;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.PathVariable;
+import org.springframework.web.bind.annotation.PostMapping;
+import org.springframework.web.bind.annotation.PutMapping;
+import org.springframework.web.bind.annotation.RequestBody;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;
+
+import java.time.Clock;
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+import java.util.regex.Pattern;
+
+/**
+ * REST controller for alert rules (env-scoped).
+ * <p>
+ * CRITICAL: {@link ExchangeMatchCondition#filter()} attribute KEYS are inlined into ClickHouse SQL.
+ * They are validated here at save time to match {@code ^[a-zA-Z0-9._-]+$} before any SQL is built.
+ */
+@RestController
+@RequestMapping("/api/v1/environments/{envSlug}/alerts/rules")
+@Tag(name = "Alert Rules", description = "Alert rule management (env-scoped)")
+@PreAuthorize("hasAnyRole('VIEWER','OPERATOR','ADMIN')")
+public class AlertRuleController {
+
+    /**
+     * Attribute KEY allowlist. Keys are inlined into ClickHouse SQL via
+     * {@code JSONExtractString(attributes, '<key>')}, so this pattern is a hard security gate.
+     * Values are always parameter-bound and safe.
+     */
+    private static final Pattern ATTR_KEY = Pattern.compile("^[a-zA-Z0-9._-]+$");
+
+    private final AlertRuleRepository ruleRepo;
+    private final OutboundConnectionService connectionService;
+    private final AuditService auditService;
+    private final MustacheRenderer renderer;
+    private final Map<ConditionKind, ConditionEvaluator<?>> evaluators;
+    private final Clock clock;
+    private final String tenantId;
+
+    @SuppressWarnings("SpringJavaInjectionPointsAutowiringInspection")
+    public AlertRuleController(AlertRuleRepository ruleRepo,
+                               OutboundConnectionService connectionService,
+                               AuditService auditService,
+                               MustacheRenderer renderer,
+                               List<ConditionEvaluator<?>> evaluatorList,
+                               Clock alertingClock,
+                               @Value("${cameleer.server.tenant.id:default}") String tenantId) {
+        this.ruleRepo = ruleRepo;
+        this.connectionService = connectionService;
+        this.auditService = auditService;
+        this.renderer = renderer;
+        this.evaluators = new java.util.EnumMap<>(ConditionKind.class);
+        for (ConditionEvaluator<?> e : evaluatorList) {
+            this.evaluators.put(e.kind(), e);
+        }
+        this.clock = alertingClock;
+        this.tenantId = tenantId;
+    }
+
+    // -------------------------------------------------------------------------
+    // List / Get
+    // -------------------------------------------------------------------------
+
+    @GetMapping
+    public List<AlertRuleResponse> list(@EnvPath Environment env) {
+        return ruleRepo.listByEnvironment(env.id())
+                .stream().map(AlertRuleResponse::from).toList();
+    }
+
+    @GetMapping("/{id}")
+    public AlertRuleResponse get(@EnvPath Environment env, @PathVariable UUID id) {
+        AlertRule rule = requireRule(id, env.id());
+        return AlertRuleResponse.from(rule);
+    }
+
+    // -------------------------------------------------------------------------
+    // Create / Update / Delete
+    // -------------------------------------------------------------------------
+
+    @PostMapping
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public ResponseEntity<AlertRuleResponse> create(
+            @EnvPath Environment env,
+            @Valid @RequestBody AlertRuleRequest req,
+            HttpServletRequest httpRequest) {
+
+        validateAttributeKeys(req.condition());
+        validateBusinessRules(req);
+        validateWebhooks(req.webhooks(), env.id());
+
+        AlertRule draft = buildRule(null, env.id(), req, currentUserId());
+        AlertRule saved = ruleRepo.save(draft);
+
+        auditService.log("ALERT_RULE_CREATE", AuditCategory.ALERT_RULE_CHANGE,
+                saved.id().toString(), Map.of("name", saved.name()), AuditResult.SUCCESS, httpRequest);
+
+        return ResponseEntity.status(HttpStatus.CREATED).body(AlertRuleResponse.from(saved));
+    }
+
+    @PutMapping("/{id}")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public AlertRuleResponse update(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            @Valid @RequestBody AlertRuleRequest req,
+            HttpServletRequest httpRequest) {
+
+        AlertRule existing = requireRule(id, env.id());
+        validateAttributeKeys(req.condition());
+        validateBusinessRules(req);
+        validateWebhooks(req.webhooks(), env.id());
+
+        AlertRule updated = buildRule(existing, env.id(), req, currentUserId());
+        AlertRule saved = ruleRepo.save(updated);
+
+        auditService.log("ALERT_RULE_UPDATE", AuditCategory.ALERT_RULE_CHANGE,
+                id.toString(), Map.of("name", saved.name()), AuditResult.SUCCESS, httpRequest);
+
+        return AlertRuleResponse.from(saved);
+    }
+
+    @DeleteMapping("/{id}")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public ResponseEntity<Void> delete(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            HttpServletRequest httpRequest) {
+
+        requireRule(id, env.id());
+        ruleRepo.delete(id);
+
+        auditService.log("ALERT_RULE_DELETE", AuditCategory.ALERT_RULE_CHANGE,
+                id.toString(), Map.of(), AuditResult.SUCCESS, httpRequest);
+
+        return ResponseEntity.noContent().build();
+    }
+
+    // -------------------------------------------------------------------------
+    // Enable / Disable
+    // -------------------------------------------------------------------------
+
+    @PostMapping("/{id}/enable")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public AlertRuleResponse enable(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            HttpServletRequest httpRequest) {
+
+        AlertRule rule = requireRule(id, env.id());
+        AlertRule updated = withEnabled(rule, true);
+        AlertRule saved = ruleRepo.save(updated);
+
+        auditService.log("ALERT_RULE_ENABLE", AuditCategory.ALERT_RULE_CHANGE,
+                id.toString(), Map.of("name", saved.name()), AuditResult.SUCCESS, httpRequest);
+
+        return AlertRuleResponse.from(saved);
+    }
+
+    @PostMapping("/{id}/disable")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public AlertRuleResponse disable(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            HttpServletRequest httpRequest) {
+
+        AlertRule rule = requireRule(id, env.id());
+        AlertRule updated = withEnabled(rule, false);
+        AlertRule saved = ruleRepo.save(updated);
+
+        auditService.log("ALERT_RULE_DISABLE", AuditCategory.ALERT_RULE_CHANGE,
+                id.toString(), Map.of("name", saved.name()), AuditResult.SUCCESS, httpRequest);
+
+        return AlertRuleResponse.from(saved);
+    }
+
+    // -------------------------------------------------------------------------
+    // Render Preview
+    // -------------------------------------------------------------------------
+
+    @PostMapping("/{id}/render-preview")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public RenderPreviewResponse renderPreview(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            @RequestBody RenderPreviewRequest req) {
+
+        AlertRule rule = requireRule(id, env.id());
+        Map<String, Object> ctx = req.context();
+        String title   = renderer.render(rule.notificationTitleTmpl(),   ctx);
+        String message = renderer.render(rule.notificationMessageTmpl(), ctx);
+        return new RenderPreviewResponse(title, message);
+    }
+
+    // -------------------------------------------------------------------------
+    // Test Evaluate
+    // -------------------------------------------------------------------------
+
+    @PostMapping("/{id}/test-evaluate")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    @SuppressWarnings({"rawtypes", "unchecked"})
+    public TestEvaluateResponse testEvaluate(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            @RequestBody TestEvaluateRequest req) {
+
+        AlertRule rule = requireRule(id, env.id());
+        ConditionEvaluator evaluator = evaluators.get(rule.conditionKind());
+        if (evaluator == null) {
+            throw new ResponseStatusException(HttpStatus.UNPROCESSABLE_ENTITY,
+                    "No evaluator registered for condition kind: " + rule.conditionKind());
+        }
+
+        EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), new TickCache());
+        EvalResult result = evaluator.evaluate(rule.condition(), rule, ctx);
+        return TestEvaluateResponse.from(result);
+    }
+
+    // -------------------------------------------------------------------------
+    // Helpers
+    // -------------------------------------------------------------------------
+
+    /**
+     * Cross-field business-rule validation for {@link AlertRuleRequest}.
+     *
+     * <p>PER_EXCHANGE rules: re-notify and for-duration are nonsensical (each fire is its own
+     * exchange — there's no "still firing" window and nothing to re-notify about). Reject 400
+     * if either is non-zero.
+     *
+     * <p>All rules: reject 400 if both webhooks and targets are empty — such a rule can never
+     * notify anyone and is a pure footgun.
+     */
+    private void validateBusinessRules(AlertRuleRequest req) {
+        if (req.condition() instanceof ExchangeMatchCondition ex
+                && ex.fireMode() == FireMode.PER_EXCHANGE) {
+            if (req.reNotifyMinutes() != null && req.reNotifyMinutes() != 0) {
+                throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                        "reNotifyMinutes must be 0 for PER_EXCHANGE rules (re-notify does not apply)");
+            }
+            if (req.forDurationSeconds() != null && req.forDurationSeconds() != 0) {
+                throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                        "forDurationSeconds must be 0 for PER_EXCHANGE rules");
+            }
+        }
+        boolean noWebhooks = req.webhooks() == null || req.webhooks().isEmpty();
+        boolean noTargets  = req.targets()  == null || req.targets().isEmpty();
+        if (noWebhooks && noTargets) {
+            throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                    "rule must have at least one webhook or target — otherwise it never notifies anyone");
+        }
+    }
+
+    /**
+     * Validates that all attribute keys in an {@link ExchangeMatchCondition} match
+     * {@code ^[a-zA-Z0-9._-]+$}. Keys are inlined into ClickHouse SQL, making this
+     * a mandatory SQL-injection prevention gate.
+     */
+    private void validateAttributeKeys(AlertCondition condition) {
+        if (condition instanceof ExchangeMatchCondition emc && emc.filter() != null) {
+            for (String key : emc.filter().attributes().keySet()) {
+                if (!ATTR_KEY.matcher(key).matches()) {
+                    throw new ResponseStatusException(HttpStatus.UNPROCESSABLE_ENTITY,
+                            "Invalid attribute key (must match [a-zA-Z0-9._-]+): " + key);
+                }
+            }
+        }
+    }
+
+    /**
+     * Validates that each webhook outboundConnectionId exists and is allowed in this environment.
+     */
+    private void validateWebhooks(List<WebhookBindingRequest> webhooks, UUID envId) {
+        for (WebhookBindingRequest wb : webhooks) {
+            OutboundConnection conn;
+            try {
+                conn = connectionService.get(wb.outboundConnectionId());
+            } catch (org.springframework.web.server.ResponseStatusException ex) {
+                throw new ResponseStatusException(HttpStatus.UNPROCESSABLE_ENTITY,
+                        "outboundConnectionId not found: " + wb.outboundConnectionId());
+            } catch (Exception ex) {
+                throw new ResponseStatusException(HttpStatus.UNPROCESSABLE_ENTITY,
+                        "outboundConnectionId not found: " + wb.outboundConnectionId());
+            }
+            if (!conn.isAllowedInEnvironment(envId)) {
+                throw new ResponseStatusException(HttpStatus.UNPROCESSABLE_ENTITY,
+                        "outboundConnection " + wb.outboundConnectionId()
+                                + " is not allowed in this environment");
+            }
+        }
+    }
+
+    private AlertRule requireRule(UUID id, UUID envId) {
+        AlertRule rule = ruleRepo.findById(id)
+                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND,
+                        "Alert rule not found: " + id));
+        if (!rule.environmentId().equals(envId)) {
+            throw new ResponseStatusException(HttpStatus.NOT_FOUND,
+                    "Alert rule not found in this environment: " + id);
+        }
+        return rule;
+    }
+
+    private AlertRule buildRule(AlertRule existing, UUID envId, AlertRuleRequest req, String userId) {
+        UUID id = existing != null ? existing.id() : UUID.randomUUID();
+        Instant now = Instant.now(clock);
+        Instant createdAt = existing != null ? existing.createdAt() : now;
+        String createdBy  = existing != null ? existing.createdBy()  : userId;
+        boolean enabled   = existing != null ? existing.enabled() : true;
+
+        List<WebhookBinding> webhooks = req.webhooks().stream()
+                .map(wb -> new WebhookBinding(
+                        UUID.randomUUID(),
+                        wb.outboundConnectionId(),
+                        wb.bodyOverride(),
+                        wb.headerOverrides()))
+                .toList();
+
+        List<AlertRuleTarget> targets = req.targets() == null ? List.of() : req.targets();
+
+        int evalInterval = req.evaluationIntervalSeconds() != null
+                ? req.evaluationIntervalSeconds() : 60;
+        int forDuration = req.forDurationSeconds() != null
+                ? req.forDurationSeconds() : 0;
+        int reNotify = req.reNotifyMinutes() != null
+                ? req.reNotifyMinutes() : 0;
+
+        String titleTmpl   = req.notificationTitleTmpl()   != null ? req.notificationTitleTmpl()   : "";
+        String messageTmpl = req.notificationMessageTmpl() != null ? req.notificationMessageTmpl() : "";
+
+        return new AlertRule(
+                id, envId, req.name(), req.description(),
+                req.severity(), enabled,
+                req.conditionKind(), req.condition(),
+                evalInterval, forDuration, reNotify,
+                titleTmpl, messageTmpl,
+                webhooks, targets,
+                now, null, null, Map.of(),
+                createdAt, createdBy, now, userId);
+    }
+
+    private AlertRule withEnabled(AlertRule r, boolean enabled) {
+        Instant now = Instant.now(clock);
+        return new AlertRule(
+                r.id(), r.environmentId(), r.name(), r.description(),
+                r.severity(), enabled, r.conditionKind(), r.condition(),
+                r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
+                r.notificationTitleTmpl(), r.notificationMessageTmpl(),
+                r.webhooks(), r.targets(),
+                r.nextEvaluationAt(), r.claimedBy(), r.claimedUntil(), r.evalState(),
+                r.createdAt(), r.createdBy(), now, currentUserId());
+    }
+
+    private String currentUserId() {
+        var auth = SecurityContextHolder.getContext().getAuthentication();
+        if (auth == null || auth.getName() == null) {
+            throw new ResponseStatusException(HttpStatus.UNAUTHORIZED, "No authentication");
+        }
+        String name = auth.getName();
+        return name.startsWith("user:") ? name.substring(5) : name;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertSilenceController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/controller/AlertSilenceController.java
@@ -0,0 +1,151 @@
+package com.cameleer.server.app.alerting.controller;
+
+import com.cameleer.server.app.alerting.dto.AlertSilenceRequest;
+import com.cameleer.server.app.alerting.dto.AlertSilenceResponse;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.admin.AuditCategory;
+import com.cameleer.server.core.admin.AuditResult;
+import com.cameleer.server.core.admin.AuditService;
+import com.cameleer.server.core.alerting.AlertSilence;
+import com.cameleer.server.core.alerting.AlertSilenceRepository;
+import com.cameleer.server.core.runtime.Environment;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import jakarta.servlet.http.HttpServletRequest;
+import jakarta.validation.Valid;
+import org.springframework.http.HttpStatus;
+import org.springframework.http.ResponseEntity;
+import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.security.core.context.SecurityContextHolder;
+import org.springframework.web.bind.annotation.DeleteMapping;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.PathVariable;
+import org.springframework.web.bind.annotation.PostMapping;
+import org.springframework.web.bind.annotation.PutMapping;
+import org.springframework.web.bind.annotation.RequestBody;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * REST controller for alert silences (env-scoped).
+ * VIEWER+ can list; OPERATOR+ can create/update/delete.
+ */
+@RestController
+@RequestMapping("/api/v1/environments/{envSlug}/alerts/silences")
+@Tag(name = "Alert Silences", description = "Alert silence management (env-scoped)")
+@PreAuthorize("hasAnyRole('VIEWER','OPERATOR','ADMIN')")
+public class AlertSilenceController {
+
+    private final AlertSilenceRepository silenceRepo;
+    private final AuditService auditService;
+
+    public AlertSilenceController(AlertSilenceRepository silenceRepo,
+                                  AuditService auditService) {
+        this.silenceRepo = silenceRepo;
+        this.auditService = auditService;
+    }
+
+    @GetMapping
+    public List<AlertSilenceResponse> list(@EnvPath Environment env) {
+        return silenceRepo.listByEnvironment(env.id())
+                .stream().map(AlertSilenceResponse::from).toList();
+    }
+
+    @PostMapping
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public ResponseEntity<AlertSilenceResponse> create(
+            @EnvPath Environment env,
+            @Valid @RequestBody AlertSilenceRequest req,
+            HttpServletRequest httpRequest) {
+
+        validateTimeRange(req);
+
+        AlertSilence silence = new AlertSilence(
+                UUID.randomUUID(), env.id(), req.matcher(), req.reason(),
+                req.startsAt(), req.endsAt(),
+                currentUserId(), Instant.now());
+
+        AlertSilence saved = silenceRepo.save(silence);
+
+        auditService.log("ALERT_SILENCE_CREATE", AuditCategory.ALERT_SILENCE_CHANGE,
+                saved.id().toString(), Map.of(), AuditResult.SUCCESS, httpRequest);
+
+        return ResponseEntity.status(HttpStatus.CREATED).body(AlertSilenceResponse.from(saved));
+    }
+
+    @PutMapping("/{id}")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public AlertSilenceResponse update(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            @Valid @RequestBody AlertSilenceRequest req,
+            HttpServletRequest httpRequest) {
+
+        AlertSilence existing = requireSilence(id, env.id());
+        validateTimeRange(req);
+
+        AlertSilence updated = new AlertSilence(
+                existing.id(), env.id(), req.matcher(), req.reason(),
+                req.startsAt(), req.endsAt(),
+                existing.createdBy(), existing.createdAt());
+
+        AlertSilence saved = silenceRepo.save(updated);
+
+        auditService.log("ALERT_SILENCE_UPDATE", AuditCategory.ALERT_SILENCE_CHANGE,
+                id.toString(), Map.of(), AuditResult.SUCCESS, httpRequest);
+
+        return AlertSilenceResponse.from(saved);
+    }
+
+    @DeleteMapping("/{id}")
+    @PreAuthorize("hasAnyRole('OPERATOR','ADMIN')")
+    public ResponseEntity<Void> delete(
+            @EnvPath Environment env,
+            @PathVariable UUID id,
+            HttpServletRequest httpRequest) {
+
+        requireSilence(id, env.id());
+        silenceRepo.delete(id);
+
+        auditService.log("ALERT_SILENCE_DELETE", AuditCategory.ALERT_SILENCE_CHANGE,
+                id.toString(), Map.of(), AuditResult.SUCCESS, httpRequest);
+
+        return ResponseEntity.noContent().build();
+    }
+
+    // -------------------------------------------------------------------------
+    // Helpers
+    // -------------------------------------------------------------------------
+
+    private void validateTimeRange(AlertSilenceRequest req) {
+        if (!req.endsAt().isAfter(req.startsAt())) {
+            throw new ResponseStatusException(HttpStatus.UNPROCESSABLE_ENTITY,
+                    "endsAt must be after startsAt");
+        }
+    }
+
+    private AlertSilence requireSilence(UUID id, UUID envId) {
+        AlertSilence silence = silenceRepo.findById(id)
+                .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND,
+                        "Alert silence not found: " + id));
+        if (!silence.environmentId().equals(envId)) {
+            throw new ResponseStatusException(HttpStatus.NOT_FOUND,
+                    "Alert silence not found in this environment: " + id);
+        }
+        return silence;
+    }
+
+    private String currentUserId() {
+        var auth = SecurityContextHolder.getContext().getAuthentication();
+        if (auth == null || auth.getName() == null) {
+            throw new ResponseStatusException(HttpStatus.UNAUTHORIZED, "No authentication");
+        }
+        String name = auth.getName();
+        return name.startsWith("user:") ? name.substring(5) : name;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertDto.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertDto.java
@@ -0,0 +1,36 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.AlertInstance;
+import com.cameleer.server.core.alerting.AlertSeverity;
+import com.cameleer.server.core.alerting.AlertState;
+
+import java.time.Instant;
+import java.util.Map;
+import java.util.UUID;
+
+public record AlertDto(
+        UUID id,
+        UUID ruleId,
+        UUID environmentId,
+        AlertState state,
+        AlertSeverity severity,
+        String title,
+        String message,
+        Instant firedAt,
+        Instant ackedAt,
+        String ackedBy,
+        Instant resolvedAt,
+        Instant readAt,     // global "has anyone read this"
+        boolean silenced,
+        Double currentValue,
+        Double threshold,
+        Map<String, Object> context
+) {
+    public static AlertDto from(AlertInstance i) {
+        return new AlertDto(
+                i.id(), i.ruleId(), i.environmentId(), i.state(), i.severity(),
+                i.title(), i.message(), i.firedAt(), i.ackedAt(), i.ackedBy(),
+                i.resolvedAt(), i.readAt(), i.silenced(),
+                i.currentValue(), i.threshold(), i.context());
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertNotificationDto.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertNotificationDto.java
@@ -0,0 +1,29 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.AlertNotification;
+import com.cameleer.server.core.alerting.NotificationStatus;
+
+import java.time.Instant;
+import java.util.UUID;
+
+public record AlertNotificationDto(
+        UUID id,
+        UUID alertInstanceId,
+        UUID webhookId,
+        UUID outboundConnectionId,
+        NotificationStatus status,
+        int attempts,
+        Instant nextAttemptAt,
+        Integer lastResponseStatus,
+        String lastResponseSnippet,
+        Instant deliveredAt,
+        Instant createdAt
+) {
+    public static AlertNotificationDto from(AlertNotification n) {
+        return new AlertNotificationDto(
+                n.id(), n.alertInstanceId(), n.webhookId(), n.outboundConnectionId(),
+                n.status(), n.attempts(), n.nextAttemptAt(),
+                n.lastResponseStatus(), n.lastResponseSnippet(),
+                n.deliveredAt(), n.createdAt());
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertRuleRequest.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertRuleRequest.java
@@ -0,0 +1,32 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.AlertCondition;
+import com.cameleer.server.core.alerting.AlertRuleTarget;
+import com.cameleer.server.core.alerting.AlertSeverity;
+import com.cameleer.server.core.alerting.ConditionKind;
+import jakarta.validation.Valid;
+import jakarta.validation.constraints.NotBlank;
+import jakarta.validation.constraints.NotNull;
+
+import java.util.List;
+import java.util.UUID;
+
+public record AlertRuleRequest(
+        @NotBlank String name,
+        String description,
+        @NotNull AlertSeverity severity,
+        @NotNull ConditionKind conditionKind,
+        @NotNull @Valid AlertCondition condition,
+        Integer evaluationIntervalSeconds,
+        Integer forDurationSeconds,
+        Integer reNotifyMinutes,
+        String notificationTitleTmpl,
+        String notificationMessageTmpl,
+        List<WebhookBindingRequest> webhooks,
+        List<AlertRuleTarget> targets
+) {
+    public AlertRuleRequest {
+        webhooks = webhooks == null ? List.of() : List.copyOf(webhooks);
+        targets  = targets  == null ? List.of() : List.copyOf(targets);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertRuleResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertRuleResponse.java
@@ -0,0 +1,46 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.AlertCondition;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.AlertRuleTarget;
+import com.cameleer.server.core.alerting.AlertSeverity;
+import com.cameleer.server.core.alerting.ConditionKind;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.UUID;
+
+public record AlertRuleResponse(
+        UUID id,
+        UUID environmentId,
+        String name,
+        String description,
+        AlertSeverity severity,
+        boolean enabled,
+        ConditionKind conditionKind,
+        AlertCondition condition,
+        int evaluationIntervalSeconds,
+        int forDurationSeconds,
+        int reNotifyMinutes,
+        String notificationTitleTmpl,
+        String notificationMessageTmpl,
+        List<WebhookBindingResponse> webhooks,
+        List<AlertRuleTarget> targets,
+        Instant createdAt,
+        String createdBy,
+        Instant updatedAt,
+        String updatedBy
+) {
+    public static AlertRuleResponse from(AlertRule r) {
+        List<WebhookBindingResponse> webhooks = r.webhooks().stream()
+                .map(WebhookBindingResponse::from)
+                .toList();
+        return new AlertRuleResponse(
+                r.id(), r.environmentId(), r.name(), r.description(),
+                r.severity(), r.enabled(), r.conditionKind(), r.condition(),
+                r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
+                r.notificationTitleTmpl(), r.notificationMessageTmpl(),
+                webhooks, r.targets(),
+                r.createdAt(), r.createdBy(), r.updatedAt(), r.updatedBy());
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertSilenceRequest.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertSilenceRequest.java
@@ -0,0 +1,14 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.SilenceMatcher;
+import jakarta.validation.Valid;
+import jakarta.validation.constraints.NotNull;
+
+import java.time.Instant;
+
+public record AlertSilenceRequest(
+        @NotNull @Valid SilenceMatcher matcher,
+        String reason,
+        @NotNull Instant startsAt,
+        @NotNull Instant endsAt
+) {}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertSilenceResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/AlertSilenceResponse.java
@@ -0,0 +1,24 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.AlertSilence;
+import com.cameleer.server.core.alerting.SilenceMatcher;
+
+import java.time.Instant;
+import java.util.UUID;
+
+public record AlertSilenceResponse(
+        UUID id,
+        UUID environmentId,
+        SilenceMatcher matcher,
+        String reason,
+        Instant startsAt,
+        Instant endsAt,
+        String createdBy,
+        Instant createdAt
+) {
+    public static AlertSilenceResponse from(AlertSilence s) {
+        return new AlertSilenceResponse(
+                s.id(), s.environmentId(), s.matcher(), s.reason(),
+                s.startsAt(), s.endsAt(), s.createdBy(), s.createdAt());
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/BulkIdsRequest.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/BulkIdsRequest.java
@@ -0,0 +1,10 @@
+package com.cameleer.server.app.alerting.dto;
+
+import jakarta.validation.constraints.NotNull;
+import jakarta.validation.constraints.Size;
+
+import java.util.List;
+import java.util.UUID;
+
+/** Shared body for bulk-read / bulk-ack / bulk-delete requests. */
+public record BulkIdsRequest(@NotNull @Size(min = 1, max = 500) List<UUID> instanceIds) {}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/RenderPreviewRequest.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/RenderPreviewRequest.java
@@ -0,0 +1,13 @@
+package com.cameleer.server.app.alerting.dto;
+
+import java.util.Map;
+
+/**
+ * Canned context for rendering a Mustache template preview without firing a real alert.
+ * All fields are optional — missing context keys render as empty string.
+ */
+public record RenderPreviewRequest(Map<String, Object> context) {
+    public RenderPreviewRequest {
+        context = context == null ? Map.of() : Map.copyOf(context);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/RenderPreviewResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/RenderPreviewResponse.java
@@ -0,0 +1,3 @@
+package com.cameleer.server.app.alerting.dto;
+
+public record RenderPreviewResponse(String title, String message) {}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/TestEvaluateRequest.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/TestEvaluateRequest.java
@@ -0,0 +1,8 @@
+package com.cameleer.server.app.alerting.dto;
+
+/**
+ * Request body for POST {id}/test-evaluate.
+ * Currently empty — the evaluator runs against live data using the saved rule definition.
+ * Reserved for future overrides (e.g., custom time window).
+ */
+public record TestEvaluateRequest() {}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/TestEvaluateResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/TestEvaluateResponse.java
@@ -0,0 +1,24 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.app.alerting.eval.EvalResult;
+
+/**
+ * Result of a one-shot evaluator run against live data (does not persist any state).
+ */
+public record TestEvaluateResponse(String resultKind, String detail) {
+
+    public static TestEvaluateResponse from(EvalResult result) {
+        if (result instanceof EvalResult.Firing f) {
+            return new TestEvaluateResponse("FIRING",
+                    "currentValue=" + f.currentValue() + " threshold=" + f.threshold());
+        } else if (result instanceof EvalResult.Clear) {
+            return new TestEvaluateResponse("CLEAR", null);
+        } else if (result instanceof EvalResult.Error e) {
+            return new TestEvaluateResponse("ERROR",
+                    e.cause() != null ? e.cause().getMessage() : "unknown error");
+        } else if (result instanceof EvalResult.Batch b) {
+            return new TestEvaluateResponse("BATCH", b.firings().size() + " firing(s)");
+        }
+        return new TestEvaluateResponse("UNKNOWN", result.getClass().getSimpleName());
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/UnreadCountResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/UnreadCountResponse.java
@@ -0,0 +1,29 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.AlertSeverity;
+
+import java.util.EnumMap;
+import java.util.Map;
+
+/**
+ * Response shape for {@code GET /alerts/unread-count}.
+ * <p>
+ * {@code total} is the sum of {@code bySeverity} values. The UI branches bell colour on
+ * the highest severity present, so callers can inspect the map directly.
+ */
+public record UnreadCountResponse(long total, Map<AlertSeverity, Long> bySeverity) {
+
+    public UnreadCountResponse {
+        // Defensive copy + fill in missing severities as 0 so the UI never sees null/undefined.
+        EnumMap<AlertSeverity, Long> normalized = new EnumMap<>(AlertSeverity.class);
+        for (AlertSeverity s : AlertSeverity.values()) normalized.put(s, 0L);
+        if (bySeverity != null) bySeverity.forEach((k, v) -> normalized.put(k, v == null ? 0L : v));
+        bySeverity = Map.copyOf(normalized);
+    }
+
+    public static UnreadCountResponse from(Map<AlertSeverity, Long> counts) {
+        long total = counts == null ? 0L
+                : counts.values().stream().filter(v -> v != null).mapToLong(Long::longValue).sum();
+        return new UnreadCountResponse(total, counts == null ? Map.of() : counts);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/WebhookBindingRequest.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/WebhookBindingRequest.java
@@ -0,0 +1,16 @@
+package com.cameleer.server.app.alerting.dto;
+
+import jakarta.validation.constraints.NotNull;
+
+import java.util.Map;
+import java.util.UUID;
+
+public record WebhookBindingRequest(
+        @NotNull UUID outboundConnectionId,
+        String bodyOverride,
+        Map<String, String> headerOverrides
+) {
+    public WebhookBindingRequest {
+        headerOverrides = headerOverrides == null ? Map.of() : Map.copyOf(headerOverrides);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/WebhookBindingResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/dto/WebhookBindingResponse.java
@@ -0,0 +1,18 @@
+package com.cameleer.server.app.alerting.dto;
+
+import com.cameleer.server.core.alerting.WebhookBinding;
+
+import java.util.Map;
+import java.util.UUID;
+
+public record WebhookBindingResponse(
+        UUID id,
+        UUID outboundConnectionId,
+        String bodyOverride,
+        Map<String, String> headerOverrides
+) {
+    public static WebhookBindingResponse from(WebhookBinding wb) {
+        return new WebhookBindingResponse(
+                wb.id(), wb.outboundConnectionId(), wb.bodyOverride(), wb.headerOverrides());
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentLifecycleEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentLifecycleEvaluator.java
@@ -0,0 +1,95 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.core.agent.AgentEventRecord;
+import com.cameleer.server.core.agent.AgentEventRepository;
+import com.cameleer.server.core.alerting.AgentLifecycleCondition;
+import com.cameleer.server.core.alerting.AgentLifecycleEventType;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.AlertScope;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import org.springframework.stereotype.Component;
+
+import java.time.Instant;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Evaluator for {@link AgentLifecycleCondition}.
+ * <p>
+ * Each matching row in {@code agent_events} produces its own {@link EvalResult.Firing}
+ * in an {@link EvalResult.Batch}, so every {@code (agent, eventType, timestamp)}
+ * tuple gets its own {@code AlertInstance} — operationally distinct outages /
+ * restarts / shutdowns are independently ackable. Deduplication across ticks
+ * is enforced by {@code alert_instances_open_rule_uq} via the canonical
+ * {@code _subjectFingerprint} key in the instance context (see V16 migration).
+ */
+@Component
+public class AgentLifecycleEvaluator implements ConditionEvaluator<AgentLifecycleCondition> {
+
+    /** Hard cap on rows returned per tick — prevents a flood of stale events from overwhelming the job. */
+    private static final int MAX_EVENTS_PER_TICK = 500;
+
+    private final AgentEventRepository eventRepo;
+    private final EnvironmentRepository envRepo;
+
+    public AgentLifecycleEvaluator(AgentEventRepository eventRepo, EnvironmentRepository envRepo) {
+        this.eventRepo = eventRepo;
+        this.envRepo   = envRepo;
+    }
+
+    @Override
+    public ConditionKind kind() { return ConditionKind.AGENT_LIFECYCLE; }
+
+    @Override
+    public EvalResult evaluate(AgentLifecycleCondition c, AlertRule rule, EvalContext ctx) {
+        String envSlug = envRepo.findById(rule.environmentId())
+                .map(e -> e.slug())
+                .orElse(null);
+        if (envSlug == null) return EvalResult.Clear.INSTANCE;
+
+        AlertScope scope = c.scope();
+        String appSlug  = scope != null ? scope.appSlug()  : null;
+        String agentId  = scope != null ? scope.agentId()  : null;
+
+        List<String> typeNames = c.eventTypes().stream()
+                .map(AgentLifecycleEventType::name)
+                .toList();
+
+        Instant from = ctx.now().minusSeconds(c.withinSeconds());
+        Instant to   = ctx.now();
+
+        List<AgentEventRecord> matches = eventRepo.findInWindow(
+                envSlug, appSlug, agentId, typeNames, from, to, MAX_EVENTS_PER_TICK);
+
+        if (matches.isEmpty()) return new EvalResult.Batch(List.of(), Map.of());
+
+        List<EvalResult.Firing> firings = new ArrayList<>(matches.size());
+        for (AgentEventRecord ev : matches) {
+            firings.add(toFiring(ev));
+        }
+        return new EvalResult.Batch(firings, Map.of());
+    }
+
+    private static EvalResult.Firing toFiring(AgentEventRecord ev) {
+        String fingerprint = (ev.instanceId() == null ? "" : ev.instanceId())
+                + ":" + (ev.eventType() == null ? "" : ev.eventType())
+                + ":" + (ev.timestamp() == null ? "0" : Long.toString(ev.timestamp().toEpochMilli()));
+
+        Map<String, Object> context = new LinkedHashMap<>();
+        context.put("agent", Map.of(
+                "id",  ev.instanceId()    == null ? "" : ev.instanceId(),
+                "app", ev.applicationId() == null ? "" : ev.applicationId()
+        ));
+        context.put("event", Map.of(
+                "type",      ev.eventType() == null ? "" : ev.eventType(),
+                "timestamp", ev.timestamp() == null ? "" : ev.timestamp().toString(),
+                "detail",    ev.detail()    == null ? "" : ev.detail()
+        ));
+        context.put("_subjectFingerprint", fingerprint);
+
+        return new EvalResult.Firing(1.0, null, context);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java
@@ -0,0 +1,61 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.core.agent.AgentInfo;
+import com.cameleer.server.core.agent.AgentRegistryService;
+import com.cameleer.server.core.agent.AgentState;
+import com.cameleer.server.core.alerting.AgentStateCondition;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.AlertScope;
+import com.cameleer.server.core.alerting.ConditionKind;
+import org.springframework.stereotype.Component;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+
+@Component
+public class AgentStateEvaluator implements ConditionEvaluator<AgentStateCondition> {
+
+    private final AgentRegistryService registry;
+
+    public AgentStateEvaluator(AgentRegistryService registry) {
+        this.registry = registry;
+    }
+
+    @Override
+    public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
+
+    @Override
+    public EvalResult evaluate(AgentStateCondition c, AlertRule rule, EvalContext ctx) {
+        AgentState target = AgentState.valueOf(c.state());
+        Instant cutoff = ctx.now().minusSeconds(c.forSeconds());
+
+        List<AgentInfo> hits = registry.findAll().stream()
+                .filter(a -> matchesScope(a, c.scope()))
+                .filter(a -> a.state() == target)
+                .filter(a -> a.lastHeartbeat() != null && a.lastHeartbeat().isBefore(cutoff))
+                .toList();
+
+        if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
+
+        AgentInfo first = hits.get(0);
+        return new EvalResult.Firing(
+                (double) hits.size(), null,
+                Map.of(
+                        "agent", Map.of(
+                                "id",    first.instanceId(),
+                                "name",  first.displayName(),
+                                "state", first.state().name()
+                        ),
+                        "app", Map.of("slug", first.applicationId())
+                )
+        );
+    }
+
+    private static boolean matchesScope(AgentInfo a, AlertScope s) {
+        if (s == null) return true;
+        if (s.appSlug() != null && !s.appSlug().equals(a.applicationId())) return false;
+        if (s.agentId() != null && !s.agentId().equals(a.instanceId())) return false;
+        return true;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java
@@ -0,0 +1,315 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.app.alerting.config.AlertingProperties;
+import com.cameleer.server.app.alerting.metrics.AlertingMetrics;
+import com.cameleer.server.app.alerting.notify.MustacheRenderer;
+import com.cameleer.server.app.alerting.notify.NotificationContextBuilder;
+import com.cameleer.server.core.alerting.*;
+import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.beans.factory.annotation.Qualifier;
+import org.springframework.beans.factory.annotation.Value;
+import org.springframework.scheduling.annotation.SchedulingConfigurer;
+import org.springframework.scheduling.config.ScheduledTaskRegistrar;
+import org.springframework.stereotype.Component;
+
+import java.time.Clock;
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.stream.Collectors;
+
+/**
+ * Claim-polling evaluator job.
+ * <p>
+ * On each tick, claims a batch of due {@link AlertRule}s via {@code FOR UPDATE SKIP LOCKED},
+ * invokes the matching {@link ConditionEvaluator}, applies the {@link AlertStateTransitions}
+ * state machine, persists any new/updated {@link AlertInstance}, enqueues webhook
+ * {@link AlertNotification}s on first-fire, and releases the claim.
+ */
+@Component
+public class AlertEvaluatorJob implements SchedulingConfigurer {
+
+    private static final Logger log = LoggerFactory.getLogger(AlertEvaluatorJob.class);
+
+    private final AlertingProperties props;
+    private final AlertRuleRepository ruleRepo;
+    private final AlertInstanceRepository instanceRepo;
+    private final AlertNotificationRepository notificationRepo;
+    private final Map<ConditionKind, ConditionEvaluator<?>> evaluators;
+    private final PerKindCircuitBreaker circuitBreaker;
+    private final MustacheRenderer renderer;
+    private final NotificationContextBuilder contextBuilder;
+    private final EnvironmentRepository environmentRepo;
+    private final ObjectMapper objectMapper;
+    private final BatchResultApplier batchResultApplier;
+    private final String instanceId;
+    private final String tenantId;
+    private final Clock clock;
+    private final AlertingMetrics metrics;
+
+    @SuppressWarnings("SpringJavaInjectionPointsAutowiringInspection")
+    public AlertEvaluatorJob(
+            AlertingProperties props,
+            AlertRuleRepository ruleRepo,
+            AlertInstanceRepository instanceRepo,
+            AlertNotificationRepository notificationRepo,
+            List<ConditionEvaluator<?>> evaluatorList,
+            PerKindCircuitBreaker circuitBreaker,
+            MustacheRenderer renderer,
+            NotificationContextBuilder contextBuilder,
+            EnvironmentRepository environmentRepo,
+            ObjectMapper objectMapper,
+            BatchResultApplier batchResultApplier,
+            @Qualifier("alertingInstanceId") String instanceId,
+            @Value("${cameleer.server.tenant.id:default}") String tenantId,
+            Clock alertingClock,
+            AlertingMetrics metrics) {
+
+        this.props              = props;
+        this.ruleRepo           = ruleRepo;
+        this.instanceRepo       = instanceRepo;
+        this.notificationRepo   = notificationRepo;
+        this.evaluators         = evaluatorList.stream()
+                .collect(Collectors.toMap(ConditionEvaluator::kind, e -> e));
+        this.circuitBreaker     = circuitBreaker;
+        this.renderer           = renderer;
+        this.contextBuilder     = contextBuilder;
+        this.environmentRepo    = environmentRepo;
+        this.objectMapper       = objectMapper;
+        this.batchResultApplier = batchResultApplier;
+        this.instanceId         = instanceId;
+        this.tenantId           = tenantId;
+        this.clock              = alertingClock;
+        this.metrics            = metrics;
+    }
+
+    // -------------------------------------------------------------------------
+    // SchedulingConfigurer — register the tick as a fixed-delay task
+    // -------------------------------------------------------------------------
+
+    @Override
+    public void configureTasks(ScheduledTaskRegistrar registrar) {
+        registrar.addFixedDelayTask(this::tick, props.effectiveEvaluatorTickIntervalMs());
+    }
+
+    // -------------------------------------------------------------------------
+    // Tick — package-visible for same-package tests; also accessible cross-package for lifecycle ITs
+    // -------------------------------------------------------------------------
+
+    public void tick() {
+        List<AlertRule> claimed = ruleRepo.claimDueRules(
+                instanceId,
+                props.effectiveEvaluatorBatchSize(),
+                props.effectiveClaimTtlSeconds());
+
+        if (claimed.isEmpty()) return;
+
+        TickCache cache = new TickCache();
+        EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), cache);
+
+        for (AlertRule rule : claimed) {
+            Instant nextRun = Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds());
+            if (circuitBreaker.isOpen(rule.conditionKind())) {
+                log.debug("Circuit breaker open for {}; skipping rule {}", rule.conditionKind(), rule.id());
+                reschedule(rule, nextRun);
+                continue;
+            }
+
+            EvalResult result;
+            try {
+                result = metrics.evalDuration(rule.conditionKind())
+                        .recordCallable(() -> evaluateSafely(rule, ctx));
+            } catch (Exception e) {
+                metrics.evalError(rule.conditionKind(), rule.id());
+                circuitBreaker.recordFailure(rule.conditionKind());
+                log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString());
+                // Evaluation itself failed — release the claim so the rule can be
+                // retried on the next tick. Cursor stays put.
+                reschedule(rule, nextRun);
+                continue;
+            }
+
+            if (result instanceof EvalResult.Batch b) {
+                // Phase 2: the Batch path is atomic. The @Transactional apply() on
+                // BatchResultApplier wraps instance writes, notification enqueues,
+                // AND the cursor advance + releaseClaim into a single tx. A
+                // mid-batch fault rolls everything back — including the cursor —
+                // so the next tick replays the whole batch exactly once.
+                try {
+                    batchResultApplier.apply(rule, b, nextRun);
+                    circuitBreaker.recordSuccess(rule.conditionKind());
+                } catch (Exception e) {
+                    metrics.evalError(rule.conditionKind(), rule.id());
+                    circuitBreaker.recordFailure(rule.conditionKind());
+                    log.warn("Batch apply failed for rule {} ({}): {} — rolling back; next tick will retry",
+                            rule.id(), rule.conditionKind(), e.toString());
+                    // The transaction rolled back. Do NOT call reschedule here —
+                    // leaving claim + next_evaluation_at as they were means the
+                    // claim TTL takes over and the rule becomes due on its own.
+                    // Rethrowing is unnecessary for correctness — the cursor
+                    // stayed put, so exactly-once-per-exchange is preserved.
+                }
+            } else {
+                // Non-Batch path (FIRING / Clear / Error): classic apply + rule
+                // reschedule. Not wrapped in a single tx — semantics unchanged
+                // from pre-Phase-2.
+                try {
+                    applyResult(rule, result);
+                    circuitBreaker.recordSuccess(rule.conditionKind());
+                } catch (Exception e) {
+                    metrics.evalError(rule.conditionKind(), rule.id());
+                    circuitBreaker.recordFailure(rule.conditionKind());
+                    log.warn("applyResult failed for rule {} ({}): {}",
+                            rule.id(), rule.conditionKind(), e.toString());
+                } finally {
+                    reschedule(rule, nextRun);
+                }
+            }
+        }
+
+        sweepReNotify();
+    }
+
+    // -------------------------------------------------------------------------
+    // Re-notification cadence sweep
+    // -------------------------------------------------------------------------
+
+    private void sweepReNotify() {
+        Instant now = Instant.now(clock);
+        List<AlertInstance> due = instanceRepo.listFiringDueForReNotify(now);
+        for (AlertInstance i : due) {
+            try {
+                AlertRule rule = i.ruleId() == null ? null : ruleRepo.findById(i.ruleId()).orElse(null);
+                if (rule == null || rule.reNotifyMinutes() <= 0) continue;
+                enqueueNotifications(rule, i, now);
+                instanceRepo.save(i.withLastNotifiedAt(now));
+                log.debug("Re-notify enqueued for instance {} (rule {})", i.id(), i.ruleId());
+            } catch (Exception e) {
+                log.warn("Re-notify sweep error for instance {}: {}", i.id(), e.toString());
+            }
+        }
+    }
+
+    // -------------------------------------------------------------------------
+    // Evaluation
+    // -------------------------------------------------------------------------
+
+    @SuppressWarnings({"rawtypes", "unchecked"})
+    private EvalResult evaluateSafely(AlertRule rule, EvalContext ctx) {
+        ConditionEvaluator evaluator = evaluators.get(rule.conditionKind());
+        if (evaluator == null) {
+            throw new IllegalStateException("No evaluator registered for " + rule.conditionKind());
+        }
+        return evaluator.evaluate(rule.condition(), rule, ctx);
+    }
+
+    // -------------------------------------------------------------------------
+    // State machine application
+    // -------------------------------------------------------------------------
+
+    private void applyResult(AlertRule rule, EvalResult result) {
+        // Note: the Batch path is handled by BatchResultApplier (transactional) —
+        // tick() routes Batch results there directly and never calls applyResult
+        // for them. This method only handles FIRING / Clear / Error state-machine
+        // transitions for the classic (non-PER_EXCHANGE) path.
+        AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null);
+        Instant now = Instant.now(clock);
+
+        AlertStateTransitions.apply(current, result, rule, now).ifPresent(next -> {
+            // Determine whether this is a newly created instance transitioning to FIRING
+            boolean isFirstFire = current == null && next.state() == AlertState.FIRING;
+            boolean promotedFromPending = current != null
+                    && current.state() == AlertState.PENDING
+                    && next.state() == AlertState.FIRING;
+
+            AlertInstance withSnapshot = next.withRuleSnapshot(snapshotRule(rule));
+            AlertInstance enriched = enrichTitleMessage(rule, withSnapshot);
+            AlertInstance persisted = instanceRepo.save(enriched);
+
+            if (isFirstFire || promotedFromPending) {
+                enqueueNotifications(rule, persisted, now);
+            }
+        });
+    }
+
+    // -------------------------------------------------------------------------
+    // Title / message rendering
+    // -------------------------------------------------------------------------
+
+    private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance) {
+        Environment env = environmentRepo.findById(rule.environmentId()).orElse(null);
+        Map<String, Object> ctx = contextBuilder.build(rule, instance, env, null);
+        String title   = renderer.render(rule.notificationTitleTmpl(), ctx);
+        String message = renderer.render(rule.notificationMessageTmpl(), ctx);
+        return instance.withTitleMessage(title, message);
+    }
+
+    // -------------------------------------------------------------------------
+    // Notification enqueue
+    // -------------------------------------------------------------------------
+
+    private void enqueueNotifications(AlertRule rule, AlertInstance instance, Instant now) {
+        for (WebhookBinding w : rule.webhooks()) {
+            Map<String, Object> payload = buildPayload(rule, instance);
+            notificationRepo.save(new AlertNotification(
+                    UUID.randomUUID(),
+                    instance.id(),
+                    w.id(),
+                    w.outboundConnectionId(),
+                    NotificationStatus.PENDING,
+                    0,
+                    now,
+                    null, null, null, null,
+                    payload,
+                    null,
+                    now));
+        }
+    }
+
+    private Map<String, Object> buildPayload(AlertRule rule, AlertInstance instance) {
+        Environment env = environmentRepo.findById(rule.environmentId()).orElse(null);
+        return contextBuilder.build(rule, instance, env, null);
+    }
+
+    // -------------------------------------------------------------------------
+    // Claim release
+    // -------------------------------------------------------------------------
+
+    private void reschedule(AlertRule rule, Instant nextRun) {
+        ruleRepo.releaseClaim(rule.id(), nextRun, rule.evalState());
+    }
+
+    // -------------------------------------------------------------------------
+    // Rule snapshot helper (used by tests / future extensions)
+    // -------------------------------------------------------------------------
+
+    @SuppressWarnings("unchecked")
+    Map<String, Object> snapshotRule(AlertRule rule) {
+        try {
+            Map<String, Object> raw = objectMapper.convertValue(rule, Map.class);
+            // Map.copyOf (used in AlertInstance compact ctor) rejects null values —
+            // strip them so the snapshot is safe to store.
+            Map<String, Object> safe = new java.util.LinkedHashMap<>();
+            raw.forEach((k, v) -> { if (v != null) safe.put(k, v); });
+            return safe;
+        } catch (Exception e) {
+            log.warn("Failed to snapshot rule {}: {}", rule.id(), e.getMessage());
+            return Map.of("id", rule.id().toString(), "name", rule.name());
+        }
+    }
+
+    // -------------------------------------------------------------------------
+    // Visible for testing
+    // -------------------------------------------------------------------------
+
+    /** Returns the evaluator map (for inspection in tests). */
+    Map<ConditionKind, ConditionEvaluator<?>> evaluators() {
+        return evaluators;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java
@@ -0,0 +1,141 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.core.alerting.AlertInstance;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.AlertRuleTarget;
+import com.cameleer.server.core.alerting.AlertSeverity;
+import com.cameleer.server.core.alerting.AlertState;
+import com.cameleer.server.core.alerting.TargetKind;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+
+/**
+ * Pure, stateless state-machine for alert instance transitions.
+ * <p>
+ * Given the current open instance (nullable) and an EvalResult, returns the new/updated
+ * AlertInstance or {@link Optional#empty()} when no action is needed.
+ * <p>
+ * Batch results must be handled directly in the job; this helper returns empty for them.
+ */
+public final class AlertStateTransitions {
+
+    private AlertStateTransitions() {}
+
+    /**
+     * Apply an EvalResult to the current open AlertInstance.
+     *
+     * @param current the open instance for this rule (PENDING / FIRING), or null if none
+     * @param result  the evaluator outcome
+     * @param rule    the rule being evaluated
+     * @param now     wall-clock instant for the current tick
+     * @return the new or updated AlertInstance, or empty when nothing should change
+     */
+    public static Optional<AlertInstance> apply(
+            AlertInstance current, EvalResult result, AlertRule rule, Instant now) {
+
+        if (result instanceof EvalResult.Clear)            return onClear(current, now);
+        if (result instanceof EvalResult.Firing f)          return onFiring(current, f, rule, now);
+        // EvalResult.Error and EvalResult.Batch — no action (Batch handled by the job directly)
+        return Optional.empty();
+    }
+
+    // -------------------------------------------------------------------------
+    // Clear branch
+    // -------------------------------------------------------------------------
+
+    private static Optional<AlertInstance> onClear(AlertInstance current, Instant now) {
+        if (current == null) return Optional.empty();                   // no open instance — no-op
+        if (current.state() == AlertState.RESOLVED) return Optional.empty(); // already resolved
+        // Any open state (PENDING / FIRING) → RESOLVED
+        return Optional.of(current
+                .withState(AlertState.RESOLVED)
+                .withResolvedAt(now));
+    }
+
+    // -------------------------------------------------------------------------
+    // Firing branch
+    // -------------------------------------------------------------------------
+
+    private static Optional<AlertInstance> onFiring(
+            AlertInstance current, EvalResult.Firing f, AlertRule rule, Instant now) {
+
+        if (current == null) {
+            // No open instance — create a new one
+            AlertState initial = rule.forDurationSeconds() > 0
+                    ? AlertState.PENDING
+                    : AlertState.FIRING;
+            return Optional.of(newInstance(rule, f, initial, now));
+        }
+
+        return switch (current.state()) {
+            case PENDING -> {
+                // Check whether the forDuration window has elapsed
+                Instant promoteAt = current.firedAt().plusSeconds(rule.forDurationSeconds());
+                if (!promoteAt.isAfter(now)) {
+                    // Promote to FIRING; keep the original firedAt (that's when it first appeared)
+                    yield Optional.of(current
+                            .withState(AlertState.FIRING)
+                            .withFiredAt(now));
+                }
+                // Still within forDuration — stay PENDING, nothing to persist
+                yield Optional.empty();
+            }
+            // FIRING — re-notification cadence handled by the dispatcher
+            case FIRING -> Optional.empty();
+            // RESOLVED should never appear as the "current open" instance, but guard anyway
+            case RESOLVED -> Optional.empty();
+        };
+    }
+
+    // -------------------------------------------------------------------------
+    // Factory helpers
+    // -------------------------------------------------------------------------
+
+    /**
+     * Creates a brand-new AlertInstance from a rule + Firing result.
+     * title/message are left empty here; the job enriches them via MustacheRenderer after.
+     */
+    static AlertInstance newInstance(AlertRule rule, EvalResult.Firing f, AlertState state, Instant now) {
+        List<AlertRuleTarget> targets = rule.targets() != null ? rule.targets() : List.of();
+        List<String> targetUserIds = targets.stream()
+                .filter(t -> t.kind() == TargetKind.USER)
+                .map(AlertRuleTarget::targetId)
+                .toList();
+        List<UUID> targetGroupIds = targets.stream()
+                .filter(t -> t.kind() == TargetKind.GROUP)
+                .map(t -> UUID.fromString(t.targetId()))
+                .toList();
+        List<String> targetRoleNames = targets.stream()
+                .filter(t -> t.kind() == TargetKind.ROLE)
+                .map(AlertRuleTarget::targetId)
+                .toList();
+
+        return new AlertInstance(
+                UUID.randomUUID(),
+                rule.id(),
+                Map.of(),                          // ruleSnapshot — caller (job) fills in via ObjectMapper
+                rule.environmentId(),
+                state,
+                rule.severity() != null ? rule.severity() : AlertSeverity.WARNING,
+                now,                               // firedAt
+                null,                              // ackedAt
+                null,                              // ackedBy
+                null,                              // resolvedAt
+                null,                              // lastNotifiedAt
+                null,                              // readAt
+                null,                              // deletedAt
+                false,                             // silenced
+                f.currentValue(),
+                f.threshold(),
+                f.context() != null ? f.context() : Map.of(),
+                "",                                // title — rendered by job
+                "",                                // message — rendered by job
+                targetUserIds,
+                targetGroupIds,
+                targetRoleNames);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/BatchResultApplier.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/BatchResultApplier.java
@@ -0,0 +1,144 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.app.alerting.notify.MustacheRenderer;
+import com.cameleer.server.app.alerting.notify.NotificationContextBuilder;
+import com.cameleer.server.core.alerting.*;
+import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.stereotype.Component;
+import org.springframework.transaction.annotation.Transactional;
+
+import java.time.Clock;
+import java.time.Instant;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * Applies a {@link EvalResult.Batch} result to persistent state inside a single
+ * transaction: instance writes, notification enqueues, and the rule's cursor
+ * advance + {@code releaseClaim} either all commit or all roll back together.
+ * <p>
+ * Lives in its own bean so the {@code @Transactional} annotation engages via the
+ * Spring proxy when invoked from {@link AlertEvaluatorJob#tick()}; calling it as
+ * {@code this.apply(...)} from {@code AlertEvaluatorJob} (a bean calling its own
+ * method) would bypass the proxy and silently disable the transaction.
+ * <p>
+ * Phase 2 of the per-exchange exactly-once plan (see
+ * {@code docs/superpowers/plans/2026-04-22-per-exchange-exactly-once.md}).
+ */
+@Component
+public class BatchResultApplier {
+
+    private static final Logger log = LoggerFactory.getLogger(BatchResultApplier.class);
+
+    private final AlertRuleRepository ruleRepo;
+    private final AlertInstanceRepository instanceRepo;
+    private final AlertNotificationRepository notificationRepo;
+    private final MustacheRenderer renderer;
+    private final NotificationContextBuilder contextBuilder;
+    private final EnvironmentRepository environmentRepo;
+    private final ObjectMapper objectMapper;
+    private final Clock clock;
+
+    public BatchResultApplier(
+            AlertRuleRepository ruleRepo,
+            AlertInstanceRepository instanceRepo,
+            AlertNotificationRepository notificationRepo,
+            MustacheRenderer renderer,
+            NotificationContextBuilder contextBuilder,
+            EnvironmentRepository environmentRepo,
+            ObjectMapper objectMapper,
+            Clock alertingClock) {
+        this.ruleRepo         = ruleRepo;
+        this.instanceRepo     = instanceRepo;
+        this.notificationRepo = notificationRepo;
+        this.renderer         = renderer;
+        this.contextBuilder   = contextBuilder;
+        this.environmentRepo  = environmentRepo;
+        this.objectMapper     = objectMapper;
+        this.clock            = alertingClock;
+    }
+
+    /**
+     * Atomically apply a Batch result for a single rule:
+     * <ol>
+     *   <li>persist a FIRING instance per firing + enqueue its notifications</li>
+     *   <li>advance the rule's cursor ({@code evalState}) iff the batch supplied one</li>
+     *   <li>release the claim with the new {@code nextRun} + {@code evalState}</li>
+     * </ol>
+     * Any exception thrown from the repo calls rolls back every write — including
+     * the cursor advance — so the rule is replayable on the next tick.
+     */
+    @Transactional
+    public void apply(AlertRule rule, EvalResult.Batch batch, Instant nextRun) {
+        for (EvalResult.Firing f : batch.firings()) {
+            applyBatchFiring(rule, f);
+        }
+        Map<String, Object> nextEvalState =
+                batch.nextEvalState().isEmpty() ? rule.evalState() : batch.nextEvalState();
+        ruleRepo.releaseClaim(rule.id(), nextRun, nextEvalState);
+    }
+
+    /**
+     * Batch (PER_EXCHANGE) mode: always create a fresh FIRING instance per Firing entry.
+     * No forDuration check — each exchange is its own event.
+     */
+    private void applyBatchFiring(AlertRule rule, EvalResult.Firing f) {
+        Instant now = Instant.now(clock);
+        AlertInstance instance = AlertStateTransitions.newInstance(rule, f, AlertState.FIRING, now)
+                .withRuleSnapshot(snapshotRule(rule));
+        AlertInstance enriched = enrichTitleMessage(rule, instance);
+        AlertInstance persisted = instanceRepo.save(enriched);
+        enqueueNotifications(rule, persisted, now);
+    }
+
+    private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance) {
+        Environment env = environmentRepo.findById(rule.environmentId()).orElse(null);
+        Map<String, Object> ctx = contextBuilder.build(rule, instance, env, null);
+        String title   = renderer.render(rule.notificationTitleTmpl(), ctx);
+        String message = renderer.render(rule.notificationMessageTmpl(), ctx);
+        return instance.withTitleMessage(title, message);
+    }
+
+    private void enqueueNotifications(AlertRule rule, AlertInstance instance, Instant now) {
+        for (WebhookBinding w : rule.webhooks()) {
+            Map<String, Object> payload = buildPayload(rule, instance);
+            notificationRepo.save(new AlertNotification(
+                    UUID.randomUUID(),
+                    instance.id(),
+                    w.id(),
+                    w.outboundConnectionId(),
+                    NotificationStatus.PENDING,
+                    0,
+                    now,
+                    null, null, null, null,
+                    payload,
+                    null,
+                    now));
+        }
+    }
+
+    private Map<String, Object> buildPayload(AlertRule rule, AlertInstance instance) {
+        Environment env = environmentRepo.findById(rule.environmentId()).orElse(null);
+        return contextBuilder.build(rule, instance, env, null);
+    }
+
+    @SuppressWarnings("unchecked")
+    private Map<String, Object> snapshotRule(AlertRule rule) {
+        try {
+            Map<String, Object> raw = objectMapper.convertValue(rule, Map.class);
+            // Map.copyOf (used in AlertInstance compact ctor) rejects null values —
+            // strip them so the snapshot is safe to store.
+            Map<String, Object> safe = new LinkedHashMap<>();
+            raw.forEach((k, v) -> { if (v != null) safe.put(k, v); });
+            return safe;
+        } catch (Exception e) {
+            log.warn("Failed to snapshot rule {}: {}", rule.id(), e.getMessage());
+            return Map.of("id", rule.id().toString(), "name", rule.name());
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ConditionEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ConditionEvaluator.java
@@ -0,0 +1,12 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.core.alerting.AlertCondition;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.ConditionKind;
+
+public interface ConditionEvaluator<C extends AlertCondition> {
+
+    ConditionKind kind();
+
+    EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/DeploymentStateEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/DeploymentStateEvaluator.java
@@ -0,0 +1,58 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.alerting.DeploymentStateCondition;
+import com.cameleer.server.core.runtime.App;
+import com.cameleer.server.core.runtime.AppRepository;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentRepository;
+import org.springframework.stereotype.Component;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+@Component
+public class DeploymentStateEvaluator implements ConditionEvaluator<DeploymentStateCondition> {
+
+    private final AppRepository appRepo;
+    private final DeploymentRepository deploymentRepo;
+
+    public DeploymentStateEvaluator(AppRepository appRepo, DeploymentRepository deploymentRepo) {
+        this.appRepo        = appRepo;
+        this.deploymentRepo = deploymentRepo;
+    }
+
+    @Override
+    public ConditionKind kind() { return ConditionKind.DEPLOYMENT_STATE; }
+
+    @Override
+    public EvalResult evaluate(DeploymentStateCondition c, AlertRule rule, EvalContext ctx) {
+        String appSlug = c.scope() != null ? c.scope().appSlug() : null;
+        App app = (appSlug != null)
+                ? appRepo.findByEnvironmentIdAndSlug(rule.environmentId(), appSlug).orElse(null)
+                : null;
+
+        if (app == null) return EvalResult.Clear.INSTANCE;
+
+        Set<String> wanted = Set.copyOf(c.states());
+        List<Deployment> hits = deploymentRepo.findByAppId(app.id()).stream()
+                .filter(d -> wanted.contains(d.status().name()))
+                .toList();
+
+        if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
+
+        Deployment d = hits.get(0);
+        return new EvalResult.Firing(
+                (double) hits.size(), null,
+                Map.of(
+                        "deployment", Map.of(
+                                "id",     d.id().toString(),
+                                "status", d.status().name()
+                        ),
+                        "app", Map.of("slug", app.slug())
+                )
+        );
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/EvalContext.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/EvalContext.java
@@ -0,0 +1,5 @@
+package com.cameleer.server.app.alerting.eval;
+
+import java.time.Instant;
+
+public record EvalContext(String tenantId, Instant now, TickCache tickCache) {}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/EvalResult.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/EvalResult.java
@@ -0,0 +1,30 @@
+package com.cameleer.server.app.alerting.eval;
+
+import java.util.List;
+import java.util.Map;
+
+public sealed interface EvalResult {
+
+    record Firing(Double currentValue, Double threshold, Map<String, Object> context) implements EvalResult {
+        public Firing {
+            context = context == null ? Map.of() : Map.copyOf(context);
+        }
+    }
+
+    record Clear() implements EvalResult {
+        public static final Clear INSTANCE = new Clear();
+    }
+
+    record Error(Throwable cause) implements EvalResult {}
+
+    record Batch(List<Firing> firings, Map<String, Object> nextEvalState) implements EvalResult {
+        public Batch {
+            firings = firings == null ? List.of() : List.copyOf(firings);
+            nextEvalState = nextEvalState == null ? Map.of() : Map.copyOf(nextEvalState);
+        }
+        /** Convenience: a Batch with no cursor update (first-run empty, or no matches). */
+        public static Batch empty() {
+            return new Batch(List.of(), Map.of());
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ExchangeMatchEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ExchangeMatchEvaluator.java
@@ -0,0 +1,187 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.app.alerting.config.AlertingProperties;
+import com.cameleer.server.app.search.ClickHouseSearchIndex;
+import com.cameleer.server.core.alerting.AlertMatchSpec;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.alerting.ExchangeMatchCondition;
+import com.cameleer.server.core.alerting.FireMode;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import com.cameleer.server.core.search.ExecutionSummary;
+import com.cameleer.server.core.search.SearchRequest;
+import com.cameleer.server.core.search.SearchResult;
+import org.springframework.stereotype.Component;
+
+import java.time.Instant;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+@Component
+public class ExchangeMatchEvaluator implements ConditionEvaluator<ExchangeMatchCondition> {
+
+    private final ClickHouseSearchIndex searchIndex;
+    private final EnvironmentRepository envRepo;
+    private final AlertingProperties alertingProperties;
+
+    public ExchangeMatchEvaluator(ClickHouseSearchIndex searchIndex,
+                                  EnvironmentRepository envRepo,
+                                  AlertingProperties alertingProperties) {
+        this.searchIndex        = searchIndex;
+        this.envRepo            = envRepo;
+        this.alertingProperties = alertingProperties;
+    }
+
+    @Override
+    public ConditionKind kind() { return ConditionKind.EXCHANGE_MATCH; }
+
+    @Override
+    public EvalResult evaluate(ExchangeMatchCondition c, AlertRule rule, EvalContext ctx) {
+        String envSlug = envRepo.findById(rule.environmentId())
+                .map(e -> e.slug())
+                .orElse(null);
+
+        return switch (c.fireMode()) {
+            case COUNT_IN_WINDOW -> evaluateCount(c, rule, ctx, envSlug);
+            case PER_EXCHANGE    -> evaluatePerExchange(c, rule, ctx, envSlug);
+        };
+    }
+
+    // ── COUNT_IN_WINDOW ───────────────────────────────────────────────────────
+
+    private EvalResult evaluateCount(ExchangeMatchCondition c, AlertRule rule,
+                                     EvalContext ctx, String envSlug) {
+        String appSlug = c.scope() != null ? c.scope().appSlug() : null;
+        String routeId = c.scope() != null ? c.scope().routeId() : null;
+        ExchangeMatchCondition.ExchangeFilter filter = c.filter();
+
+        var spec = new AlertMatchSpec(
+                ctx.tenantId(),
+                envSlug,
+                appSlug,
+                routeId,
+                filter != null ? filter.status() : null,
+                filter != null ? filter.attributes() : Map.of(),
+                ctx.now().minusSeconds(c.windowSeconds()),
+                ctx.now(),
+                null
+        );
+
+        long count = searchIndex.countExecutionsForAlerting(spec);
+        if (count <= c.threshold()) return EvalResult.Clear.INSTANCE;
+
+        return new EvalResult.Firing(
+                (double) count,
+                c.threshold().doubleValue(),
+                Map.of(
+                        "app",   Map.of("slug", appSlug == null ? "" : appSlug),
+                        "route", Map.of("id",   routeId == null ? "" : routeId)
+                )
+        );
+    }
+
+    // ── PER_EXCHANGE ──────────────────────────────────────────────────────────
+
+    private EvalResult evaluatePerExchange(ExchangeMatchCondition c, AlertRule rule,
+                                           EvalContext ctx, String envSlug) {
+        String appSlug = c.scope() != null ? c.scope().appSlug() : null;
+        String routeId = c.scope() != null ? c.scope().routeId() : null;
+        ExchangeMatchCondition.ExchangeFilter filter = c.filter();
+
+        // Resolve composite cursor: (startTime, executionId)
+        Instant cursorTs;
+        String cursorId;
+        Object raw = rule.evalState().get("lastExchangeCursor");
+        if (raw instanceof String s && !s.isBlank()) {
+            int pipe = s.indexOf('|');
+            if (pipe < 0) {
+                // Malformed — treat as first-run (with deploy-backlog-cap clamp).
+                cursorTs = firstRunCursorTs(rule, ctx);
+                cursorId = "";
+            } else {
+                cursorTs = Instant.parse(s.substring(0, pipe));
+                cursorId = s.substring(pipe + 1);
+            }
+        } else {
+            // First run — bounded by rule.createdAt, empty executionId so any real id sorts after it.
+            // Clamp to deploy-backlog-cap to avoid backlog flooding for long-lived rules on first
+            // post-deploy tick. Normal-advance path (valid cursor above) is intentionally unaffected.
+            cursorTs = firstRunCursorTs(rule, ctx);
+            cursorId = "";
+        }
+
+        var req = new SearchRequest(
+                filter != null ? filter.status() : null,
+                cursorTs,                        // timeFrom
+                ctx.now(),                       // timeTo
+                null, null, null,                // durationMin/Max, correlationId
+                null, null, null, null,          // text variants
+                routeId,
+                null,                            // instanceId
+                null,                            // processorType
+                appSlug,
+                null,                            // instanceIds
+                0,
+                50,
+                "startTime",
+                "asc",                           // asc so we process oldest first
+                cursorId.isEmpty() ? null : cursorId,  // afterExecutionId — null on first run enables >=
+                envSlug
+        );
+
+        SearchResult<ExecutionSummary> result = searchIndex.search(req);
+        List<ExecutionSummary> matches = result.data();
+
+        if (matches.isEmpty()) return EvalResult.Batch.empty();
+
+        // Ensure deterministic ordering for cursor advance
+        matches = new ArrayList<>(matches);
+        matches.sort(Comparator
+                .comparing(ExecutionSummary::startTime)
+                .thenComparing(ExecutionSummary::executionId));
+
+        ExecutionSummary last = matches.get(matches.size() - 1);
+        String nextCursorSerialized = last.startTime().toString() + "|" + last.executionId();
+
+        List<EvalResult.Firing> firings = new ArrayList<>();
+        for (ExecutionSummary ex : matches) {
+            Map<String, Object> ctx2 = new HashMap<>();
+            ctx2.put("exchange", Map.of(
+                    "id",        ex.executionId(),
+                    "routeId",   ex.routeId() == null   ? "" : ex.routeId(),
+                    "status",    ex.status()  == null   ? "" : ex.status(),
+                    "startTime", ex.startTime() == null ? "" : ex.startTime().toString()
+            ));
+            ctx2.put("app", Map.of("slug", ex.applicationId() == null ? "" : ex.applicationId()));
+            firings.add(new EvalResult.Firing(1.0, null, ctx2));
+        }
+
+        Map<String, Object> nextEvalState = new HashMap<>(rule.evalState());
+        nextEvalState.put("lastExchangeCursor", nextCursorSerialized);
+        return new EvalResult.Batch(firings, nextEvalState);
+    }
+
+    /**
+     * First-run cursor timestamp: {@code rule.createdAt()}, clamped to
+     * {@code now - perExchangeDeployBacklogCapSeconds} so a long-lived PER_EXCHANGE rule
+     * doesn't scan from its creation date forward on first post-deploy tick.
+     * <p>
+     * Cap ≤ 0 disables the clamp (first-run falls back to {@code rule.createdAt()} verbatim).
+     * Applied only on first-run / malformed-cursor paths — the normal-advance path is
+     * intentionally unaffected so legitimate missed ticks are not silently skipped.
+     */
+    private Instant firstRunCursorTs(AlertRule rule, EvalContext ctx) {
+        Instant cursorTs = rule.createdAt();
+        int capSeconds = alertingProperties.effectivePerExchangeDeployBacklogCapSeconds();
+        if (capSeconds > 0) {
+            Instant capFloor = ctx.now().minusSeconds(capSeconds);
+            if (cursorTs == null || cursorTs.isBefore(capFloor)) {
+                cursorTs = capFloor;
+            }
+        }
+        return cursorTs;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/JvmMetricEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/JvmMetricEvaluator.java
@@ -0,0 +1,77 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.core.alerting.AggregationOp;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.alerting.JvmMetricCondition;
+import com.cameleer.server.core.storage.MetricsQueryStore;
+import com.cameleer.server.core.storage.model.MetricTimeSeries;
+import org.springframework.stereotype.Component;
+
+import java.util.List;
+import java.util.Map;
+import java.util.OptionalDouble;
+
+@Component
+public class JvmMetricEvaluator implements ConditionEvaluator<JvmMetricCondition> {
+
+    private final MetricsQueryStore metricsStore;
+
+    public JvmMetricEvaluator(MetricsQueryStore metricsStore) {
+        this.metricsStore = metricsStore;
+    }
+
+    @Override
+    public ConditionKind kind() { return ConditionKind.JVM_METRIC; }
+
+    @Override
+    public EvalResult evaluate(JvmMetricCondition c, AlertRule rule, EvalContext ctx) {
+        String agentId = c.scope() != null ? c.scope().agentId() : null;
+        if (agentId == null) return EvalResult.Clear.INSTANCE;
+
+        Map<String, List<MetricTimeSeries.Bucket>> series = metricsStore.queryTimeSeries(
+                agentId,
+                List.of(c.metric()),
+                ctx.now().minusSeconds(c.windowSeconds()),
+                ctx.now(),
+                1
+        );
+
+        List<MetricTimeSeries.Bucket> buckets = series.get(c.metric());
+        if (buckets == null || buckets.isEmpty()) return EvalResult.Clear.INSTANCE;
+
+        OptionalDouble aggregated = aggregate(buckets, c.aggregation());
+        if (aggregated.isEmpty()) return EvalResult.Clear.INSTANCE;
+
+        double actual = aggregated.getAsDouble();
+
+        boolean fire = switch (c.comparator()) {
+            case GT  -> actual >  c.threshold();
+            case GTE -> actual >= c.threshold();
+            case LT  -> actual <  c.threshold();
+            case LTE -> actual <= c.threshold();
+            case EQ  -> actual == c.threshold();
+        };
+
+        if (!fire) return EvalResult.Clear.INSTANCE;
+
+        return new EvalResult.Firing(actual, c.threshold(),
+                Map.of(
+                        "metric", c.metric(),
+                        "agent",  Map.of("id", agentId)
+                )
+        );
+    }
+
+    private OptionalDouble aggregate(List<MetricTimeSeries.Bucket> buckets, AggregationOp op) {
+        return switch (op) {
+            case MAX    -> buckets.stream().mapToDouble(MetricTimeSeries.Bucket::value).max();
+            case MIN    -> buckets.stream().mapToDouble(MetricTimeSeries.Bucket::value).min();
+            case AVG    -> buckets.stream().mapToDouble(MetricTimeSeries.Bucket::value).average();
+            case LATEST -> buckets.stream()
+                    .max(java.util.Comparator.comparing(MetricTimeSeries.Bucket::time))
+                    .map(b -> OptionalDouble.of(b.value()))
+                    .orElse(OptionalDouble.empty());
+        };
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/LogPatternEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/LogPatternEvaluator.java
@@ -0,0 +1,82 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.app.search.ClickHouseLogStore;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.alerting.LogPatternCondition;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import com.cameleer.server.core.search.LogSearchRequest;
+import org.springframework.stereotype.Component;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+
+@Component
+public class LogPatternEvaluator implements ConditionEvaluator<LogPatternCondition> {
+
+    private final ClickHouseLogStore logStore;
+    private final EnvironmentRepository envRepo;
+
+    public LogPatternEvaluator(ClickHouseLogStore logStore, EnvironmentRepository envRepo) {
+        this.logStore = logStore;
+        this.envRepo  = envRepo;
+    }
+
+    @Override
+    public ConditionKind kind() { return ConditionKind.LOG_PATTERN; }
+
+    @Override
+    public EvalResult evaluate(LogPatternCondition c, AlertRule rule, EvalContext ctx) {
+        String envSlug = envRepo.findById(rule.environmentId())
+                .map(e -> e.slug())
+                .orElse(null);
+
+        String appSlug = c.scope() != null ? c.scope().appSlug() : null;
+
+        Instant from = ctx.now().minusSeconds(c.windowSeconds());
+        Instant to   = ctx.now();
+
+        // Build a stable cache key so identical queries within the same tick are coalesced.
+        String cacheKey = String.join("|",
+                envSlug  == null ? "" : envSlug,
+                appSlug  == null ? "" : appSlug,
+                c.level() == null ? "" : c.level(),
+                c.pattern() == null ? "" : c.pattern(),
+                from.toString(),
+                to.toString()
+        );
+
+        long count = ctx.tickCache().getOrCompute(cacheKey, () -> {
+            var req = new LogSearchRequest(
+                    c.pattern(),
+                    c.level() != null ? List.of(c.level()) : List.of(),
+                    appSlug,
+                    null,   // instanceId
+                    null,   // exchangeId
+                    null,   // logger
+                    envSlug,
+                    null,   // sources
+                    from,
+                    to,
+                    null,   // cursor
+                    1,      // limit (count query; value irrelevant)
+                    "desc", // sort
+                    null    // instanceIds
+            );
+            return logStore.countLogs(req);
+        });
+
+        if (count <= c.threshold()) return EvalResult.Clear.INSTANCE;
+
+        return new EvalResult.Firing(
+                (double) count,
+                (double) c.threshold(),
+                Map.of(
+                        "app",     Map.of("slug",    appSlug  == null ? "" : appSlug),
+                        "pattern", c.pattern() == null ? "" : c.pattern(),
+                        "level",   c.level()   == null ? "" : c.level()
+                )
+        );
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/PerKindCircuitBreaker.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/PerKindCircuitBreaker.java
@@ -0,0 +1,72 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.app.alerting.metrics.AlertingMetrics;
+import com.cameleer.server.core.alerting.ConditionKind;
+
+import java.time.Clock;
+import java.time.Duration;
+import java.time.Instant;
+import java.util.ArrayDeque;
+import java.util.Deque;
+import java.util.concurrent.ConcurrentHashMap;
+
+public class PerKindCircuitBreaker {
+
+    private record State(Deque<Instant> failures, Instant openUntil) {}
+
+    private final int threshold;
+    private final Duration window;
+    private final Duration cooldown;
+    private final Clock clock;
+    private final ConcurrentHashMap<ConditionKind, State> byKind = new ConcurrentHashMap<>();
+
+    /** Optional metrics — set via {@link #setMetrics} after construction (avoids circular bean deps). */
+    private volatile AlertingMetrics metrics;
+
+    /** Production constructor — uses system clock. */
+    public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds) {
+        this(threshold, windowSeconds, cooldownSeconds, Clock.systemDefaultZone());
+    }
+
+    /** Test constructor — allows a fixed/controllable clock. */
+    public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds, Clock clock) {
+        this.threshold = threshold;
+        this.window    = Duration.ofSeconds(windowSeconds);
+        this.cooldown  = Duration.ofSeconds(cooldownSeconds);
+        this.clock     = clock;
+    }
+
+    /** Wire metrics after construction to avoid circular Spring dependency. */
+    public void setMetrics(AlertingMetrics metrics) {
+        this.metrics = metrics;
+    }
+
+    public void recordFailure(ConditionKind kind) {
+        final boolean[] justOpened = {false};
+        byKind.compute(kind, (k, s) -> {
+            Deque<Instant> deque = (s == null) ? new ArrayDeque<>() : new ArrayDeque<>(s.failures());
+            Instant now    = Instant.now(clock);
+            Instant cutoff = now.minus(window);
+            while (!deque.isEmpty() && deque.peekFirst().isBefore(cutoff)) deque.pollFirst();
+            deque.addLast(now);
+            boolean wasOpen = s != null && s.openUntil() != null && now.isBefore(s.openUntil());
+            Instant openUntil = (deque.size() >= threshold) ? now.plus(cooldown) : null;
+            if (openUntil != null && !wasOpen) {
+                justOpened[0] = true;
+            }
+            return new State(deque, openUntil);
+        });
+        if (justOpened[0] && metrics != null) {
+            metrics.circuitOpened(kind);
+        }
+    }
+
+    public boolean isOpen(ConditionKind kind) {
+        State s = byKind.get(kind);
+        return s != null && s.openUntil() != null && Instant.now(clock).isBefore(s.openUntil());
+    }
+
+    public void recordSuccess(ConditionKind kind) {
+        byKind.compute(kind, (k, s) -> new State(new ArrayDeque<>(), null));
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/RouteMetricEvaluator.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/RouteMetricEvaluator.java
@@ -0,0 +1,79 @@
+package com.cameleer.server.app.alerting.eval;
+
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.alerting.RouteMetricCondition;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import com.cameleer.server.core.search.ExecutionStats;
+import com.cameleer.server.core.storage.StatsStore;
+import org.springframework.stereotype.Component;
+
+import java.time.Instant;
+import java.util.Map;
+
+@Component
+public class RouteMetricEvaluator implements ConditionEvaluator<RouteMetricCondition> {
+
+    private final StatsStore statsStore;
+    private final EnvironmentRepository envRepo;
+
+    public RouteMetricEvaluator(StatsStore statsStore, EnvironmentRepository envRepo) {
+        this.statsStore = statsStore;
+        this.envRepo    = envRepo;
+    }
+
+    @Override
+    public ConditionKind kind() { return ConditionKind.ROUTE_METRIC; }
+
+    @Override
+    public EvalResult evaluate(RouteMetricCondition c, AlertRule rule, EvalContext ctx) {
+        Instant from = ctx.now().minusSeconds(c.windowSeconds());
+        Instant to   = ctx.now();
+
+        String envSlug = envRepo.findById(rule.environmentId())
+                .map(e -> e.slug())
+                .orElse(null);
+
+        String appSlug  = c.scope() != null ? c.scope().appSlug()  : null;
+        String routeId  = c.scope() != null ? c.scope().routeId()  : null;
+
+        ExecutionStats stats;
+        if (routeId != null) {
+            stats = statsStore.statsForRoute(from, to, routeId, appSlug, envSlug);
+        } else if (appSlug != null) {
+            stats = statsStore.statsForApp(from, to, appSlug, envSlug);
+        } else {
+            stats = statsStore.stats(from, to, envSlug);
+        }
+
+        double actual = switch (c.metric()) {
+            case ERROR_RATE     -> errorRate(stats);
+            case AVG_DURATION_MS -> (double) stats.avgDurationMs();
+            case P99_LATENCY_MS -> (double) stats.p99LatencyMs();
+            case THROUGHPUT     -> (double) stats.totalCount();
+            case ERROR_COUNT    -> (double) stats.failedCount();
+        };
+
+        boolean fire = switch (c.comparator()) {
+            case GT  -> actual >  c.threshold();
+            case GTE -> actual >= c.threshold();
+            case LT  -> actual <  c.threshold();
+            case LTE -> actual <= c.threshold();
+            case EQ  -> actual == c.threshold();
+        };
+
+        if (!fire) return EvalResult.Clear.INSTANCE;
+
+        return new EvalResult.Firing(actual, c.threshold(),
+                Map.of(
+                        "route", Map.of("id",   routeId  == null ? "" : routeId),
+                        "app",   Map.of("slug",  appSlug  == null ? "" : appSlug)
+                )
+        );
+    }
+
+    private double errorRate(ExecutionStats s) {
+        long total = s.totalCount();
+        return total == 0 ? 0.0 : (double) s.failedCount() / total;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/TickCache.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/TickCache.java
@@ -0,0 +1,14 @@
+package com.cameleer.server.app.alerting.eval;
+
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.function.Supplier;
+
+public class TickCache {
+
+    private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<>();
+
+    @SuppressWarnings("unchecked")
+    public <T> T getOrCompute(String key, Supplier<T> supplier) {
+        return (T) map.computeIfAbsent(key, k -> supplier.get());
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/metrics/AlertingMetrics.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/metrics/AlertingMetrics.java
@@ -0,0 +1,279 @@
+package com.cameleer.server.app.alerting.metrics;
+
+import com.cameleer.server.core.alerting.AlertState;
+import com.cameleer.server.core.alerting.ConditionKind;
+import com.cameleer.server.core.alerting.NotificationStatus;
+import io.micrometer.core.instrument.Counter;
+import io.micrometer.core.instrument.Gauge;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.Timer;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.springframework.stereotype.Component;
+
+import java.time.Duration;
+import java.time.Instant;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.function.Supplier;
+
+/**
+ * Micrometer-based metrics for the alerting subsystem.
+ * <p>
+ * Counters:
+ * <ul>
+ *   <li>{@code alerting_eval_errors_total{kind}} — evaluation errors by condition kind</li>
+ *   <li>{@code alerting_circuit_opened_total{kind}} — circuit breaker open transitions by kind</li>
+ *   <li>{@code alerting_notifications_total{status}} — notification outcomes by status</li>
+ * </ul>
+ * Timers:
+ * <ul>
+ *   <li>{@code alerting_eval_duration_seconds{kind}} — per-kind evaluation latency</li>
+ *   <li>{@code alerting_webhook_delivery_duration_seconds} — webhook POST latency</li>
+ * </ul>
+ * Gauges (read from PostgreSQL, cached for {@link #DEFAULT_GAUGE_TTL} to amortise
+ * Prometheus scrapes that may fire every few seconds):
+ * <ul>
+ *   <li>{@code alerting_rules_total{state=enabled|disabled}} — rule counts from {@code alert_rules}</li>
+ *   <li>{@code alerting_instances_total{state}} — instance counts grouped from {@code alert_instances}</li>
+ * </ul>
+ */
+@Component
+public class AlertingMetrics {
+
+    private static final Logger log = LoggerFactory.getLogger(AlertingMetrics.class);
+
+    /** Default time-to-live for the gauge-supplier caches. */
+    static final Duration DEFAULT_GAUGE_TTL = Duration.ofSeconds(30);
+
+    private final MeterRegistry registry;
+
+    // Cached counters per kind (lazy-initialized)
+    private final ConcurrentMap<String, Counter> evalErrorCounters   = new ConcurrentHashMap<>();
+    private final ConcurrentMap<String, Counter> circuitOpenCounters = new ConcurrentHashMap<>();
+    private final ConcurrentMap<String, Timer>   evalDurationTimers  = new ConcurrentHashMap<>();
+
+    // Notification outcome counter per status
+    private final ConcurrentMap<String, Counter> notificationCounters = new ConcurrentHashMap<>();
+
+    // Shared delivery timer
+    private final Timer webhookDeliveryTimer;
+
+    // TTL-cached gauge suppliers registered so tests can force a read cycle.
+    private final TtlCache enabledRulesCache;
+    private final TtlCache disabledRulesCache;
+    private final Map<AlertState, TtlCache> instancesByStateCaches;
+
+    /**
+     * Production constructor: wraps the Postgres-backed gauge suppliers in a
+     * 30-second TTL cache so Prometheus scrapes don't cause per-scrape DB queries.
+     */
+    @Autowired
+    public AlertingMetrics(MeterRegistry registry, JdbcTemplate jdbc) {
+        this(registry,
+             () -> countRules(jdbc, true),
+             () -> countRules(jdbc, false),
+             state -> countInstances(jdbc, state),
+             DEFAULT_GAUGE_TTL,
+             Instant::now);
+    }
+
+    /**
+     * Test-friendly constructor accepting the three gauge suppliers that are
+     * exercised in the {@link AlertingMetricsCachingTest} plan sketch. The
+     * {@code instancesSupplier} is used for every {@link AlertState}.
+     */
+    AlertingMetrics(MeterRegistry registry,
+                    Supplier<Long> enabledRulesSupplier,
+                    Supplier<Long> disabledRulesSupplier,
+                    Supplier<Long> instancesSupplier,
+                    Duration gaugeTtl,
+                    Supplier<Instant> clock) {
+        this(registry,
+             enabledRulesSupplier,
+             disabledRulesSupplier,
+             state -> instancesSupplier.get(),
+             gaugeTtl,
+             clock);
+    }
+
+    /**
+     * Core constructor: accepts per-state instance supplier so production can
+     * query PostgreSQL with a different value per {@link AlertState}.
+     */
+    private AlertingMetrics(MeterRegistry registry,
+                            Supplier<Long> enabledRulesSupplier,
+                            Supplier<Long> disabledRulesSupplier,
+                            java.util.function.Function<AlertState, Long> instancesSupplier,
+                            Duration gaugeTtl,
+                            Supplier<Instant> clock) {
+        this.registry = registry;
+
+        // ── Static timers ───────────────────────────────────────────────
+        this.webhookDeliveryTimer = Timer.builder("alerting_webhook_delivery_duration_seconds")
+                .description("Latency of outbound webhook POST requests")
+                .register(registry);
+
+        // ── Gauge: rules by enabled/disabled (cached) ───────────────────
+        this.enabledRulesCache  = new TtlCache(enabledRulesSupplier,  gaugeTtl, clock);
+        this.disabledRulesCache = new TtlCache(disabledRulesSupplier, gaugeTtl, clock);
+
+        Gauge.builder("alerting_rules_total", enabledRulesCache, TtlCache::getAsDouble)
+                .tag("state", "enabled")
+                .description("Number of enabled alert rules")
+                .register(registry);
+        Gauge.builder("alerting_rules_total", disabledRulesCache, TtlCache::getAsDouble)
+                .tag("state", "disabled")
+                .description("Number of disabled alert rules")
+                .register(registry);
+
+        // ── Gauges: alert instances by state (cached) ───────────────────
+        this.instancesByStateCaches = new EnumMap<>(AlertState.class);
+        for (AlertState state : AlertState.values()) {
+            AlertState captured = state;
+            TtlCache cache = new TtlCache(() -> instancesSupplier.apply(captured), gaugeTtl, clock);
+            this.instancesByStateCaches.put(state, cache);
+            Gauge.builder("alerting_instances_total", cache, TtlCache::getAsDouble)
+                    .tag("state", state.name().toLowerCase())
+                    .description("Number of alert instances by state")
+                    .register(registry);
+        }
+    }
+
+    // ── Public API ──────────────────────────────────────────────────────
+
+    /**
+     * Increment the evaluation error counter for the given condition kind and rule.
+     */
+    public void evalError(ConditionKind kind, UUID ruleId) {
+        String key = kind.name();
+        evalErrorCounters.computeIfAbsent(key, k ->
+            Counter.builder("alerting_eval_errors_total")
+                   .tag("kind", kind.name())
+                   .description("Alerting evaluation errors by condition kind")
+                   .register(registry))
+            .increment();
+        log.debug("Alerting eval error for kind={} ruleId={}", kind, ruleId);
+    }
+
+    /**
+     * Increment the circuit-breaker opened counter for the given condition kind.
+     */
+    public void circuitOpened(ConditionKind kind) {
+        String key = kind.name();
+        circuitOpenCounters.computeIfAbsent(key, k ->
+            Counter.builder("alerting_circuit_opened_total")
+                   .tag("kind", kind.name())
+                   .description("Circuit breaker open transitions by condition kind")
+                   .register(registry))
+            .increment();
+    }
+
+    /**
+     * Return the eval duration timer for the given condition kind (creates lazily if absent).
+     */
+    public Timer evalDuration(ConditionKind kind) {
+        return evalDurationTimers.computeIfAbsent(kind.name(), k ->
+            Timer.builder("alerting_eval_duration_seconds")
+                 .tag("kind", kind.name())
+                 .description("Alerting condition evaluation latency by kind")
+                 .register(registry));
+    }
+
+    /**
+     * The shared webhook delivery duration timer.
+     */
+    public Timer webhookDeliveryDuration() {
+        return webhookDeliveryTimer;
+    }
+
+    /**
+     * Increment the notification outcome counter for the given status.
+     */
+    public void notificationOutcome(NotificationStatus status) {
+        String key = status.name();
+        notificationCounters.computeIfAbsent(key, k ->
+            Counter.builder("alerting_notifications_total")
+                   .tag("status", status.name().toLowerCase())
+                   .description("Alerting notification outcomes by status")
+                   .register(registry))
+            .increment();
+    }
+
+    /**
+     * Force a read of every TTL-cached gauge supplier. Used by tests to simulate
+     * a Prometheus scrape without needing a real registry scrape pipeline.
+     */
+    void snapshotAllGauges() {
+        List<TtlCache> all = new ArrayList<>();
+        all.add(enabledRulesCache);
+        all.add(disabledRulesCache);
+        all.addAll(instancesByStateCaches.values());
+        for (TtlCache c : all) {
+            c.getAsDouble();
+        }
+    }
+
+    // ── Gauge suppliers (queried at most once per TTL) ──────────────────
+
+    private static long countRules(JdbcTemplate jdbc, boolean enabled) {
+        try {
+            Long count = jdbc.queryForObject(
+                "SELECT COUNT(*) FROM alert_rules WHERE enabled = ?", Long.class, enabled);
+            return count == null ? 0L : count;
+        } catch (Exception e) {
+            log.debug("alerting_rules gauge query failed: {}", e.getMessage());
+            return 0L;
+        }
+    }
+
+    private static long countInstances(JdbcTemplate jdbc, AlertState state) {
+        try {
+            Long count = jdbc.queryForObject(
+                "SELECT COUNT(*) FROM alert_instances WHERE state = ?::alert_state_enum",
+                Long.class, state.name());
+            return count == null ? 0L : count;
+        } catch (Exception e) {
+            log.debug("alerting_instances gauge query failed: {}", e.getMessage());
+            return 0L;
+        }
+    }
+
+    /**
+     * Lightweight TTL cache around a {@code Supplier<Long>}. Every call to
+     * {@link #getAsDouble()} either returns the cached value (if {@code clock.get()
+     * - lastRead < ttl}) or invokes the delegate and refreshes the cache.
+     *
+     * <p>Used to amortise Postgres queries behind Prometheus gauges over a
+     * 30-second TTL (see {@link AlertingMetrics#DEFAULT_GAUGE_TTL}).
+     */
+    static final class TtlCache {
+        private final Supplier<Long> delegate;
+        private final Duration ttl;
+        private final Supplier<Instant> clock;
+        private volatile Instant lastRead = Instant.MIN;
+        private volatile long cached = 0L;
+
+        TtlCache(Supplier<Long> delegate, Duration ttl, Supplier<Instant> clock) {
+            this.delegate = delegate;
+            this.ttl = ttl;
+            this.clock = clock;
+        }
+
+        synchronized double getAsDouble() {
+            Instant now = clock.get();
+            if (lastRead == Instant.MIN || Duration.between(lastRead, now).compareTo(ttl) >= 0) {
+                cached = delegate.get();
+                lastRead = now;
+            }
+            return cached;
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/HmacSigner.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/HmacSigner.java
@@ -0,0 +1,35 @@
+package com.cameleer.server.app.alerting.notify;
+
+import org.springframework.stereotype.Component;
+
+import javax.crypto.Mac;
+import javax.crypto.spec.SecretKeySpec;
+import java.nio.charset.StandardCharsets;
+import java.util.HexFormat;
+
+/**
+ * Computes HMAC-SHA256 webhook signatures.
+ * <p>
+ * Output format: {@code sha256=<lowercase hex>}
+ */
+@Component
+public class HmacSigner {
+
+    /**
+     * Signs {@code body} with {@code secret} using HmacSHA256.
+     *
+     * @param secret plain-text secret (UTF-8 encoded)
+     * @param body   request body bytes to sign
+     * @return {@code "sha256=" + hex(hmac)}
+     */
+    public String sign(String secret, byte[] body) {
+        try {
+            Mac mac = Mac.getInstance("HmacSHA256");
+            mac.init(new SecretKeySpec(secret.getBytes(StandardCharsets.UTF_8), "HmacSHA256"));
+            byte[] digest = mac.doFinal(body);
+            return "sha256=" + HexFormat.of().formatHex(digest);
+        } catch (Exception e) {
+            throw new IllegalStateException("HMAC signing failed", e);
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/InAppInboxQuery.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/InAppInboxQuery.java
@@ -0,0 +1,107 @@
+package com.cameleer.server.app.alerting.notify;
+
+import com.cameleer.server.app.alerting.dto.UnreadCountResponse;
+import com.cameleer.server.core.alerting.AlertInstance;
+import com.cameleer.server.core.alerting.AlertInstanceRepository;
+import com.cameleer.server.core.alerting.AlertSeverity;
+import com.cameleer.server.core.alerting.AlertState;
+import com.cameleer.server.core.rbac.RbacService;
+import org.springframework.stereotype.Component;
+
+import java.time.Clock;
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * Server-side query helper for the in-app alert inbox.
+ * <p>
+ * {@link #listInbox} returns alerts the user is allowed to see (targeted directly or via group/role).
+ * {@link #countUnread} is memoized per {@code (envId, userId)} for 5 seconds to avoid hammering
+ * the database on every page render. The memo caches the full per-severity breakdown so
+ * the UI can branch bell colour on the highest unread severity without a second call.
+ */
+@Component
+public class InAppInboxQuery {
+
+    private static final long MEMO_TTL_MS = 5_000L;
+
+    private final AlertInstanceRepository instanceRepo;
+    private final RbacService rbacService;
+    private final Clock clock;
+
+    /** Cache key for the unread count memo. */
+    private record Key(UUID envId, String userId) {}
+
+    /** Cache entry: cached response + expiry timestamp. */
+    private record Entry(UnreadCountResponse response, Instant expiresAt) {}
+
+    private final ConcurrentHashMap<Key, Entry> memo = new ConcurrentHashMap<>();
+
+    public InAppInboxQuery(AlertInstanceRepository instanceRepo,
+                           RbacService rbacService,
+                           Clock alertingClock) {
+        this.instanceRepo = instanceRepo;
+        this.rbacService  = rbacService;
+        this.clock        = alertingClock;
+    }
+
+    /**
+     * Full filtered variant: optional {@code states}, {@code severities}, {@code acked},
+     * and {@code read} narrow the result set. {@code null} or empty lists mean
+     * "no filter on that dimension". {@code acked}/{@code read} are tri-state:
+     * {@code null} = no filter, {@code TRUE} = only acked/read, {@code FALSE} = only unacked/unread.
+     */
+    public List<AlertInstance> listInbox(UUID envId,
+                                         String userId,
+                                         List<AlertState> states,
+                                         List<AlertSeverity> severities,
+                                         Boolean acked,
+                                         Boolean read,
+                                         int limit) {
+        List<String> groupIds   = resolveGroupIds(userId);
+        List<String> roleNames  = resolveRoleNames(userId);
+        return instanceRepo.listForInbox(envId, groupIds, userId, roleNames,
+                states, severities, acked, read, limit);
+    }
+
+    /**
+     * Returns the unread (un-acked) alert count for the user, broken down by severity.
+     * <p>
+     * Memoized for 5 seconds per {@code (envId, userId)}.
+     */
+    public UnreadCountResponse countUnread(UUID envId, String userId) {
+        Key key = new Key(envId, userId);
+        Instant now = Instant.now(clock);
+        Entry cached = memo.get(key);
+        if (cached != null && now.isBefore(cached.expiresAt())) {
+            return cached.response();
+        }
+        List<String> groupIds  = resolveGroupIds(userId);
+        List<String> roleNames = resolveRoleNames(userId);
+        Map<AlertSeverity, Long> bySeverity = instanceRepo.countUnreadBySeverity(envId, userId, groupIds, roleNames);
+        UnreadCountResponse response = UnreadCountResponse.from(bySeverity);
+        memo.put(key, new Entry(response, now.plusMillis(MEMO_TTL_MS)));
+        return response;
+    }
+
+    // -------------------------------------------------------------------------
+    // Helpers
+    // -------------------------------------------------------------------------
+
+    private List<String> resolveGroupIds(String userId) {
+        return rbacService.getEffectiveGroupsForUser(userId)
+                .stream()
+                .map(g -> g.id().toString())
+                .toList();
+    }
+
+    private List<String> resolveRoleNames(String userId) {
+        return rbacService.getEffectiveRolesForUser(userId)
+                .stream()
+                .map(r -> r.name())
+                .toList();
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java
@@ -0,0 +1,92 @@
+package com.cameleer.server.app.alerting.notify;
+
+import com.samskivert.mustache.Mustache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.stereotype.Component;
+
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Renders Mustache templates against a context map.
+ * <p>
+ * Contract:
+ * <ul>
+ *   <li>Unresolved {@code {{x.y.z}}} tokens render as the literal {@code {{x.y.z}}} and log WARN.</li>
+ *   <li>Malformed templates (e.g. unclosed {@code {{}) return the original template string and log WARN.</li>
+ *   <li>Never throws on template content.</li>
+ * </ul>
+ */
+@Component
+public class MustacheRenderer {
+
+    private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
+
+    /** Matches {{path}} tokens, capturing the trimmed path. Ignores triple-mustache and comments. */
+    private static final Pattern TOKEN = Pattern.compile("\\{\\{\\s*([^#/!>{\\s][^}]*)\\s*\\}\\}");
+
+    /** Sentinel prefix/suffix to survive Mustache compilation so we can post-replace. */
+    private static final String SENTINEL_PREFIX = "\u0000TPL\u0001";
+    private static final String SENTINEL_SUFFIX = "\u0001LPT\u0000";
+
+    public String render(String template, Map<String, Object> ctx) {
+        if (template == null) return "";
+        try {
+            // 1) Walk all {{path}} tokens. Those unresolved get replaced with a unique sentinel.
+            Map<String, String> literals = new LinkedHashMap<>();
+            StringBuilder pre = new StringBuilder();
+            Matcher m = TOKEN.matcher(template);
+            int sentinelIdx = 0;
+            boolean anyUnresolved = false;
+            while (m.find()) {
+                String path = m.group(1).trim();
+                if (resolvePath(ctx, path) == null) {
+                    anyUnresolved = true;
+                    String sentinelKey = SENTINEL_PREFIX + sentinelIdx++ + SENTINEL_SUFFIX;
+                    literals.put(sentinelKey, "{{" + path + "}}");
+                    m.appendReplacement(pre, Matcher.quoteReplacement(sentinelKey));
+                }
+            }
+            m.appendTail(pre);
+            if (anyUnresolved) {
+                log.warn("MustacheRenderer: unresolved template variables; rendering as literals. template={}",
+                        template.length() > 200 ? template.substring(0, 200) + "..." : template);
+            }
+
+            // 2) Compile & render the pre-processed template (sentinels are plain text — not Mustache tags).
+            String rendered = Mustache.compiler()
+                    .defaultValue("")
+                    .escapeHTML(false)
+                    .compile(pre.toString())
+                    .execute(ctx);
+
+            // 3) Restore the sentinel placeholders back to their original {{path}} literals.
+            for (Map.Entry<String, String> e : literals.entrySet()) {
+                rendered = rendered.replace(e.getKey(), e.getValue());
+            }
+            return rendered;
+        } catch (Exception e) {
+            log.warn("MustacheRenderer: template render failed, returning raw template: {}", e.getMessage());
+            return template;
+        }
+    }
+
+    /**
+     * Resolves a dotted path like "alert.state" against a nested Map context.
+     * Returns null if any segment is missing or the value is null.
+     */
+    @SuppressWarnings("unchecked")
+    Object resolvePath(Map<String, Object> ctx, String path) {
+        if (ctx == null || path == null || path.isBlank()) return null;
+        String[] parts = path.split("\\.");
+        Object current = ctx.get(parts[0]);
+        for (int i = 1; i < parts.length; i++) {
+            if (!(current instanceof Map)) return null;
+            current = ((Map<String, Object>) current).get(parts[i]);
+        }
+        return current;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java
@@ -0,0 +1,126 @@
+package com.cameleer.server.app.alerting.notify;
+
+import com.cameleer.server.core.alerting.AlertInstance;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.runtime.Environment;
+import org.springframework.stereotype.Component;
+
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/**
+ * Builds the Mustache template context map from an AlertRule + AlertInstance + Environment.
+ * <p>
+ * Always present: {@code env}, {@code rule}, {@code alert}.
+ * Conditionally present based on {@code rule.conditionKind()}:
+ * <ul>
+ *   <li>AGENT_STATE      → {@code agent}, {@code app}</li>
+ *   <li>DEPLOYMENT_STATE → {@code deployment}, {@code app}</li>
+ *   <li>ROUTE_METRIC     → {@code route}, {@code app}</li>
+ *   <li>EXCHANGE_MATCH   → {@code exchange}, {@code app}, {@code route}</li>
+ *   <li>LOG_PATTERN      → {@code log}, {@code app}</li>
+ *   <li>JVM_METRIC       → {@code metric}, {@code agent}, {@code app}</li>
+ * </ul>
+ * Values absent from {@code instance.context()} render as empty string so Mustache templates
+ * remain valid even for env-wide rules that have no app/route scope.
+ */
+@Component
+public class NotificationContextBuilder {
+
+    public Map<String, Object> build(AlertRule rule, AlertInstance instance, Environment env, String uiOrigin) {
+        Map<String, Object> ctx = new LinkedHashMap<>();
+
+        // --- env subtree ---
+        ctx.put("env", Map.of(
+                "slug", env.slug(),
+                "id",   env.id().toString()
+        ));
+
+        // --- rule subtree ---
+        ctx.put("rule", Map.of(
+                "id",          rule.id().toString(),
+                "name",        rule.name(),
+                "severity",    rule.severity().name(),
+                "description", rule.description() == null ? "" : rule.description()
+        ));
+
+        // --- alert subtree ---
+        String base = uiOrigin == null ? "" : uiOrigin;
+        ctx.put("alert", Map.of(
+                "id",           instance.id().toString(),
+                "state",        instance.state().name(),
+                "firedAt",      instance.firedAt().toString(),
+                "resolvedAt",   instance.resolvedAt()    == null ? "" : instance.resolvedAt().toString(),
+                "ackedBy",      instance.ackedBy()       == null ? "" : instance.ackedBy(),
+                "link",         base + "/alerts/inbox/" + instance.id(),
+                "currentValue", instance.currentValue()  == null ? "" : instance.currentValue().toString(),
+                "threshold",    instance.threshold()     == null ? "" : instance.threshold().toString()
+        ));
+
+        // --- per-kind conditional subtrees ---
+        if (rule.conditionKind() != null) {
+            switch (rule.conditionKind()) {
+                case AGENT_STATE -> {
+                    ctx.put("agent", subtree(instance, "agent.id", "agent.name", "agent.state"));
+                    ctx.put("app",   subtree(instance, "app.slug", "app.id"));
+                }
+                case AGENT_LIFECYCLE -> {
+                    ctx.put("agent", subtree(instance, "agent.id", "agent.app"));
+                    ctx.put("event", subtree(instance, "event.type", "event.timestamp", "event.detail"));
+                }
+                case DEPLOYMENT_STATE -> {
+                    ctx.put("deployment", subtree(instance, "deployment.id", "deployment.status"));
+                    ctx.put("app",        subtree(instance, "app.slug", "app.id"));
+                }
+                case ROUTE_METRIC -> {
+                    ctx.put("route", subtree(instance, "route.id", "route.uri"));
+                    ctx.put("app",   subtree(instance, "app.slug", "app.id"));
+                }
+                case EXCHANGE_MATCH -> {
+                    ctx.put("exchange", subtree(instance, "exchange.id", "exchange.status"));
+                    ctx.put("app",      subtree(instance, "app.slug", "app.id"));
+                    ctx.put("route",    subtree(instance, "route.id", "route.uri"));
+                }
+                case LOG_PATTERN -> {
+                    ctx.put("log", subtree(instance, "log.pattern", "log.matchCount"));
+                    ctx.put("app", subtree(instance, "app.slug", "app.id"));
+                }
+                case JVM_METRIC -> {
+                    ctx.put("metric", subtree(instance, "metric.name", "metric.value"));
+                    ctx.put("agent",  subtree(instance, "agent.id", "agent.name"));
+                    ctx.put("app",    subtree(instance, "app.slug", "app.id"));
+                }
+            }
+        }
+
+        return ctx;
+    }
+
+    /**
+     * Extracts a flat subtree from {@code instance.context()} using dotted key paths.
+     * Each path like {@code "agent.id"} becomes the leaf key {@code "id"} in the returned map.
+     * Missing or null values are stored as empty string.
+     */
+    private Map<String, Object> subtree(AlertInstance instance, String... dottedPaths) {
+        Map<String, Object> sub = new LinkedHashMap<>();
+        Map<String, Object> ic = instance.context();
+        for (String path : dottedPaths) {
+            String leafKey = path.contains(".") ? path.substring(path.lastIndexOf('.') + 1) : path;
+            Object val = resolveContext(ic, path);
+            sub.put(leafKey, val == null ? "" : val.toString());
+        }
+        return sub;
+    }
+
+    @SuppressWarnings("unchecked")
+    private Object resolveContext(Map<String, Object> ctx, String path) {
+        if (ctx == null) return null;
+        String[] parts = path.split("\\.");
+        Object current = ctx.get(parts[0]);
+        for (int i = 1; i < parts.length; i++) {
+            if (!(current instanceof Map)) return null;
+            current = ((Map<String, Object>) current).get(parts[i]);
+        }
+        return current;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationDispatchJob.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationDispatchJob.java
@@ -0,0 +1,181 @@
+package com.cameleer.server.app.alerting.notify;
+
+import com.cameleer.server.app.alerting.config.AlertingProperties;
+import com.cameleer.server.app.alerting.metrics.AlertingMetrics;
+import com.cameleer.server.core.alerting.*;
+import com.cameleer.server.core.outbound.OutboundConnectionRepository;
+import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentRepository;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.beans.factory.annotation.Qualifier;
+import org.springframework.beans.factory.annotation.Value;
+import org.springframework.scheduling.annotation.SchedulingConfigurer;
+import org.springframework.scheduling.config.ScheduledTaskRegistrar;
+import org.springframework.stereotype.Component;
+
+import java.time.Clock;
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Claim-polling outbox loop that dispatches {@link AlertNotification} records.
+ * <p>
+ * On each tick, claims a batch of due notifications, resolves the backing
+ * {@link AlertInstance} and {@link com.cameleer.server.core.outbound.OutboundConnection},
+ * checks active silences, delegates to {@link WebhookDispatcher}, and persists the outcome.
+ * <p>
+ * Retry backoff: {@code retryAfter × attempts} (30 s, 60 s, 90 s, …).
+ * After {@link AlertingProperties#effectiveWebhookMaxAttempts()} retries the notification
+ * is marked FAILED permanently.
+ */
+@Component
+public class NotificationDispatchJob implements SchedulingConfigurer {
+
+    private static final Logger log = LoggerFactory.getLogger(NotificationDispatchJob.class);
+
+    private final AlertingProperties props;
+    private final AlertNotificationRepository notificationRepo;
+    private final AlertInstanceRepository instanceRepo;
+    private final AlertRuleRepository ruleRepo;
+    private final AlertSilenceRepository silenceRepo;
+    private final OutboundConnectionRepository outboundRepo;
+    private final EnvironmentRepository envRepo;
+    private final WebhookDispatcher dispatcher;
+    private final SilenceMatcherService silenceMatcher;
+    private final NotificationContextBuilder contextBuilder;
+    private final String instanceId;
+    private final String tenantId;
+    private final Clock clock;
+    private final String uiOrigin;
+    private final AlertingMetrics metrics;
+
+    @SuppressWarnings("SpringJavaInjectionPointsAutowiringInspection")
+    public NotificationDispatchJob(
+            AlertingProperties props,
+            AlertNotificationRepository notificationRepo,
+            AlertInstanceRepository instanceRepo,
+            AlertRuleRepository ruleRepo,
+            AlertSilenceRepository silenceRepo,
+            OutboundConnectionRepository outboundRepo,
+            EnvironmentRepository envRepo,
+            WebhookDispatcher dispatcher,
+            SilenceMatcherService silenceMatcher,
+            NotificationContextBuilder contextBuilder,
+            @Qualifier("alertingInstanceId") String instanceId,
+            @Value("${cameleer.server.tenant.id:default}") String tenantId,
+            Clock alertingClock,
+            @Value("${cameleer.server.ui-origin:#{null}}") String uiOrigin,
+            AlertingMetrics metrics) {
+
+        this.props           = props;
+        this.notificationRepo = notificationRepo;
+        this.instanceRepo    = instanceRepo;
+        this.ruleRepo        = ruleRepo;
+        this.silenceRepo     = silenceRepo;
+        this.outboundRepo    = outboundRepo;
+        this.envRepo         = envRepo;
+        this.dispatcher      = dispatcher;
+        this.silenceMatcher  = silenceMatcher;
+        this.contextBuilder  = contextBuilder;
+        this.instanceId      = instanceId;
+        this.tenantId        = tenantId;
+        this.clock           = alertingClock;
+        this.uiOrigin        = uiOrigin;
+        this.metrics         = metrics;
+    }
+
+    // -------------------------------------------------------------------------
+    // SchedulingConfigurer
+    // -------------------------------------------------------------------------
+
+    @Override
+    public void configureTasks(ScheduledTaskRegistrar registrar) {
+        registrar.addFixedDelayTask(this::tick, props.effectiveNotificationTickIntervalMs());
+    }
+
+    // -------------------------------------------------------------------------
+    // Tick — accessible for tests across packages
+    // -------------------------------------------------------------------------
+
+    public void tick() {
+        List<AlertNotification> claimed = notificationRepo.claimDueNotifications(
+                instanceId,
+                props.effectiveNotificationBatchSize(),
+                props.effectiveClaimTtlSeconds());
+
+        for (AlertNotification n : claimed) {
+            try {
+                processOne(n);
+            } catch (Exception e) {
+                log.warn("Notification dispatch error for {}: {}", n.id(), e.toString());
+                notificationRepo.scheduleRetry(n.id(), Instant.now(clock).plusSeconds(30), -1, e.getMessage());
+            }
+        }
+    }
+
+    // -------------------------------------------------------------------------
+    // Per-notification processing
+    // -------------------------------------------------------------------------
+
+    private void processOne(AlertNotification n) {
+        // 1. Resolve alert instance
+        AlertInstance instance = instanceRepo.findById(n.alertInstanceId()).orElse(null);
+        if (instance == null) {
+            notificationRepo.markFailed(n.id(), 0, "instance deleted");
+            return;
+        }
+
+        // 2. Resolve outbound connection
+        var conn = outboundRepo.findById(tenantId, n.outboundConnectionId()).orElse(null);
+        if (conn == null) {
+            notificationRepo.markFailed(n.id(), 0, "outbound connection deleted");
+            return;
+        }
+
+        // 3. Resolve rule and environment (may be null after deletion)
+        AlertRule rule = instance.ruleId() == null ? null
+                : ruleRepo.findById(instance.ruleId()).orElse(null);
+        Environment env = envRepo.findById(instance.environmentId()).orElse(null);
+
+        // 4. Build Mustache context (guard: rule or env may be null after deletion)
+        Map<String, Object> context = (rule != null && env != null)
+                ? contextBuilder.build(rule, instance, env, uiOrigin)
+                : Map.of();
+
+        // 5. Silence check
+        List<AlertSilence> activeSilences = silenceRepo.listActive(instance.environmentId(), Instant.now(clock));
+        for (AlertSilence s : activeSilences) {
+            if (silenceMatcher.matches(s.matcher(), instance, rule)) {
+                instanceRepo.markSilenced(instance.id(), true);
+                notificationRepo.markFailed(n.id(), 0, "silenced");
+                return;
+            }
+        }
+
+        // 6. Dispatch
+        WebhookDispatcher.Outcome outcome = dispatcher.dispatch(n, rule, instance, conn, context);
+
+        NotificationStatus outcomeStatus = outcome.status();
+        if (outcomeStatus == NotificationStatus.DELIVERED) {
+            Instant now = Instant.now(clock);
+            notificationRepo.markDelivered(n.id(), outcome.httpStatus(), outcome.snippet(), now);
+            instanceRepo.save(instance.withLastNotifiedAt(now));
+            metrics.notificationOutcome(NotificationStatus.DELIVERED);
+        } else if (outcomeStatus == NotificationStatus.FAILED) {
+            notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
+            metrics.notificationOutcome(NotificationStatus.FAILED);
+        } else {
+            // null status = transient failure (5xx / network / timeout) → retry
+            int attempts = n.attempts() + 1;
+            if (attempts >= props.effectiveWebhookMaxAttempts()) {
+                notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
+                metrics.notificationOutcome(NotificationStatus.FAILED);
+            } else {
+                Instant next = Instant.now(clock).plus(outcome.retryAfter().multipliedBy(attempts));
+                notificationRepo.scheduleRetry(n.id(), next, outcome.httpStatus(), outcome.snippet());
+            }
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java
@@ -0,0 +1,58 @@
+package com.cameleer.server.app.alerting.notify;
+
+import com.cameleer.server.core.alerting.AlertInstance;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.SilenceMatcher;
+import org.springframework.stereotype.Component;
+
+/**
+ * Evaluates whether an active silence matches an alert instance at notification-dispatch time.
+ * <p>
+ * Each non-null field on the matcher is an additional AND constraint. A null field is a wildcard.
+ * Matching is purely in-process — no I/O.
+ */
+@Component
+public class SilenceMatcherService {
+
+    /**
+     * Returns {@code true} if the silence covers this alert instance.
+     *
+     * @param matcher  the silence's matching spec (never null)
+     * @param instance the alert instance to test (never null)
+     * @param rule     the alert rule; may be null when the rule was deleted after instance creation.
+     *                 Scope-based matchers (appSlug, routeId, agentId) return false when rule is null
+     *                 because the scope cannot be verified.
+     */
+    public boolean matches(SilenceMatcher matcher, AlertInstance instance, AlertRule rule) {
+        // ruleId constraint
+        if (matcher.ruleId() != null && !matcher.ruleId().equals(instance.ruleId())) {
+            return false;
+        }
+
+        // severity constraint
+        if (matcher.severity() != null && matcher.severity() != instance.severity()) {
+            return false;
+        }
+
+        // scope-based constraints require the rule to derive scope from
+        boolean needsScope = matcher.appSlug() != null || matcher.routeId() != null || matcher.agentId() != null;
+        if (needsScope && rule == null) {
+            return false;
+        }
+
+        if (rule != null && rule.condition() != null) {
+            var scope = rule.condition().scope();
+            if (matcher.appSlug() != null && !matcher.appSlug().equals(scope.appSlug())) {
+                return false;
+            }
+            if (matcher.routeId() != null && !matcher.routeId().equals(scope.routeId())) {
+                return false;
+            }
+            if (matcher.agentId() != null && !matcher.agentId().equals(scope.agentId())) {
+                return false;
+            }
+        }
+
+        return true;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/WebhookDispatcher.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/WebhookDispatcher.java
@@ -0,0 +1,213 @@
+package com.cameleer.server.app.alerting.notify;
+
+import com.cameleer.server.app.alerting.config.AlertingProperties;
+import com.cameleer.server.app.outbound.crypto.SecretCipher;
+import com.cameleer.server.core.alerting.AlertInstance;
+import com.cameleer.server.core.alerting.AlertNotification;
+import com.cameleer.server.core.alerting.AlertRule;
+import com.cameleer.server.core.alerting.NotificationStatus;
+import com.cameleer.server.core.alerting.WebhookBinding;
+import com.cameleer.server.core.http.OutboundHttpClientFactory;
+import com.cameleer.server.core.http.OutboundHttpRequestContext;
+import com.cameleer.server.core.outbound.OutboundConnection;
+import com.cameleer.server.core.outbound.OutboundMethod;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.hc.client5.http.classic.methods.HttpPatch;
+import org.apache.hc.client5.http.classic.methods.HttpPost;
+import org.apache.hc.client5.http.classic.methods.HttpPut;
+import org.apache.hc.client5.http.classic.methods.HttpUriRequestBase;
+import org.apache.hc.core5.http.io.entity.EntityUtils;
+import org.apache.hc.core5.http.io.entity.StringEntity;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.stereotype.Component;
+
+import java.nio.charset.StandardCharsets;
+import java.time.Duration;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/**
+ * Renders, signs, and dispatches webhook notifications over HTTP.
+ * <p>
+ * Classification:
+ * <ul>
+ *   <li>2xx → {@link NotificationStatus#DELIVERED}</li>
+ *   <li>4xx → {@link NotificationStatus#FAILED} (retry won't help)</li>
+ *   <li>5xx / network / timeout → {@code null} status (caller retries up to max attempts)</li>
+ * </ul>
+ */
+@Component
+public class WebhookDispatcher {
+
+    private static final Logger log = LoggerFactory.getLogger(WebhookDispatcher.class);
+
+    /** baseDelay that callers multiply by attempt count: 30s, 60s, 90s, … */
+    static final Duration BASE_RETRY_DELAY = Duration.ofSeconds(30);
+
+    private static final int SNIPPET_LIMIT = 512;
+    private static final String DEFAULT_CONTENT_TYPE = "application/json";
+
+    private final OutboundHttpClientFactory clientFactory;
+    private final SecretCipher secretCipher;
+    private final MustacheRenderer renderer;
+    private final AlertingProperties props;
+    private final ObjectMapper objectMapper;
+
+    public WebhookDispatcher(OutboundHttpClientFactory clientFactory,
+                             SecretCipher secretCipher,
+                             MustacheRenderer renderer,
+                             AlertingProperties props,
+                             ObjectMapper objectMapper) {
+        this.clientFactory = clientFactory;
+        this.secretCipher  = secretCipher;
+        this.renderer      = renderer;
+        this.props         = props;
+        this.objectMapper  = objectMapper;
+    }
+
+    public record Outcome(
+            NotificationStatus status,
+            int httpStatus,
+            String snippet,
+            Duration retryAfter) {}
+
+    /**
+     * Dispatch a single webhook notification.
+     *
+     * @param notif    the outbox record (contains webhookId used to find per-rule overrides)
+     * @param rule     the alert rule (may be null when rule was deleted)
+     * @param instance the alert instance
+     * @param conn     the resolved outbound connection
+     * @param context  the Mustache rendering context
+     */
+    public Outcome dispatch(AlertNotification notif,
+                            AlertRule rule,
+                            AlertInstance instance,
+                            OutboundConnection conn,
+                            Map<String, Object> context) {
+        try {
+            // 1. Determine per-binding overrides
+            WebhookBinding binding = findBinding(rule, notif);
+
+            // 2. Render URL
+            String url = renderer.render(conn.url(), context);
+
+            // 3. Build body
+            String body = buildBody(conn, binding, context);
+
+            // 4. Build headers
+            Map<String, String> headers = buildHeaders(conn, binding, context);
+
+            // 5. HMAC sign if configured
+            if (conn.hmacSecretCiphertext() != null) {
+                String secret = secretCipher.decrypt(conn.hmacSecretCiphertext());
+                String sig = new HmacSigner().sign(secret, body.getBytes(StandardCharsets.UTF_8));
+                headers.put("X-Cameleer-Signature", sig);
+            }
+
+            // 6. Build HTTP request
+            Duration timeout = Duration.ofMillis(props.effectiveWebhookTimeoutMs());
+            OutboundHttpRequestContext ctx = new OutboundHttpRequestContext(
+                    conn.tlsTrustMode(), conn.tlsCaPemPaths(), timeout, timeout);
+
+            var client = clientFactory.clientFor(ctx);
+            HttpUriRequestBase request = buildRequest(conn.method(), url);
+            for (var e : headers.entrySet()) {
+                request.setHeader(e.getKey(), e.getValue());
+            }
+            request.setEntity(new StringEntity(body, StandardCharsets.UTF_8));
+
+            // 7. Execute and classify
+            try (var response = client.execute(request)) {
+                int code = response.getCode();
+                String snippet = snippet(response.getEntity() != null
+                        ? EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8)
+                        : "");
+
+                if (code >= 200 && code < 300) {
+                    return new Outcome(NotificationStatus.DELIVERED, code, snippet, null);
+                } else if (code >= 400 && code < 500) {
+                    return new Outcome(NotificationStatus.FAILED, code, snippet, null);
+                } else {
+                    return new Outcome(null, code, snippet, BASE_RETRY_DELAY);
+                }
+            }
+
+        } catch (Exception e) {
+            log.warn("WebhookDispatcher: network/timeout error dispatching notification {}: {}",
+                    notif.id(), e.getMessage());
+            return new Outcome(null, 0, snippet(e.getMessage()), BASE_RETRY_DELAY);
+        }
+    }
+
+    // -------------------------------------------------------------------------
+    // Helpers
+    // -------------------------------------------------------------------------
+
+    private WebhookBinding findBinding(AlertRule rule, AlertNotification notif) {
+        if (rule == null || notif.webhookId() == null) return null;
+        return rule.webhooks().stream()
+                .filter(w -> w.id().equals(notif.webhookId()))
+                .findFirst()
+                .orElse(null);
+    }
+
+    private String buildBody(OutboundConnection conn, WebhookBinding binding, Map<String, Object> context) {
+        // Priority: per-binding override > connection default > built-in JSON envelope
+        String tmpl = null;
+        if (binding != null && binding.bodyOverride() != null) {
+            tmpl = binding.bodyOverride();
+        } else if (conn.defaultBodyTmpl() != null) {
+            tmpl = conn.defaultBodyTmpl();
+        }
+
+        if (tmpl != null) {
+            return renderer.render(tmpl, context);
+        }
+
+        // Built-in default: serialize the entire context map as JSON
+        try {
+            return objectMapper.writeValueAsString(context);
+        } catch (Exception e) {
+            log.warn("WebhookDispatcher: failed to serialize context as JSON, using empty object", e);
+            return "{}";
+        }
+    }
+
+    private Map<String, String> buildHeaders(OutboundConnection conn, WebhookBinding binding,
+                                             Map<String, Object> context) {
+        Map<String, String> headers = new LinkedHashMap<>();
+
+        // Default content-type
+        headers.put("Content-Type", DEFAULT_CONTENT_TYPE);
+
+        // Connection-level default headers (keys are literal, values are Mustache-rendered)
+        for (var e : conn.defaultHeaders().entrySet()) {
+            headers.put(e.getKey(), renderer.render(e.getValue(), context));
+        }
+
+        // Per-binding overrides (also Mustache-rendered values)
+        if (binding != null) {
+            for (var e : binding.headerOverrides().entrySet()) {
+                headers.put(e.getKey(), renderer.render(e.getValue(), context));
+            }
+        }
+
+        return headers;
+    }
+
+    private HttpUriRequestBase buildRequest(OutboundMethod method, String url) {
+        if (method == null) method = OutboundMethod.POST;
+        return switch (method) {
+            case PUT   -> new HttpPut(url);
+            case PATCH -> new HttpPatch(url);
+            default    -> new HttpPost(url);
+        };
+    }
+
+    private String snippet(String text) {
+        if (text == null) return "";
+        return text.length() <= SNIPPET_LIMIT ? text : text.substring(0, SNIPPET_LIMIT);
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/retention/AlertingRetentionJob.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/retention/AlertingRetentionJob.java
@@ -0,0 +1,63 @@
+package com.cameleer.server.app.alerting.retention;
+
+import com.cameleer.server.app.alerting.config.AlertingProperties;
+import com.cameleer.server.core.alerting.AlertInstanceRepository;
+import com.cameleer.server.core.alerting.AlertNotificationRepository;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.scheduling.annotation.Scheduled;
+import org.springframework.stereotype.Component;
+
+import java.time.Clock;
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+
+/**
+ * Nightly retention job for alerting data.
+ * <p>
+ * Deletes RESOLVED {@link com.cameleer.server.core.alerting.AlertInstance} rows older than
+ * {@code cameleer.server.alerting.eventRetentionDays} and DELIVERED/FAILED
+ * {@link com.cameleer.server.core.alerting.AlertNotification} rows older than
+ * {@code cameleer.server.alerting.notificationRetentionDays}.
+ * <p>
+ * Duplicate runs across replicas are tolerable — the DELETEs are idempotent.
+ */
+@Component
+public class AlertingRetentionJob {
+
+    private static final Logger log = LoggerFactory.getLogger(AlertingRetentionJob.class);
+
+    private final AlertingProperties props;
+    private final AlertInstanceRepository alertInstanceRepo;
+    private final AlertNotificationRepository alertNotificationRepo;
+    private final Clock clock;
+
+    public AlertingRetentionJob(AlertingProperties props,
+                                AlertInstanceRepository alertInstanceRepo,
+                                AlertNotificationRepository alertNotificationRepo,
+                                Clock alertingClock) {
+        this.props = props;
+        this.alertInstanceRepo = alertInstanceRepo;
+        this.alertNotificationRepo = alertNotificationRepo;
+        this.clock = alertingClock;
+    }
+
+    @Scheduled(cron = "0 0 3 * * *") // 03:00 every day
+    public void cleanup() {
+        log.info("Alerting retention job started");
+
+        Instant now = Instant.now(clock);
+
+        Instant instanceCutoff = now.minus(props.effectiveEventRetentionDays(), ChronoUnit.DAYS);
+        alertInstanceRepo.deleteResolvedBefore(instanceCutoff);
+        log.info("Alerting retention: deleted RESOLVED instances older than {} ({} days)",
+                instanceCutoff, props.effectiveEventRetentionDays());
+
+        Instant notificationCutoff = now.minus(props.effectiveNotificationRetentionDays(), ChronoUnit.DAYS);
+        alertNotificationRepo.deleteSettledBefore(notificationCutoff);
+        log.info("Alerting retention: deleted settled notifications older than {} ({} days)",
+                notificationCutoff, props.effectiveNotificationRetentionDays());
+
+        log.info("Alerting retention job completed");
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java
@@ -0,0 +1,377 @@
+package com.cameleer.server.app.alerting.storage;
+
+import com.cameleer.server.core.alerting.*;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.dao.DuplicateKeyException;
+import org.springframework.jdbc.core.ConnectionCallback;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.springframework.jdbc.core.RowMapper;
+
+import java.sql.Array;
+import java.sql.SQLException;
+import java.sql.Timestamp;
+import java.time.Instant;
+import java.util.*;
+
+public class PostgresAlertInstanceRepository implements AlertInstanceRepository {
+
+    private static final Logger log = LoggerFactory.getLogger(PostgresAlertInstanceRepository.class);
+
+    private final JdbcTemplate jdbc;
+    private final ObjectMapper om;
+
+    public PostgresAlertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        this.jdbc = jdbc;
+        this.om = om;
+    }
+
+    @Override
+    public AlertInstance save(AlertInstance i) {
+        String sql = """
+            INSERT INTO alert_instances (
+                id, rule_id, rule_snapshot, environment_id, state, severity,
+                fired_at, acked_at, acked_by, resolved_at, last_notified_at,
+                read_at, deleted_at,
+                silenced, current_value, threshold, context, title, message,
+                target_user_ids, target_group_ids, target_role_names)
+            VALUES (?, ?, ?::jsonb, ?, ?::alert_state_enum, ?::severity_enum,
+                    ?, ?, ?, ?, ?,
+                    ?, ?,
+                    ?, ?, ?, ?::jsonb, ?, ?,
+                    ?, ?, ?)
+            ON CONFLICT (id) DO UPDATE SET
+                state             = EXCLUDED.state,
+                acked_at          = EXCLUDED.acked_at,
+                acked_by          = EXCLUDED.acked_by,
+                resolved_at       = EXCLUDED.resolved_at,
+                last_notified_at  = EXCLUDED.last_notified_at,
+                read_at           = EXCLUDED.read_at,
+                deleted_at        = EXCLUDED.deleted_at,
+                silenced          = EXCLUDED.silenced,
+                current_value     = EXCLUDED.current_value,
+                threshold         = EXCLUDED.threshold,
+                context           = EXCLUDED.context,
+                title             = EXCLUDED.title,
+                message           = EXCLUDED.message,
+                target_user_ids   = EXCLUDED.target_user_ids,
+                target_group_ids  = EXCLUDED.target_group_ids,
+                target_role_names = EXCLUDED.target_role_names
+            """;
+        Array userIds  = toTextArray(i.targetUserIds());
+        Array groupIds = toUuidArray(i.targetGroupIds());
+        Array roleNames = toTextArray(i.targetRoleNames());
+
+        try {
+            jdbc.update(sql,
+                i.id(), i.ruleId(), writeJson(i.ruleSnapshot()),
+                i.environmentId(), i.state().name(), i.severity().name(),
+                ts(i.firedAt()), ts(i.ackedAt()), i.ackedBy(),
+                ts(i.resolvedAt()), ts(i.lastNotifiedAt()),
+                ts(i.readAt()), ts(i.deletedAt()),
+                i.silenced(), i.currentValue(), i.threshold(),
+                writeJson(i.context()), i.title(), i.message(),
+                userIds, groupIds, roleNames);
+        } catch (DuplicateKeyException e) {
+            log.info("Skipped duplicate open alert_instance for rule {}: {}", i.ruleId(), e.getMessage());
+            return findOpenForRule(i.ruleId()).orElse(i);
+        }
+        return i;
+    }
+
+    @Override
+    public Optional<AlertInstance> findById(UUID id) {
+        var list = jdbc.query("SELECT * FROM alert_instances WHERE id = ?", rowMapper(), id);
+        return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
+    }
+
+    @Override
+    public Optional<AlertInstance> findOpenForRule(UUID ruleId) {
+        var list = jdbc.query("""
+            SELECT * FROM alert_instances
+             WHERE rule_id = ?
+               AND state IN ('PENDING','FIRING')
+               AND deleted_at IS NULL
+             LIMIT 1
+            """, rowMapper(), ruleId);
+        return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
+    }
+
+    @Override
+    public List<AlertInstance> listForInbox(UUID environmentId,
+                                            List<String> userGroupIdFilter,
+                                            String userId,
+                                            List<String> userRoleNames,
+                                            List<AlertState> states,
+                                            List<AlertSeverity> severities,
+                                            Boolean acked,
+                                            Boolean read,
+                                            int limit) {
+        Array groupArray = toUuidArrayFromStrings(userGroupIdFilter);
+        Array roleArray  = toTextArray(userRoleNames);
+
+        StringBuilder sql = new StringBuilder("""
+            SELECT * FROM alert_instances
+             WHERE environment_id = ?
+               AND (
+                   ? = ANY(target_user_ids)
+                   OR target_group_ids && ?
+                   OR target_role_names && ?
+               )
+            """);
+        List<Object> args = new ArrayList<>(List.of(environmentId, userId, groupArray, roleArray));
+
+        if (states != null && !states.isEmpty()) {
+            Array stateArray = toTextArray(states.stream().map(Enum::name).toList());
+            sql.append(" AND state::text = ANY(?)");
+            args.add(stateArray);
+        }
+        if (severities != null && !severities.isEmpty()) {
+            Array severityArray = toTextArray(severities.stream().map(Enum::name).toList());
+            sql.append(" AND severity::text = ANY(?)");
+            args.add(severityArray);
+        }
+        if (acked != null) {
+            sql.append(acked ? " AND acked_at IS NOT NULL" : " AND acked_at IS NULL");
+        }
+        if (read != null) {
+            sql.append(read ? " AND read_at IS NOT NULL" : " AND read_at IS NULL");
+        }
+        sql.append(" AND deleted_at IS NULL");
+        sql.append(" ORDER BY fired_at DESC LIMIT ?");
+        args.add(limit);
+
+        return jdbc.query(sql.toString(), rowMapper(), args.toArray());
+    }
+
+    @Override
+    public Map<AlertSeverity, Long> countUnreadBySeverity(UUID environmentId,
+                                                          String userId,
+                                                          List<String> groupIds,
+                                                          List<String> roleNames) {
+        Array groupArray = toUuidArrayFromStrings(groupIds);
+        Array roleArray  = toTextArray(roleNames);
+        String sql = """
+            SELECT severity::text AS severity, COUNT(*) AS cnt
+              FROM alert_instances
+             WHERE environment_id = ?
+               AND read_at IS NULL
+               AND deleted_at IS NULL
+               AND (
+                   ? = ANY(target_user_ids)
+                   OR target_group_ids && ?
+                   OR target_role_names && ?
+               )
+             GROUP BY severity
+            """;
+        EnumMap<AlertSeverity, Long> counts = new EnumMap<>(AlertSeverity.class);
+        for (AlertSeverity s : AlertSeverity.values()) counts.put(s, 0L);
+        jdbc.query(sql, (org.springframework.jdbc.core.RowCallbackHandler) rs -> counts.put(
+            AlertSeverity.valueOf(rs.getString("severity")), rs.getLong("cnt")
+        ), environmentId, userId, groupArray, roleArray);
+        return counts;
+    }
+
+    @Override
+    public void ack(UUID id, String userId, Instant when) {
+        jdbc.update("""
+            UPDATE alert_instances
+               SET acked_at = ?, acked_by = ?
+             WHERE id = ? AND acked_at IS NULL AND deleted_at IS NULL
+            """, Timestamp.from(when), userId, id);
+    }
+
+    @Override
+    public void markRead(UUID id, Instant when) {
+        jdbc.update("UPDATE alert_instances SET read_at = ? WHERE id = ? AND read_at IS NULL",
+                Timestamp.from(when), id);
+    }
+
+    @Override
+    public void bulkMarkRead(List<UUID> ids, Instant when) {
+        if (ids == null || ids.isEmpty()) return;
+        Array idArray = jdbc.execute((ConnectionCallback<Array>) c ->
+            c.createArrayOf("uuid", ids.toArray()));
+        jdbc.update("""
+            UPDATE alert_instances SET read_at = ?
+             WHERE id = ANY(?) AND read_at IS NULL AND deleted_at IS NULL
+            """, Timestamp.from(when), idArray);
+    }
+
+    @Override
+    public void softDelete(UUID id, Instant when) {
+        jdbc.update("UPDATE alert_instances SET deleted_at = ? WHERE id = ? AND deleted_at IS NULL",
+                Timestamp.from(when), id);
+    }
+
+    @Override
+    public void bulkSoftDelete(List<UUID> ids, Instant when) {
+        if (ids == null || ids.isEmpty()) return;
+        Array idArray = jdbc.execute((ConnectionCallback<Array>) c ->
+            c.createArrayOf("uuid", ids.toArray()));
+        jdbc.update("""
+            UPDATE alert_instances SET deleted_at = ?
+             WHERE id = ANY(?) AND deleted_at IS NULL
+            """, Timestamp.from(when), idArray);
+    }
+
+    @Override
+    public void restore(UUID id) {
+        jdbc.update("UPDATE alert_instances SET deleted_at = NULL WHERE id = ?", id);
+    }
+
+    @Override
+    public void bulkAck(List<UUID> ids, String userId, Instant when) {
+        if (ids == null || ids.isEmpty()) return;
+        Array idArray = jdbc.execute((ConnectionCallback<Array>) c ->
+            c.createArrayOf("uuid", ids.toArray()));
+        jdbc.update("""
+            UPDATE alert_instances SET acked_at = ?, acked_by = ?
+             WHERE id = ANY(?) AND acked_at IS NULL AND deleted_at IS NULL
+            """, Timestamp.from(when), userId, idArray);
+    }
+
+    @Override
+    public void resolve(UUID id, Instant when) {
+        jdbc.update("""
+            UPDATE alert_instances
+               SET state = 'RESOLVED'::alert_state_enum,
+                   resolved_at = ?
+             WHERE id = ?
+            """, Timestamp.from(when), id);
+    }
+
+    @Override
+    public void markSilenced(UUID id, boolean silenced) {
+        jdbc.update("UPDATE alert_instances SET silenced = ? WHERE id = ?", silenced, id);
+    }
+
+    @Override
+    public List<AlertInstance> listFiringDueForReNotify(Instant now) {
+        return jdbc.query("""
+            SELECT ai.* FROM alert_instances ai
+              JOIN alert_rules ar ON ar.id = ai.rule_id
+             WHERE ai.state = 'FIRING'::alert_state_enum
+               AND ai.silenced = false
+               AND ar.enabled = true
+               AND ar.re_notify_minutes > 0
+               AND ai.last_notified_at IS NOT NULL
+               AND ai.last_notified_at + make_interval(mins => ar.re_notify_minutes) <= ?
+            """, rowMapper(), Timestamp.from(now));
+    }
+
+    @Override
+    public List<UUID> filterInEnvLive(List<UUID> ids, UUID environmentId) {
+        if (ids == null || ids.isEmpty()) return List.of();
+        Array idArray = jdbc.execute((ConnectionCallback<Array>) c ->
+            c.createArrayOf("uuid", ids.toArray()));
+        return jdbc.query("""
+            SELECT id FROM alert_instances
+             WHERE id = ANY(?) AND environment_id = ? AND deleted_at IS NULL
+            """, (rs, i) -> (UUID) rs.getObject("id"), idArray, environmentId);
+    }
+
+    @Override
+    public void deleteResolvedBefore(Instant cutoff) {
+        jdbc.update("""
+            DELETE FROM alert_instances
+             WHERE state = 'RESOLVED'::alert_state_enum
+               AND resolved_at < ?
+            """, Timestamp.from(cutoff));
+    }
+
+    // -------------------------------------------------------------------------
+
+    private RowMapper<AlertInstance> rowMapper() {
+        return (rs, i) -> {
+            try {
+                Map<String, Object> snapshot = om.readValue(
+                    rs.getString("rule_snapshot"), new TypeReference<>() {});
+                Map<String, Object> context = om.readValue(
+                    rs.getString("context"), new TypeReference<>() {});
+
+                Timestamp ackedAt = rs.getTimestamp("acked_at");
+                Timestamp resolvedAt = rs.getTimestamp("resolved_at");
+                Timestamp lastNotifiedAt = rs.getTimestamp("last_notified_at");
+                Timestamp readAt = rs.getTimestamp("read_at");
+                Timestamp deletedAt = rs.getTimestamp("deleted_at");
+
+                Object cvObj = rs.getObject("current_value");
+                Double currentValue = cvObj == null ? null : ((Number) cvObj).doubleValue();
+                Object thObj = rs.getObject("threshold");
+                Double threshold = thObj == null ? null : ((Number) thObj).doubleValue();
+
+                UUID ruleId = rs.getObject("rule_id") == null ? null : (UUID) rs.getObject("rule_id");
+
+                return new AlertInstance(
+                    (UUID) rs.getObject("id"),
+                    ruleId,
+                    snapshot,
+                    (UUID) rs.getObject("environment_id"),
+                    AlertState.valueOf(rs.getString("state")),
+                    AlertSeverity.valueOf(rs.getString("severity")),
+                    rs.getTimestamp("fired_at").toInstant(),
+                    ackedAt == null ? null : ackedAt.toInstant(),
+                    rs.getString("acked_by"),
+                    resolvedAt == null ? null : resolvedAt.toInstant(),
+                    lastNotifiedAt == null ? null : lastNotifiedAt.toInstant(),
+                    readAt == null ? null : readAt.toInstant(),
+                    deletedAt == null ? null : deletedAt.toInstant(),
+                    rs.getBoolean("silenced"),
+                    currentValue,
+                    threshold,
+                    context,
+                    rs.getString("title"),
+                    rs.getString("message"),
+                    readTextArray(rs.getArray("target_user_ids")),
+                    readUuidArray(rs.getArray("target_group_ids")),
+                    readTextArray(rs.getArray("target_role_names")));
+            } catch (Exception e) {
+                throw new IllegalStateException("Failed to map alert_instances row", e);
+            }
+        };
+    }
+
+    private String writeJson(Object o) {
+        try { return om.writeValueAsString(o); }
+        catch (Exception e) { throw new IllegalStateException("Failed to serialize JSON", e); }
+    }
+
+    private Timestamp ts(Instant instant) {
+        return instant == null ? null : Timestamp.from(instant);
+    }
+
+    private Array toTextArray(List<String> items) {
+        return jdbc.execute((ConnectionCallback<Array>) conn ->
+            conn.createArrayOf("text", items.toArray()));
+    }
+
+    private Array toUuidArray(List<UUID> ids) {
+        return jdbc.execute((ConnectionCallback<Array>) conn ->
+            conn.createArrayOf("uuid", ids.toArray()));
+    }
+
+    private Array toUuidArrayFromStrings(List<String> ids) {
+        return jdbc.execute((ConnectionCallback<Array>) conn ->
+            conn.createArrayOf("uuid",
+                ids.stream().map(UUID::fromString).toArray()));
+    }
+
+    private List<String> readTextArray(Array arr) throws SQLException {
+        if (arr == null) return List.of();
+        Object[] raw = (Object[]) arr.getArray();
+        List<String> out = new ArrayList<>(raw.length);
+        for (Object o : raw) out.add((String) o);
+        return out;
+    }
+
+    private List<UUID> readUuidArray(Array arr) throws SQLException {
+        if (arr == null) return List.of();
+        Object[] raw = (Object[]) arr.getArray();
+        List<UUID> out = new ArrayList<>(raw.length);
+        for (Object o : raw) out.add((UUID) o);
+        return out;
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertNotificationRepository.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertNotificationRepository.java
@@ -0,0 +1,200 @@
+package com.cameleer.server.app.alerting.storage;
+
+import com.cameleer.server.core.alerting.AlertNotification;
+import com.cameleer.server.core.alerting.AlertNotificationRepository;
+import com.cameleer.server.core.alerting.NotificationStatus;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.springframework.jdbc.core.RowMapper;
+
+import java.sql.Timestamp;
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+
+public class PostgresAlertNotificationRepository implements AlertNotificationRepository {
+
+    private final JdbcTemplate jdbc;
+    private final ObjectMapper om;
+
+    public PostgresAlertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        this.jdbc = jdbc;
+        this.om = om;
+    }
+
+    @Override
+    public AlertNotification save(AlertNotification n) {
+        jdbc.update("""
+            INSERT INTO alert_notifications (
+                id, alert_instance_id, webhook_id, outbound_connection_id,
+                status, attempts, next_attempt_at, claimed_by, claimed_until,
+                last_response_status, last_response_snippet, payload, delivered_at, created_at)
+            VALUES (?, ?, ?, ?,
+                    ?::notification_status_enum, ?, ?, ?, ?,
+                    ?, ?, ?::jsonb, ?, ?)
+            ON CONFLICT (id) DO UPDATE SET
+                status                = EXCLUDED.status,
+                attempts              = EXCLUDED.attempts,
+                next_attempt_at       = EXCLUDED.next_attempt_at,
+                claimed_by            = EXCLUDED.claimed_by,
+                claimed_until         = EXCLUDED.claimed_until,
+                last_response_status  = EXCLUDED.last_response_status,
+                last_response_snippet = EXCLUDED.last_response_snippet,
+                payload               = EXCLUDED.payload,
+                delivered_at          = EXCLUDED.delivered_at
+            """,
+            n.id(), n.alertInstanceId(), n.webhookId(), n.outboundConnectionId(),
+            n.status().name(), n.attempts(), Timestamp.from(n.nextAttemptAt()),
+            n.claimedBy(), n.claimedUntil() == null ? null : Timestamp.from(n.claimedUntil()),
+            n.lastResponseStatus(), n.lastResponseSnippet(),
+            writeJson(n.payload()),
+            n.deliveredAt() == null ? null : Timestamp.from(n.deliveredAt()),
+            Timestamp.from(n.createdAt()));
+        return n;
+    }
+
+    @Override
+    public Optional<AlertNotification> findById(UUID id) {
+        var list = jdbc.query("SELECT * FROM alert_notifications WHERE id = ?", rowMapper(), id);
+        return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
+    }
+
+    @Override
+    public List<AlertNotification> listForInstance(UUID alertInstanceId) {
+        return jdbc.query("""
+            SELECT * FROM alert_notifications
+             WHERE alert_instance_id = ?
+             ORDER BY created_at DESC
+            """, rowMapper(), alertInstanceId);
+    }
+
+    @Override
+    public List<AlertNotification> claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds) {
+        String sql = """
+            UPDATE alert_notifications
+               SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
+             WHERE id IN (
+                 SELECT id FROM alert_notifications
+                  WHERE status = 'PENDING'::notification_status_enum
+                    AND next_attempt_at <= now()
+                    AND (claimed_until IS NULL OR claimed_until < now())
+                  ORDER BY next_attempt_at
+                  LIMIT ?
+                  FOR UPDATE SKIP LOCKED
+             )
+             RETURNING *
+            """;
+        return jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize);
+    }
+
+    @Override
+    public void markDelivered(UUID id, int status, String snippet, Instant when) {
+        jdbc.update("""
+            UPDATE alert_notifications
+               SET status = 'DELIVERED'::notification_status_enum,
+                   last_response_status  = ?,
+                   last_response_snippet = ?,
+                   delivered_at          = ?,
+                   claimed_by            = NULL,
+                   claimed_until         = NULL
+             WHERE id = ?
+            """, status, snippet, Timestamp.from(when), id);
+    }
+
+    @Override
+    public void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet) {
+        jdbc.update("""
+            UPDATE alert_notifications
+               SET attempts              = attempts + 1,
+                   next_attempt_at       = ?,
+                   last_response_status  = ?,
+                   last_response_snippet = ?,
+                   claimed_by            = NULL,
+                   claimed_until         = NULL
+             WHERE id = ?
+            """, Timestamp.from(nextAttemptAt), status, snippet, id);
+    }
+
+    @Override
+    public void resetForRetry(UUID id, Instant nextAttemptAt) {
+        jdbc.update("""
+            UPDATE alert_notifications
+               SET attempts              = 0,
+                   status               = 'PENDING'::notification_status_enum,
+                   next_attempt_at      = ?,
+                   claimed_by           = NULL,
+                   claimed_until        = NULL,
+                   last_response_status = NULL,
+                   last_response_snippet = NULL
+             WHERE id = ?
+            """, Timestamp.from(nextAttemptAt), id);
+    }
+
+    @Override
+    public void markFailed(UUID id, int status, String snippet) {
+        jdbc.update("""
+            UPDATE alert_notifications
+               SET status                = 'FAILED'::notification_status_enum,
+                   attempts              = attempts + 1,
+                   last_response_status  = ?,
+                   last_response_snippet = ?,
+                   claimed_by            = NULL,
+                   claimed_until         = NULL
+             WHERE id = ?
+            """, status, snippet, id);
+    }
+
+    @Override
+    public void deleteSettledBefore(Instant cutoff) {
+        jdbc.update("""
+            DELETE FROM alert_notifications
+             WHERE status IN ('DELIVERED'::notification_status_enum, 'FAILED'::notification_status_enum)
+               AND created_at < ?
+            """, Timestamp.from(cutoff));
+    }
+
+    // -------------------------------------------------------------------------
+
+    private RowMapper<AlertNotification> rowMapper() {
+        return (rs, i) -> {
+            try {
+                Map<String, Object> payload = om.readValue(
+                    rs.getString("payload"), new TypeReference<>() {});
+                Timestamp claimedUntil = rs.getTimestamp("claimed_until");
+                Timestamp deliveredAt = rs.getTimestamp("delivered_at");
+                Object lastStatus = rs.getObject("last_response_status");
+
+                Object webhookIdObj = rs.getObject("webhook_id");
+                UUID webhookId = webhookIdObj == null ? null : (UUID) webhookIdObj;
+                Object connIdObj = rs.getObject("outbound_connection_id");
+                UUID connId = connIdObj == null ? null : (UUID) connIdObj;
+
+                return new AlertNotification(
+                    (UUID) rs.getObject("id"),
+                    (UUID) rs.getObject("alert_instance_id"),
+                    webhookId,
+                    connId,
+                    NotificationStatus.valueOf(rs.getString("status")),
+                    rs.getInt("attempts"),
+                    rs.getTimestamp("next_attempt_at").toInstant(),
+                    rs.getString("claimed_by"),
+                    claimedUntil == null ? null : claimedUntil.toInstant(),
+                    lastStatus == null ? null : ((Number) lastStatus).intValue(),
+                    rs.getString("last_response_snippet"),
+                    payload,
+                    deliveredAt == null ? null : deliveredAt.toInstant(),
+                    rs.getTimestamp("created_at").toInstant());
+            } catch (Exception e) {
+                throw new IllegalStateException("Failed to map alert_notifications row", e);
+            }
+        };
+    }
+
+    private String writeJson(Object o) {
+        try { return om.writeValueAsString(o); }
+        catch (Exception e) { throw new IllegalStateException("Failed to serialize JSON", e); }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java
@@ -0,0 +1,223 @@
+package com.cameleer.server.app.alerting.storage;
+
+import com.cameleer.server.core.alerting.*;
+import com.fasterxml.jackson.core.type.TypeReference;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.springframework.jdbc.core.RowMapper;
+
+import java.sql.Timestamp;
+import java.time.Instant;
+import java.util.*;
+
+public class PostgresAlertRuleRepository implements AlertRuleRepository {
+
+    private final JdbcTemplate jdbc;
+    private final ObjectMapper om;
+
+    public PostgresAlertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        this.jdbc = jdbc;
+        this.om = om;
+    }
+
+    @Override
+    public AlertRule save(AlertRule r) {
+        String sql = """
+            INSERT INTO alert_rules (id, environment_id, name, description, severity, enabled,
+                condition_kind, condition, evaluation_interval_seconds, for_duration_seconds,
+                re_notify_minutes, notification_title_tmpl, notification_message_tmpl,
+                webhooks, next_evaluation_at, claimed_by, claimed_until, eval_state,
+                created_at, created_by, updated_at, updated_by)
+            VALUES (?, ?, ?, ?, ?::severity_enum, ?, ?::condition_kind_enum, ?::jsonb, ?, ?, ?, ?, ?, ?::jsonb,
+                ?, ?, ?, ?::jsonb, ?, ?, ?, ?)
+            ON CONFLICT (id) DO UPDATE SET
+                name = EXCLUDED.name, description = EXCLUDED.description,
+                severity = EXCLUDED.severity, enabled = EXCLUDED.enabled,
+                condition_kind = EXCLUDED.condition_kind, condition = EXCLUDED.condition,
+                evaluation_interval_seconds = EXCLUDED.evaluation_interval_seconds,
+                for_duration_seconds = EXCLUDED.for_duration_seconds,
+                re_notify_minutes = EXCLUDED.re_notify_minutes,
+                notification_title_tmpl = EXCLUDED.notification_title_tmpl,
+                notification_message_tmpl = EXCLUDED.notification_message_tmpl,
+                webhooks = EXCLUDED.webhooks, eval_state = EXCLUDED.eval_state,
+                updated_at = EXCLUDED.updated_at, updated_by = EXCLUDED.updated_by
+            """;
+        jdbc.update(sql,
+            r.id(), r.environmentId(), r.name(), r.description(),
+            r.severity().name(), r.enabled(), r.conditionKind().name(),
+            writeJson(r.condition()),
+            r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
+            r.notificationTitleTmpl(), r.notificationMessageTmpl(),
+            writeJson(r.webhooks()),
+            Timestamp.from(r.nextEvaluationAt()),
+            r.claimedBy(),
+            r.claimedUntil() == null ? null : Timestamp.from(r.claimedUntil()),
+            writeJson(r.evalState()),
+            Timestamp.from(r.createdAt()), r.createdBy(),
+            Timestamp.from(r.updatedAt()), r.updatedBy());
+        saveTargets(r.id(), r.targets());
+        return r;
+    }
+
+    private void saveTargets(UUID ruleId, List<AlertRuleTarget> targets) {
+        jdbc.update("DELETE FROM alert_rule_targets WHERE rule_id = ?", ruleId);
+        if (targets == null || targets.isEmpty()) return;
+        jdbc.batchUpdate(
+            "INSERT INTO alert_rule_targets (id, rule_id, target_kind, target_id) VALUES (?, ?, ?::target_kind_enum, ?)",
+            targets, targets.size(), (ps, t) -> {
+                ps.setObject(1, t.id() != null ? t.id() : UUID.randomUUID());
+                ps.setObject(2, ruleId);
+                ps.setString(3, t.kind().name());
+                ps.setString(4, t.targetId());
+            });
+    }
+
+    @Override
+    public Optional<AlertRule> findById(UUID id) {
+        var list = jdbc.query("SELECT * FROM alert_rules WHERE id = ?", rowMapper(), id);
+        if (list.isEmpty()) return Optional.empty();
+        return Optional.of(withTargets(list).get(0));
+    }
+
+    @Override
+    public List<AlertRule> listByEnvironment(UUID environmentId) {
+        var list = jdbc.query(
+            "SELECT * FROM alert_rules WHERE environment_id = ? ORDER BY created_at DESC",
+            rowMapper(), environmentId);
+        return withTargets(list);
+    }
+
+    @Override
+    public List<AlertRule> findAllByOutboundConnectionId(UUID connectionId) {
+        String sql = """
+            SELECT * FROM alert_rules
+             WHERE webhooks @> ?::jsonb
+             ORDER BY created_at DESC
+            """;
+        String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
+        return jdbc.query(sql, rowMapper(), predicate);
+    }
+
+    @Override
+    public List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId) {
+        String sql = """
+            SELECT id FROM alert_rules
+             WHERE webhooks @> ?::jsonb
+            """;
+        String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
+        return jdbc.queryForList(sql, UUID.class, predicate);
+    }
+
+    @Override
+    public void delete(UUID id) {
+        jdbc.update("DELETE FROM alert_rules WHERE id = ?", id);
+    }
+
+    @Override
+    public List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds) {
+        String sql = """
+            UPDATE alert_rules
+               SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
+             WHERE id IN (
+                 SELECT id FROM alert_rules
+                  WHERE enabled = true
+                    AND next_evaluation_at <= now()
+                    AND (claimed_until IS NULL OR claimed_until < now())
+                  ORDER BY next_evaluation_at
+                  LIMIT ?
+                  FOR UPDATE SKIP LOCKED
+             )
+             RETURNING *
+            """;
+        List<AlertRule> rules = jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize);
+        return withTargets(rules);
+    }
+
+    /** Batch-loads targets for the given rules and returns new rule instances with targets populated. */
+    private List<AlertRule> withTargets(List<AlertRule> rules) {
+        if (rules.isEmpty()) return rules;
+        // Build IN clause
+        String inClause = rules.stream()
+            .map(r -> "'" + r.id() + "'")
+            .collect(java.util.stream.Collectors.joining(","));
+        String sql = "SELECT * FROM alert_rule_targets WHERE rule_id IN (" + inClause + ")";
+        Map<UUID, List<AlertRuleTarget>> byRuleId = new HashMap<>();
+        jdbc.query(sql, rs -> {
+            UUID ruleId = (UUID) rs.getObject("rule_id");
+            AlertRuleTarget t = new AlertRuleTarget(
+                (UUID) rs.getObject("id"),
+                ruleId,
+                TargetKind.valueOf(rs.getString("target_kind")),
+                rs.getString("target_id"));
+            byRuleId.computeIfAbsent(ruleId, k -> new ArrayList<>()).add(t);
+        });
+        return rules.stream()
+            .map(r -> new AlertRule(
+                r.id(), r.environmentId(), r.name(), r.description(),
+                r.severity(), r.enabled(), r.conditionKind(), r.condition(),
+                r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
+                r.notificationTitleTmpl(), r.notificationMessageTmpl(),
+                r.webhooks(), byRuleId.getOrDefault(r.id(), List.of()),
+                r.nextEvaluationAt(), r.claimedBy(), r.claimedUntil(), r.evalState(),
+                r.createdAt(), r.createdBy(), r.updatedAt(), r.updatedBy()))
+            .toList();
+    }
+
+    @Override
+    public void releaseClaim(UUID ruleId, Instant nextEvaluationAt, Map<String, Object> evalState) {
+        jdbc.update("""
+            UPDATE alert_rules
+               SET claimed_by = NULL, claimed_until = NULL,
+                   next_evaluation_at = ?, eval_state = ?::jsonb
+             WHERE id = ?
+            """,
+            Timestamp.from(nextEvaluationAt), writeJson(evalState), ruleId);
+    }
+
+    private RowMapper<AlertRule> rowMapper() {
+        return (rs, i) -> {
+            try {
+                ConditionKind kind = ConditionKind.valueOf(rs.getString("condition_kind"));
+                AlertCondition cond = om.readValue(rs.getString("condition"), AlertCondition.class);
+                List<WebhookBinding> webhooks = om.readValue(
+                    rs.getString("webhooks"), new TypeReference<>() {});
+                Map<String, Object> evalState = om.readValue(
+                    rs.getString("eval_state"), new TypeReference<>() {});
+
+                Timestamp cu = rs.getTimestamp("claimed_until");
+                return new AlertRule(
+                    (UUID) rs.getObject("id"),
+                    (UUID) rs.getObject("environment_id"),
+                    rs.getString("name"),
+                    rs.getString("description"),
+                    AlertSeverity.valueOf(rs.getString("severity")),
+                    rs.getBoolean("enabled"),
+                    kind, cond,
+                    rs.getInt("evaluation_interval_seconds"),
+                    rs.getInt("for_duration_seconds"),
+                    rs.getInt("re_notify_minutes"),
+                    rs.getString("notification_title_tmpl"),
+                    rs.getString("notification_message_tmpl"),
+                    webhooks, List.of(),
+                    rs.getTimestamp("next_evaluation_at").toInstant(),
+                    rs.getString("claimed_by"),
+                    cu == null ? null : cu.toInstant(),
+                    evalState,
+                    rs.getTimestamp("created_at").toInstant(),
+                    rs.getString("created_by"),
+                    rs.getTimestamp("updated_at").toInstant(),
+                    rs.getString("updated_by"));
+            } catch (Exception e) {
+                throw new IllegalStateException("Failed to map alert_rules row", e);
+            }
+        };
+    }
+
+    private String writeJson(Object o) {
+        try {
+            return om.writeValueAsString(o);
+        } catch (Exception e) {
+            throw new IllegalStateException("Failed to serialize to JSON", e);
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertSilenceRepository.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertSilenceRepository.java
@@ -0,0 +1,101 @@
+package com.cameleer.server.app.alerting.storage;
+
+import com.cameleer.server.core.alerting.AlertSilence;
+import com.cameleer.server.core.alerting.AlertSilenceRepository;
+import com.cameleer.server.core.alerting.AlertSeverity;
+import com.cameleer.server.core.alerting.SilenceMatcher;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.springframework.jdbc.core.RowMapper;
+
+import java.sql.Timestamp;
+import java.time.Instant;
+import java.util.List;
+import java.util.Optional;
+import java.util.UUID;
+
+public class PostgresAlertSilenceRepository implements AlertSilenceRepository {
+
+    private final JdbcTemplate jdbc;
+    private final ObjectMapper om;
+
+    public PostgresAlertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) {
+        this.jdbc = jdbc;
+        this.om = om;
+    }
+
+    @Override
+    public AlertSilence save(AlertSilence s) {
+        jdbc.update("""
+            INSERT INTO alert_silences (id, environment_id, matcher, reason, starts_at, ends_at, created_by, created_at)
+            VALUES (?, ?, ?::jsonb, ?, ?, ?, ?, ?)
+            ON CONFLICT (id) DO UPDATE SET
+                matcher    = EXCLUDED.matcher,
+                reason     = EXCLUDED.reason,
+                starts_at  = EXCLUDED.starts_at,
+                ends_at    = EXCLUDED.ends_at
+            """,
+            s.id(), s.environmentId(), writeJson(s.matcher()),
+            s.reason(),
+            Timestamp.from(s.startsAt()), Timestamp.from(s.endsAt()),
+            s.createdBy(), Timestamp.from(s.createdAt()));
+        return s;
+    }
+
+    @Override
+    public Optional<AlertSilence> findById(UUID id) {
+        var list = jdbc.query("SELECT * FROM alert_silences WHERE id = ?", rowMapper(), id);
+        return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
+    }
+
+    @Override
+    public List<AlertSilence> listActive(UUID environmentId, Instant when) {
+        Timestamp t = Timestamp.from(when);
+        return jdbc.query("""
+            SELECT * FROM alert_silences
+             WHERE environment_id = ?
+               AND starts_at <= ? AND ends_at >= ?
+             ORDER BY starts_at
+            """, rowMapper(), environmentId, t, t);
+    }
+
+    @Override
+    public List<AlertSilence> listByEnvironment(UUID environmentId) {
+        return jdbc.query("""
+            SELECT * FROM alert_silences
+             WHERE environment_id = ?
+             ORDER BY starts_at DESC
+            """, rowMapper(), environmentId);
+    }
+
+    @Override
+    public void delete(UUID id) {
+        jdbc.update("DELETE FROM alert_silences WHERE id = ?", id);
+    }
+
+    // -------------------------------------------------------------------------
+
+    private RowMapper<AlertSilence> rowMapper() {
+        return (rs, i) -> {
+            try {
+                SilenceMatcher matcher = om.readValue(rs.getString("matcher"), SilenceMatcher.class);
+                return new AlertSilence(
+                    (UUID) rs.getObject("id"),
+                    (UUID) rs.getObject("environment_id"),
+                    matcher,
+                    rs.getString("reason"),
+                    rs.getTimestamp("starts_at").toInstant(),
+                    rs.getTimestamp("ends_at").toInstant(),
+                    rs.getString("created_by"),
+                    rs.getTimestamp("created_at").toInstant());
+            } catch (Exception e) {
+                throw new IllegalStateException("Failed to map alert_silences row", e);
+            }
+        };
+    }
+
+    private String writeJson(Object o) {
+        try { return om.writeValueAsString(o); }
+        catch (Exception e) { throw new IllegalStateException("Failed to serialize JSON", e); }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseSchemaInitializer.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseSchemaInitializer.java
@@ -26,9 +26,14 @@ public class ClickHouseSchemaInitializer {

    @EventListener(ApplicationReadyEvent.class)
    public void initializeSchema() {
+        runScript("clickhouse/init.sql");
+        runScript("clickhouse/alerting_projections.sql");
+    }
+
+    private void runScript(String classpathResource) {
        try {
            PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
-            Resource script = resolver.getResource("classpath:clickhouse/init.sql");
+            Resource script = resolver.getResource("classpath:" + classpathResource);

            String sql = script.getContentAsString(StandardCharsets.UTF_8);
            log.info("Executing ClickHouse schema: {}", script.getFilename());
@@ -41,13 +46,28 @@ public class ClickHouseSchemaInitializer {
                        .filter(line -> !line.isEmpty())
                        .reduce("", (a, b) -> a + b);
                if (!withoutComments.isEmpty()) {
-                    clickHouseJdbc.execute(trimmed);
+                    String upper = withoutComments.toUpperCase();
+                    boolean isBestEffort = upper.contains("MATERIALIZE PROJECTION")
+                            || upper.contains("ADD PROJECTION");
+                    try {
+                        clickHouseJdbc.execute(trimmed);
+                    } catch (Exception e) {
+                        if (isBestEffort) {
+                            // ADD PROJECTION on ReplacingMergeTree requires a session setting not available
+                            // via JDBC pool; MATERIALIZE can fail on empty tables — both are non-fatal.
+                            log.warn("Projection DDL step skipped (non-fatal): {} — {}",
+                                    trimmed.substring(0, Math.min(trimmed.length(), 120)), e.getMessage());
+                        } else {
+                            throw e;
+                        }
+                    }
                }
            }

-            log.info("ClickHouse schema initialization complete");
+            log.info("ClickHouse schema script complete: {}", script.getFilename());
        } catch (Exception e) {
-            log.error("ClickHouse schema initialization failed — server will continue but ClickHouse features may not work", e);
+            log.error("ClickHouse schema script failed [{}] — server will continue but ClickHouse features may not work",
+                    classpathResource, e);
        }
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/config/RuntimeBeanConfig.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/config/RuntimeBeanConfig.java
@@ -9,6 +9,7 @@ import com.cameleer.server.core.runtime.AppService;
 import com.cameleer.server.core.runtime.AppVersionRepository;
 import com.cameleer.server.core.runtime.DeploymentRepository;
 import com.cameleer.server.core.runtime.DeploymentService;
+import com.cameleer.server.core.runtime.DirtyStateCalculator;
 import com.cameleer.server.core.runtime.EnvironmentRepository;
 import com.cameleer.server.core.runtime.EnvironmentService;
 import com.fasterxml.jackson.databind.ObjectMapper;
@@ -64,6 +65,11 @@ public class RuntimeBeanConfig {
        return new DeploymentService(deployRepo, appService, envService);
    }

+    @Bean
+    public DirtyStateCalculator dirtyStateCalculator(ObjectMapper objectMapper) {
+        return new DirtyStateCalculator(objectMapper);
+    }
+
    @Bean(name = "deploymentTaskExecutor")
    public Executor deploymentTaskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/config/StorageBeanConfig.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/config/StorageBeanConfig.java
@@ -5,8 +5,12 @@ import com.cameleer.server.app.search.ClickHouseLogStore;
 import com.cameleer.server.app.storage.ClickHouseAgentEventRepository;
 import com.cameleer.server.app.storage.ClickHouseUsageTracker;
 import com.cameleer.server.app.storage.ClickHouseDiagramStore;
+import com.cameleer.server.app.storage.ClickHouseRouteCatalogStore;
+import com.cameleer.server.core.storage.RouteCatalogStore;
 import com.cameleer.server.app.storage.ClickHouseMetricsQueryStore;
 import com.cameleer.server.app.storage.ClickHouseMetricsStore;
+import com.cameleer.server.app.storage.ClickHouseServerMetricsQueryStore;
+import com.cameleer.server.app.storage.ClickHouseServerMetricsStore;
 import com.cameleer.server.app.storage.ClickHouseStatsStore;
 import com.cameleer.server.core.admin.AuditRepository;
 import com.cameleer.server.core.admin.AuditService;
@@ -14,7 +18,6 @@ import com.cameleer.server.core.agent.AgentEventRepository;
 import com.cameleer.server.core.agent.AgentInfo;
 import com.cameleer.server.core.agent.AgentRegistryService;
 import com.cameleer.server.core.detail.DetailService;
-import com.cameleer.server.core.indexing.SearchIndexer;
 import com.cameleer.server.app.ingestion.ExecutionFlushScheduler;
 import com.cameleer.server.app.search.ClickHouseSearchIndex;
 import com.cameleer.server.app.storage.ClickHouseExecutionStore;
@@ -41,26 +44,15 @@ public class StorageBeanConfig {
        return new DetailService(executionStore);
    }

-    @Bean(destroyMethod = "shutdown")
-    public SearchIndexer searchIndexer(ExecutionStore executionStore, SearchIndex searchIndex,
-                                        @Value("${cameleer.server.indexer.debouncems:2000}") long debounceMs,
-                                        @Value("${cameleer.server.indexer.queuesize:10000}") int queueSize) {
-        return new SearchIndexer(executionStore, searchIndex, debounceMs, queueSize);
-    }
-
    @Bean
    public AuditService auditService(AuditRepository auditRepository) {
        return new AuditService(auditRepository);
    }

    @Bean
-    public IngestionService ingestionService(ExecutionStore executionStore,
-                                              DiagramStore diagramStore,
-                                              WriteBuffer<MetricsSnapshot> metricsBuffer,
-                                              SearchIndexer searchIndexer,
-                                              @Value("${cameleer.server.ingestion.bodysizelimit:16384}") int bodySizeLimit) {
-        return new IngestionService(executionStore, diagramStore, metricsBuffer,
-                searchIndexer::onExecutionUpdated, bodySizeLimit);
+    public IngestionService ingestionService(DiagramStore diagramStore,
+                                              WriteBuffer<MetricsSnapshot> metricsBuffer) {
+        return new IngestionService(diagramStore, metricsBuffer);
    }

    @Bean
@@ -77,6 +69,19 @@ public class StorageBeanConfig {
        return new ClickHouseMetricsQueryStore(tenantProperties.getId(), clickHouseJdbc);
    }

+    @Bean
+    public ServerMetricsStore clickHouseServerMetricsStore(
+            @Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
+        return new ClickHouseServerMetricsStore(clickHouseJdbc);
+    }
+
+    @Bean
+    public ServerMetricsQueryStore clickHouseServerMetricsQueryStore(
+            TenantProperties tenantProperties,
+            @Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
+        return new ClickHouseServerMetricsQueryStore(tenantProperties.getId(), clickHouseJdbc);
+    }
+
    // ── Execution Store ──────────────────────────────────────────────────

    @Bean
@@ -121,9 +126,14 @@ public class StorageBeanConfig {
    }

    @Bean
-    public SearchIndex clickHouseSearchIndex(
+    public ClickHouseSearchIndex clickHouseSearchIndex(
            TenantProperties tenantProperties,
            @Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
+        // Return type is the concrete class so Spring exposes the bean under both
+        // SearchIndex (for SearchIndexer) AND ClickHouseSearchIndex (for ExchangeMatchEvaluator,
+        // which calls countExecutionsForAlerting — a method that exists only on the concrete type).
+        // Declaring the return as the interface hides the concrete methods from autowire
+        // matching and crashloops the app on startup.
        return new ClickHouseSearchIndex(tenantProperties.getId(), clickHouseJdbc);
    }

@@ -145,6 +155,15 @@ public class StorageBeanConfig {
        return new ClickHouseDiagramStore(tenantProperties.getId(), clickHouseJdbc);
    }

+    // ── ClickHouse Route Catalog Store ───────────────────────────────
+
+    @Bean
+    public RouteCatalogStore clickHouseRouteCatalogStore(
+            TenantProperties tenantProperties,
+            @Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
+        return new ClickHouseRouteCatalogStore(tenantProperties.getId(), clickHouseJdbc);
+    }
+
    // ── ClickHouse Agent Event Repository ─────────────────────────────

    @Bean
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/config/WebConfig.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/config/WebConfig.java
@@ -3,10 +3,14 @@ package com.cameleer.server.app.config;
 import com.cameleer.server.app.analytics.UsageTrackingInterceptor;
 import com.cameleer.server.app.interceptor.AuditInterceptor;
 import com.cameleer.server.app.interceptor.ProtocolVersionInterceptor;
+import com.cameleer.server.app.web.EnvironmentPathResolver;
 import org.springframework.context.annotation.Configuration;
+import org.springframework.web.method.support.HandlerMethodArgumentResolver;
 import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
 import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;

+import java.util.List;
+
 /**
 * Web MVC configuration.
 */
@@ -16,13 +20,21 @@ public class WebConfig implements WebMvcConfigurer {
    private final ProtocolVersionInterceptor protocolVersionInterceptor;
    private final AuditInterceptor auditInterceptor;
    private final UsageTrackingInterceptor usageTrackingInterceptor;
+    private final EnvironmentPathResolver environmentPathResolver;

    public WebConfig(ProtocolVersionInterceptor protocolVersionInterceptor,
                     AuditInterceptor auditInterceptor,
-                     @org.springframework.lang.Nullable UsageTrackingInterceptor usageTrackingInterceptor) {
+                     @org.springframework.lang.Nullable UsageTrackingInterceptor usageTrackingInterceptor,
+                     EnvironmentPathResolver environmentPathResolver) {
        this.protocolVersionInterceptor = protocolVersionInterceptor;
        this.auditInterceptor = auditInterceptor;
        this.usageTrackingInterceptor = usageTrackingInterceptor;
+        this.environmentPathResolver = environmentPathResolver;
+    }
+
+    @Override
+    public void addArgumentResolvers(List<HandlerMethodArgumentResolver> resolvers) {
+        resolvers.add(environmentPathResolver);
    }

    @Override
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentCommandController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentCommandController.java
@@ -223,7 +223,8 @@ public class AgentCommandController {
        if (body != null && body.status() != null) {
            AgentInfo agent = registryService.findById(id);
            String application = agent != null ? agent.applicationId() : "unknown";
-            agentEventService.recordEvent(id, application, "COMMAND_" + body.status(),
+            String environment = agent != null ? agent.environmentId() : null;
+            agentEventService.recordEvent(id, application, environment, "COMMAND_" + body.status(),
                    "Command " + commandId + ": " + body.message());
            log.debug("Command {} ack from agent {}: {} - {}", commandId, id, body.status(), body.message());
        }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentConfigController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentConfigController.java
@@ -0,0 +1,115 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.common.model.ApplicationConfig;
+import com.cameleer.server.app.dto.AppConfigResponse;
+import com.cameleer.server.app.security.JwtAuthenticationFilter;
+import com.cameleer.server.app.storage.PostgresApplicationConfigRepository;
+import com.cameleer.server.core.admin.SensitiveKeysConfig;
+import com.cameleer.server.core.admin.SensitiveKeysMerger;
+import com.cameleer.server.core.admin.SensitiveKeysRepository;
+import com.cameleer.server.core.agent.AgentInfo;
+import com.cameleer.server.core.agent.AgentRegistryService;
+import com.cameleer.server.core.security.JwtService.JwtValidationResult;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.swagger.v3.oas.annotations.Operation;
+import io.swagger.v3.oas.annotations.responses.ApiResponse;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import jakarta.servlet.http.HttpServletRequest;
+import org.springframework.http.HttpStatus;
+import org.springframework.http.ResponseEntity;
+import org.springframework.security.core.Authentication;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RestController;
+
+import java.util.List;
+
+/**
+ * Agent-authoritative application config read. Env and application are derived
+ * from the agent's JWT + registry entry — not from the URL or query params, so
+ * agents cannot spoof env.
+ */
+@RestController
+@RequestMapping("/api/v1/agents")
+@Tag(name = "Agent Config", description = "Agent-authoritative config read (AGENT only)")
+public class AgentConfigController {
+
+    private final PostgresApplicationConfigRepository configRepository;
+    private final AgentRegistryService registryService;
+    private final SensitiveKeysRepository sensitiveKeysRepository;
+    private final ObjectMapper objectMapper;
+
+    public AgentConfigController(PostgresApplicationConfigRepository configRepository,
+                                 AgentRegistryService registryService,
+                                 SensitiveKeysRepository sensitiveKeysRepository,
+                                 ObjectMapper objectMapper) {
+        this.configRepository = configRepository;
+        this.registryService = registryService;
+        this.sensitiveKeysRepository = sensitiveKeysRepository;
+        this.objectMapper = objectMapper;
+    }
+
+    @GetMapping("/config")
+    @Operation(summary = "Get application config for the calling agent",
+            description = "Resolves (application, environment) from the agent's JWT + registry. "
+                    + "Prefers the registry entry (heartbeat-authoritative); falls back to the JWT env claim. "
+                    + "Returns 404 if neither identifies a valid agent.")
+    @ApiResponse(responseCode = "200", description = "Config returned")
+    @ApiResponse(responseCode = "404", description = "Calling agent could not be resolved")
+    public ResponseEntity<AppConfigResponse> getConfigForAgent(Authentication auth,
+                                                                HttpServletRequest request) {
+        String instanceId = auth != null ? auth.getName() : null;
+        if (instanceId == null || instanceId.isBlank()) {
+            return ResponseEntity.status(HttpStatus.UNAUTHORIZED).build();
+        }
+
+        AgentInfo agent = registryService.findById(instanceId);
+        String application;
+        String environment;
+        if (agent != null) {
+            application = agent.applicationId();
+            environment = agent.environmentId();
+        } else {
+            // Registry miss — fall back to JWT env claim; application can't be
+            // derived from JWT alone, so without a registry entry we 404.
+            environment = environmentFromJwt(request);
+            application = null;
+        }
+
+        if (application == null || application.isBlank() || environment == null || environment.isBlank()) {
+            return ResponseEntity.notFound().build();
+        }
+
+        ApplicationConfig config = configRepository.findByApplicationAndEnvironment(application, environment)
+                .orElse(ApplicationConfigController.defaultConfig(application, environment));
+
+        List<String> globalKeys = sensitiveKeysRepository.find()
+                .map(SensitiveKeysConfig::keys)
+                .orElse(null);
+        List<String> merged = SensitiveKeysMerger.merge(globalKeys, extractSensitiveKeys(config));
+
+        return ResponseEntity.ok(new AppConfigResponse(config, globalKeys, merged));
+    }
+
+    private static String environmentFromJwt(HttpServletRequest request) {
+        Object attr = request.getAttribute(JwtAuthenticationFilter.JWT_RESULT_ATTR);
+        if (attr instanceof JwtValidationResult result) {
+            return result.environment();
+        }
+        return null;
+    }
+
+    private List<String> extractSensitiveKeys(ApplicationConfig config) {
+        try {
+            JsonNode node = objectMapper.valueToTree(config);
+            JsonNode keysNode = node.get("sensitiveKeys");
+            if (keysNode == null || keysNode.isNull() || !keysNode.isArray()) {
+                return null;
+            }
+            return objectMapper.convertValue(keysNode, new com.fasterxml.jackson.core.type.TypeReference<List<String>>() {});
+        } catch (Exception e) {
+            return null;
+        }
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentEventsController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentEventsController.java
@@ -1,7 +1,11 @@
 package com.cameleer.server.app.controller;

+import com.cameleer.server.app.dto.AgentEventPageResponse;
 import com.cameleer.server.app.dto.AgentEventResponse;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.agent.AgentEventPage;
 import com.cameleer.server.core.agent.AgentEventService;
+import com.cameleer.server.core.runtime.Environment;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
@@ -12,11 +16,10 @@ import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;

 import java.time.Instant;
-import java.util.List;

@RestController
-@RequestMapping("/api/v1/agents/events-log")
-@Tag(name = "Agent Events", description = "Agent lifecycle event log")
+@RequestMapping("/api/v1/environments/{envSlug}/agents/events")
+@Tag(name = "Agent Events", description = "Agent lifecycle event log (env-scoped)")
 public class AgentEventsController {

    private final AgentEventService agentEventService;
@@ -26,25 +29,26 @@ public class AgentEventsController {
    }

    @GetMapping
-    @Operation(summary = "Query agent events",
-            description = "Returns agent lifecycle events, optionally filtered by app and/or agent ID")
-    @ApiResponse(responseCode = "200", description = "Events returned")
-    public ResponseEntity<List<AgentEventResponse>> getEvents(
+    @Operation(summary = "Query agent events in this environment",
+            description = "Cursor-paginated. Returns newest first. Pass nextCursor back as ?cursor= for the next page.")
+    @ApiResponse(responseCode = "200", description = "Event page returned")
+    public ResponseEntity<AgentEventPageResponse> getEvents(
+            @EnvPath Environment env,
            @RequestParam(required = false) String appId,
            @RequestParam(required = false) String agentId,
-            @RequestParam(required = false) String environment,
            @RequestParam(required = false) String from,
            @RequestParam(required = false) String to,
+            @RequestParam(required = false) String cursor,
            @RequestParam(defaultValue = "50") int limit) {

        Instant fromInstant = from != null ? Instant.parse(from) : null;
        Instant toInstant = to != null ? Instant.parse(to) : null;

-        var events = agentEventService.queryEvents(appId, agentId, environment, fromInstant, toInstant, limit)
-                .stream()
-                .map(AgentEventResponse::from)
-                .toList();
+        AgentEventPage page = agentEventService.queryEventPage(
+                appId, agentId, env.slug(), fromInstant, toInstant, cursor, limit);

-        return ResponseEntity.ok(events);
+        var data = page.data().stream().map(AgentEventResponse::from).toList();
+
+        return ResponseEntity.ok(new AgentEventPageResponse(data, page.nextCursor(), page.hasMore()));
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentListController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentListController.java
@@ -0,0 +1,163 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.server.app.dto.AgentInstanceResponse;
+import com.cameleer.server.app.dto.ErrorResponse;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.agent.AgentInfo;
+import com.cameleer.server.core.agent.AgentRegistryService;
+import com.cameleer.server.core.agent.AgentState;
+import com.cameleer.server.core.runtime.Environment;
+import io.swagger.v3.oas.annotations.Operation;
+import io.swagger.v3.oas.annotations.media.Content;
+import io.swagger.v3.oas.annotations.media.Schema;
+import io.swagger.v3.oas.annotations.responses.ApiResponse;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.http.ResponseEntity;
+import org.springframework.jdbc.core.JdbcTemplate;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RequestParam;
+import org.springframework.web.bind.annotation.RestController;
+
+import java.time.Instant;
+import java.time.temporal.ChronoUnit;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Read-only user-facing list of agents in an environment. Agent self-service
+ * endpoints (register/heartbeat/refresh/deregister/events/commands) remain
+ * flat at /api/v1/agents/... — those are JWT-authoritative and env is
+ * derived from the token.
+ */
+@RestController
+@RequestMapping("/api/v1/environments/{envSlug}/agents")
+@Tag(name = "Agent List", description = "List registered agents in an environment")
+public class AgentListController {
+
+    private static final Logger log = LoggerFactory.getLogger(AgentListController.class);
+
+    private final AgentRegistryService registryService;
+    private final JdbcTemplate jdbc;
+
+    public AgentListController(AgentRegistryService registryService,
+                                @org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc) {
+        this.registryService = registryService;
+        this.jdbc = jdbc;
+    }
+
+    @GetMapping
+    @Operation(summary = "List all agents in this environment",
+            description = "Returns registered agents with runtime metrics, optionally filtered by status and/or application")
+    @ApiResponse(responseCode = "200", description = "Agent list returned")
+    @ApiResponse(responseCode = "400", description = "Invalid status filter",
+            content = @Content(schema = @Schema(implementation = ErrorResponse.class)))
+    public ResponseEntity<List<AgentInstanceResponse>> listAgents(
+            @EnvPath Environment env,
+            @RequestParam(required = false) String status,
+            @RequestParam(required = false) String application) {
+        List<AgentInfo> agents;
+
+        if (status != null) {
+            try {
+                AgentState stateFilter = AgentState.valueOf(status.toUpperCase());
+                agents = registryService.findByState(stateFilter);
+            } catch (IllegalArgumentException e) {
+                return ResponseEntity.badRequest().build();
+            }
+        } else {
+            agents = registryService.findAll();
+        }
+
+        // Filter by env (from path — always applied)
+        agents = agents.stream()
+                .filter(a -> env.slug().equals(a.environmentId()))
+                .toList();
+
+        if (application != null && !application.isBlank()) {
+            agents = agents.stream()
+                    .filter(a -> application.equals(a.applicationId()))
+                    .toList();
+        }
+
+        Map<String, double[]> agentMetrics = queryAgentMetrics();
+        Map<String, Double> cpuByInstance = queryAgentCpuUsage();
+        final List<AgentInfo> finalAgents = agents;
+
+        List<AgentInstanceResponse> response = finalAgents.stream()
+                .map(a -> {
+                    AgentInstanceResponse dto = AgentInstanceResponse.from(a);
+                    double[] m = agentMetrics.get(a.applicationId());
+                    if (m != null) {
+                        long appAgentCount = finalAgents.stream()
+                                .filter(ag -> ag.applicationId().equals(a.applicationId())).count();
+                        double agentTps = appAgentCount > 0 ? m[0] / appAgentCount : 0;
+                        double errorRate = m[1];
+                        int activeRoutes = (int) m[2];
+                        dto = dto.withMetrics(agentTps, errorRate, activeRoutes);
+                    }
+                    Double cpu = cpuByInstance.get(a.instanceId());
+                    if (cpu != null) {
+                        dto = dto.withCpuUsage(cpu);
+                    }
+                    return dto;
+                })
+                .toList();
+        return ResponseEntity.ok(response);
+    }
+
+    private Map<String, double[]> queryAgentMetrics() {
+        Map<String, double[]> result = new HashMap<>();
+        Instant now = Instant.now();
+        Instant from1m = now.minus(1, ChronoUnit.MINUTES);
+        try {
+            jdbc.query(
+                    "SELECT application_id, " +
+                            "uniqMerge(total_count) AS total, " +
+                            "uniqIfMerge(failed_count) AS failed, " +
+                            "COUNT(DISTINCT route_id) AS active_routes " +
+                            "FROM stats_1m_route WHERE bucket >= " + lit(from1m) + " AND bucket < " + lit(now) +
+                            " GROUP BY application_id",
+                    rs -> {
+                        long total = rs.getLong("total");
+                        long failed = rs.getLong("failed");
+                        double tps = total / 60.0;
+                        double errorRate = total > 0 ? (double) failed / total : 0.0;
+                        int activeRoutes = rs.getInt("active_routes");
+                        result.put(rs.getString("application_id"), new double[]{tps, errorRate, activeRoutes});
+                    });
+        } catch (Exception e) {
+            log.debug("Could not query agent metrics: {}", e.getMessage());
+        }
+        return result;
+    }
+
+    private Map<String, Double> queryAgentCpuUsage() {
+        Map<String, Double> result = new HashMap<>();
+        Instant now = Instant.now();
+        Instant from2m = now.minus(2, ChronoUnit.MINUTES);
+        try {
+            jdbc.query(
+                    "SELECT instance_id, avg(metric_value) AS cpu_avg " +
+                            "FROM agent_metrics " +
+                            "WHERE metric_name = 'process.cpu.usage.value'" +
+                            " AND collected_at >= " + lit(from2m) + " AND collected_at < " + lit(now) +
+                            " GROUP BY instance_id",
+                    rs -> {
+                        result.put(rs.getString("instance_id"), rs.getDouble("cpu_avg"));
+                    });
+        } catch (Exception e) {
+            log.debug("Could not query agent CPU usage: {}", e.getMessage());
+        }
+        return result;
+    }
+
+    private static String lit(Instant instant) {
+        return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
+                .withZone(java.time.ZoneOffset.UTC)
+                .format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentMetricsController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentMetricsController.java
@@ -2,8 +2,13 @@ package com.cameleer.server.app.controller;

 import com.cameleer.server.app.dto.AgentMetricsResponse;
 import com.cameleer.server.app.dto.MetricBucket;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.agent.AgentInfo;
+import com.cameleer.server.core.agent.AgentRegistryService;
+import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.storage.MetricsQueryStore;
 import com.cameleer.server.core.storage.model.MetricTimeSeries;
+import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.*;

 import java.time.Instant;
@@ -12,17 +17,21 @@ import java.util.*;
 import java.util.stream.Collectors;

@RestController
-@RequestMapping("/api/v1/agents/{agentId}/metrics")
+@RequestMapping("/api/v1/environments/{envSlug}/agents/{agentId}/metrics")
 public class AgentMetricsController {

    private final MetricsQueryStore metricsQueryStore;
+    private final AgentRegistryService registryService;

-    public AgentMetricsController(MetricsQueryStore metricsQueryStore) {
+    public AgentMetricsController(MetricsQueryStore metricsQueryStore,
+                                   AgentRegistryService registryService) {
        this.metricsQueryStore = metricsQueryStore;
+        this.registryService = registryService;
    }

    @GetMapping
-    public AgentMetricsResponse getMetrics(
+    public ResponseEntity<AgentMetricsResponse> getMetrics(
+            @EnvPath Environment env,
            @PathVariable String agentId,
            @RequestParam String names,
            @RequestParam(required = false) Instant from,
@@ -30,6 +39,13 @@ public class AgentMetricsController {
            @RequestParam(defaultValue = "60") int buckets,
            @RequestParam(defaultValue = "gauge") String mode) {

+        // Defence in depth: if the agent is currently in the registry, reject
+        // requests that cross-env (path env doesn't match the agent's env).
+        AgentInfo agent = registryService.findById(agentId);
+        if (agent != null && !env.slug().equals(agent.environmentId())) {
+            return ResponseEntity.notFound().build();
+        }
+
        if (from == null) from = Instant.now().minus(1, ChronoUnit.HOURS);
        if (to == null) to = Instant.now();

@@ -48,6 +64,6 @@ public class AgentMetricsController {
                        (a, b) -> a,
                        LinkedHashMap::new));

-        return new AgentMetricsResponse(result);
+        return ResponseEntity.ok(new AgentMetricsResponse(result));
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentRegistrationController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentRegistrationController.java
@@ -19,6 +19,7 @@ import com.cameleer.server.core.agent.AgentRegistryService;
 import com.cameleer.server.core.agent.AgentState;
 import com.cameleer.server.core.agent.RouteStateRegistry;
 import com.cameleer.server.core.security.Ed25519SigningService;
+import com.cameleer.server.core.storage.RouteCatalogStore;
 import com.cameleer.server.core.security.InvalidTokenException;
 import com.cameleer.server.core.security.JwtService;
 import io.swagger.v3.oas.annotations.Operation;
@@ -68,6 +69,7 @@ public class AgentRegistrationController {
    private final AuditService auditService;
    private final JdbcTemplate jdbc;
    private final RouteStateRegistry routeStateRegistry;
+    private final RouteCatalogStore routeCatalogStore;

    public AgentRegistrationController(AgentRegistryService registryService,
                                        AgentRegistryConfig config,
@@ -77,7 +79,8 @@ public class AgentRegistrationController {
                                        AgentEventService agentEventService,
                                        AuditService auditService,
                                        @org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc,
-                                        RouteStateRegistry routeStateRegistry) {
+                                        RouteStateRegistry routeStateRegistry,
+                                        RouteCatalogStore routeCatalogStore) {
        this.registryService = registryService;
        this.config = config;
        this.bootstrapTokenValidator = bootstrapTokenValidator;
@@ -87,6 +90,7 @@ public class AgentRegistrationController {
        this.auditService = auditService;
        this.jdbc = jdbc;
        this.routeStateRegistry = routeStateRegistry;
+        this.routeCatalogStore = routeCatalogStore;
    }

    @PostMapping("/register")
@@ -113,9 +117,15 @@ public class AgentRegistrationController {
        if (request.instanceId() == null || request.instanceId().isBlank()) {
            return ResponseEntity.badRequest().build();
        }
+        if (request.environmentId() == null || request.environmentId().isBlank()) {
+            String remote = httpRequest.getRemoteAddr();
+            log.warn("Agent registration rejected (no environmentId): instanceId={} remote={}",
+                    request.instanceId(), remote);
+            return ResponseEntity.badRequest().build();
+        }

        String application = request.applicationId() != null ? request.applicationId() : "default";
-        String environmentId = request.environmentId() != null ? request.environmentId() : "default";
+        String environmentId = request.environmentId();
        List<String> routeIds = request.routeIds() != null ? request.routeIds() : List.of();
        var capabilities = request.capabilities() != null ? request.capabilities() : Collections.<String, Object>emptyMap();

@@ -125,15 +135,20 @@ public class AgentRegistrationController {
                request.instanceId(), request.instanceId(), application, environmentId,
                request.version(), routeIds, capabilities);

+        // Persist routes in catalog for server-restart recovery
+        if (!routeIds.isEmpty()) {
+            routeCatalogStore.upsert(application, environmentId, routeIds);
+        }
+
        if (reRegistration) {
            log.info("Agent re-registered: {} (application={}, routes={}, capabilities={})",
                    request.instanceId(), application, routeIds.size(), capabilities.keySet());
-            agentEventService.recordEvent(request.instanceId(), application, "RE_REGISTERED",
+            agentEventService.recordEvent(request.instanceId(), application, environmentId, "RE_REGISTERED",
                    "Agent re-registered with " + routeIds.size() + " routes");
        } else {
            log.info("Agent registered: {} (application={}, routes={})",
                    request.instanceId(), application, routeIds.size());
-            agentEventService.recordEvent(request.instanceId(), application, "REGISTERED",
+            agentEventService.recordEvent(request.instanceId(), application, environmentId, "REGISTERED",
                    "Agent registered: " + request.instanceId());
        }

@@ -197,9 +212,15 @@ public class AgentRegistrationController {
        List<String> roles = result.roles().isEmpty()
                ? List.of("AGENT") : result.roles();
        String application = result.application() != null ? result.application() : "default";
+        String environment = result.environment();

-        // Try to get application + environment from registry (agent may not be registered after server restart)
-        String environment = result.environment() != null ? result.environment() : "default";
+        // Refresh-token env claim is required — agents without env shouldn't have gotten a token in the first place.
+        if (environment == null || environment.isBlank()) {
+            log.warn("Refresh token has no environment claim: agentId={}", agentId);
+            return ResponseEntity.status(401).build();
+        }
+
+        // Prefer registry if agent is still registered (covers env edits on re-registration)
        AgentInfo agent = registryService.findById(agentId);
        if (agent != null) {
            application = agent.applicationId();
@@ -233,9 +254,14 @@ public class AgentRegistrationController {
                    JwtAuthenticationFilter.JWT_RESULT_ATTR);
            if (jwtResult != null) {
                String application = jwtResult.application() != null ? jwtResult.application() : "default";
-                // Prefer environment from heartbeat body (most current), fall back to JWT claim
-                String env = heartbeatEnv != null ? heartbeatEnv
-                        : jwtResult.environment() != null ? jwtResult.environment() : "default";
+                // Env: prefer heartbeat body (current), else JWT claim. No silent default.
+                String env = (heartbeatEnv != null && !heartbeatEnv.isBlank())
+                        ? heartbeatEnv
+                        : jwtResult.environment();
+                if (env == null || env.isBlank()) {
+                    log.warn("Heartbeat auto-heal rejected (no environment on JWT or body): agentId={}", id);
+                    return ResponseEntity.status(400).build();
+                }
                Map<String, Object> caps = capabilities != null ? capabilities : Map.of();
                List<String> healRouteIds = routeIds != null ? routeIds : List.of();
                registryService.register(id, id, application, env, "unknown",
@@ -248,15 +274,20 @@ public class AgentRegistrationController {
            }
        }

-        if (request != null && request.getRouteStates() != null && !request.getRouteStates().isEmpty()) {
+        if (routeIds != null && !routeIds.isEmpty()) {
            AgentInfo agent = registryService.findById(id);
            if (agent != null) {
-                for (var entry : request.getRouteStates().entrySet()) {
-                    RouteStateRegistry.RouteState state = parseRouteState(entry.getValue());
-                    if (state != null) {
-                        routeStateRegistry.setState(agent.applicationId(), entry.getKey(), state);
+                // Update route states from heartbeat
+                if (request != null && request.getRouteStates() != null) {
+                    for (var entry : request.getRouteStates().entrySet()) {
+                        RouteStateRegistry.RouteState state = parseRouteState(entry.getValue());
+                        if (state != null) {
+                            routeStateRegistry.setState(agent.applicationId(), entry.getKey(), state);
+                        }
                    }
                }
+                // Persist routes in catalog for server-restart recovery
+                routeCatalogStore.upsert(agent.applicationId(), agent.environmentId(), routeIds);
            }
        }

@@ -284,103 +315,14 @@ public class AgentRegistrationController {
            return ResponseEntity.notFound().build();
        }
        String applicationId = agent.applicationId();
+        String environment = agent.environmentId();
        registryService.deregister(id);
-        agentEventService.recordEvent(id, applicationId, "DEREGISTERED", "Agent deregistered");
+        agentEventService.recordEvent(id, applicationId, environment, "DEREGISTERED", "Agent deregistered");
        auditService.log(id, "agent_deregister", AuditCategory.AGENT, id, null, AuditResult.SUCCESS, httpRequest);
        return ResponseEntity.ok().build();
    }

-    @GetMapping
-    @Operation(summary = "List all agents",
-            description = "Returns all registered agents with runtime metrics, optionally filtered by status and/or application")
-    @ApiResponse(responseCode = "200", description = "Agent list returned")
-    @ApiResponse(responseCode = "400", description = "Invalid status filter",
-            content = @Content(schema = @Schema(implementation = ErrorResponse.class)))
-    public ResponseEntity<List<AgentInstanceResponse>> listAgents(
-            @RequestParam(required = false) String status,
-            @RequestParam(required = false) String application,
-            @RequestParam(required = false) String environment) {
-        List<AgentInfo> agents;
-
-        if (status != null) {
-            try {
-                AgentState stateFilter = AgentState.valueOf(status.toUpperCase());
-                agents = registryService.findByState(stateFilter);
-            } catch (IllegalArgumentException e) {
-                return ResponseEntity.badRequest().build();
-            }
-        } else {
-            agents = registryService.findAll();
-        }
-
-        // Apply application filter if specified
-        if (application != null && !application.isBlank()) {
-            agents = agents.stream()
-                    .filter(a -> application.equals(a.applicationId()))
-                    .toList();
-        }
-
-        // Apply environment filter if specified
-        if (environment != null && !environment.isBlank()) {
-            agents = agents.stream()
-                    .filter(a -> environment.equals(a.environmentId()))
-                    .toList();
-        }
-
-        // Enrich with runtime metrics from continuous aggregates
-        Map<String, double[]> agentMetrics = queryAgentMetrics();
-        final List<AgentInfo> finalAgents = agents;
-
-        List<AgentInstanceResponse> response = finalAgents.stream()
-                .map(a -> {
-                    AgentInstanceResponse dto = AgentInstanceResponse.from(a);
-                    double[] m = agentMetrics.get(a.applicationId());
-                    if (m != null) {
-                        long appAgentCount = finalAgents.stream()
-                                .filter(ag -> ag.applicationId().equals(a.applicationId())).count();
-                        double agentTps = appAgentCount > 0 ? m[0] / appAgentCount : 0;
-                        double errorRate = m[1];
-                        int activeRoutes = (int) m[2];
-                        return dto.withMetrics(agentTps, errorRate, activeRoutes);
-                    }
-                    return dto;
-                })
-                .toList();
-        return ResponseEntity.ok(response);
-    }
-
-    private Map<String, double[]> queryAgentMetrics() {
-        Map<String, double[]> result = new HashMap<>();
-        Instant now = Instant.now();
-        Instant from1m = now.minus(1, ChronoUnit.MINUTES);
-        try {
-            // Literal SQL — ClickHouse JDBC driver wraps prepared statements in sub-queries
-            // that strip AggregateFunction column types, breaking -Merge combinators
-            jdbc.query(
-                    "SELECT application_id, " +
-                            "uniqMerge(total_count) AS total, " +
-                            "uniqIfMerge(failed_count) AS failed, " +
-                            "COUNT(DISTINCT route_id) AS active_routes " +
-                            "FROM stats_1m_route WHERE bucket >= " + lit(from1m) + " AND bucket < " + lit(now) +
-                            " GROUP BY application_id",
-                    rs -> {
-                        long total = rs.getLong("total");
-                        long failed = rs.getLong("failed");
-                        double tps = total / 60.0;
-                        double errorRate = total > 0 ? (double) failed / total : 0.0;
-                        int activeRoutes = rs.getInt("active_routes");
-                        result.put(rs.getString("application_id"), new double[]{tps, errorRate, activeRoutes});
-                    });
-        } catch (Exception e) {
-            log.debug("Could not query agent metrics: {}", e.getMessage());
-        }
-        return result;
-    }
-
-    /** Format an Instant as a ClickHouse DateTime literal. */
-    private static String lit(Instant instant) {
-        return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
-                .withZone(java.time.ZoneOffset.UTC)
-                .format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
-    }
+    // Agent list moved to AgentListController at /api/v1/environments/{envSlug}/agents.
+    // Agent register/refresh/heartbeat/deregister remain here at /api/v1/agents/** —
+    // these are JWT-authoritative and intentionally flat (env from token).
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentSseController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentSseController.java
@@ -62,10 +62,13 @@ public class AgentSseController {

        AgentInfo agent = registryService.findById(id);
        if (agent == null) {
-            // Auto-heal: re-register agent from JWT claims after server restart
+            // Auto-heal re-registers an agent from JWT claims after a server
+            // restart, but only when the JWT subject matches the path id.
+            // Otherwise a holder of any valid agent JWT could spoof an
+            // arbitrary agentId in the URL.
            var jwtResult = (JwtService.JwtValidationResult) httpRequest.getAttribute(
                    JwtAuthenticationFilter.JWT_RESULT_ATTR);
-            if (jwtResult != null) {
+            if (jwtResult != null && id.equals(jwtResult.subject())) {
                String application = jwtResult.application() != null ? jwtResult.application() : "default";
                String env = jwtResult.environment() != null ? jwtResult.environment() : "default";
                registryService.register(id, id, application, env, "unknown", List.of(), Map.of());
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ApiExceptionHandler.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ApiExceptionHandler.java
@@ -6,6 +6,8 @@ import org.springframework.web.bind.annotation.ExceptionHandler;
 import org.springframework.web.bind.annotation.RestControllerAdvice;
 import org.springframework.web.server.ResponseStatusException;

+import java.time.format.DateTimeParseException;
+
 /**
 * Global exception handler that ensures error responses use the typed {@link ErrorResponse} schema.
 */
@@ -18,4 +20,11 @@ public class ApiExceptionHandler {
        return ResponseEntity.status(ex.getStatusCode())
                .body(new ErrorResponse(reason != null ? reason : "Unknown error"));
    }
+
+    @ExceptionHandler({DateTimeParseException.class, IllegalArgumentException.class})
+    public ResponseEntity<ErrorResponse> handleBadRequest(Exception ex) {
+        String msg = ex.getMessage();
+        return ResponseEntity.badRequest()
+                .body(new ErrorResponse(msg != null ? msg : "Bad request"));
+    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AppController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AppController.java
@@ -1,12 +1,24 @@
 package com.cameleer.server.app.controller;

+import com.cameleer.common.model.ApplicationConfig;
+import com.cameleer.server.app.dto.DirtyStateResponse;
+import com.cameleer.server.app.storage.PostgresApplicationConfigRepository;
+import com.cameleer.server.app.storage.PostgresDeploymentRepository;
+import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.runtime.App;
 import com.cameleer.server.core.runtime.AppService;
 import com.cameleer.server.core.runtime.AppVersion;
+import com.cameleer.server.core.runtime.AppVersionRepository;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentConfigSnapshot;
+import com.cameleer.server.core.runtime.DirtyStateCalculator;
+import com.cameleer.server.core.runtime.DirtyStateResult;
+import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.runtime.RuntimeType;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
+import org.springframework.http.HttpStatus;
 import org.springframework.http.MediaType;
 import org.springframework.http.ResponseEntity;
 import org.springframework.security.access.prepost.PreAuthorize;
@@ -20,70 +32,83 @@ import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;
 import org.springframework.web.multipart.MultipartFile;
+import org.springframework.web.server.ResponseStatusException;

 import java.io.IOException;
+import java.util.Comparator;
 import java.util.List;
 import java.util.Map;
 import java.util.UUID;

 /**
- * App CRUD and JAR upload endpoints.
- * All app-scoped endpoints accept the app slug (not UUID) as path variable.
- * Protected by {@code ROLE_OPERATOR} or {@code ROLE_ADMIN}.
+ * App CRUD and JAR upload. All routes env-scoped: the (env, appSlug) pair
+ * identifies a single app — the same app slug can legitimately exist in
+ * multiple environments with independent configuration and history.
 */
@RestController
-@RequestMapping("/api/v1/apps")
-@Tag(name = "App Management", description = "Application lifecycle and JAR uploads")
+@RequestMapping("/api/v1/environments/{envSlug}/apps")
+@Tag(name = "App Management", description = "Application lifecycle and JAR uploads (env-scoped)")
@PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')")
 public class AppController {

    private final AppService appService;
+    private final AppVersionRepository appVersionRepository;
+    private final PostgresApplicationConfigRepository configRepository;
+    private final PostgresDeploymentRepository deploymentRepository;
+    private final DirtyStateCalculator dirtyCalc;

-    public AppController(AppService appService) {
+    public AppController(AppService appService,
+                         AppVersionRepository appVersionRepository,
+                         PostgresApplicationConfigRepository configRepository,
+                         PostgresDeploymentRepository deploymentRepository,
+                         DirtyStateCalculator dirtyCalc) {
        this.appService = appService;
+        this.appVersionRepository = appVersionRepository;
+        this.configRepository = configRepository;
+        this.deploymentRepository = deploymentRepository;
+        this.dirtyCalc = dirtyCalc;
    }

    @GetMapping
-    @Operation(summary = "List apps by environment")
+    @Operation(summary = "List apps in this environment")
    @ApiResponse(responseCode = "200", description = "App list returned")
-    public ResponseEntity<List<App>> listApps(@RequestParam(required = false) UUID environmentId) {
-        if (environmentId != null) {
-            return ResponseEntity.ok(appService.listByEnvironment(environmentId));
-        }
-        return ResponseEntity.ok(appService.listAll());
+    public ResponseEntity<List<App>> listApps(@EnvPath Environment env) {
+        return ResponseEntity.ok(appService.listByEnvironment(env.id()));
    }

    @GetMapping("/{appSlug}")
-    @Operation(summary = "Get app by slug")
+    @Operation(summary = "Get app by env + slug")
    @ApiResponse(responseCode = "200", description = "App found")
-    @ApiResponse(responseCode = "404", description = "App not found")
-    public ResponseEntity<App> getApp(@PathVariable String appSlug) {
+    @ApiResponse(responseCode = "404", description = "App not found in this environment")
+    public ResponseEntity<App> getApp(@EnvPath Environment env, @PathVariable String appSlug) {
        try {
-            return ResponseEntity.ok(appService.getBySlug(appSlug));
+            return ResponseEntity.ok(appService.getByEnvironmentAndSlug(env.id(), appSlug));
        } catch (IllegalArgumentException e) {
            return ResponseEntity.notFound().build();
        }
    }

    @PostMapping
-    @Operation(summary = "Create a new app")
+    @Operation(summary = "Create a new app in this environment",
+            description = "Slug must match ^[a-z0-9][a-z0-9-]{0,63}$ and be unique within the environment. "
+                    + "Slug is immutable after creation.")
    @ApiResponse(responseCode = "201", description = "App created")
-    @ApiResponse(responseCode = "400", description = "Slug already exists in environment")
-    public ResponseEntity<App> createApp(@RequestBody CreateAppRequest request) {
+    @ApiResponse(responseCode = "400", description = "Invalid slug, or slug already exists in this environment")
+    public ResponseEntity<?> createApp(@EnvPath Environment env, @RequestBody CreateAppRequest request) {
        try {
-            UUID id = appService.createApp(request.environmentId(), request.slug(), request.displayName());
+            UUID id = appService.createApp(env.id(), request.slug(), request.displayName());
            return ResponseEntity.status(201).body(appService.getById(id));
        } catch (IllegalArgumentException e) {
-            return ResponseEntity.badRequest().build();
+            return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
        }
    }

    @GetMapping("/{appSlug}/versions")
-    @Operation(summary = "List app versions")
+    @Operation(summary = "List versions for this app")
    @ApiResponse(responseCode = "200", description = "Version list returned")
-    public ResponseEntity<List<AppVersion>> listVersions(@PathVariable String appSlug) {
+    public ResponseEntity<List<AppVersion>> listVersions(@EnvPath Environment env, @PathVariable String appSlug) {
        try {
-            App app = appService.getBySlug(appSlug);
+            App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
            return ResponseEntity.ok(appService.listVersions(app.id()));
        } catch (IllegalArgumentException e) {
            return ResponseEntity.notFound().build();
@@ -91,13 +116,14 @@ public class AppController {
    }

    @PostMapping(value = "/{appSlug}/versions", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
-    @Operation(summary = "Upload a JAR for a new app version")
+    @Operation(summary = "Upload a JAR for a new version of this app")
    @ApiResponse(responseCode = "201", description = "JAR uploaded and version created")
-    @ApiResponse(responseCode = "404", description = "App not found")
-    public ResponseEntity<AppVersion> uploadJar(@PathVariable String appSlug,
+    @ApiResponse(responseCode = "404", description = "App not found in this environment")
+    public ResponseEntity<AppVersion> uploadJar(@EnvPath Environment env,
+                                                 @PathVariable String appSlug,
                                                 @RequestParam("file") MultipartFile file) throws IOException {
        try {
-            App app = appService.getBySlug(appSlug);
+            App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
            AppVersion version = appService.uploadJar(app.id(), file.getOriginalFilename(), file.getInputStream(), file.getSize());
            return ResponseEntity.status(201).body(version);
        } catch (IllegalArgumentException e) {
@@ -106,11 +132,11 @@ public class AppController {
    }

    @DeleteMapping("/{appSlug}")
-    @Operation(summary = "Delete an app")
+    @Operation(summary = "Delete this app")
    @ApiResponse(responseCode = "204", description = "App deleted")
-    public ResponseEntity<Void> deleteApp(@PathVariable String appSlug) {
+    public ResponseEntity<Void> deleteApp(@EnvPath Environment env, @PathVariable String appSlug) {
        try {
-            App app = appService.getBySlug(appSlug);
+            App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
            appService.deleteApp(app.id());
            return ResponseEntity.noContent().build();
        } catch (IllegalArgumentException e) {
@@ -118,6 +144,47 @@ public class AppController {
        }
    }

+    @GetMapping("/{appSlug}/dirty-state")
+    @Operation(summary = "Check whether the app's current config differs from the last successful deploy",
+            description = "Returns dirty=true when the desired state (current JAR + agent config + container config) "
+                    + "would produce a changed deployment. When no successful deploy exists yet, dirty=true.")
+    @ApiResponse(responseCode = "200", description = "Dirty-state computed")
+    @ApiResponse(responseCode = "404", description = "App not found in this environment")
+    public ResponseEntity<DirtyStateResponse> getDirtyState(@EnvPath Environment env,
+                                                             @PathVariable String appSlug) {
+        App app;
+        try {
+            app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
+        } catch (IllegalArgumentException e) {
+            throw new ResponseStatusException(HttpStatus.NOT_FOUND, "App not found");
+        }
+
+        // Latest JAR version (newest first — findByAppId orders by version DESC)
+        List<AppVersion> versions = appVersionRepository.findByAppId(app.id());
+        UUID latestVersionId = versions.isEmpty() ? null
+                : versions.stream().max(Comparator.comparingInt(AppVersion::version))
+                          .map(AppVersion::id).orElse(null);
+
+        // Desired agent config
+        ApplicationConfig agentConfig = configRepository
+                .findByApplicationAndEnvironment(appSlug, env.slug())
+                .orElse(null);
+
+        // Container config
+        Map<String, Object> containerConfig = app.containerConfig();
+
+        // Last successful deployment snapshot
+        Deployment lastSuccessful = deploymentRepository
+                .findLatestSuccessfulByAppAndEnv(app.id(), env.id())
+                .orElse(null);
+        DeploymentConfigSnapshot snapshot = lastSuccessful != null ? lastSuccessful.deployedConfigSnapshot() : null;
+
+        DirtyStateResult result = dirtyCalc.compute(latestVersionId, agentConfig, containerConfig, snapshot);
+
+        String lastId = lastSuccessful != null ? lastSuccessful.id().toString() : null;
+        return ResponseEntity.ok(new DirtyStateResponse(result.dirty(), lastId, result.differences()));
+    }
+
    private static final java.util.regex.Pattern CUSTOM_ARGS_PATTERN =
            java.util.regex.Pattern.compile("^[-a-zA-Z0-9_.=:/\\s+\"']*$");

@@ -134,24 +201,25 @@ public class AppController {
    }

    @PutMapping("/{appSlug}/container-config")
-    @Operation(summary = "Update container config for an app")
+    @Operation(summary = "Update container config for this app")
    @ApiResponse(responseCode = "200", description = "Container config updated")
    @ApiResponse(responseCode = "400", description = "Invalid configuration")
-    @ApiResponse(responseCode = "404", description = "App not found")
-    public ResponseEntity<App> updateContainerConfig(@PathVariable String appSlug,
-                                                      @RequestBody Map<String, Object> containerConfig) {
+    @ApiResponse(responseCode = "404", description = "App not found in this environment")
+    public ResponseEntity<?> updateContainerConfig(@EnvPath Environment env,
+                                                    @PathVariable String appSlug,
+                                                    @RequestBody Map<String, Object> containerConfig) {
        try {
            validateContainerConfig(containerConfig);
-            App app = appService.getBySlug(appSlug);
+            App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
            appService.updateContainerConfig(app.id(), containerConfig);
            return ResponseEntity.ok(appService.getById(app.id()));
        } catch (IllegalArgumentException e) {
            if (e.getMessage().contains("not found")) {
                return ResponseEntity.notFound().build();
            }
-            return ResponseEntity.badRequest().build();
+            return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
        }
    }

-    public record CreateAppRequest(UUID environmentId, String slug, String displayName) {}
+    public record CreateAppRequest(String slug, String displayName) {}
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AppSettingsController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AppSettingsController.java
@@ -1,11 +1,13 @@
 package com.cameleer.server.app.controller;

 import com.cameleer.server.app.dto.AppSettingsRequest;
+import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.admin.AppSettings;
 import com.cameleer.server.core.admin.AppSettingsRepository;
 import com.cameleer.server.core.admin.AuditCategory;
 import com.cameleer.server.core.admin.AuditResult;
 import com.cameleer.server.core.admin.AuditService;
+import com.cameleer.server.core.runtime.Environment;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.tags.Tag;
 import jakarta.servlet.http.HttpServletRequest;
@@ -26,7 +28,7 @@ import java.util.List;
 import java.util.Map;

@RestController
-@RequestMapping("/api/v1/admin/app-settings")
+@RequestMapping("/api/v1/environments/{envSlug}")
@PreAuthorize("hasAnyRole('ADMIN', 'OPERATOR')")
@Tag(name = "App Settings", description = "Per-application dashboard settings (ADMIN/OPERATOR)")
 public class AppSettingsController {
@@ -39,22 +41,25 @@ public class AppSettingsController {
        this.auditService = auditService;
    }

-    @GetMapping
-    @Operation(summary = "List all application settings")
-    public ResponseEntity<List<AppSettings>> getAll() {
-        return ResponseEntity.ok(repository.findAll());
+    @GetMapping("/app-settings")
+    @Operation(summary = "List application settings in this environment")
+    public ResponseEntity<List<AppSettings>> getAll(@EnvPath Environment env) {
+        return ResponseEntity.ok(repository.findByEnvironment(env.slug()));
    }

-    @GetMapping("/{appId}")
-    @Operation(summary = "Get settings for a specific application (returns defaults if not configured)")
-    public ResponseEntity<AppSettings> getByAppId(@PathVariable String appId) {
-        AppSettings settings = repository.findByApplicationId(appId).orElse(AppSettings.defaults(appId));
+    @GetMapping("/apps/{appSlug}/settings")
+    @Operation(summary = "Get settings for an application in this environment (returns defaults if not configured)")
+    public ResponseEntity<AppSettings> getByAppId(@EnvPath Environment env,
+                                                   @PathVariable String appSlug) {
+        AppSettings settings = repository.findByApplicationAndEnvironment(appSlug, env.slug())
+                .orElse(AppSettings.defaults(appSlug, env.slug()));
        return ResponseEntity.ok(settings);
    }

-    @PutMapping("/{appId}")
-    @Operation(summary = "Create or update settings for an application")
-    public ResponseEntity<AppSettings> update(@PathVariable String appId,
+    @PutMapping("/apps/{appSlug}/settings")
+    @Operation(summary = "Create or update settings for an application in this environment")
+    public ResponseEntity<AppSettings> update(@EnvPath Environment env,
+                                               @PathVariable String appSlug,
                                               @Valid @RequestBody AppSettingsRequest request,
                                               HttpServletRequest httpRequest) {
        List<String> errors = request.validate();
@@ -62,18 +67,20 @@ public class AppSettingsController {
            throw new ResponseStatusException(HttpStatus.BAD_REQUEST, String.join("; ", errors));
        }

-        AppSettings saved = repository.save(request.toSettings(appId));
-        auditService.log("update_app_settings", AuditCategory.CONFIG, appId,
-                Map.of("settings", saved), AuditResult.SUCCESS, httpRequest);
+        AppSettings saved = repository.save(request.toSettings(appSlug, env.slug()));
+        auditService.log("update_app_settings", AuditCategory.CONFIG, appSlug,
+                Map.of("environment", env.slug(), "settings", saved), AuditResult.SUCCESS, httpRequest);
        return ResponseEntity.ok(saved);
    }

-    @DeleteMapping("/{appId}")
-    @Operation(summary = "Delete application settings (reverts to defaults)")
-    public ResponseEntity<Void> delete(@PathVariable String appId, HttpServletRequest httpRequest) {
-        repository.delete(appId);
-        auditService.log("delete_app_settings", AuditCategory.CONFIG, appId,
-                Map.of(), AuditResult.SUCCESS, httpRequest);
+    @DeleteMapping("/apps/{appSlug}/settings")
+    @Operation(summary = "Delete application settings for this environment (reverts to defaults)")
+    public ResponseEntity<Void> delete(@EnvPath Environment env,
+                                        @PathVariable String appSlug,
+                                        HttpServletRequest httpRequest) {
+        repository.delete(appSlug, env.slug());
+        auditService.log("delete_app_settings", AuditCategory.CONFIG, appSlug,
+                Map.of("environment", env.slug()), AuditResult.SUCCESS, httpRequest);
        return ResponseEntity.noContent().build();
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ApplicationConfigController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ApplicationConfigController.java
@@ -7,6 +7,7 @@ import com.cameleer.server.app.dto.ConfigUpdateResponse;
 import com.cameleer.server.app.dto.TestExpressionRequest;
 import com.cameleer.server.app.dto.TestExpressionResponse;
 import com.cameleer.server.app.storage.PostgresApplicationConfigRepository;
+import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.admin.AuditCategory;
 import com.cameleer.server.core.admin.AuditResult;
 import com.cameleer.server.core.admin.AuditService;
@@ -18,10 +19,12 @@ import com.cameleer.server.core.agent.AgentRegistryService;
 import com.cameleer.server.core.agent.AgentState;
 import com.cameleer.server.core.agent.CommandReply;
 import com.cameleer.server.core.agent.CommandType;
+import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.storage.DiagramStore;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import io.swagger.v3.oas.annotations.Operation;
+import io.swagger.v3.oas.annotations.Parameter;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
 import jakarta.servlet.http.HttpServletRequest;
@@ -31,6 +34,7 @@ import org.springframework.http.HttpStatus;
 import org.springframework.http.ResponseEntity;
 import org.springframework.security.core.Authentication;
 import org.springframework.web.bind.annotation.*;
+import org.springframework.web.server.ResponseStatusException;

 import java.util.ArrayList;
 import java.util.List;
@@ -41,12 +45,13 @@ import java.util.concurrent.TimeUnit;
 import java.util.concurrent.TimeoutException;

 /**
- * Per-application configuration management.
- * Agents fetch config at startup; the UI modifies config which is persisted and pushed to agents via SSE.
+ * Per-application configuration for UI/admin callers. Env comes from the path,
+ * app comes from the path. Agents use {@link AgentConfigController} instead —
+ * env is derived from the JWT there, not spoofable via URL.
 */
@RestController
-@RequestMapping("/api/v1/config")
-@Tag(name = "Application Config", description = "Per-application observability configuration")
+@RequestMapping("/api/v1/environments/{envSlug}")
+@Tag(name = "Application Config", description = "Per-application observability configuration (user-facing)")
 public class ApplicationConfigController {

    private static final Logger log = LoggerFactory.getLogger(ApplicationConfigController.class);
@@ -72,24 +77,28 @@ public class ApplicationConfigController {
        this.sensitiveKeysRepository = sensitiveKeysRepository;
    }

-    @GetMapping
-    @Operation(summary = "List all application configs",
-            description = "Returns stored configurations for all applications")
+    @GetMapping("/config")
+    @Operation(summary = "List application configs in this environment")
    @ApiResponse(responseCode = "200", description = "Configs returned")
-    public ResponseEntity<List<ApplicationConfig>> listConfigs(HttpServletRequest httpRequest) {
-        auditService.log("view_app_configs", AuditCategory.CONFIG, null, null, AuditResult.SUCCESS, httpRequest);
-        return ResponseEntity.ok(configRepository.findAll());
+    public ResponseEntity<List<ApplicationConfig>> listConfigs(@EnvPath Environment env,
+                                                                 HttpServletRequest httpRequest) {
+        auditService.log("view_app_configs", AuditCategory.CONFIG, null,
+                Map.of("environment", env.slug()), AuditResult.SUCCESS, httpRequest);
+        return ResponseEntity.ok(configRepository.findByEnvironment(env.slug()));
    }

-    @GetMapping("/{application}")
-    @Operation(summary = "Get application config",
-            description = "Returns the current configuration for an application with merged sensitive keys.")
+    @GetMapping("/apps/{appSlug}/config")
+    @Operation(summary = "Get application config for this environment",
+            description = "Returns stored config merged with global sensitive keys. "
+                    + "Falls back to defaults if no row is persisted yet.")
    @ApiResponse(responseCode = "200", description = "Config returned")
-    public ResponseEntity<AppConfigResponse> getConfig(@PathVariable String application,
+    public ResponseEntity<AppConfigResponse> getConfig(@EnvPath Environment env,
+                                                        @PathVariable String appSlug,
                                                        HttpServletRequest httpRequest) {
-        auditService.log("view_app_config", AuditCategory.CONFIG, application, null, AuditResult.SUCCESS, httpRequest);
-        ApplicationConfig config = configRepository.findByApplication(application)
-                .orElse(defaultConfig(application));
+        auditService.log("view_app_config", AuditCategory.CONFIG, appSlug,
+                Map.of("environment", env.slug()), AuditResult.SUCCESS, httpRequest);
+        ApplicationConfig config = configRepository.findByApplicationAndEnvironment(appSlug, env.slug())
+                .orElse(defaultConfig(appSlug, env.slug()));

        List<String> globalKeys = sensitiveKeysRepository.find()
                .map(SensitiveKeysConfig::keys)
@@ -99,72 +108,87 @@ public class ApplicationConfigController {
        return ResponseEntity.ok(new AppConfigResponse(config, globalKeys, merged));
    }

-    @PutMapping("/{application}")
-    @Operation(summary = "Update application config",
-            description = "Saves config and pushes CONFIG_UPDATE to all LIVE agents of this application")
-    @ApiResponse(responseCode = "200", description = "Config saved and pushed")
-    public ResponseEntity<ConfigUpdateResponse> updateConfig(@PathVariable String application,
-                                                              @RequestParam(required = false) String environment,
+    @PutMapping("/apps/{appSlug}/config")
+    @Operation(summary = "Update application config for this environment",
+            description = "Saves config. When apply=live (default), also pushes CONFIG_UPDATE to LIVE agents. "
+                    + "When apply=staged, persists without a live push — the next successful deploy applies it.")
+    @ApiResponse(responseCode = "200", description = "Config saved (and pushed if apply=live)")
+    @ApiResponse(responseCode = "400", description = "Unknown apply value (must be 'staged' or 'live')")
+    public ResponseEntity<ConfigUpdateResponse> updateConfig(@EnvPath Environment env,
+                                                              @PathVariable String appSlug,
+                                                              @Parameter(name = "apply",
+                                                                      description = "When to apply: 'live' (default) saves and pushes CONFIG_UPDATE to live agents immediately; 'staged' saves without pushing — the next successful deploy applies it.")
+                                                              @RequestParam(name = "apply", defaultValue = "live") String apply,
                                                              @RequestBody ApplicationConfig config,
                                                              Authentication auth,
                                                              HttpServletRequest httpRequest) {
+        if (!"staged".equalsIgnoreCase(apply) && !"live".equalsIgnoreCase(apply)) {
+            throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
+                    "Unknown apply value '" + apply + "' — must be 'staged' or 'live'");
+        }
+
        String updatedBy = auth != null ? auth.getName() : "system";

-        config.setApplication(application);
-        ApplicationConfig saved = configRepository.save(application, config, updatedBy);
+        config.setApplication(appSlug);
+        ApplicationConfig saved = configRepository.save(appSlug, env.slug(), config, updatedBy);

-        // Merge global + per-app sensitive keys for the SSE push payload
        List<String> globalKeys = sensitiveKeysRepository.find()
                .map(SensitiveKeysConfig::keys)
                .orElse(null);
        List<String> perAppKeys = extractSensitiveKeys(saved);
        List<String> mergedKeys = SensitiveKeysMerger.merge(globalKeys, perAppKeys);

-        // Push with merged sensitive keys injected into the payload
-        CommandGroupResponse pushResult = pushConfigToAgentsWithMergedKeys(application, environment, saved, mergedKeys);
-        log.info("Config v{} saved for '{}', pushed to {} agent(s), {} responded",
-                saved.getVersion(), application, pushResult.total(), pushResult.responded());
+        CommandGroupResponse pushResult;
+        if ("staged".equalsIgnoreCase(apply)) {
+            pushResult = new CommandGroupResponse(true, 0, 0, List.of(), List.of());
+            log.info("Config v{} staged for '{}' (no live push)", saved.getVersion(), appSlug);
+        } else {
+            pushResult = pushConfigToAgentsWithMergedKeys(appSlug, env.slug(), saved, mergedKeys);
+            log.info("Config v{} saved for '{}', pushed to {} agent(s), {} responded",
+                    saved.getVersion(), appSlug, pushResult.total(), pushResult.responded());
+        }

-        auditService.log("update_app_config", AuditCategory.CONFIG, application,
-                Map.of("version", saved.getVersion(), "agentsPushed", pushResult.total(),
-                        "responded", pushResult.responded(), "timedOut", pushResult.timedOut().size()),
+        auditService.log(
+                "staged".equalsIgnoreCase(apply) ? "stage_app_config" : "update_app_config",
+                AuditCategory.CONFIG, appSlug,
+                Map.of("environment", env.slug(), "version", saved.getVersion(),
+                        "apply", apply.toLowerCase(),
+                        "agentsPushed", pushResult.total(),
+                        "responded", pushResult.responded(),
+                        "timedOut", pushResult.timedOut().size()),
                AuditResult.SUCCESS, httpRequest);

        return ResponseEntity.ok(new ConfigUpdateResponse(saved, pushResult));
    }

-    @GetMapping("/{application}/processor-routes")
-    @Operation(summary = "Get processor to route mapping",
-            description = "Returns a map of processorId → routeId for all processors seen in this application")
+    @GetMapping("/apps/{appSlug}/processor-routes")
+    @Operation(summary = "Get processor to route mapping for this environment",
+            description = "Returns a map of processorId → routeId for all processors seen in this application + environment")
    @ApiResponse(responseCode = "200", description = "Mapping returned")
-    public ResponseEntity<Map<String, String>> getProcessorRouteMapping(@PathVariable String application) {
-        return ResponseEntity.ok(diagramStore.findProcessorRouteMapping(application));
+    public ResponseEntity<Map<String, String>> getProcessorRouteMapping(@EnvPath Environment env,
+                                                                          @PathVariable String appSlug) {
+        return ResponseEntity.ok(diagramStore.findProcessorRouteMapping(appSlug, env.slug()));
    }

-    @PostMapping("/{application}/test-expression")
-    @Operation(summary = "Test a tap expression against sample data via a live agent")
+    @PostMapping("/apps/{appSlug}/config/test-expression")
+    @Operation(summary = "Test a tap expression against sample data via a live agent in this environment")
    @ApiResponse(responseCode = "200", description = "Expression evaluated successfully")
-    @ApiResponse(responseCode = "404", description = "No live agent available for this application")
+    @ApiResponse(responseCode = "404", description = "No live agent available for this application in this environment")
    @ApiResponse(responseCode = "504", description = "Agent did not respond in time")
    public ResponseEntity<TestExpressionResponse> testExpression(
-            @PathVariable String application,
-            @RequestParam(required = false) String environment,
+            @EnvPath Environment env,
+            @PathVariable String appSlug,
            @RequestBody TestExpressionRequest request) {
-        // Find a LIVE agent for this application, optionally filtered by environment
-        var candidates = registryService.findAll().stream()
-                .filter(a -> application.equals(a.applicationId()))
-                .filter(a -> a.state() == AgentState.LIVE);
-        if (environment != null) {
-            candidates = candidates.filter(a -> environment.equals(a.environmentId()));
-        }
-        AgentInfo agent = candidates.findFirst().orElse(null);
+        AgentInfo agent = registryService.findByApplicationAndEnvironment(appSlug, env.slug()).stream()
+                .filter(a -> a.state() == AgentState.LIVE)
+                .findFirst()
+                .orElse(null);

        if (agent == null) {
            return ResponseEntity.status(HttpStatus.NOT_FOUND)
-                    .body(new TestExpressionResponse(null, "No live agent available for application: " + application));
+                    .body(new TestExpressionResponse(null, "No live agent available for application: " + appSlug));
        }

-        // Build payload JSON
        String payloadJson;
        try {
            payloadJson = objectMapper.writeValueAsString(Map.of(
@@ -179,7 +203,6 @@ public class ApplicationConfigController {
                    .body(new TestExpressionResponse(null, "Failed to serialize request"));
        }

-        // Send command and await reply
        CompletableFuture<CommandReply> future = registryService.addCommandWithReply(
                agent.instanceId(), CommandType.TEST_EXPRESSION, payloadJson);

@@ -201,10 +224,6 @@ public class ApplicationConfigController {
        }
    }

-    /**
-     * Extracts sensitiveKeys from ApplicationConfig via JsonNode to avoid compile-time
-     * dependency on getSensitiveKeys() which may not be in the published cameleer-common jar yet.
-     */
    private List<String> extractSensitiveKeys(ApplicationConfig config) {
        try {
            com.fasterxml.jackson.databind.JsonNode node = objectMapper.valueToTree(config);
@@ -218,14 +237,10 @@ public class ApplicationConfigController {
        }
    }

-    /**
-     * Push config to agents with merged sensitive keys injected into the JSON payload.
-     */
    private CommandGroupResponse pushConfigToAgentsWithMergedKeys(String application, String environment,
                                                                   ApplicationConfig config, List<String> mergedKeys) {
        String payloadJson;
        try {
-            // Serialize config to a mutable map, inject merged keys
            @SuppressWarnings("unchecked")
            Map<String, Object> configMap = objectMapper.convertValue(config, Map.class);
            configMap.put("sensitiveKeys", mergedKeys);
@@ -271,9 +286,10 @@ public class ApplicationConfigController {
        return new CommandGroupResponse(allSuccess, futures.size(), responses.size(), responses, timedOut);
    }

-    private static ApplicationConfig defaultConfig(String application) {
+    static ApplicationConfig defaultConfig(String application, String environment) {
        ApplicationConfig config = new ApplicationConfig();
        config.setApplication(application);
+        config.setEnvironment(environment);
        config.setVersion(0);
        config.setMetricsEnabled(true);
        config.setSamplingRate(1.0);
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/CatalogController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/CatalogController.java
@@ -11,6 +11,8 @@ import com.cameleer.server.core.agent.AgentState;
 import com.cameleer.server.core.agent.RouteStateRegistry;
 import com.cameleer.server.core.runtime.*;
 import com.cameleer.server.core.storage.DiagramStore;
+import com.cameleer.server.core.storage.RouteCatalogEntry;
+import com.cameleer.server.core.storage.RouteCatalogStore;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
@@ -47,6 +49,7 @@ public class CatalogController {
    private final EnvironmentService envService;
    private final DeploymentRepository deploymentRepo;
    private final TenantProperties tenantProperties;
+    private final RouteCatalogStore routeCatalogStore;

    @Value("${cameleer.server.catalog.discoveryttldays:7}")
    private int discoveryTtlDays;
@@ -58,7 +61,8 @@ public class CatalogController {
                             AppService appService,
                             EnvironmentService envService,
                             DeploymentRepository deploymentRepo,
-                             TenantProperties tenantProperties) {
+                             TenantProperties tenantProperties,
+                             RouteCatalogStore routeCatalogStore) {
        this.registryService = registryService;
        this.diagramStore = diagramStore;
        this.jdbc = jdbc;
@@ -67,6 +71,7 @@ public class CatalogController {
        this.envService = envService;
        this.deploymentRepo = deploymentRepo;
        this.tenantProperties = tenantProperties;
+        this.routeCatalogStore = routeCatalogStore;
    }

    @GetMapping
@@ -154,6 +159,20 @@ public class CatalogController {
            }
        }

+        // Merge routes from persistent catalog (covers routes with 0 executions
+        // and routes from previous app versions within the selected time window)
+        try {
+            List<RouteCatalogEntry> catalogEntries = (environment != null && !environment.isBlank())
+                    ? routeCatalogStore.findByEnvironment(environment, rangeFrom, rangeTo)
+                    : routeCatalogStore.findAll(rangeFrom, rangeTo);
+            for (RouteCatalogEntry entry : catalogEntries) {
+                routesByApp.computeIfAbsent(entry.applicationId(), k -> new LinkedHashSet<>())
+                           .add(entry.routeId());
+            }
+        } catch (Exception e) {
+            log.warn("Failed to query route catalog: {}", e.getMessage());
+        }
+
        // 7. Build unified catalog
        Set<String> allSlugs = new LinkedHashSet<>(appsBySlug.keySet());
        allSlugs.addAll(agentsByApp.keySet());
@@ -177,7 +196,16 @@ public class CatalogController {
            }

            Set<String> routeIds = routesByApp.getOrDefault(slug, Set.of());
-            List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
+
+            // Resolve the env slug for this row early so fromUri can survive
+            // cross-env queries (env==null) against managed apps.
+            String rowEnvSlug = envSlug;
+            if (app != null && rowEnvSlug.isEmpty()) {
+                try {
+                    rowEnvSlug = envService.getById(app.environmentId()).slug();
+                } catch (Exception ignored) {}
+            }
+            final String resolvedEnvSlug = rowEnvSlug;

            // Routes
            List<RouteSummary> routeSummaries = routeIds.stream()
@@ -185,7 +213,7 @@ public class CatalogController {
                        String key = slug + "/" + routeId;
                        long count = routeExchangeCounts.getOrDefault(key, 0L);
                        Instant lastSeen = routeLastSeen.get(key);
-                        String fromUri = resolveFromEndpointUri(routeId, agentIds);
+                        String fromUri = resolveFromEndpointUri(slug, routeId, resolvedEnvSlug);
                        String state = routeStateRegistry.getState(slug, routeId).name().toLowerCase();
                        String routeState = "started".equals(state) ? null : state;
                        return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
@@ -239,15 +267,9 @@ public class CatalogController {
            String healthTooltip = buildHealthTooltip(app != null, deployStatus, agentHealth, agents.size());

            String displayName = app != null ? app.displayName() : slug;
-            String appEnvSlug = envSlug;
-            if (app != null && appEnvSlug.isEmpty()) {
-                try {
-                    appEnvSlug = envService.getById(app.environmentId()).slug();
-                } catch (Exception ignored) {}
-            }

            catalog.add(new CatalogApp(
-                    slug, displayName, app != null, appEnvSlug,
+                    slug, displayName, app != null, resolvedEnvSlug,
                    health, healthTooltip, agents.size(), routeSummaries, agentSummaries,
                    totalExchanges, deploymentSummary
            ));
@@ -256,8 +278,11 @@ public class CatalogController {
        return ResponseEntity.ok(catalog);
    }

-    private String resolveFromEndpointUri(String routeId, List<String> agentIds) {
-        return diagramStore.findContentHashForRouteByAgents(routeId, agentIds)
+    private String resolveFromEndpointUri(String applicationId, String routeId, String environment) {
+        if (environment == null || environment.isBlank()) {
+            return null;
+        }
+        return diagramStore.findLatestContentHashForAppRoute(applicationId, routeId, environment)
                .flatMap(diagramStore::findByContentHash)
                .map(RouteGraph::getRoot)
                .map(root -> root.getEndpointUri())
@@ -331,6 +356,7 @@ public class CatalogController {

        // Delete ClickHouse data
        deleteClickHouseData(tenantId, applicationId);
+        routeCatalogStore.deleteByApplication(applicationId);

        // Delete managed app if exists (PostgreSQL)
        try {
@@ -348,7 +374,7 @@ public class CatalogController {
        String[] tablesWithAppId = {
            "executions", "processor_executions", "route_diagrams", "agent_events",
            "stats_1m_app", "stats_1m_route", "stats_1m_processor_type", "stats_1m_processor",
-            "stats_1m_processor_detail"
+            "stats_1m_processor_detail", "route_catalog"
        };
        for (String table : tablesWithAppId) {
            try {
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ClickHouseAdminController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ClickHouseAdminController.java
@@ -4,8 +4,6 @@ import com.cameleer.server.app.dto.ClickHousePerformanceResponse;
 import com.cameleer.server.app.dto.ClickHouseQueryInfo;
 import com.cameleer.server.app.dto.ClickHouseStatusResponse;
 import com.cameleer.server.app.dto.ClickHouseTableInfo;
-import com.cameleer.server.app.dto.IndexerPipelineResponse;
-import com.cameleer.server.core.indexing.SearchIndexerStats;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.tags.Tag;
 import org.springframework.beans.factory.annotation.Qualifier;
@@ -31,15 +29,12 @@ import java.util.List;
 public class ClickHouseAdminController {

    private final JdbcTemplate clickHouseJdbc;
-    private final SearchIndexerStats indexerStats;
    private final String clickHouseUrl;

    public ClickHouseAdminController(
            @Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc,
-            SearchIndexerStats indexerStats,
            @Value("${cameleer.server.clickhouse.url:}") String clickHouseUrl) {
        this.clickHouseJdbc = clickHouseJdbc;
-        this.indexerStats = indexerStats;
        this.clickHouseUrl = clickHouseUrl;
    }

@@ -157,16 +152,4 @@ public class ClickHouseAdminController {
        }
    }

-    @GetMapping("/pipeline")
-    @Operation(summary = "Search indexer pipeline statistics")
-    public IndexerPipelineResponse getPipeline() {
-        return new IndexerPipelineResponse(
-                indexerStats.getQueueDepth(),
-                indexerStats.getMaxQueueSize(),
-                indexerStats.getFailedCount(),
-                indexerStats.getIndexedCount(),
-                indexerStats.getDebounceMs(),
-                indexerStats.getIndexingRate(),
-                indexerStats.getLastIndexedAt());
-    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DeploymentController.java
@@ -1,31 +1,48 @@
 package com.cameleer.server.app.controller;

 import com.cameleer.server.app.runtime.DeploymentExecutor;
-import com.cameleer.server.core.runtime.*;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.admin.AuditCategory;
+import com.cameleer.server.core.admin.AuditResult;
+import com.cameleer.server.core.admin.AuditService;
+import com.cameleer.server.core.runtime.App;
+import com.cameleer.server.core.runtime.AppService;
+import com.cameleer.server.core.runtime.AppVersion;
+import com.cameleer.server.core.runtime.AppVersionRepository;
+import com.cameleer.server.core.runtime.Deployment;
+import com.cameleer.server.core.runtime.DeploymentService;
+import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentService;
+import com.cameleer.server.core.runtime.RuntimeOrchestrator;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
+import jakarta.servlet.http.HttpServletRequest;
+import org.springframework.http.HttpStatus;
 import org.springframework.http.ResponseEntity;
 import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.security.core.context.SecurityContextHolder;
 import org.springframework.web.bind.annotation.GetMapping;
 import org.springframework.web.bind.annotation.PathVariable;
 import org.springframework.web.bind.annotation.PostMapping;
 import org.springframework.web.bind.annotation.RequestBody;
 import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;

 import java.util.List;
+import java.util.Map;
 import java.util.UUID;
 import java.util.stream.Collectors;

 /**
- * Deployment management: deploy, stop, promote, and view logs.
- * All app-scoped endpoints accept the app slug (not UUID) as path variable.
- * Protected by {@code ROLE_OPERATOR} or {@code ROLE_ADMIN}.
+ * Deployment management. Env + app come from the URL. Promote is inherently
+ * cross-env, so the target environment stays explicit in the request body
+ * (as a slug).
 */
@RestController
-@RequestMapping("/api/v1/apps/{appSlug}/deployments")
-@Tag(name = "Deployment Management", description = "Deploy, stop, restart, promote, and view logs")
+@RequestMapping("/api/v1/environments/{envSlug}/apps/{appSlug}/deployments")
+@Tag(name = "Deployment Management", description = "Deploy, stop, promote, and view logs")
@PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')")
 public class DeploymentController {

@@ -33,23 +50,32 @@ public class DeploymentController {
    private final DeploymentExecutor deploymentExecutor;
    private final RuntimeOrchestrator orchestrator;
    private final AppService appService;
+    private final EnvironmentService environmentService;
+    private final AuditService auditService;
+    private final AppVersionRepository appVersionRepository;

    public DeploymentController(DeploymentService deploymentService,
                                 DeploymentExecutor deploymentExecutor,
                                 RuntimeOrchestrator orchestrator,
-                                 AppService appService) {
+                                 AppService appService,
+                                 EnvironmentService environmentService,
+                                 AuditService auditService,
+                                 AppVersionRepository appVersionRepository) {
        this.deploymentService = deploymentService;
        this.deploymentExecutor = deploymentExecutor;
        this.orchestrator = orchestrator;
        this.appService = appService;
+        this.environmentService = environmentService;
+        this.auditService = auditService;
+        this.appVersionRepository = appVersionRepository;
    }

    @GetMapping
-    @Operation(summary = "List deployments for an app")
+    @Operation(summary = "List deployments for this app in this environment")
    @ApiResponse(responseCode = "200", description = "Deployment list returned")
-    public ResponseEntity<List<Deployment>> listDeployments(@PathVariable String appSlug) {
+    public ResponseEntity<List<Deployment>> listDeployments(@EnvPath Environment env, @PathVariable String appSlug) {
        try {
-            App app = appService.getBySlug(appSlug);
+            App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
            return ResponseEntity.ok(deploymentService.listByApp(app.id()));
        } catch (IllegalArgumentException e) {
            return ResponseEntity.notFound().build();
@@ -60,7 +86,9 @@ public class DeploymentController {
    @Operation(summary = "Get deployment by ID")
    @ApiResponse(responseCode = "200", description = "Deployment found")
    @ApiResponse(responseCode = "404", description = "Deployment not found")
-    public ResponseEntity<Deployment> getDeployment(@PathVariable String appSlug, @PathVariable UUID deploymentId) {
+    public ResponseEntity<Deployment> getDeployment(@EnvPath Environment env,
+                                                     @PathVariable String appSlug,
+                                                     @PathVariable UUID deploymentId) {
        try {
            return ResponseEntity.ok(deploymentService.getById(deploymentId));
        } catch (IllegalArgumentException e) {
@@ -69,15 +97,29 @@ public class DeploymentController {
    }

    @PostMapping
-    @Operation(summary = "Create and start a new deployment")
+    @Operation(summary = "Create and start a new deployment for this app in this environment")
    @ApiResponse(responseCode = "202", description = "Deployment accepted and starting")
-    public ResponseEntity<Deployment> deploy(@PathVariable String appSlug, @RequestBody DeployRequest request) {
+    public ResponseEntity<Deployment> deploy(@EnvPath Environment env,
+                                              @PathVariable String appSlug,
+                                              @RequestBody DeployRequest request,
+                                              HttpServletRequest httpRequest) {
        try {
-            App app = appService.getBySlug(appSlug);
-            Deployment deployment = deploymentService.createDeployment(app.id(), request.appVersionId(), request.environmentId());
+            App app = appService.getByEnvironmentAndSlug(env.id(), appSlug);
+            AppVersion appVersion = appVersionRepository.findById(request.appVersionId())
+                    .orElseThrow(() -> new IllegalArgumentException("AppVersion not found: " + request.appVersionId()));
+            Deployment deployment = deploymentService.createDeployment(app.id(), request.appVersionId(), env.id(), currentUserId());
            deploymentExecutor.executeAsync(deployment);
+            auditService.log("deploy_app", AuditCategory.DEPLOYMENT, deployment.id().toString(),
+                    Map.of("appSlug", appSlug, "envSlug", env.slug(),
+                            "appVersionId", request.appVersionId().toString(),
+                            "jarFilename", appVersion.jarFilename() != null ? appVersion.jarFilename() : "",
+                            "version", appVersion.version()),
+                    AuditResult.SUCCESS, httpRequest);
            return ResponseEntity.accepted().body(deployment);
        } catch (IllegalArgumentException e) {
+            auditService.log("deploy_app", AuditCategory.DEPLOYMENT, null,
+                    Map.of("appSlug", appSlug, "envSlug", env.slug(), "error", e.getMessage()),
+                    AuditResult.FAILURE, httpRequest);
            return ResponseEntity.notFound().build();
        }
    }
@@ -86,38 +128,65 @@ public class DeploymentController {
    @Operation(summary = "Stop a running deployment")
    @ApiResponse(responseCode = "200", description = "Deployment stopped")
    @ApiResponse(responseCode = "404", description = "Deployment not found")
-    public ResponseEntity<Deployment> stop(@PathVariable String appSlug, @PathVariable UUID deploymentId) {
+    public ResponseEntity<Deployment> stop(@EnvPath Environment env,
+                                            @PathVariable String appSlug,
+                                            @PathVariable UUID deploymentId,
+                                            HttpServletRequest httpRequest) {
        try {
            Deployment deployment = deploymentService.getById(deploymentId);
            deploymentExecutor.stopDeployment(deployment);
+            auditService.log("stop_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
+                    Map.of("appSlug", appSlug, "envSlug", env.slug()),
+                    AuditResult.SUCCESS, httpRequest);
            return ResponseEntity.ok(deploymentService.getById(deploymentId));
        } catch (IllegalArgumentException e) {
+            auditService.log("stop_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
+                    Map.of("appSlug", appSlug, "envSlug", env.slug(), "error", e.getMessage()),
+                    AuditResult.FAILURE, httpRequest);
            return ResponseEntity.notFound().build();
        }
    }

    @PostMapping("/{deploymentId}/promote")
-    @Operation(summary = "Promote deployment to a different environment")
+    @Operation(summary = "Promote this deployment to a different environment",
+            description = "Target environment is specified by slug in the request body. "
+                    + "The same app slug must exist in the target environment (or be created separately first).")
    @ApiResponse(responseCode = "202", description = "Promotion accepted and starting")
-    @ApiResponse(responseCode = "404", description = "Deployment not found")
-    public ResponseEntity<Deployment> promote(@PathVariable String appSlug, @PathVariable UUID deploymentId,
-                                               @RequestBody PromoteRequest request) {
+    @ApiResponse(responseCode = "404", description = "Deployment or target environment not found")
+    public ResponseEntity<?> promote(@EnvPath Environment env,
+                                      @PathVariable String appSlug,
+                                      @PathVariable UUID deploymentId,
+                                      @RequestBody PromoteRequest request,
+                                      HttpServletRequest httpRequest) {
        try {
-            App app = appService.getBySlug(appSlug);
            Deployment source = deploymentService.getById(deploymentId);
-            Deployment promoted = deploymentService.promote(app.id(), source.appVersionId(), request.targetEnvironmentId());
+            Environment targetEnv = environmentService.getBySlug(request.targetEnvironment());
+            // Target must also have the app with the same slug
+            App targetApp = appService.getByEnvironmentAndSlug(targetEnv.id(), appSlug);
+            Deployment promoted = deploymentService.promote(targetApp.id(), source.appVersionId(), targetEnv.id(), currentUserId());
            deploymentExecutor.executeAsync(promoted);
+            auditService.log("promote_deployment", AuditCategory.DEPLOYMENT, promoted.id().toString(),
+                    Map.of("sourceEnv", env.slug(), "targetEnv", request.targetEnvironment(),
+                            "appSlug", appSlug, "appVersionId", source.appVersionId().toString()),
+                    AuditResult.SUCCESS, httpRequest);
            return ResponseEntity.accepted().body(promoted);
        } catch (IllegalArgumentException e) {
-            return ResponseEntity.notFound().build();
+            auditService.log("promote_deployment", AuditCategory.DEPLOYMENT, deploymentId.toString(),
+                    Map.of("sourceEnv", env.slug(), "targetEnv", request.targetEnvironment(),
+                            "appSlug", appSlug, "error", e.getMessage()),
+                    AuditResult.FAILURE, httpRequest);
+            return ResponseEntity.status(HttpStatus.NOT_FOUND)
+                    .body(Map.of("error", e.getMessage()));
        }
    }

    @GetMapping("/{deploymentId}/logs")
-    @Operation(summary = "Get container logs for a deployment")
+    @Operation(summary = "Get container logs for this deployment")
    @ApiResponse(responseCode = "200", description = "Logs returned")
    @ApiResponse(responseCode = "404", description = "Deployment not found or no container")
-    public ResponseEntity<List<String>> getLogs(@PathVariable String appSlug, @PathVariable UUID deploymentId) {
+    public ResponseEntity<List<String>> getLogs(@EnvPath Environment env,
+                                                 @PathVariable String appSlug,
+                                                 @PathVariable UUID deploymentId) {
        try {
            Deployment deployment = deploymentService.getById(deploymentId);
            if (deployment.containerId() == null) {
@@ -130,6 +199,15 @@ public class DeploymentController {
        }
    }

-    public record DeployRequest(UUID appVersionId, UUID environmentId) {}
-    public record PromoteRequest(UUID targetEnvironmentId) {}
+    private String currentUserId() {
+        var auth = SecurityContextHolder.getContext().getAuthentication();
+        if (auth == null || auth.getName() == null) {
+            throw new ResponseStatusException(HttpStatus.UNAUTHORIZED, "No authentication");
+        }
+        String name = auth.getName();
+        return name.startsWith("user:") ? name.substring(5) : name;
+    }
+
+    public record DeployRequest(UUID appVersionId) {}
+    public record PromoteRequest(String targetEnvironment) {}
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DiagramController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DiagramController.java
@@ -50,11 +50,13 @@ public class DiagramController {
    @ApiResponse(responseCode = "202", description = "Data accepted for processing")
    public ResponseEntity<Void> ingestDiagrams(@RequestBody String body) throws JsonProcessingException {
        String instanceId = extractAgentId();
-        String applicationId = resolveApplicationId(instanceId);
+        AgentInfo agent = registryService.findById(instanceId);
+        String applicationId = agent != null ? agent.applicationId() : "";
+        String environment = agent != null ? agent.environmentId() : "";
        List<RouteGraph> graphs = parsePayload(body);

        for (RouteGraph graph : graphs) {
-            ingestionService.ingestDiagram(new TaggedDiagram(instanceId, applicationId, graph));
+            ingestionService.ingestDiagram(new TaggedDiagram(instanceId, applicationId, environment, graph));
        }

        return ResponseEntity.accepted().build();
@@ -65,11 +67,6 @@ public class DiagramController {
        return auth != null ? auth.getName() : "";
    }

-    private String resolveApplicationId(String instanceId) {
-        AgentInfo agent = registryService.findById(instanceId);
-        return agent != null ? agent.applicationId() : "";
-    }
-
    private List<RouteGraph> parsePayload(String body) throws JsonProcessingException {
        String trimmed = body.strip();
        if (trimmed.startsWith("[")) {
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DiagramRenderController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/DiagramRenderController.java
@@ -1,10 +1,10 @@
 package com.cameleer.server.app.controller;

 import com.cameleer.common.graph.RouteGraph;
-import com.cameleer.server.core.agent.AgentInfo;
-import com.cameleer.server.core.agent.AgentRegistryService;
+import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.diagram.DiagramLayout;
 import com.cameleer.server.core.diagram.DiagramRenderer;
+import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.storage.DiagramStore;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.media.Content;
@@ -16,24 +16,22 @@ import org.springframework.http.MediaType;
 import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.GetMapping;
 import org.springframework.web.bind.annotation.PathVariable;
-import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;

-import java.util.List;
 import java.util.Optional;

 /**
- * REST endpoint for rendering route diagrams.
+ * Diagram rendering and lookup.
 * <p>
- * Supports content negotiation via Accept header:
- * <ul>
- *   <li>{@code image/svg+xml} or default: returns SVG document</li>
- *   <li>{@code application/json}: returns JSON layout with node positions</li>
- * </ul>
+ * Content-addressed rendering stays flat at /api/v1/diagrams/{contentHash}/render:
+ * the hash is globally unique, permalinks are valuable, and no env partitioning
+ * is possible or needed.
+ * <p>
+ * By-app-and-route lookup is env-scoped at
+ * /api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram.
 */
@RestController
-@RequestMapping("/api/v1/diagrams")
@Tag(name = "Diagrams", description = "Diagram rendering endpoints")
 public class DiagramRenderController {

@@ -41,19 +39,17 @@ public class DiagramRenderController {

    private final DiagramStore diagramStore;
    private final DiagramRenderer diagramRenderer;
-    private final AgentRegistryService registryService;

    public DiagramRenderController(DiagramStore diagramStore,
-                                    DiagramRenderer diagramRenderer,
-                                    AgentRegistryService registryService) {
+                                    DiagramRenderer diagramRenderer) {
        this.diagramStore = diagramStore;
        this.diagramRenderer = diagramRenderer;
-        this.registryService = registryService;
    }

-    @GetMapping("/{contentHash}/render")
-    @Operation(summary = "Render a route diagram",
-            description = "Returns SVG (default) or JSON layout based on Accept header")
+    @GetMapping("/api/v1/diagrams/{contentHash}/render")
+    @Operation(summary = "Render a route diagram by content hash",
+            description = "Returns SVG (default) or JSON layout based on Accept header. "
+                    + "Content hashes are globally unique, so this endpoint is intentionally flat (no env).")
    @ApiResponse(responseCode = "200", description = "Diagram rendered successfully",
            content = {
                    @Content(mediaType = "image/svg+xml", schema = @Schema(type = "string")),
@@ -73,9 +69,6 @@ public class DiagramRenderController {
        RouteGraph graph = graphOpt.get();
        String accept = request.getHeader("Accept");

-        // Return JSON only when the client explicitly requests application/json
-        // without also accepting everything (*/*). This means "application/json"
-        // must appear and wildcards must not dominate the preference.
        if (accept != null && isJsonPreferred(accept)) {
            DiagramLayout layout = diagramRenderer.layoutJson(graph, direction);
            return ResponseEntity.ok()
@@ -83,31 +76,24 @@ public class DiagramRenderController {
                    .body(layout);
        }

-        // Default to SVG for image/svg+xml, */* or no Accept header
        String svg = diagramRenderer.renderSvg(graph);
        return ResponseEntity.ok()
                .contentType(SVG_MEDIA_TYPE)
                .body(svg);
    }

-    @GetMapping
-    @Operation(summary = "Find diagram by application and route ID",
-            description = "Resolves application to agent IDs and finds the latest diagram for the route")
+    @GetMapping("/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram")
+    @Operation(summary = "Find the latest diagram for this app's route in this environment",
+            description = "Returns the most recently stored diagram for (app, env, route). Independent of the "
+                    + "agent registry, so routes removed from the current app version still resolve.")
    @ApiResponse(responseCode = "200", description = "Diagram layout returned")
-    @ApiResponse(responseCode = "404", description = "No diagram found for the given application and route")
-    public ResponseEntity<DiagramLayout> findByApplicationAndRoute(
-            @RequestParam String application,
-            @RequestParam String routeId,
+    @ApiResponse(responseCode = "404", description = "No diagram found")
+    public ResponseEntity<DiagramLayout> findByAppAndRoute(
+            @EnvPath Environment env,
+            @PathVariable String appSlug,
+            @PathVariable String routeId,
            @RequestParam(defaultValue = "LR") String direction) {
-        List<String> agentIds = registryService.findByApplication(application).stream()
-                .map(AgentInfo::instanceId)
-                .toList();
-
-        if (agentIds.isEmpty()) {
-            return ResponseEntity.notFound().build();
-        }
-
-        Optional<String> contentHash = diagramStore.findContentHashForRouteByAgents(routeId, agentIds);
+        Optional<String> contentHash = diagramStore.findLatestContentHashForAppRoute(appSlug, routeId, env.slug());
        if (contentHash.isEmpty()) {
            return ResponseEntity.notFound().build();
        }
@@ -121,14 +107,6 @@ public class DiagramRenderController {
        return ResponseEntity.ok(layout);
    }

-    /**
-     * Determine if JSON is the explicitly preferred format.
-     * <p>
-     * Returns true only when the first media type in the Accept header is
-     * "application/json". Clients sending broad Accept lists like
-     * "text/plain, application/json, *&#47;*" are treated as unspecific
-     * and receive the SVG default.
-     */
    private boolean isJsonPreferred(String accept) {
        String[] parts = accept.split(",");
        if (parts.length == 0) return false;
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/EnvironmentAdminController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/EnvironmentAdminController.java
@@ -1,6 +1,7 @@
 package com.cameleer.server.app.controller;

 import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.runtime.EnvironmentColor;
 import com.cameleer.server.core.runtime.EnvironmentService;
 import com.cameleer.server.core.runtime.RuntimeType;
 import io.swagger.v3.oas.annotations.Operation;
@@ -12,7 +13,6 @@ import org.springframework.web.bind.annotation.*;

 import java.util.List;
 import java.util.Map;
-import java.util.UUID;

@RestController
@RequestMapping("/api/v1/admin/environments")
@@ -33,13 +33,13 @@ public class EnvironmentAdminController {
        return ResponseEntity.ok(environmentService.listAll());
    }

-    @GetMapping("/{id}")
-    @Operation(summary = "Get environment by ID")
+    @GetMapping("/{envSlug}")
+    @Operation(summary = "Get environment by slug")
    @ApiResponse(responseCode = "200", description = "Environment found")
    @ApiResponse(responseCode = "404", description = "Environment not found")
-    public ResponseEntity<Environment> getEnvironment(@PathVariable UUID id) {
+    public ResponseEntity<Environment> getEnvironment(@PathVariable String envSlug) {
        try {
-            return ResponseEntity.ok(environmentService.getById(id));
+            return ResponseEntity.ok(environmentService.getBySlug(envSlug));
        } catch (IllegalArgumentException e) {
            return ResponseEntity.notFound().build();
        }
@@ -48,24 +48,34 @@ public class EnvironmentAdminController {
    @PostMapping
    @Operation(summary = "Create a new environment")
    @ApiResponse(responseCode = "201", description = "Environment created")
-    @ApiResponse(responseCode = "400", description = "Slug already exists")
+    @ApiResponse(responseCode = "400", description = "Invalid slug or slug already exists")
    public ResponseEntity<?> createEnvironment(@RequestBody CreateEnvironmentRequest request) {
        try {
-            UUID id = environmentService.create(request.slug(), request.displayName(), request.production());
-            return ResponseEntity.status(201).body(environmentService.getById(id));
+            environmentService.create(request.slug(), request.displayName(), request.production());
+            return ResponseEntity.status(201).body(environmentService.getBySlug(request.slug()));
        } catch (IllegalArgumentException e) {
            return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
        }
    }

-    @PutMapping("/{id}")
-    @Operation(summary = "Update an environment")
+    @PutMapping("/{envSlug}")
+    @Operation(summary = "Update an environment's mutable fields (displayName, production, enabled, color)",
+            description = "Slug is immutable after creation and cannot be changed. "
+                    + "Any slug field in the request body is ignored. "
+                    + "If color is null or absent, the existing color is preserved.")
    @ApiResponse(responseCode = "200", description = "Environment updated")
+    @ApiResponse(responseCode = "400", description = "Unknown color value")
    @ApiResponse(responseCode = "404", description = "Environment not found")
-    public ResponseEntity<?> updateEnvironment(@PathVariable UUID id, @RequestBody UpdateEnvironmentRequest request) {
+    public ResponseEntity<?> updateEnvironment(@PathVariable String envSlug,
+                                                @RequestBody UpdateEnvironmentRequest request) {
        try {
-            environmentService.update(id, request.displayName(), request.production(), request.enabled());
-            return ResponseEntity.ok(environmentService.getById(id));
+            Environment current = environmentService.getBySlug(envSlug);
+            String nextColor = request.color() == null ? current.color() : request.color();
+            if (!EnvironmentColor.isValid(nextColor)) {
+                return ResponseEntity.badRequest().body(Map.of("error", "unknown environment color: " + request.color()));
+            }
+            environmentService.update(current.id(), request.displayName(), request.production(), request.enabled(), nextColor);
+            return ResponseEntity.ok(environmentService.getBySlug(envSlug));
        } catch (IllegalArgumentException e) {
            if (e.getMessage().contains("not found")) {
                return ResponseEntity.notFound().build();
@@ -74,14 +84,15 @@ public class EnvironmentAdminController {
        }
    }

-    @DeleteMapping("/{id}")
+    @DeleteMapping("/{envSlug}")
    @Operation(summary = "Delete an environment")
    @ApiResponse(responseCode = "204", description = "Environment deleted")
    @ApiResponse(responseCode = "400", description = "Cannot delete default environment")
    @ApiResponse(responseCode = "404", description = "Environment not found")
-    public ResponseEntity<?> deleteEnvironment(@PathVariable UUID id) {
+    public ResponseEntity<?> deleteEnvironment(@PathVariable String envSlug) {
        try {
-            environmentService.delete(id);
+            Environment current = environmentService.getBySlug(envSlug);
+            environmentService.delete(current.id());
            return ResponseEntity.noContent().build();
        } catch (IllegalArgumentException e) {
            if (e.getMessage().contains("not found")) {
@@ -106,17 +117,18 @@ public class EnvironmentAdminController {
        }
    }

-    @PutMapping("/{id}/default-container-config")
+    @PutMapping("/{envSlug}/default-container-config")
    @Operation(summary = "Update default container config for an environment")
    @ApiResponse(responseCode = "200", description = "Default container config updated")
    @ApiResponse(responseCode = "400", description = "Invalid configuration")
    @ApiResponse(responseCode = "404", description = "Environment not found")
-    public ResponseEntity<?> updateDefaultContainerConfig(@PathVariable UUID id,
+    public ResponseEntity<?> updateDefaultContainerConfig(@PathVariable String envSlug,
                                                           @RequestBody Map<String, Object> defaultContainerConfig) {
        try {
            validateContainerConfig(defaultContainerConfig);
-            environmentService.updateDefaultContainerConfig(id, defaultContainerConfig);
-            return ResponseEntity.ok(environmentService.getById(id));
+            Environment current = environmentService.getBySlug(envSlug);
+            environmentService.updateDefaultContainerConfig(current.id(), defaultContainerConfig);
+            return ResponseEntity.ok(environmentService.getBySlug(envSlug));
        } catch (IllegalArgumentException e) {
            if (e.getMessage().contains("not found")) {
                return ResponseEntity.notFound().build();
@@ -125,15 +137,16 @@ public class EnvironmentAdminController {
        }
    }

-    @PutMapping("/{id}/jar-retention")
+    @PutMapping("/{envSlug}/jar-retention")
    @Operation(summary = "Update JAR retention policy for an environment")
    @ApiResponse(responseCode = "200", description = "Retention policy updated")
    @ApiResponse(responseCode = "404", description = "Environment not found")
-    public ResponseEntity<?> updateJarRetention(@PathVariable UUID id,
+    public ResponseEntity<?> updateJarRetention(@PathVariable String envSlug,
                                                 @RequestBody JarRetentionRequest request) {
        try {
-            environmentService.updateJarRetentionCount(id, request.jarRetentionCount());
-            return ResponseEntity.ok(environmentService.getById(id));
+            Environment current = environmentService.getBySlug(envSlug);
+            environmentService.updateJarRetentionCount(current.id(), request.jarRetentionCount());
+            return ResponseEntity.ok(environmentService.getBySlug(envSlug));
        } catch (IllegalArgumentException e) {
            if (e.getMessage().contains("not found")) {
                return ResponseEntity.notFound().build();
@@ -143,6 +156,6 @@ public class EnvironmentAdminController {
    }

    public record CreateEnvironmentRequest(String slug, String displayName, boolean production) {}
-    public record UpdateEnvironmentRequest(String displayName, boolean production, boolean enabled) {}
+    public record UpdateEnvironmentRequest(String displayName, boolean production, boolean enabled, String color) {}
    public record JarRetentionRequest(Integer jarRetentionCount) {}
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/EventIngestionController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/EventIngestionController.java
@@ -71,9 +71,10 @@ public class EventIngestionController {

        AgentInfo agent = registryService.findById(instanceId);
        String applicationId = agent != null ? agent.applicationId() : "";
+        String environment = agent != null ? agent.environmentId() : null;

        for (AgentEvent event : events) {
-            agentEventService.recordEvent(instanceId, applicationId,
+            agentEventService.recordEvent(instanceId, applicationId, environment,
                    event.getEventType(),
                    event.getDetails() != null ? event.getDetails().toString() : null);

--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ExecutionController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ExecutionController.java
@@ -1,87 +0,0 @@
-package com.cameleer.server.app.controller;
-
-import com.cameleer.common.model.RouteExecution;
-import com.cameleer.server.core.agent.AgentInfo;
-import com.cameleer.server.core.agent.AgentRegistryService;
-import com.cameleer.server.core.ingestion.ChunkAccumulator;
-import com.cameleer.server.core.ingestion.IngestionService;
-import com.fasterxml.jackson.core.JsonProcessingException;
-import com.fasterxml.jackson.core.type.TypeReference;
-import com.fasterxml.jackson.databind.ObjectMapper;
-import io.swagger.v3.oas.annotations.Operation;
-import io.swagger.v3.oas.annotations.responses.ApiResponse;
-import io.swagger.v3.oas.annotations.tags.Tag;
-import org.springframework.boot.autoconfigure.condition.ConditionalOnMissingBean;
-import org.springframework.http.ResponseEntity;
-import org.springframework.security.core.Authentication;
-import org.springframework.security.core.context.SecurityContextHolder;
-import org.springframework.web.bind.annotation.PostMapping;
-import org.springframework.web.bind.annotation.RequestBody;
-import org.springframework.web.bind.annotation.RequestMapping;
-import org.springframework.web.bind.annotation.RestController;
-
-import java.util.List;
-
-/**
- * Legacy ingestion endpoint for route execution data (PostgreSQL path).
- * <p>
- * Accepts both single {@link RouteExecution} and arrays. Data is written
- * synchronously to PostgreSQL via {@link IngestionService}.
- * <p>
- * Only active when ClickHouse is disabled — when ClickHouse is enabled,
- * {@link ChunkIngestionController} takes over the {@code /executions} mapping.
- */
-@RestController
-@RequestMapping("/api/v1/data")
-@ConditionalOnMissingBean(ChunkAccumulator.class)
-@Tag(name = "Ingestion", description = "Data ingestion endpoints")
-public class ExecutionController {
-
-    private final IngestionService ingestionService;
-    private final AgentRegistryService registryService;
-    private final ObjectMapper objectMapper;
-
-    public ExecutionController(IngestionService ingestionService,
-                               AgentRegistryService registryService,
-                               ObjectMapper objectMapper) {
-        this.ingestionService = ingestionService;
-        this.registryService = registryService;
-        this.objectMapper = objectMapper;
-    }
-
-    @PostMapping("/executions")
-    @Operation(summary = "Ingest route execution data",
-            description = "Accepts a single RouteExecution or an array of RouteExecutions")
-    @ApiResponse(responseCode = "202", description = "Data accepted for processing")
-    public ResponseEntity<Void> ingestExecutions(@RequestBody String body) throws JsonProcessingException {
-        String instanceId = extractAgentId();
-        String applicationId = resolveApplicationId(instanceId);
-        List<RouteExecution> executions = parsePayload(body);
-
-        for (RouteExecution execution : executions) {
-            ingestionService.ingestExecution(instanceId, applicationId, execution);
-        }
-
-        return ResponseEntity.accepted().build();
-    }
-
-    private String extractAgentId() {
-        Authentication auth = SecurityContextHolder.getContext().getAuthentication();
-        return auth != null ? auth.getName() : "";
-    }
-
-    private String resolveApplicationId(String instanceId) {
-        AgentInfo agent = registryService.findById(instanceId);
-        return agent != null ? agent.applicationId() : "";
-    }
-
-    private List<RouteExecution> parsePayload(String body) throws JsonProcessingException {
-        String trimmed = body.strip();
-        if (trimmed.startsWith("[")) {
-            return objectMapper.readValue(trimmed, new TypeReference<>() {});
-        } else {
-            RouteExecution single = objectMapper.readValue(trimmed, RouteExecution.class);
-            return List.of(single);
-        }
-    }
-}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LogQueryController.java
@@ -2,6 +2,8 @@ package com.cameleer.server.app.controller;

 import com.cameleer.server.app.dto.LogEntryResponse;
 import com.cameleer.server.app.dto.LogSearchPageResponse;
+import com.cameleer.server.app.web.EnvPath;
+import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.search.LogSearchRequest;
 import com.cameleer.server.core.search.LogSearchResponse;
 import com.cameleer.server.core.storage.LogIndex;
@@ -18,8 +20,8 @@ import java.util.Arrays;
 import java.util.List;

@RestController
-@RequestMapping("/api/v1/logs")
-@Tag(name = "Application Logs", description = "Query application logs")
+@RequestMapping("/api/v1/environments/{envSlug}")
+@Tag(name = "Application Logs", description = "Query application logs (env-scoped)")
 public class LogQueryController {

    private final LogIndex logIndex;
@@ -28,11 +30,12 @@ public class LogQueryController {
        this.logIndex = logIndex;
    }

-    @GetMapping
-    @Operation(summary = "Search application log entries",
-            description = "Returns log entries with cursor-based pagination and level count aggregation. " +
-                    "Supports free-text search, multi-level filtering, and optional application scoping.")
+    @GetMapping("/logs")
+    @Operation(summary = "Search application log entries in this environment",
+            description = "Cursor-paginated log search scoped to the env in the path. "
+                    + "Supports free-text search, multi-level filtering, and optional application/agent scoping.")
    public ResponseEntity<LogSearchPageResponse> searchLogs(
+            @EnvPath Environment env,
            @RequestParam(required = false) String q,
            @RequestParam(required = false) String query,
            @RequestParam(required = false) String level,
@@ -40,8 +43,8 @@ public class LogQueryController {
            @RequestParam(name = "agentId", required = false) String instanceId,
            @RequestParam(required = false) String exchangeId,
            @RequestParam(required = false) String logger,
-            @RequestParam(required = false) String environment,
            @RequestParam(required = false) String source,
+            @RequestParam(required = false) String instanceIds,
            @RequestParam(required = false) String from,
            @RequestParam(required = false) String to,
            @RequestParam(required = false) String cursor,
@@ -51,7 +54,6 @@ public class LogQueryController {
        // q takes precedence over deprecated query param
        String searchText = q != null ? q : query;

-        // Parse CSV levels
        List<String> levels = List.of();
        if (level != null && !level.isEmpty()) {
            levels = Arrays.stream(level.split(","))
@@ -60,12 +62,29 @@ public class LogQueryController {
                    .toList();
        }

+        List<String> sources = List.of();
+        if (source != null && !source.isEmpty()) {
+            sources = Arrays.stream(source.split(","))
+                    .map(String::trim)
+                    .filter(s -> !s.isEmpty())
+                    .toList();
+        }
+
+        List<String> instanceIdList = List.of();
+        if (instanceIds != null && !instanceIds.isEmpty()) {
+            instanceIdList = Arrays.stream(instanceIds.split(","))
+                    .map(String::trim)
+                    .filter(s -> !s.isEmpty())
+                    .toList();
+        }
+
        Instant fromInstant = from != null ? Instant.parse(from) : null;
        Instant toInstant = to != null ? Instant.parse(to) : null;

        LogSearchRequest request = new LogSearchRequest(
                searchText, levels, application, instanceId, exchangeId,
-                logger, environment, source, fromInstant, toInstant, cursor, limit, sort);
+                logger, env.slug(), sources, fromInstant, toInstant, cursor, limit, sort,
+                instanceIdList);

        LogSearchResponse result = logIndex.search(request);

--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/RouteCatalogController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/RouteCatalogController.java
@@ -3,13 +3,16 @@ package com.cameleer.server.app.controller;
 import com.cameleer.server.app.dto.AgentSummary;
 import com.cameleer.server.app.dto.AppCatalogEntry;
 import com.cameleer.server.app.dto.RouteSummary;
+import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.common.graph.RouteGraph;
 import com.cameleer.server.core.agent.AgentInfo;
 import com.cameleer.server.core.agent.AgentRegistryService;
 import com.cameleer.server.core.agent.AgentState;
 import com.cameleer.server.core.agent.RouteStateRegistry;
+import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.storage.DiagramStore;
-import com.cameleer.server.core.storage.StatsStore;
+import com.cameleer.server.core.storage.RouteCatalogEntry;
+import com.cameleer.server.core.storage.RouteCatalogStore;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
 import io.swagger.v3.oas.annotations.tags.Tag;
@@ -32,8 +35,8 @@ import java.util.Set;
 import java.util.stream.Collectors;

@RestController
-@RequestMapping("/api/v1/routes")
-@Tag(name = "Route Catalog", description = "Route catalog and discovery")
+@RequestMapping("/api/v1/environments/{envSlug}")
+@Tag(name = "Route Catalog", description = "Route catalog and discovery (env-scoped)")
 public class RouteCatalogController {

    private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(RouteCatalogController.class);
@@ -42,39 +45,36 @@ public class RouteCatalogController {
    private final DiagramStore diagramStore;
    private final JdbcTemplate jdbc;
    private final RouteStateRegistry routeStateRegistry;
+    private final RouteCatalogStore routeCatalogStore;

    public RouteCatalogController(AgentRegistryService registryService,
                                   DiagramStore diagramStore,
                                   @org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc,
-                                   RouteStateRegistry routeStateRegistry) {
+                                   RouteStateRegistry routeStateRegistry,
+                                   RouteCatalogStore routeCatalogStore) {
        this.registryService = registryService;
        this.diagramStore = diagramStore;
        this.jdbc = jdbc;
        this.routeStateRegistry = routeStateRegistry;
+        this.routeCatalogStore = routeCatalogStore;
    }

-    @GetMapping("/catalog")
-    @Operation(summary = "Get route catalog",
-            description = "Returns all applications with their routes, agents, and health status")
+    @GetMapping("/routes")
+    @Operation(summary = "Get route catalog for this environment",
+            description = "Returns all applications with their routes, agents, and health status — filtered to this environment")
    @ApiResponse(responseCode = "200", description = "Catalog returned")
    public ResponseEntity<List<AppCatalogEntry>> getCatalog(
+            @EnvPath Environment env,
            @RequestParam(required = false) String from,
-            @RequestParam(required = false) String to,
-            @RequestParam(required = false) String environment) {
-        List<AgentInfo> allAgents = registryService.findAll();
+            @RequestParam(required = false) String to) {
+        String envSlug = env.slug();
+        List<AgentInfo> allAgents = registryService.findAll().stream()
+                .filter(a -> envSlug.equals(a.environmentId()))
+                .toList();

-        // Filter agents by environment if specified
-        if (environment != null && !environment.isBlank()) {
-            allAgents = allAgents.stream()
-                    .filter(a -> environment.equals(a.environmentId()))
-                    .toList();
-        }
-
-        // Group agents by application name
        Map<String, List<AgentInfo>> agentsByApp = allAgents.stream()
                .collect(Collectors.groupingBy(AgentInfo::applicationId, LinkedHashMap::new, Collectors.toList()));

-        // Collect all distinct routes per app
        Map<String, Set<String>> routesByApp = new LinkedHashMap<>();
        for (var entry : agentsByApp.entrySet()) {
            Set<String> routes = new LinkedHashSet<>();
@@ -86,21 +86,16 @@ public class RouteCatalogController {
            routesByApp.put(entry.getKey(), routes);
        }

-        // Time range for exchange counts — use provided range or default to last 24h
        Instant now = Instant.now();
        Instant rangeFrom = from != null ? Instant.parse(from) : now.minus(24, ChronoUnit.HOURS);
        Instant rangeTo = to != null ? Instant.parse(to) : now;
-        // Route exchange counts from AggregatingMergeTree (literal SQL — ClickHouse JDBC driver
-        // wraps prepared statements in sub-queries that strip AggregateFunction column types)
        Map<String, Long> routeExchangeCounts = new LinkedHashMap<>();
        Map<String, Instant> routeLastSeen = new LinkedHashMap<>();
        try {
-            String envFilter = (environment != null && !environment.isBlank())
-                    ? " AND environment = " + lit(environment) : "";
            jdbc.query(
                    "SELECT application_id, route_id, uniqMerge(total_count) AS cnt, MAX(bucket) AS last_seen " +
                            "FROM stats_1m_route WHERE bucket >= " + lit(rangeFrom) + " AND bucket < " + lit(rangeTo) +
-                            envFilter +
+                            " AND environment = " + lit(envSlug) +
                            " GROUP BY application_id, route_id",
                    rs -> {
                        String key = rs.getString("application_id") + "/" + rs.getString("route_id");
@@ -112,9 +107,6 @@ public class RouteCatalogController {
            log.warn("Failed to query route exchange counts: {}", e.getMessage());
        }

-        // Merge route IDs from ClickHouse stats into routesByApp.
-        // After server restart, auto-healed agents have empty routeIds, but
-        // ClickHouse still has execution data with the correct route IDs.
        for (var countEntry : routeExchangeCounts.entrySet()) {
            String[] parts = countEntry.getKey().split("/", 2);
            if (parts.length == 2) {
@@ -122,7 +114,16 @@ public class RouteCatalogController {
            }
        }

-        // Build catalog entries — merge apps from agent registry + ClickHouse data
+        try {
+            List<RouteCatalogEntry> catalogEntries = routeCatalogStore.findByEnvironment(envSlug, rangeFrom, rangeTo);
+            for (RouteCatalogEntry entry : catalogEntries) {
+                routesByApp.computeIfAbsent(entry.applicationId(), k -> new LinkedHashSet<>())
+                           .add(entry.routeId());
+            }
+        } catch (Exception e) {
+            log.warn("Failed to query route catalog: {}", e.getMessage());
+        }
+
        Set<String> allAppIds = new LinkedHashSet<>(agentsByApp.keySet());
        allAppIds.addAll(routesByApp.keySet());

@@ -130,31 +131,25 @@ public class RouteCatalogController {
        for (String appId : allAppIds) {
            List<AgentInfo> agents = agentsByApp.getOrDefault(appId, List.of());

-            // Routes
            Set<String> routeIds = routesByApp.getOrDefault(appId, Set.of());
-            List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
            List<RouteSummary> routeSummaries = routeIds.stream()
                    .map(routeId -> {
                        String key = appId + "/" + routeId;
                        long count = routeExchangeCounts.getOrDefault(key, 0L);
                        Instant lastSeen = routeLastSeen.get(key);
-                        String fromUri = resolveFromEndpointUri(routeId, agentIds);
+                        String fromUri = resolveFromEndpointUri(appId, routeId, envSlug);
                        String state = routeStateRegistry.getState(appId, routeId).name().toLowerCase();
-                        // Only include non-default states (stopped/suspended); null means started
                        String routeState = "started".equals(state) ? null : state;
                        return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
                    })
                    .toList();

-            // Agent summaries
            List<AgentSummary> agentSummaries = agents.stream()
                    .map(a -> new AgentSummary(a.instanceId(), a.displayName(), a.state().name().toLowerCase(), 0.0))
                    .toList();

-            // Health = worst state among agents
            String health = computeWorstHealth(agents);

-            // Total exchange count for the app
            long totalExchanges = routeSummaries.stream().mapToLong(RouteSummary::exchangeCount).sum();

            catalog.add(new AppCatalogEntry(appId, routeSummaries, agentSummaries,
@@ -164,23 +159,20 @@ public class RouteCatalogController {
        return ResponseEntity.ok(catalog);
    }

-    /** Resolve the from() endpoint URI for a route by looking up its diagram. */
-    private String resolveFromEndpointUri(String routeId, List<String> agentIds) {
-        return diagramStore.findContentHashForRouteByAgents(routeId, agentIds)
+    private String resolveFromEndpointUri(String applicationId, String routeId, String environment) {
+        return diagramStore.findLatestContentHashForAppRoute(applicationId, routeId, environment)
                .flatMap(diagramStore::findByContentHash)
                .map(RouteGraph::getRoot)
                .map(root -> root.getEndpointUri())
                .orElse(null);
    }

-    /** Format an Instant as a ClickHouse DateTime literal in UTC. */
    private static String lit(Instant instant) {
        return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
                .withZone(java.time.ZoneOffset.UTC)
                .format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
    }

-    /** Format a string as a ClickHouse SQL literal with backslash + quote escaping. */
    private static String lit(String value) {
        return "'" + value.replace("\\", "\\\\").replace("'", "\\'") + "'";
    }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/RouteMetricsController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/RouteMetricsController.java
@@ -2,8 +2,10 @@ package com.cameleer.server.app.controller;

 import com.cameleer.server.app.dto.ProcessorMetrics;
 import com.cameleer.server.app.dto.RouteMetrics;
+import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.admin.AppSettings;
 import com.cameleer.server.core.admin.AppSettingsRepository;
+import com.cameleer.server.core.runtime.Environment;
 import com.cameleer.server.core.storage.StatsStore;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.responses.ApiResponse;
@@ -15,24 +17,23 @@ import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;

-import java.sql.Timestamp;
 import java.time.Duration;
 import java.time.Instant;
 import java.time.temporal.ChronoUnit;
-import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;

@RestController
-@RequestMapping("/api/v1/routes")
-@Tag(name = "Route Metrics", description = "Route performance metrics")
+@RequestMapping("/api/v1/environments/{envSlug}/routes")
+@Tag(name = "Route Metrics", description = "Route performance metrics (env-scoped)")
 public class RouteMetricsController {

    private final JdbcTemplate jdbc;
    private final StatsStore statsStore;
    private final AppSettingsRepository appSettingsRepository;

-    public RouteMetricsController(@org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc, StatsStore statsStore,
+    public RouteMetricsController(@org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc,
+                                   StatsStore statsStore,
                                   AppSettingsRepository appSettingsRepository) {
        this.jdbc = jdbc;
        this.statsStore = statsStore;
@@ -40,35 +41,32 @@ public class RouteMetricsController {
    }

    @GetMapping("/metrics")
-    @Operation(summary = "Get route metrics",
-            description = "Returns aggregated performance metrics per route for the given time window")
+    @Operation(summary = "Get route metrics for this environment",
+            description = "Returns aggregated performance metrics per route for the given time window. "
+                    + "Optional appId filter narrows to a single application.")
    @ApiResponse(responseCode = "200", description = "Metrics returned")
    public ResponseEntity<List<RouteMetrics>> getMetrics(
+            @EnvPath Environment env,
            @RequestParam(required = false) String from,
            @RequestParam(required = false) String to,
-            @RequestParam(required = false) String appId,
-            @RequestParam(required = false) String environment) {
+            @RequestParam(required = false) String appId) {

        Instant toInstant = to != null ? Instant.parse(to) : Instant.now();
        Instant fromInstant = from != null ? Instant.parse(from) : toInstant.minus(24, ChronoUnit.HOURS);
        long windowSeconds = Duration.between(fromInstant, toInstant).toSeconds();

-        // Literal SQL — ClickHouse JDBC driver wraps prepared statements in sub-queries
-        // that strip AggregateFunction column types, breaking -Merge combinators
        var sql = new StringBuilder(
                "SELECT application_id, route_id, " +
                "uniqMerge(total_count) AS total, " +
                "uniqIfMerge(failed_count) AS failed, " +
                "CASE WHEN uniqMerge(total_count) > 0 THEN toFloat64(sumMerge(duration_sum)) / uniqMerge(total_count) ELSE 0 END AS avg_dur, " +
                "COALESCE(quantileMerge(0.99)(p99_duration), 0) AS p99_dur " +
-                "FROM stats_1m_route WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant));
+                "FROM stats_1m_route WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant) +
+                " AND environment = " + lit(env.slug()));

        if (appId != null) {
            sql.append(" AND application_id = " + lit(appId));
        }
-        if (environment != null) {
-            sql.append(" AND environment = " + lit(environment));
-        }
        sql.append(" GROUP BY application_id, route_id ORDER BY application_id, route_id");

        List<RouteMetrics> metrics = jdbc.query(sql.toString(), (rs, rowNum) -> {
@@ -87,7 +85,7 @@ public class RouteMetricsController {
                    avgDur, p99Dur, errorRate, tps, List.of(), -1.0);
        });

-        // Fetch sparklines (12 buckets over the time window)
+        // Sparklines
        if (!metrics.isEmpty()) {
            int sparkBuckets = 12;
            long bucketSeconds = Math.max(windowSeconds / sparkBuckets, 60);
@@ -95,15 +93,12 @@ public class RouteMetricsController {
            for (int i = 0; i < metrics.size(); i++) {
                RouteMetrics m = metrics.get(i);
                try {
-                    var sparkWhere = new StringBuilder(
-                            "FROM stats_1m_route WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant) +
-                            " AND application_id = " + lit(m.appId()) + " AND route_id = " + lit(m.routeId()));
-                    if (environment != null) {
-                        sparkWhere.append(" AND environment = " + lit(environment));
-                    }
                    String sparkSql = "SELECT toStartOfInterval(bucket, toIntervalSecond(" + bucketSeconds + ")) AS period, " +
                            "COALESCE(uniqMerge(total_count), 0) AS cnt " +
-                            sparkWhere + " GROUP BY period ORDER BY period";
+                            "FROM stats_1m_route WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant) +
+                            " AND environment = " + lit(env.slug()) +
+                            " AND application_id = " + lit(m.appId()) + " AND route_id = " + lit(m.routeId()) +
+                            " GROUP BY period ORDER BY period";
                    List<Double> sparkline = jdbc.query(sparkSql,
                            (rs, rowNum) -> rs.getDouble("cnt"));
                    metrics.set(i, new RouteMetrics(m.routeId(), m.appId(), m.exchangeCount(),
@@ -115,15 +110,16 @@ public class RouteMetricsController {
            }
        }

-        // Enrich with SLA compliance per route
+        // SLA compliance
        if (!metrics.isEmpty()) {
-            // Determine SLA threshold (per-app or default)
-            String effectiveAppId = appId != null ? appId : (metrics.isEmpty() ? null : metrics.get(0).appId());
-            int threshold = appSettingsRepository.findByApplicationId(effectiveAppId != null ? effectiveAppId : "")
-                    .map(AppSettings::slaThresholdMs).orElse(300);
+            String effectiveAppId = appId != null ? appId : metrics.get(0).appId();
+            int threshold = effectiveAppId != null
+                    ? appSettingsRepository.findByApplicationAndEnvironment(effectiveAppId, env.slug())
+                            .map(AppSettings::slaThresholdMs).orElse(300)
+                    : 300;

            Map<String, long[]> slaCounts = statsStore.slaCountsByRoute(fromInstant, toInstant,
-                    effectiveAppId, threshold, environment);
+                    effectiveAppId, threshold, env.slug());

            for (int i = 0; i < metrics.size(); i++) {
                RouteMetrics m = metrics.get(i);
@@ -140,24 +136,19 @@ public class RouteMetricsController {
    }

    @GetMapping("/metrics/processors")
-    @Operation(summary = "Get processor metrics",
+    @Operation(summary = "Get processor metrics for this environment",
            description = "Returns aggregated performance metrics per processor for the given route and time window")
    @ApiResponse(responseCode = "200", description = "Metrics returned")
    public ResponseEntity<List<ProcessorMetrics>> getProcessorMetrics(
+            @EnvPath Environment env,
            @RequestParam String routeId,
            @RequestParam(required = false) String appId,
            @RequestParam(required = false) Instant from,
-            @RequestParam(required = false) Instant to,
-            @RequestParam(required = false) String environment) {
+            @RequestParam(required = false) Instant to) {

        Instant toInstant = to != null ? to : Instant.now();
        Instant fromInstant = from != null ? from : toInstant.minus(24, ChronoUnit.HOURS);

-        // Literal SQL for AggregatingMergeTree -Merge combinators.
-        // Aliases (tc, fc) must NOT shadow column names (total_count, failed_count) —
-        // ClickHouse 24.12 new analyzer resolves subsequent uniqMerge(total_count)
-        // to the alias (UInt64) instead of the AggregateFunction column.
-        // total_count/failed_count use uniq(execution_id) to deduplicate repeated inserts.
        var sql = new StringBuilder(
                "SELECT processor_id, processor_type, route_id, application_id, " +
                "uniqMerge(total_count) AS tc, " +
@@ -166,14 +157,12 @@ public class RouteMetricsController {
                "quantileMerge(0.99)(p99_duration) AS p99_duration_ms " +
                "FROM stats_1m_processor_detail " +
                "WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant) +
+                " AND environment = " + lit(env.slug()) +
                " AND route_id = " + lit(routeId));

        if (appId != null) {
            sql.append(" AND application_id = " + lit(appId));
        }
-        if (environment != null) {
-            sql.append(" AND environment = " + lit(environment));
-        }
        sql.append(" GROUP BY processor_id, processor_type, route_id, application_id");
        sql.append(" ORDER BY tc DESC");

@@ -196,14 +185,12 @@ public class RouteMetricsController {
        return ResponseEntity.ok(metrics);
    }

-    /** Format an Instant as a ClickHouse DateTime literal. */
    private static String lit(Instant instant) {
        return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
                .withZone(java.time.ZoneOffset.UTC)
                .format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
    }

-    /** Format a string as a ClickHouse SQL literal with backslash + quote escaping. */
    private static String lit(String value) {
        return "'" + value.replace("\\", "\\\\").replace("'", "\\'") + "'";
    }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/SearchController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/SearchController.java
@@ -1,9 +1,10 @@
 package com.cameleer.server.app.controller;

+import com.cameleer.server.app.web.EnvPath;
 import com.cameleer.server.core.admin.AppSettings;
 import com.cameleer.server.core.admin.AppSettingsRepository;
-import com.cameleer.server.core.agent.AgentInfo;
-import com.cameleer.server.core.agent.AgentRegistryService;
+import com.cameleer.server.core.runtime.Environment;
+import com.cameleer.server.core.search.AttributeFilter;
 import com.cameleer.server.core.search.ExecutionStats;
 import com.cameleer.server.core.search.ExecutionSummary;
 import com.cameleer.server.core.search.SearchRequest;
@@ -14,6 +15,7 @@ import com.cameleer.server.core.search.TopError;
 import com.cameleer.server.core.storage.StatsStore;
 import io.swagger.v3.oas.annotations.Operation;
 import io.swagger.v3.oas.annotations.tags.Tag;
+import org.springframework.http.HttpStatus;
 import org.springframework.http.ResponseEntity;
 import org.springframework.web.bind.annotation.GetMapping;
 import org.springframework.web.bind.annotation.PostMapping;
@@ -21,36 +23,35 @@ import org.springframework.web.bind.annotation.RequestBody;
 import org.springframework.web.bind.annotation.RequestMapping;
 import org.springframework.web.bind.annotation.RequestParam;
 import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;

 import java.time.Instant;
+import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;

 /**
- * Search endpoints for querying route executions.
- * <p>
- * GET supports basic filters via query parameters. POST accepts a full
- * {@link SearchRequest} JSON body for advanced search with all filter types.
+ * Execution search and stats endpoints. Env is the path; env filter is
+ * derived from the path and always applied to underlying ClickHouse queries.
 */
@RestController
-@RequestMapping("/api/v1/search")
-@Tag(name = "Search", description = "Transaction search endpoints")
+@RequestMapping("/api/v1/environments/{envSlug}")
+@Tag(name = "Search", description = "Transaction search and stats (env-scoped)")
 public class SearchController {

    private final SearchService searchService;
-    private final AgentRegistryService registryService;
    private final AppSettingsRepository appSettingsRepository;

-    public SearchController(SearchService searchService, AgentRegistryService registryService,
+    public SearchController(SearchService searchService,
                             AppSettingsRepository appSettingsRepository) {
        this.searchService = searchService;
-        this.registryService = registryService;
        this.appSettingsRepository = appSettingsRepository;
    }

    @GetMapping("/executions")
-    @Operation(summary = "Search executions with basic filters")
+    @Operation(summary = "Search executions with basic filters (env from path)")
    public ResponseEntity<SearchResult<ExecutionSummary>> searchGet(
+            @EnvPath Environment env,
            @RequestParam(required = false) String status,
            @RequestParam(required = false) Instant timeFrom,
            @RequestParam(required = false) Instant timeTo,
@@ -60,13 +61,18 @@ public class SearchController {
            @RequestParam(name = "agentId", required = false) String instanceId,
            @RequestParam(required = false) String processorType,
            @RequestParam(required = false) String application,
-            @RequestParam(required = false) String environment,
+            @RequestParam(name = "attr", required = false) List<String> attr,
            @RequestParam(defaultValue = "0") int offset,
            @RequestParam(defaultValue = "50") int limit,
            @RequestParam(required = false) String sortField,
            @RequestParam(required = false) String sortDir) {

-        List<String> agentIds = resolveApplicationToAgentIds(application);
+        List<AttributeFilter> attributeFilters;
+        try {
+            attributeFilters = parseAttrParams(attr);
+        } catch (IllegalArgumentException e) {
+            throw new ResponseStatusException(HttpStatus.BAD_REQUEST, e.getMessage(), e);
+        }

        SearchRequest request = new SearchRequest(
                status, timeFrom, timeTo,
@@ -74,140 +80,144 @@ public class SearchController {
                correlationId,
                text, null, null, null,
                routeId, instanceId, processorType,
-                application, agentIds,
+                application, null,
                offset, limit,
                sortField, sortDir,
-                environment
+                null,
+                env.slug(),
+                attributeFilters
        );

        return ResponseEntity.ok(searchService.search(request));
    }

-    @PostMapping("/executions")
-    @Operation(summary = "Advanced search with all filters")
-    public ResponseEntity<SearchResult<ExecutionSummary>> searchPost(
-            @RequestBody SearchRequest request) {
-        // Resolve application to agentIds if application is specified but agentIds is not
-        SearchRequest resolved = request;
-        if (request.applicationId() != null && !request.applicationId().isBlank()
-                && (request.instanceIds() == null || request.instanceIds().isEmpty())) {
-            resolved = request.withInstanceIds(resolveApplicationToAgentIds(request.applicationId()));
+    /**
+     * Parses {@code attr} query params of the form {@code key} (key-only) or {@code key:value}
+     * (exact or wildcard via {@code *}). Splits on the first {@code :}; later colons are part of
+     * the value. Blank / null list → empty result. Key validation is delegated to
+     * {@link AttributeFilter}'s compact constructor, which throws {@link IllegalArgumentException}
+     * on invalid keys (mapped to 400 by the caller).
+     */
+    static List<AttributeFilter> parseAttrParams(List<String> raw) {
+        if (raw == null || raw.isEmpty()) return List.of();
+        List<AttributeFilter> out = new ArrayList<>(raw.size());
+        for (String entry : raw) {
+            if (entry == null || entry.isBlank()) continue;
+            int colon = entry.indexOf(':');
+            if (colon < 0) {
+                out.add(new AttributeFilter(entry.trim(), null));
+            } else {
+                out.add(new AttributeFilter(entry.substring(0, colon).trim(),
+                        entry.substring(colon + 1)));
+            }
        }
-        return ResponseEntity.ok(searchService.search(resolved));
+        return out;
+    }
+
+    @PostMapping("/executions/search")
+    @Operation(summary = "Advanced search with all filters",
+            description = "Env from the path overrides any environment field in the body.")
+    public ResponseEntity<SearchResult<ExecutionSummary>> searchPost(
+            @EnvPath Environment env,
+            @RequestBody SearchRequest request) {
+        SearchRequest scoped = request.withEnvironment(env.slug());
+        return ResponseEntity.ok(searchService.search(scoped));
    }

    @GetMapping("/stats")
    @Operation(summary = "Aggregate execution stats (P99 latency, active count, SLA compliance)")
    public ResponseEntity<ExecutionStats> stats(
+            @EnvPath Environment env,
            @RequestParam Instant from,
            @RequestParam(required = false) Instant to,
            @RequestParam(required = false) String routeId,
-            @RequestParam(required = false) String application,
-            @RequestParam(required = false) String environment) {
+            @RequestParam(required = false) String application) {
        Instant end = to != null ? to : Instant.now();
        ExecutionStats stats;
        if (routeId == null && application == null) {
-            stats = searchService.stats(from, end, environment);
+            stats = searchService.stats(from, end, env.slug());
        } else if (routeId == null) {
-            stats = searchService.statsForApp(from, end, application, environment);
+            stats = searchService.statsForApp(from, end, application, env.slug());
        } else {
-            List<String> agentIds = resolveApplicationToAgentIds(application);
-            stats = searchService.stats(from, end, routeId, agentIds, environment);
+            stats = searchService.statsForRoute(from, end, routeId, application, env.slug());
        }

-        // Enrich with SLA compliance
-        int threshold = appSettingsRepository
-                .findByApplicationId(application != null ? application : "")
-                .map(AppSettings::slaThresholdMs).orElse(300);
-        double sla = searchService.slaCompliance(from, end, threshold, application, routeId, environment);
+        int threshold = application != null && !application.isBlank()
+                ? appSettingsRepository.findByApplicationAndEnvironment(application, env.slug())
+                        .map(AppSettings::slaThresholdMs).orElse(300)
+                : 300;
+        double sla = searchService.slaCompliance(from, end, threshold, application, routeId, env.slug());
        return ResponseEntity.ok(stats.withSlaCompliance(sla));
    }

    @GetMapping("/stats/timeseries")
    @Operation(summary = "Bucketed time-series stats over a time window")
    public ResponseEntity<StatsTimeseries> timeseries(
+            @EnvPath Environment env,
            @RequestParam Instant from,
            @RequestParam(required = false) Instant to,
            @RequestParam(defaultValue = "24") int buckets,
            @RequestParam(required = false) String routeId,
-            @RequestParam(required = false) String application,
-            @RequestParam(required = false) String environment) {
+            @RequestParam(required = false) String application) {
        Instant end = to != null ? to : Instant.now();
        if (routeId == null && application == null) {
-            return ResponseEntity.ok(searchService.timeseries(from, end, buckets, environment));
+            return ResponseEntity.ok(searchService.timeseries(from, end, buckets, env.slug()));
        }
        if (routeId == null) {
-            return ResponseEntity.ok(searchService.timeseriesForApp(from, end, buckets, application, environment));
+            return ResponseEntity.ok(searchService.timeseriesForApp(from, end, buckets, application, env.slug()));
        }
-        List<String> agentIds = resolveApplicationToAgentIds(application);
-        if (routeId == null && agentIds.isEmpty()) {
-            return ResponseEntity.ok(searchService.timeseries(from, end, buckets, environment));
-        }
-        return ResponseEntity.ok(searchService.timeseries(from, end, buckets, routeId, agentIds, environment));
+        return ResponseEntity.ok(searchService.timeseriesForRoute(from, end, buckets, routeId, application, env.slug()));
    }

    @GetMapping("/stats/timeseries/by-app")
    @Operation(summary = "Timeseries grouped by application")
    public ResponseEntity<Map<String, StatsTimeseries>> timeseriesByApp(
+            @EnvPath Environment env,
            @RequestParam Instant from,
            @RequestParam(required = false) Instant to,
-            @RequestParam(defaultValue = "24") int buckets,
-            @RequestParam(required = false) String environment) {
+            @RequestParam(defaultValue = "24") int buckets) {
        Instant end = to != null ? to : Instant.now();
-        return ResponseEntity.ok(searchService.timeseriesGroupedByApp(from, end, buckets, environment));
+        return ResponseEntity.ok(searchService.timeseriesGroupedByApp(from, end, buckets, env.slug()));
    }

    @GetMapping("/stats/timeseries/by-route")
    @Operation(summary = "Timeseries grouped by route for an application")
    public ResponseEntity<Map<String, StatsTimeseries>> timeseriesByRoute(
+            @EnvPath Environment env,
            @RequestParam Instant from,
            @RequestParam(required = false) Instant to,
            @RequestParam(defaultValue = "24") int buckets,
-            @RequestParam String application,
-            @RequestParam(required = false) String environment) {
+            @RequestParam String application) {
        Instant end = to != null ? to : Instant.now();
-        return ResponseEntity.ok(searchService.timeseriesGroupedByRoute(from, end, buckets, application, environment));
+        return ResponseEntity.ok(searchService.timeseriesGroupedByRoute(from, end, buckets, application, env.slug()));
    }

    @GetMapping("/stats/punchcard")
    @Operation(summary = "Transaction punchcard: weekday x hour grid (rolling 7 days)")
    public ResponseEntity<List<StatsStore.PunchcardCell>> punchcard(
-            @RequestParam(required = false) String application,
-            @RequestParam(required = false) String environment) {
+            @EnvPath Environment env,
+            @RequestParam(required = false) String application) {
        Instant to = Instant.now();
        Instant from = to.minus(java.time.Duration.ofDays(7));
-        return ResponseEntity.ok(searchService.punchcard(from, to, application, environment));
+        return ResponseEntity.ok(searchService.punchcard(from, to, application, env.slug()));
    }

    @GetMapping("/attributes/keys")
-    @Operation(summary = "Distinct attribute key names across all executions")
-    public ResponseEntity<List<String>> attributeKeys() {
-        return ResponseEntity.ok(searchService.distinctAttributeKeys());
+    @Operation(summary = "Distinct attribute key names for this environment")
+    public ResponseEntity<List<String>> attributeKeys(@EnvPath Environment env) {
+        return ResponseEntity.ok(searchService.distinctAttributeKeys(env.slug()));
    }

    @GetMapping("/errors/top")
    @Operation(summary = "Top N errors with velocity trend")
    public ResponseEntity<List<TopError>> topErrors(
+            @EnvPath Environment env,
            @RequestParam Instant from,
            @RequestParam(required = false) Instant to,
            @RequestParam(required = false) String application,
            @RequestParam(required = false) String routeId,
-            @RequestParam(required = false) String environment,
            @RequestParam(defaultValue = "5") int limit) {
        Instant end = to != null ? to : Instant.now();
-        return ResponseEntity.ok(searchService.topErrors(from, end, application, routeId, limit, environment));
-    }
-
-    /**
-     * Resolve an application name to agent IDs.
-     * Returns empty list if application is null/blank (no filtering).
-     */
-    private List<String> resolveApplicationToAgentIds(String application) {
-        if (application == null || application.isBlank()) {
-            return List.of();
-        }
-        return registryService.findByApplication(application).stream()
-                .map(AgentInfo::instanceId)
-                .toList();
+        return ResponseEntity.ok(searchService.topErrors(from, end, application, routeId, limit, env.slug()));
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/SensitiveKeysAdminController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/SensitiveKeysAdminController.java
@@ -120,31 +120,37 @@ public class SensitiveKeysAdminController {
     * not yet include that field accessor.
     */
    private CommandGroupResponse fanOutToAllAgents(List<String> globalKeys) {
-        // Collect all distinct application IDs
-        Set<String> applications = new LinkedHashSet<>();
-        configRepository.findAll().stream()
-                .map(ApplicationConfig::getApplication)
-                .filter(a -> a != null && !a.isBlank())
-                .forEach(applications::add);
+        // Collect every (application, environment) slice we know about: persisted config rows
+        // PLUS currently-registered live agents (which may have no stored config yet).
+        // Global sensitive keys are server-wide, but per-app overrides live per env, so the
+        // push is scoped per (app, env) so each slice gets its own merged keys.
+        Set<AppEnv> slices = new LinkedHashSet<>();
+        for (ApplicationConfig cfg : configRepository.findAll()) {
+            if (cfg.getApplication() != null && !cfg.getApplication().isBlank()
+                    && cfg.getEnvironment() != null && !cfg.getEnvironment().isBlank()) {
+                slices.add(new AppEnv(cfg.getApplication(), cfg.getEnvironment()));
+            }
+        }
        registryService.findAll().stream()
-                .map(a -> a.applicationId())
-                .filter(a -> a != null && !a.isBlank())
-                .forEach(applications::add);
+                .filter(a -> a.applicationId() != null && !a.applicationId().isBlank()
+                        && a.environmentId() != null && !a.environmentId().isBlank())
+                .forEach(a -> slices.add(new AppEnv(a.applicationId(), a.environmentId())));

-        if (applications.isEmpty()) {
+        if (slices.isEmpty()) {
            return new CommandGroupResponse(true, 0, 0, List.of(), List.of());
        }

-        // Shared 10-second deadline across all applications
+        // Shared 10-second deadline across all slices
        long deadline = System.currentTimeMillis() + 10_000;
        List<CommandGroupResponse.AgentResponse> allResponses = new ArrayList<>();
        List<String> allTimedOut = new ArrayList<>();
        int totalAgents = 0;

-        for (String application : applications) {
-            // Load per-app sensitive keys via JsonNode to avoid dependency on
+        for (AppEnv slice : slices) {
+            // Load per-(app,env) sensitive keys via JsonNode to avoid dependency on
            // ApplicationConfig.getSensitiveKeys() which may not be in the published jar yet.
-            List<String> perAppKeys = configRepository.findByApplication(application)
+            List<String> perAppKeys = configRepository
+                    .findByApplicationAndEnvironment(slice.application(), slice.environment())
                    .map(cfg -> extractSensitiveKeys(cfg))
                    .orElse(null);

@@ -153,19 +159,22 @@ public class SensitiveKeysAdminController {

            // Build a minimal payload map — only sensitiveKeys + application fields.
            Map<String, Object> payloadMap = new LinkedHashMap<>();
-            payloadMap.put("application", application);
+            payloadMap.put("application", slice.application());
            payloadMap.put("sensitiveKeys", mergedKeys);

            String payloadJson;
            try {
                payloadJson = objectMapper.writeValueAsString(payloadMap);
            } catch (JsonProcessingException e) {
-                log.error("Failed to serialize sensitive keys push payload for application '{}'", application, e);
+                log.error("Failed to serialize sensitive keys push payload for {}/{}",
+                        slice.application(), slice.environment(), e);
                continue;
            }

            Map<String, CompletableFuture<CommandReply>> futures =
-                    registryService.addGroupCommandWithReplies(application, null, CommandType.CONFIG_UPDATE, payloadJson);
+                    registryService.addGroupCommandWithReplies(
+                            slice.application(), slice.environment(),
+                            CommandType.CONFIG_UPDATE, payloadJson);

            totalAgents += futures.size();

@@ -213,4 +222,7 @@ public class SensitiveKeysAdminController {
            return null;
        }
    }
+
+    /** (application, environment) slice used by the fan-out loop. */
+    private record AppEnv(String application, String environment) {}
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ServerMetricsAdminController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ServerMetricsAdminController.java
@@ -0,0 +1,148 @@
+package com.cameleer.server.app.controller;
+
+import com.cameleer.server.core.storage.ServerMetricsQueryStore;
+import com.cameleer.server.core.storage.model.ServerInstanceInfo;
+import com.cameleer.server.core.storage.model.ServerMetricCatalogEntry;
+import com.cameleer.server.core.storage.model.ServerMetricQueryRequest;
+import com.cameleer.server.core.storage.model.ServerMetricQueryResponse;
+import io.swagger.v3.oas.annotations.Operation;
+import io.swagger.v3.oas.annotations.tags.Tag;
+import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
+import org.springframework.http.ResponseEntity;
+import org.springframework.security.access.prepost.PreAuthorize;
+import org.springframework.web.bind.annotation.ExceptionHandler;
+import org.springframework.web.bind.annotation.GetMapping;
+import org.springframework.web.bind.annotation.PostMapping;
+import org.springframework.web.bind.annotation.RequestBody;
+import org.springframework.web.bind.annotation.RequestMapping;
+import org.springframework.web.bind.annotation.RequestParam;
+import org.springframework.web.bind.annotation.RestController;
+
+import java.time.Instant;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Generic read API over the ClickHouse {@code server_metrics} table. Lets
+ * SaaS control planes build server-health dashboards without requiring direct
+ * ClickHouse access.
+ *
+ * <p>Three endpoints cover all 17 panels in {@code docs/server-self-metrics.md}:
+ * <ul>
+ *   <li>{@code GET /catalog} — discover available metric names, types, statistics, and tags</li>
+ *   <li>{@code POST /query} — generic time-series query with aggregation, grouping, filtering, and counter-delta mode</li>
+ *   <li>{@code GET /instances} — list server instances (useful for partitioning counter math)</li>
+ * </ul>
+ *
+ * <p>Visibility matches {@code ClickHouseAdminController} / {@code DatabaseAdminController}:
+ * <ul>
+ *   <li>Conditional on {@code cameleer.server.security.infrastructureendpoints=true} (default).</li>
+ *   <li>Class-level {@code @PreAuthorize("hasRole('ADMIN')")} on top of the
+ *       {@code /api/v1/admin/**} catch-all in {@code SecurityConfig}.</li>
+ * </ul>
+ */
+@ConditionalOnProperty(
+    name = "cameleer.server.security.infrastructureendpoints",
+    havingValue = "true",
+    matchIfMissing = true
+)
+@RestController
+@RequestMapping("/api/v1/admin/server-metrics")
+@PreAuthorize("hasRole('ADMIN')")
+@Tag(name = "Server Self-Metrics",
+     description = "Read API over the server's own Micrometer registry snapshots (ADMIN only)")
+public class ServerMetricsAdminController {
+
+    /** Default lookback window for catalog/instances when from/to are omitted. */
+    private static final long DEFAULT_LOOKBACK_SECONDS = 3_600L;
+
+    private final ServerMetricsQueryStore store;
+
+    public ServerMetricsAdminController(ServerMetricsQueryStore store) {
+        this.store = store;
+    }
+
+    @GetMapping("/catalog")
+    @Operation(summary = "List metric names observed in the window",
+               description = "For each metric_name, returns metric_type, the set of statistics emitted, and the union of tag keys.")
+    public ResponseEntity<List<ServerMetricCatalogEntry>> catalog(
+            @RequestParam(required = false) String from,
+            @RequestParam(required = false) String to) {
+        Instant[] window = resolveWindow(from, to);
+        return ResponseEntity.ok(store.catalog(window[0], window[1]));
+    }
+
+    @GetMapping("/instances")
+    @Operation(summary = "List server_instance_id values observed in the window",
+               description = "Returns first/last seen timestamps — use to partition counter-delta computations.")
+    public ResponseEntity<List<ServerInstanceInfo>> instances(
+            @RequestParam(required = false) String from,
+            @RequestParam(required = false) String to) {
+        Instant[] window = resolveWindow(from, to);
+        return ResponseEntity.ok(store.listInstances(window[0], window[1]));
+    }
+
+    @PostMapping("/query")
+    @Operation(summary = "Generic time-series query",
+               description = "Returns bucketed series for a single metric_name. Supports aggregation (avg/sum/max/min/latest), group-by-tag, filter-by-tag, counter delta mode, and a derived 'mean' statistic for timers.")
+    public ResponseEntity<ServerMetricQueryResponse> query(@RequestBody QueryBody body) {
+        ServerMetricQueryRequest request = new ServerMetricQueryRequest(
+                body.metric(),
+                body.statistic(),
+                parseInstant(body.from(), "from"),
+                parseInstant(body.to(), "to"),
+                body.stepSeconds(),
+                body.groupByTags(),
+                body.filterTags(),
+                body.aggregation(),
+                body.mode(),
+                body.serverInstanceIds());
+        return ResponseEntity.ok(store.query(request));
+    }
+
+    @ExceptionHandler(IllegalArgumentException.class)
+    public ResponseEntity<Map<String, String>> handleBadRequest(IllegalArgumentException e) {
+        return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
+    }
+
+    private static Instant[] resolveWindow(String from, String to) {
+        Instant toI = to != null ? parseInstant(to, "to") : Instant.now();
+        Instant fromI = from != null
+                ? parseInstant(from, "from")
+                : toI.minusSeconds(DEFAULT_LOOKBACK_SECONDS);
+        if (!fromI.isBefore(toI)) {
+            throw new IllegalArgumentException("from must be strictly before to");
+        }
+        return new Instant[]{fromI, toI};
+    }
+
+    private static Instant parseInstant(String raw, String field) {
+        if (raw == null || raw.isBlank()) {
+            throw new IllegalArgumentException(field + " is required");
+        }
+        try {
+            return Instant.parse(raw);
+        } catch (Exception e) {
+            throw new IllegalArgumentException(
+                    field + " must be an ISO-8601 instant (e.g. 2026-04-23T10:00:00Z)");
+        }
+    }
+
+    /**
+     * Request body for {@link #query(QueryBody)}. Uses ISO-8601 strings on
+     * the wire so the OpenAPI schema stays language-neutral.
+     */
+    public record QueryBody(
+            String metric,
+            String statistic,
+            String from,
+            String to,
+            Integer stepSeconds,
+            List<String> groupByTags,
+            Map<String, String> filterTags,
+            String aggregation,
+            String mode,
+            List<String> serverInstanceIds
+    ) {
+    }
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/UserAdminController.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/controller/UserAdminController.java
@@ -93,7 +93,9 @@ public class UserAdminController {
            return ResponseEntity.badRequest()
                .body(Map.of("error", "Local user creation is disabled when OIDC is enabled. Users are provisioned automatically via SSO."));
        }
-        String userId = "user:" + request.username();
+        // DB key is the bare username (matches alert_rules.created_by FK shape used by
+        // the env-scoped read-path controllers, which strip "user:" from JWT subjects).
+        String userId = request.username();
        UserInfo user = new UserInfo(userId, "local",
                request.email() != null ? request.email() : "",
                request.displayName() != null ? request.displayName() : request.username(),
@@ -215,9 +217,7 @@ public class UserAdminController {
                return ResponseEntity.badRequest().build();
            }
        }
-        // Extract bare username from "user:username" format for policy check
-        String username = userId.startsWith("user:") ? userId.substring(5) : userId;
-        List<String> violations = PasswordPolicyValidator.validate(request.password(), username);
+        List<String> violations = PasswordPolicyValidator.validate(request.password(), userId);
        if (!violations.isEmpty()) {
            throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
                    "Password policy violation: " + String.join("; ", violations));
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/AgentEventPageResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/AgentEventPageResponse.java
@@ -0,0 +1,12 @@
+package com.cameleer.server.app.dto;
+
+import io.swagger.v3.oas.annotations.media.Schema;
+
+import java.util.List;
+
+@Schema(description = "Cursor-paginated agent event list")
+public record AgentEventPageResponse(
+        List<AgentEventResponse> data,
+        String nextCursor,
+        boolean hasMore
+) {}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/AgentInstanceResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/AgentInstanceResponse.java
@@ -25,7 +25,8 @@ public record AgentInstanceResponse(
        double errorRate,
        int activeRoutes,
        int totalRoutes,
-        long uptimeSeconds
+        long uptimeSeconds,
+        @Schema(description = "Recent average CPU usage (0.0–1.0), -1 if unavailable") double cpuUsage
 ) {
    public static AgentInstanceResponse from(AgentInfo info) {
        long uptime = Duration.between(info.registeredAt(), Instant.now()).toSeconds();
@@ -37,7 +38,7 @@ public record AgentInstanceResponse(
                info.version(), info.capabilities(),
                0.0, 0.0,
                0, info.routeIds() != null ? info.routeIds().size() : 0,
-                uptime
+                uptime, -1
        );
    }

@@ -46,7 +47,16 @@ public record AgentInstanceResponse(
                instanceId, displayName, applicationId, environmentId,
                status, routeIds, registeredAt, lastHeartbeat,
                version, capabilities,
-                tps, errorRate, activeRoutes, totalRoutes, uptimeSeconds
+                tps, errorRate, activeRoutes, totalRoutes, uptimeSeconds, cpuUsage
+        );
+    }
+
+    public AgentInstanceResponse withCpuUsage(double cpuUsage) {
+        return new AgentInstanceResponse(
+                instanceId, displayName, applicationId, environmentId,
+                status, routeIds, registeredAt, lastHeartbeat,
+                version, capabilities,
+                tps, errorRate, activeRoutes, totalRoutes, uptimeSeconds, cpuUsage
        );
    }
 }
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/AppSettingsRequest.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/AppSettingsRequest.java
@@ -33,9 +33,9 @@ public record AppSettingsRequest(
        Double healthSlaCrit
 ) {

-    public AppSettings toSettings(String appId) {
+    public AppSettings toSettings(String appId, String environment) {
        Instant now = Instant.now();
-        return new AppSettings(appId, slaThresholdMs, healthErrorWarn, healthErrorCrit,
+        return new AppSettings(appId, environment, slaThresholdMs, healthErrorWarn, healthErrorCrit,
                healthSlaWarn, healthSlaCrit, now, now);
    }

--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/DirtyStateResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/DirtyStateResponse.java
@@ -0,0 +1,12 @@
+package com.cameleer.server.app.dto;
+
+import com.cameleer.server.core.runtime.DirtyStateResult;
+
+import java.util.List;
+
+public record DirtyStateResponse(
+        boolean dirty,
+        String lastSuccessfulDeploymentId,
+        List<DirtyStateResult.Difference> differences
+) {
+}
--- a/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/IndexerPipelineResponse.java
+++ b/cameleer-server-app/src/main/java/com/cameleer/server/app/dto/IndexerPipelineResponse.java
@@ -1,16 +0,0 @@
-package com.cameleer.server.app.dto;
-
-import io.swagger.v3.oas.annotations.media.Schema;
-
-import java.time.Instant;
-
-@Schema(description = "Search indexer pipeline statistics")
-public record IndexerPipelineResponse(
-        int queueDepth,
-        int maxQueueSize,
-        long failedCount,
-        long indexedCount,
-        long debounceMs,
-        double indexingRate,
-        Instant lastIndexedAt
-) {}
--- a/Show More
+++ b/Show More