diff --git a/docs/superpowers/plans/2026-04-19-alerting-02-backend.md b/docs/superpowers/plans/2026-04-19-alerting-02-backend.md new file mode 100644 index 00000000..1a36efe2 --- /dev/null +++ b/docs/superpowers/plans/2026-04-19-alerting-02-backend.md @@ -0,0 +1,3428 @@ +# Alerting — Plan 02 — Backend Implementation + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Deliver the server-side alerting feature described in `docs/superpowers/specs/2026-04-19-alerting-design.md` — domain model, storage, evaluators for all six condition kinds, notification dispatch (webhook + in-app inbox), REST API, retention, metrics, and integration tests. UI, CMD-K integration, and load tests are explicitly deferred to Plan 03. + +**Architecture:** Confined to new `alerting/` packages in both `cameleer-server-core` (pure records + interfaces) and `cameleer-server-app` (Spring-wired storage, scheduling, REST). Postgres stores rules/instances/silences/notifications; ClickHouse stores observability data read by evaluators (new `countLogs` / `countExecutionsForAlerting` methods, four additive projections). Claim-polling `FOR UPDATE SKIP LOCKED` makes the evaluator and dispatcher horizontally scalable. Rule→connection wiring (`rulesReferencing`) is populated in this plan — it is the gate that unlocks safe production use of Plan 01. + +**Tech Stack:** Java 17, Spring Boot 3.4.3, PostgreSQL (Flyway V12), ClickHouse (idempotent init SQL), JMustache for templates, Apache HttpClient 5 via Plan 01's `OutboundHttpClientFactory`, Testcontainers + JUnit 5 + WireMock + AssertJ for tests. + +--- + +## Base branch + +**Branch Plan 02 off `feat/alerting-01-outbound-infra`.** Plan 02 depends on Plan 01's `OutboundConnection` domain, `OutboundHttpClientFactory` bean, `SecretCipher`, `OutboundConnectionServiceImpl.rulesReferencing()` stub, the V11 migration, and the `OUTBOUND_CONNECTION_CHANGE` / `OUTBOUND_HTTP_TRUST_CHANGE` audit categories. Branching off `main` is **not** an option — those classes do not exist there yet. When Plan 01 merges, rebase Plan 02 onto main; until then Plan 02 is stacked PR #2. + +```bash +# Execute in a fresh worktree +git fetch origin +git worktree add -b feat/alerting-02-backend .worktrees/alerting-02 feat/alerting-01-outbound-infra +cd .worktrees/alerting-02 +mvn clean compile # confirm Plan 01 code compiles as baseline +``` + +--- + +## File Structure + +### Created — `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/` + +| File | Responsibility | +|---|---| +| `AlertingProperties.java` | Not here — see app module. | +| `AlertRule.java` | Immutable record: id, environmentId, name, description, severity, enabled, conditionKind, condition, evaluationIntervalSeconds, forDurationSeconds, reNotifyMinutes, notificationTitleTmpl, notificationMessageTmpl, webhooks, targets, nextEvaluationAt, claimedBy, claimedUntil, evalState, audit fields. | +| `AlertCondition.java` | Sealed interface; Jackson DEDUCTION polymorphism root. | +| `RouteMetricCondition.java` | Record: scope, metric, comparator, threshold, windowSeconds. | +| `ExchangeMatchCondition.java` | Record: scope, filter, fireMode, threshold, windowSeconds, perExchangeLingerSeconds. | +| `AgentStateCondition.java` | Record: scope, state, forSeconds. | +| `DeploymentStateCondition.java` | Record: scope, states. | +| `LogPatternCondition.java` | Record: scope, level, pattern, threshold, windowSeconds. | +| `JvmMetricCondition.java` | Record: scope, metric, aggregation, comparator, threshold, windowSeconds. | +| `AlertScope.java` | Record: appSlug?, routeId?, agentId? — nullable fields, used by all conditions. | +| `ConditionKind.java` | Enum mirror of SQL `condition_kind_enum`. | +| `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java` | Enums used in conditions. | +| `AlertSeverity.java` | Enum mirror of SQL `severity_enum`. | +| `AlertState.java` | Enum mirror of SQL `alert_state_enum`. | +| `AlertInstance.java` | Immutable record for `alert_instances` row. | +| `AlertRuleTarget.java` | Record for `alert_rule_targets` row. | +| `TargetKind.java` | Enum mirror of SQL `target_kind_enum`. | +| `AlertSilence.java` | Record: id, environmentId, matcher, reason, startsAt, endsAt, createdBy, createdAt. | +| `SilenceMatcher.java` | Record: ruleId?, appSlug?, routeId?, agentId?, severity?. | +| `AlertNotification.java` | Record for `alert_notifications` outbox row. | +| `NotificationStatus.java` | Enum mirror of SQL `notification_status_enum`. | +| `WebhookBinding.java` | Record embedded in `alert_rules.webhooks` JSONB: id, outboundConnectionId, bodyOverride?, headerOverrides?. | +| `AlertRuleRepository.java` | CRUD + claim-polling interface. | +| `AlertInstanceRepository.java` | CRUD + query-for-inbox interface. | +| `AlertSilenceRepository.java` | CRUD interface. | +| `AlertNotificationRepository.java` | CRUD + claim-polling interface. | +| `AlertReadRepository.java` | Mark-read + count-unread interface. | + +### Created — `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/` + +| File | Responsibility | +|---|---| +| `config/AlertingProperties.java` | `@ConfigurationProperties("cameleer.server.alerting")`. | +| `config/AlertingBeanConfig.java` | Bean wiring for repositories, evaluators, dispatch, mustache renderer, etc. | +| `storage/PostgresAlertRuleRepository.java` | JdbcTemplate impl of `AlertRuleRepository`. | +| `storage/PostgresAlertInstanceRepository.java` | JdbcTemplate impl. | +| `storage/PostgresAlertSilenceRepository.java` | JdbcTemplate impl. | +| `storage/PostgresAlertNotificationRepository.java` | JdbcTemplate impl. | +| `storage/PostgresAlertReadRepository.java` | JdbcTemplate impl. | +| `eval/EvalContext.java` | Per-tick context (tenantId, now, tickCache). | +| `eval/EvalResult.java` | Sealed: `Firing(value, threshold, contextMap)` / `Clear` / `Error(Throwable)`. | +| `eval/TickCache.java` | `ConcurrentHashMap` discarded per tick. | +| `eval/PerKindCircuitBreaker.java` | Failure window + cooldown per `ConditionKind`. | +| `eval/ConditionEvaluator.java` | Generic interface: `evaluate(C, AlertRule, EvalContext)`. | +| `eval/RouteMetricEvaluator.java` | Reads `StatsStore`. | +| `eval/ExchangeMatchEvaluator.java` | Reads `ClickHouseSearchIndex.countExecutionsForAlerting` + `SearchService.search` for PER_EXCHANGE cursor mode. | +| `eval/AgentStateEvaluator.java` | Reads `AgentRegistryService.findAll`. | +| `eval/DeploymentStateEvaluator.java` | Reads `DeploymentRepository.findByAppId`. | +| `eval/LogPatternEvaluator.java` | Reads new `ClickHouseLogStore.countLogs`. | +| `eval/JvmMetricEvaluator.java` | Reads `MetricsQueryStore.queryTimeSeries`. | +| `eval/AlertEvaluatorJob.java` | `@Component` implementing `SchedulingConfigurer`; claim-polling loop. | +| `eval/AlertStateTransitions.java` | Pure function: given current instance + EvalResult → new state + timestamps. | +| `notify/MustacheRenderer.java` | JMustache wrapper; resilient to bad templates. | +| `notify/NotificationContextBuilder.java` | Pure: builds context map from `AlertInstance` + rule + env. | +| `notify/SilenceMatcher.java` | Pure: evaluates a `SilenceMatcher` against an `AlertInstance`. | +| `notify/InAppInboxQuery.java` | Server-side query helper for `/alerts` and unread-count. | +| `notify/WebhookDispatcher.java` | Renders + POSTs + HMAC signs; classifies 2xx/4xx/5xx → status. | +| `notify/NotificationDispatchJob.java` | `@Component` `SchedulingConfigurer`; claim-polling on `alert_notifications`. | +| `notify/HmacSigner.java` | Pure: computes `sha256=`. | +| `retention/AlertingRetentionJob.java` | `@Scheduled(cron = "0 0 3 * * *")` — delete old `alert_instances` + `alert_notifications`. | +| `controller/AlertRuleController.java` | `/api/v1/environments/{envSlug}/alerts/rules`. | +| `controller/AlertController.java` | `/api/v1/environments/{envSlug}/alerts` + instance actions. | +| `controller/AlertSilenceController.java` | `/api/v1/environments/{envSlug}/alerts/silences`. | +| `controller/AlertNotificationController.java` | `/api/v1/environments/{envSlug}/alerts/{id}/notifications`, `/alerts/notifications/{id}/retry`. | +| `dto/AlertRuleDto.java`, `dto/AlertDto.java`, `dto/AlertSilenceDto.java`, `dto/AlertNotificationDto.java`, `dto/ConditionDto.java`, `dto/WebhookBindingDto.java`, `dto/RenderPreviewRequest.java`, `dto/RenderPreviewResponse.java`, `dto/TestEvaluateRequest.java`, `dto/TestEvaluateResponse.java`, `dto/UnreadCountResponse.java` | Request/response DTOs. | +| `metrics/AlertingMetrics.java` | Micrometer registrations for counters/gauges/histograms. | + +### Created — resources + +| File | Responsibility | +|---|---| +| `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql` | Flyway migration: 5 enums, 6 tables, indexes, cascades. | +| `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql` | 4 projections on `executions` / `logs` / `agent_metrics`, all `IF NOT EXISTS`. | + +### Modified + +| File | Change | +|---|---| +| `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` | Add `ALERT_RULE_CHANGE`, `ALERT_SILENCE_CHANGE`. | +| `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java` | Replace the `rulesReferencing(UUID)` stub with a call through `AlertRuleRepository.findRuleIdsByOutboundConnectionId`. | +| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java` | Add `long countLogs(LogSearchRequest)` — no `FINAL`. | +| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java` | Add `long countExecutionsForAlerting(AlertMatchSpec)` — no `FINAL`. | +| `cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java` | Run `alerting_projections.sql` via existing `ClickHouseSchemaInitializer`. | +| `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java` | Permit new `/api/v1/environments/{envSlug}/alerts/**` path matchers with role-based access. | +| `cameleer-server-core/pom.xml` | Add `com.samskivert:jmustache:1.16`. | +| `.claude/rules/app-classes.md`, `.claude/rules/core-classes.md` | Document new packages. | +| `cameleer-server-app/src/main/resources/application.yml` | Default `AlertingProperties` stanza + comment linking to the admin guide. | + +--- + +## Conventions + +- **TDD.** Every task starts with a failing test, implements the minimum to pass, then commits. +- **One commit per task.** Commit messages: `feat(alerting): …`, `test(alerting): …`, `fix(alerting): …`, `chore(alerting): …`, `docs(alerting): …`. +- **Tenant invariant.** Every ClickHouse query and Postgres table referencing observability data filters by `tenantId` (injected via `AlertingBeanConfig` from `cameleer.server.tenant.id`). +- **No `FINAL`** on the two new CH count methods — alerting tolerates brief duplicate counts. +- **Jackson polymorphism** via `@JsonTypeInfo(use = DEDUCTION)` with `@JsonSubTypes` on `AlertCondition`. +- **Pure `core/`, Spring-only in `app/`.** No `@Component`, `@Service`, or `@Scheduled` annotations in `cameleer-server-core`. +- **Claim polling.** `FOR UPDATE SKIP LOCKED` + `claimed_by` / `claimed_until` with 30 s TTL. +- **Instance id** for claim ownership: use `InetAddress.getLocalHost().getHostName() + ":" + processPid()`; exposed as a bean `"alertingInstanceId"` of type `String`. +- **GitNexus hygiene.** Before modifying any existing class (`OutboundConnectionServiceImpl`, `ClickHouseLogStore`, `ClickHouseSearchIndex`, `AuditCategory`, `SecurityConfig`), run `gitnexus_impact({target: "", direction: "upstream"})` and report blast radius. Run `gitnexus_detect_changes()` before each commit. + +--- + +## Phase 1 — Flyway V12 migration and audit categories + +### Task 1: `V12__alerting_tables.sql` + +**Files:** +- Create: `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql` +- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java` + +- [ ] **Step 1: Write the failing integration test** + +```java +package com.cameleer.server.app.alerting.storage; + +import com.cameleer.server.app.AbstractPostgresIT; +import org.junit.jupiter.api.Test; +import static org.assertj.core.api.Assertions.assertThat; + +class V12MigrationIT extends AbstractPostgresIT { + + @Test + void allAlertingTablesAndEnumsExist() { + var tables = jdbcTemplate.queryForList( + "SELECT table_name FROM information_schema.tables WHERE table_schema='public' " + + "AND table_name IN ('alert_rules','alert_rule_targets','alert_instances'," + + "'alert_silences','alert_notifications','alert_reads')", + String.class); + assertThat(tables).containsExactlyInAnyOrder( + "alert_rules","alert_rule_targets","alert_instances", + "alert_silences","alert_notifications","alert_reads"); + + var enums = jdbcTemplate.queryForList( + "SELECT typname FROM pg_type WHERE typname IN " + + "('severity_enum','condition_kind_enum','alert_state_enum'," + + "'target_kind_enum','notification_status_enum')", + String.class); + assertThat(enums).hasSize(5); + } + + @Test + void deletingEnvironmentCascadesAlertingRows() { + var envId = java.util.UUID.randomUUID(); + jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-cascade-env"); + jdbcTemplate.update( + "INSERT INTO users (user_id, username, password_hash, email, enabled) " + + "VALUES (?, ?, 'x', 'a@b', true)", "u1", "u1"); + var ruleId = java.util.UUID.randomUUID(); + jdbcTemplate.update( + "INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " + + "notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " + + "VALUES (?, ?, 'r', 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'u1', 'u1')", + ruleId, envId); + var instanceId = java.util.UUID.randomUUID(); + jdbcTemplate.update( + "INSERT INTO alert_instances (id, rule_id, rule_snapshot, environment_id, state, severity, " + + "fired_at, context, title, message) VALUES (?, ?, '{}'::jsonb, ?, 'FIRING', 'WARNING', " + + "now(), '{}'::jsonb, 't', 'm')", + instanceId, ruleId, envId); + + jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId); + + assertThat(jdbcTemplate.queryForObject( + "SELECT count(*) FROM alert_rules WHERE environment_id = ?", + Integer.class, envId)).isZero(); + assertThat(jdbcTemplate.queryForObject( + "SELECT count(*) FROM alert_instances WHERE environment_id = ?", + Integer.class, envId)).isZero(); + } +} +``` + +- [ ] **Step 2: Run the test to verify it fails** + +Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT` +Expected: FAIL — tables do not exist. + +- [ ] **Step 3: Write the migration** + +Create `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql`: + +```sql +-- Enums (outbound_method_enum / outbound_auth_kind_enum / trust_mode_enum already exist from V11) +CREATE TYPE severity_enum AS ENUM ('CRITICAL','WARNING','INFO'); +CREATE TYPE condition_kind_enum AS ENUM ('ROUTE_METRIC','EXCHANGE_MATCH','AGENT_STATE','DEPLOYMENT_STATE','LOG_PATTERN','JVM_METRIC'); +CREATE TYPE alert_state_enum AS ENUM ('PENDING','FIRING','ACKNOWLEDGED','RESOLVED'); +CREATE TYPE target_kind_enum AS ENUM ('USER','GROUP','ROLE'); +CREATE TYPE notification_status_enum AS ENUM ('PENDING','DELIVERED','FAILED'); + +CREATE TABLE alert_rules ( + id uuid PRIMARY KEY, + environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + name varchar(200) NOT NULL, + description text, + severity severity_enum NOT NULL, + enabled boolean NOT NULL DEFAULT true, + condition_kind condition_kind_enum NOT NULL, + condition jsonb NOT NULL, + evaluation_interval_seconds int NOT NULL DEFAULT 60 CHECK (evaluation_interval_seconds >= 5), + for_duration_seconds int NOT NULL DEFAULT 0 CHECK (for_duration_seconds >= 0), + re_notify_minutes int NOT NULL DEFAULT 60 CHECK (re_notify_minutes >= 0), + notification_title_tmpl text NOT NULL, + notification_message_tmpl text NOT NULL, + webhooks jsonb NOT NULL DEFAULT '[]', + next_evaluation_at timestamptz NOT NULL DEFAULT now(), + claimed_by varchar(64), + claimed_until timestamptz, + eval_state jsonb NOT NULL DEFAULT '{}', + created_at timestamptz NOT NULL DEFAULT now(), + created_by text NOT NULL REFERENCES users(user_id), + updated_at timestamptz NOT NULL DEFAULT now(), + updated_by text NOT NULL REFERENCES users(user_id) +); +CREATE INDEX alert_rules_env_idx ON alert_rules (environment_id); +CREATE INDEX alert_rules_claim_due_idx ON alert_rules (next_evaluation_at) WHERE enabled = true; + +CREATE TABLE alert_rule_targets ( + id uuid PRIMARY KEY, + rule_id uuid NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE, + target_kind target_kind_enum NOT NULL, + target_id varchar(128) NOT NULL, + UNIQUE (rule_id, target_kind, target_id) +); +CREATE INDEX alert_rule_targets_lookup_idx ON alert_rule_targets (target_kind, target_id); + +CREATE TABLE alert_instances ( + id uuid PRIMARY KEY, + rule_id uuid REFERENCES alert_rules(id) ON DELETE SET NULL, + rule_snapshot jsonb NOT NULL, + environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + state alert_state_enum NOT NULL, + severity severity_enum NOT NULL, + fired_at timestamptz NOT NULL, + acked_at timestamptz, + acked_by text REFERENCES users(user_id), + resolved_at timestamptz, + last_notified_at timestamptz, + silenced boolean NOT NULL DEFAULT false, + current_value numeric, + threshold numeric, + context jsonb NOT NULL, + title text NOT NULL, + message text NOT NULL, + target_user_ids text[] NOT NULL DEFAULT '{}', + target_group_ids uuid[] NOT NULL DEFAULT '{}', + target_role_names text[] NOT NULL DEFAULT '{}' +); +CREATE INDEX alert_instances_inbox_idx ON alert_instances (environment_id, state, fired_at DESC); +CREATE INDEX alert_instances_open_rule_idx ON alert_instances (rule_id, state) WHERE rule_id IS NOT NULL; +CREATE INDEX alert_instances_resolved_idx ON alert_instances (resolved_at) WHERE state = 'RESOLVED'; +CREATE INDEX alert_instances_target_u_idx ON alert_instances USING GIN (target_user_ids); +CREATE INDEX alert_instances_target_g_idx ON alert_instances USING GIN (target_group_ids); +CREATE INDEX alert_instances_target_r_idx ON alert_instances USING GIN (target_role_names); + +CREATE TABLE alert_silences ( + id uuid PRIMARY KEY, + environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE, + matcher jsonb NOT NULL, + reason text, + starts_at timestamptz NOT NULL, + ends_at timestamptz NOT NULL CHECK (ends_at > starts_at), + created_by text NOT NULL REFERENCES users(user_id), + created_at timestamptz NOT NULL DEFAULT now() +); +CREATE INDEX alert_silences_active_idx ON alert_silences (environment_id, ends_at); + +CREATE TABLE alert_notifications ( + id uuid PRIMARY KEY, + alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE, + webhook_id uuid, + outbound_connection_id uuid REFERENCES outbound_connections(id) ON DELETE SET NULL, + status notification_status_enum NOT NULL DEFAULT 'PENDING', + attempts int NOT NULL DEFAULT 0, + next_attempt_at timestamptz NOT NULL DEFAULT now(), + claimed_by varchar(64), + claimed_until timestamptz, + last_response_status int, + last_response_snippet text, + payload jsonb NOT NULL, + delivered_at timestamptz, + created_at timestamptz NOT NULL DEFAULT now() +); +CREATE INDEX alert_notifications_pending_idx ON alert_notifications (next_attempt_at) WHERE status = 'PENDING'; +CREATE INDEX alert_notifications_instance_idx ON alert_notifications (alert_instance_id); + +CREATE TABLE alert_reads ( + user_id text NOT NULL REFERENCES users(user_id) ON DELETE CASCADE, + alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE, + read_at timestamptz NOT NULL DEFAULT now(), + PRIMARY KEY (user_id, alert_instance_id) +); +``` + +Notes: +- Plan 01 established `users.user_id` as TEXT. All FK-to-users columns in this migration are `text`, not `uuid`. +- `target_user_ids` is `text[]` (matches `users.user_id`). +- `outbound_connections` (Plan 01) is referenced with `ON DELETE SET NULL` — matches the spec's "409 if referenced" semantics at the app layer while preserving referential cleanup if the admin-facing guard is bypassed. + +- [ ] **Step 4: Run the test to verify it passes** + +Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java +git commit -m "feat(alerting): V12 flyway migration for alerting tables" +``` + +### Task 2: Extend `AuditCategory` + +**Files:** +- Modify: `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` +- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java` + +- [ ] **Step 1: GitNexus impact check** + +Run `gitnexus_impact({target: "AuditCategory", direction: "upstream"})` — report the blast radius (additive enum values are non-breaking; affected files are the admin rule file + any switch statements). + +- [ ] **Step 2: Write the failing test** + +```java +package com.cameleer.server.core.admin; + +import org.junit.jupiter.api.Test; +import static org.assertj.core.api.Assertions.assertThat; + +class AuditCategoryTest { + @Test + void alertingCategoriesPresent() { + assertThat(AuditCategory.valueOf("ALERT_RULE_CHANGE")).isNotNull(); + assertThat(AuditCategory.valueOf("ALERT_SILENCE_CHANGE")).isNotNull(); + } +} +``` + +- [ ] **Step 3: Run the test — FAIL** + +Run: `mvn -pl cameleer-server-core test -Dtest=AuditCategoryTest` +Expected: FAIL — `IllegalArgumentException: No enum constant`. + +- [ ] **Step 4: Add the enum values** + +Replace the whole enum body with: + +```java +package com.cameleer.server.core.admin; + +public enum AuditCategory { + INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, + OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE, + ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE +} +``` + +- [ ] **Step 5: Run the test — PASS** + +- [ ] **Step 6: Commit** + +```bash +git add cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java \ + cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java +git commit -m "feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories" +``` + +--- + +## Phase 2 — Core domain model + +Each task in this phase adds a small, focused set of pure-Java records and enums under `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/`. All records use canonical constructors with explicit `@NotNull`-style defensive copying only for mutable collections (`List.copyOf`, `Map.copyOf`). Jackson polymorphism is handled by `@JsonTypeInfo(use = DEDUCTION)` on `AlertCondition`. + +### Task 3: Enums + `AlertScope` + +**Files:** +- Create: `.../alerting/AlertSeverity.java`, `AlertState.java`, `ConditionKind.java`, `TargetKind.java`, `NotificationStatus.java`, `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java`, `AlertScope.java` +- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +package com.cameleer.server.core.alerting; + +import org.junit.jupiter.api.Test; +import static org.assertj.core.api.Assertions.assertThat; + +class AlertScopeTest { + + @Test + void allFieldsNullIsEnvWide() { + var s = new AlertScope(null, null, null); + assertThat(s.isEnvWide()).isTrue(); + } + + @Test + void appScoped() { + var s = new AlertScope("orders", null, null); + assertThat(s.isEnvWide()).isFalse(); + assertThat(s.appSlug()).isEqualTo("orders"); + } + + @Test + void enumsHaveExpectedValues() { + assertThat(AlertSeverity.values()).containsExactly( + AlertSeverity.CRITICAL, AlertSeverity.WARNING, AlertSeverity.INFO); + assertThat(AlertState.values()).containsExactly( + AlertState.PENDING, AlertState.FIRING, AlertState.ACKNOWLEDGED, AlertState.RESOLVED); + assertThat(ConditionKind.values()).hasSize(6); + assertThat(TargetKind.values()).containsExactly( + TargetKind.USER, TargetKind.GROUP, TargetKind.ROLE); + assertThat(NotificationStatus.values()).containsExactly( + NotificationStatus.PENDING, NotificationStatus.DELIVERED, NotificationStatus.FAILED); + } +} +``` + +- [ ] **Step 2: Run — FAIL** (`cannot find symbol`). + +Run: `mvn -pl cameleer-server-core test -Dtest=AlertScopeTest` + +- [ ] **Step 3: Create the files** + +```java +// AlertSeverity.java +package com.cameleer.server.core.alerting; +public enum AlertSeverity { CRITICAL, WARNING, INFO } + +// AlertState.java +package com.cameleer.server.core.alerting; +public enum AlertState { PENDING, FIRING, ACKNOWLEDGED, RESOLVED } + +// ConditionKind.java +package com.cameleer.server.core.alerting; +public enum ConditionKind { ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC } + +// TargetKind.java +package com.cameleer.server.core.alerting; +public enum TargetKind { USER, GROUP, ROLE } + +// NotificationStatus.java +package com.cameleer.server.core.alerting; +public enum NotificationStatus { PENDING, DELIVERED, FAILED } + +// RouteMetric.java +package com.cameleer.server.core.alerting; +public enum RouteMetric { ERROR_RATE, P95_LATENCY_MS, P99_LATENCY_MS, THROUGHPUT, ERROR_COUNT } + +// Comparator.java +package com.cameleer.server.core.alerting; +public enum Comparator { GT, GTE, LT, LTE, EQ } + +// AggregationOp.java +package com.cameleer.server.core.alerting; +public enum AggregationOp { MAX, MIN, AVG, LATEST } + +// FireMode.java +package com.cameleer.server.core.alerting; +public enum FireMode { PER_EXCHANGE, COUNT_IN_WINDOW } + +// AlertScope.java +package com.cameleer.server.core.alerting; +public record AlertScope(String appSlug, String routeId, String agentId) { + public boolean isEnvWide() { return appSlug == null && routeId == null && agentId == null; } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \ + cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java +git commit -m "feat(alerting): core enums + AlertScope" +``` + +### Task 4: `AlertCondition` sealed hierarchy + Jackson polymorphism + +**Files:** +- Create: `.../alerting/AlertCondition.java`, `RouteMetricCondition.java`, `ExchangeMatchCondition.java` (with nested `ExchangeFilter`), `AgentStateCondition.java`, `DeploymentStateCondition.java`, `LogPatternCondition.java`, `JvmMetricCondition.java` +- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +package com.cameleer.server.core.alerting; + +import com.fasterxml.jackson.databind.ObjectMapper; +import org.junit.jupiter.api.Test; +import java.util.List; +import java.util.Map; + +import static org.assertj.core.api.Assertions.assertThat; + +class AlertConditionJsonTest { + + private final ObjectMapper om = new ObjectMapper(); + + @Test + void roundtripRouteMetric() throws Exception { + var c = new RouteMetricCondition( + new AlertScope("orders", "route-1", null), + RouteMetric.P99_LATENCY_MS, Comparator.GT, 2000.0, 300); + String json = om.writeValueAsString((AlertCondition) c); + AlertCondition parsed = om.readValue(json, AlertCondition.class); + assertThat(parsed).isInstanceOf(RouteMetricCondition.class); + assertThat(parsed.kind()).isEqualTo(ConditionKind.ROUTE_METRIC); + } + + @Test + void roundtripExchangeMatchPerExchange() throws Exception { + var c = new ExchangeMatchCondition( + new AlertScope("orders", null, null), + new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of("type","payment")), + FireMode.PER_EXCHANGE, null, null, 300); + String json = om.writeValueAsString((AlertCondition) c); + AlertCondition parsed = om.readValue(json, AlertCondition.class); + assertThat(parsed).isInstanceOf(ExchangeMatchCondition.class); + } + + @Test + void roundtripExchangeMatchCountInWindow() throws Exception { + var c = new ExchangeMatchCondition( + new AlertScope("orders", null, null), + new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()), + FireMode.COUNT_IN_WINDOW, 5, 900, null); + AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class); + assertThat(((ExchangeMatchCondition) parsed).threshold()).isEqualTo(5); + } + + @Test + void roundtripAgentState() throws Exception { + var c = new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60); + AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class); + assertThat(parsed).isInstanceOf(AgentStateCondition.class); + } + + @Test + void roundtripDeploymentState() throws Exception { + var c = new DeploymentStateCondition(new AlertScope("orders", null, null), List.of("FAILED","DEGRADED")); + AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class); + assertThat(parsed).isInstanceOf(DeploymentStateCondition.class); + } + + @Test + void roundtripLogPattern() throws Exception { + var c = new LogPatternCondition(new AlertScope("orders", null, null), + "ERROR", "TimeoutException", 5, 900); + AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class); + assertThat(parsed).isInstanceOf(LogPatternCondition.class); + } + + @Test + void roundtripJvmMetric() throws Exception { + var c = new JvmMetricCondition(new AlertScope("orders", null, null), + "heap_used_percent", AggregationOp.MAX, Comparator.GT, 90.0, 300); + AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class); + assertThat(parsed).isInstanceOf(JvmMetricCondition.class); + } +} +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Create the sealed hierarchy** + +```java +// AlertCondition.java +package com.cameleer.server.core.alerting; + +import com.fasterxml.jackson.annotation.JsonSubTypes; +import com.fasterxml.jackson.annotation.JsonTypeInfo; + +@JsonTypeInfo(use = JsonTypeInfo.Id.DEDUCTION) +@JsonSubTypes({ + @JsonSubTypes.Type(RouteMetricCondition.class), + @JsonSubTypes.Type(ExchangeMatchCondition.class), + @JsonSubTypes.Type(AgentStateCondition.class), + @JsonSubTypes.Type(DeploymentStateCondition.class), + @JsonSubTypes.Type(LogPatternCondition.class), + @JsonSubTypes.Type(JvmMetricCondition.class) +}) +public sealed interface AlertCondition permits + RouteMetricCondition, ExchangeMatchCondition, AgentStateCondition, + DeploymentStateCondition, LogPatternCondition, JvmMetricCondition { + + ConditionKind kind(); + AlertScope scope(); +} +``` + +```java +// RouteMetricCondition.java +package com.cameleer.server.core.alerting; + +public record RouteMetricCondition( + AlertScope scope, + RouteMetric metric, + Comparator comparator, + double threshold, + int windowSeconds) implements AlertCondition { + @Override public ConditionKind kind() { return ConditionKind.ROUTE_METRIC; } +} +``` + +```java +// ExchangeMatchCondition.java +package com.cameleer.server.core.alerting; + +import java.util.Map; + +public record ExchangeMatchCondition( + AlertScope scope, + ExchangeFilter filter, + FireMode fireMode, + Integer threshold, // required when COUNT_IN_WINDOW; null for PER_EXCHANGE + Integer windowSeconds, // required when COUNT_IN_WINDOW + Integer perExchangeLingerSeconds // required when PER_EXCHANGE +) implements AlertCondition { + + public ExchangeMatchCondition { + if (fireMode == FireMode.COUNT_IN_WINDOW && (threshold == null || windowSeconds == null)) + throw new IllegalArgumentException("COUNT_IN_WINDOW requires threshold + windowSeconds"); + if (fireMode == FireMode.PER_EXCHANGE && perExchangeLingerSeconds == null) + throw new IllegalArgumentException("PER_EXCHANGE requires perExchangeLingerSeconds"); + } + + @Override public ConditionKind kind() { return ConditionKind.EXCHANGE_MATCH; } + + public record ExchangeFilter(String status, Map attributes) { + public ExchangeFilter { attributes = attributes == null ? Map.of() : Map.copyOf(attributes); } + } +} +``` + +```java +// AgentStateCondition.java +package com.cameleer.server.core.alerting; + +public record AgentStateCondition(AlertScope scope, String state, int forSeconds) implements AlertCondition { + @Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; } +} +``` + +```java +// DeploymentStateCondition.java +package com.cameleer.server.core.alerting; + +import java.util.List; + +public record DeploymentStateCondition(AlertScope scope, List states) implements AlertCondition { + public DeploymentStateCondition { states = List.copyOf(states); } + @Override public ConditionKind kind() { return ConditionKind.DEPLOYMENT_STATE; } +} +``` + +```java +// LogPatternCondition.java +package com.cameleer.server.core.alerting; + +public record LogPatternCondition( + AlertScope scope, + String level, + String pattern, + int threshold, + int windowSeconds) implements AlertCondition { + @Override public ConditionKind kind() { return ConditionKind.LOG_PATTERN; } +} +``` + +```java +// JvmMetricCondition.java +package com.cameleer.server.core.alerting; + +public record JvmMetricCondition( + AlertScope scope, + String metric, + AggregationOp aggregation, + Comparator comparator, + double threshold, + int windowSeconds) implements AlertCondition { + @Override public ConditionKind kind() { return ConditionKind.JVM_METRIC; } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \ + cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java +git commit -m "feat(alerting): sealed AlertCondition hierarchy with Jackson deduction" +``` + +### Task 5: Core data records (`AlertRule`, `AlertInstance`, `AlertSilence`, `SilenceMatcher`, `AlertRuleTarget`, `AlertNotification`, `WebhookBinding`) + +**Files:** +- Create: the seven records above under `.../alerting/` +- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +package com.cameleer.server.core.alerting; + +import org.junit.jupiter.api.Test; +import java.time.Instant; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +import static org.assertj.core.api.Assertions.assertThat; + +class AlertDomainRecordsTest { + + @Test + void alertRuleDefensiveCopy() { + var webhooks = new java.util.ArrayList(); + webhooks.add(new WebhookBinding(UUID.randomUUID(), UUID.randomUUID(), null, null)); + var r = newRule(webhooks); + webhooks.clear(); + assertThat(r.webhooks()).hasSize(1); + } + + @Test + void silenceMatcherAllFieldsNullMatchesEverything() { + var m = new SilenceMatcher(null, null, null, null, null); + assertThat(m.isWildcard()).isTrue(); + } + + private AlertRule newRule(List wh) { + return new AlertRule( + UUID.randomUUID(), UUID.randomUUID(), "r", null, + AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE, + new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60), + 60, 0, 60, "t", "m", wh, List.of(), + Instant.now(), null, null, Map.of(), + Instant.now(), "u1", Instant.now(), "u1"); + } +} +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Create the records** + +```java +// AlertRule.java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +public record AlertRule( + UUID id, + UUID environmentId, + String name, + String description, + AlertSeverity severity, + boolean enabled, + ConditionKind conditionKind, + AlertCondition condition, + int evaluationIntervalSeconds, + int forDurationSeconds, + int reNotifyMinutes, + String notificationTitleTmpl, + String notificationMessageTmpl, + List webhooks, + List targets, + Instant nextEvaluationAt, + String claimedBy, + Instant claimedUntil, + Map evalState, + Instant createdAt, + String createdBy, + Instant updatedAt, + String updatedBy) { + + public AlertRule { + webhooks = webhooks == null ? List.of() : List.copyOf(webhooks); + targets = targets == null ? List.of() : List.copyOf(targets); + evalState = evalState == null ? Map.of() : Map.copyOf(evalState); + } +} +``` + +```java +// AlertInstance.java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +public record AlertInstance( + UUID id, + UUID ruleId, // nullable after rule deletion + Map ruleSnapshot, + UUID environmentId, + AlertState state, + AlertSeverity severity, + Instant firedAt, + Instant ackedAt, + String ackedBy, + Instant resolvedAt, + Instant lastNotifiedAt, + boolean silenced, + Double currentValue, + Double threshold, + Map context, + String title, + String message, + List targetUserIds, + List targetGroupIds, + List targetRoleNames) { + + public AlertInstance { + ruleSnapshot = ruleSnapshot == null ? Map.of() : Map.copyOf(ruleSnapshot); + context = context == null ? Map.of() : Map.copyOf(context); + targetUserIds = targetUserIds == null ? List.of() : List.copyOf(targetUserIds); + targetGroupIds = targetGroupIds == null ? List.of() : List.copyOf(targetGroupIds); + targetRoleNames = targetRoleNames == null ? List.of() : List.copyOf(targetRoleNames); + } +} +``` + +```java +// AlertRuleTarget.java +package com.cameleer.server.core.alerting; + +import java.util.UUID; + +public record AlertRuleTarget(UUID id, UUID ruleId, TargetKind kind, String targetId) {} +``` + +```java +// WebhookBinding.java +package com.cameleer.server.core.alerting; + +import java.util.Map; +import java.util.UUID; + +public record WebhookBinding( + UUID id, + UUID outboundConnectionId, + String bodyOverride, + Map headerOverrides) { + + public WebhookBinding { + headerOverrides = headerOverrides == null ? Map.of() : Map.copyOf(headerOverrides); + } +} +``` + +```java +// SilenceMatcher.java +package com.cameleer.server.core.alerting; + +import java.util.UUID; + +public record SilenceMatcher( + UUID ruleId, String appSlug, String routeId, String agentId, AlertSeverity severity) { + + public boolean isWildcard() { + return ruleId == null && appSlug == null && routeId == null && agentId == null && severity == null; + } +} +``` + +```java +// AlertSilence.java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.UUID; + +public record AlertSilence( + UUID id, + UUID environmentId, + SilenceMatcher matcher, + String reason, + Instant startsAt, + Instant endsAt, + String createdBy, + Instant createdAt) {} +``` + +```java +// AlertNotification.java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.Map; +import java.util.UUID; + +public record AlertNotification( + UUID id, + UUID alertInstanceId, + UUID webhookId, + UUID outboundConnectionId, + NotificationStatus status, + int attempts, + Instant nextAttemptAt, + String claimedBy, + Instant claimedUntil, + Integer lastResponseStatus, + String lastResponseSnippet, + Map payload, + Instant deliveredAt, + Instant createdAt) { + + public AlertNotification { + payload = payload == null ? Map.of() : Map.copyOf(payload); + } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \ + cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java +git commit -m "feat(alerting): core domain records (rule, instance, silence, notification)" +``` + +### Task 6: Repository interfaces + +**Files:** +- Create: `.../alerting/AlertRuleRepository.java`, `AlertInstanceRepository.java`, `AlertSilenceRepository.java`, `AlertNotificationRepository.java`, `AlertReadRepository.java` +- No test (pure interfaces — covered by the Phase 3 integration tests). + +- [ ] **Step 1: Create the interfaces** + +```java +// AlertRuleRepository.java +package com.cameleer.server.core.alerting; + +import java.util.List; +import java.util.Optional; +import java.util.UUID; + +public interface AlertRuleRepository { + AlertRule save(AlertRule rule); // upsert by id + Optional findById(UUID id); + List listByEnvironment(UUID environmentId); + List findAllByOutboundConnectionId(UUID connectionId); + List findRuleIdsByOutboundConnectionId(UUID connectionId); // used by rulesReferencing() + void delete(UUID id); + + /** Claim up to batchSize rules whose next_evaluation_at <= now AND (claimed_until IS NULL OR claimed_until < now). + * Atomically sets claimed_by + claimed_until = now + ttl. Returns claimed rules. */ + List claimDueRules(String instanceId, int batchSize, int claimTtlSeconds); + + /** Release claim + bump next_evaluation_at. */ + void releaseClaim(UUID ruleId, java.time.Instant nextEvaluationAt, + java.util.Map evalState); +} +``` + +```java +// AlertInstanceRepository.java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.List; +import java.util.Optional; +import java.util.UUID; + +public interface AlertInstanceRepository { + AlertInstance save(AlertInstance instance); // upsert by id + Optional findById(UUID id); + Optional findOpenForRule(UUID ruleId); // state IN ('PENDING','FIRING','ACKNOWLEDGED') + List listForInbox(UUID environmentId, + List userGroupIdFilter, // UUIDs as String? decide impl-side + String userId, + List userRoleNames, + int limit); + long countUnreadForUser(UUID environmentId, String userId); + void ack(UUID id, String userId, Instant when); + void resolve(UUID id, Instant when); + void markSilenced(UUID id, boolean silenced); + void deleteResolvedBefore(Instant cutoff); +} +``` + +```java +// AlertSilenceRepository.java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.List; +import java.util.Optional; +import java.util.UUID; + +public interface AlertSilenceRepository { + AlertSilence save(AlertSilence silence); + Optional findById(UUID id); + List listActive(UUID environmentId, Instant when); + List listByEnvironment(UUID environmentId); + void delete(UUID id); +} +``` + +```java +// AlertNotificationRepository.java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.List; +import java.util.Optional; +import java.util.UUID; + +public interface AlertNotificationRepository { + AlertNotification save(AlertNotification n); + Optional findById(UUID id); + List listForInstance(UUID alertInstanceId); + List claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds); + void markDelivered(UUID id, int status, String snippet, Instant when); + void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet); + void markFailed(UUID id, int status, String snippet); + void deleteSettledBefore(Instant cutoff); +} +``` + +```java +// AlertReadRepository.java +package com.cameleer.server.core.alerting; + +import java.util.List; +import java.util.UUID; + +public interface AlertReadRepository { + void markRead(String userId, UUID alertInstanceId); + void bulkMarkRead(String userId, List alertInstanceIds); +} +``` + +- [ ] **Step 2: Compile** + +Run: `mvn -pl cameleer-server-core compile` +Expected: SUCCESS. + +- [ ] **Step 3: Commit** + +```bash +git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/Alert*Repository.java +git commit -m "feat(alerting): core repository interfaces" +``` + +--- + +## Phase 3 — Postgres repositories + +All repositories use `JdbcTemplate` and `ObjectMapper` for JSONB columns (same pattern as `PostgresOutboundConnectionRepository`). Convert UUID[] with `ConnectionCallback` + `Array.of("uuid", ...)` and text[] with `Array.of("text", ...)`. + +### Task 7: `PostgresAlertRuleRepository` + +**Files:** +- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java` +- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java` + +- [ ] **Step 1: Write the failing integration test** + +```java +package com.cameleer.server.app.alerting.storage; + +import com.cameleer.server.app.AbstractPostgresIT; +import com.cameleer.server.core.alerting.*; +import com.fasterxml.jackson.databind.ObjectMapper; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.Test; + +import java.time.Instant; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +import static org.assertj.core.api.Assertions.assertThat; + +class PostgresAlertRuleRepositoryIT extends AbstractPostgresIT { + + private PostgresAlertRuleRepository repo; + private UUID envId; + + @AfterEach + void cleanup() { + jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId); + jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId); + jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-user'"); + } + + @org.junit.jupiter.api.BeforeEach + void setup() { + repo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper()); + envId = UUID.randomUUID(); + jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-env-" + UUID.randomUUID()); + jdbcTemplate.update( + "INSERT INTO users (user_id, username, password_hash, email, enabled) " + + "VALUES ('test-user', 'test-user', 'x', 'a@b', true)"); + } + + @Test + void saveAndFindByIdRoundtrip() { + var rule = newRule(List.of()); + repo.save(rule); + var found = repo.findById(rule.id()).orElseThrow(); + assertThat(found.name()).isEqualTo(rule.name()); + assertThat(found.condition()).isInstanceOf(AgentStateCondition.class); + } + + @Test + void findRuleIdsByOutboundConnectionId() { + var connId = UUID.randomUUID(); + var wb = new WebhookBinding(UUID.randomUUID(), connId, null, Map.of()); + var rule = newRule(List.of(wb)); + repo.save(rule); + + List ids = repo.findRuleIdsByOutboundConnectionId(connId); + assertThat(ids).containsExactly(rule.id()); + + assertThat(repo.findRuleIdsByOutboundConnectionId(UUID.randomUUID())).isEmpty(); + } + + @Test + void claimDueRulesAtomicSkipLocked() { + var rule = newRule(List.of()); + repo.save(rule); + + List claimed = repo.claimDueRules("instance-A", 10, 30); + assertThat(claimed).hasSize(1); + + // Second claimant sees nothing until first releases or TTL expires + List second = repo.claimDueRules("instance-B", 10, 30); + assertThat(second).isEmpty(); + } + + private AlertRule newRule(List webhooks) { + return new AlertRule( + UUID.randomUUID(), envId, "rule-" + UUID.randomUUID(), "desc", + AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE, + new AgentStateCondition(new AlertScope(null, null, null), "DEAD", 60), + 60, 0, 60, "t", "m", webhooks, List.of(), + Instant.now().minusSeconds(10), null, null, Map.of(), + Instant.now(), "test-user", Instant.now(), "test-user"); + } +} +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement the repository** + +```java +package com.cameleer.server.app.alerting.storage; + +import com.cameleer.server.core.alerting.*; +import com.fasterxml.jackson.core.type.TypeReference; +import com.fasterxml.jackson.databind.ObjectMapper; +import org.postgresql.util.PGobject; +import org.springframework.jdbc.core.JdbcTemplate; +import org.springframework.jdbc.core.RowMapper; + +import java.sql.PreparedStatement; +import java.sql.SQLException; +import java.sql.Timestamp; +import java.sql.Types; +import java.time.Instant; +import java.util.*; + +public class PostgresAlertRuleRepository implements AlertRuleRepository { + + private final JdbcTemplate jdbc; + private final ObjectMapper om; + + public PostgresAlertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) { + this.jdbc = jdbc; + this.om = om; + } + + @Override + public AlertRule save(AlertRule r) { + String sql = """ + INSERT INTO alert_rules (id, environment_id, name, description, severity, enabled, + condition_kind, condition, evaluation_interval_seconds, for_duration_seconds, + re_notify_minutes, notification_title_tmpl, notification_message_tmpl, + webhooks, next_evaluation_at, claimed_by, claimed_until, eval_state, + created_at, created_by, updated_at, updated_by) + VALUES (?, ?, ?, ?, ?::severity_enum, ?, ?::condition_kind_enum, ?::jsonb, ?, ?, ?, ?, ?, ?::jsonb, + ?, ?, ?, ?::jsonb, ?, ?, ?, ?) + ON CONFLICT (id) DO UPDATE SET + name = EXCLUDED.name, description = EXCLUDED.description, + severity = EXCLUDED.severity, enabled = EXCLUDED.enabled, + condition_kind = EXCLUDED.condition_kind, condition = EXCLUDED.condition, + evaluation_interval_seconds = EXCLUDED.evaluation_interval_seconds, + for_duration_seconds = EXCLUDED.for_duration_seconds, + re_notify_minutes = EXCLUDED.re_notify_minutes, + notification_title_tmpl = EXCLUDED.notification_title_tmpl, + notification_message_tmpl = EXCLUDED.notification_message_tmpl, + webhooks = EXCLUDED.webhooks, eval_state = EXCLUDED.eval_state, + updated_at = EXCLUDED.updated_at, updated_by = EXCLUDED.updated_by + """; + jdbc.update(sql, + r.id(), r.environmentId(), r.name(), r.description(), + r.severity().name(), r.enabled(), r.conditionKind().name(), + writeJson(r.condition()), + r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(), + r.notificationTitleTmpl(), r.notificationMessageTmpl(), + writeJson(r.webhooks()), + Timestamp.from(r.nextEvaluationAt()), + r.claimedBy(), + r.claimedUntil() == null ? null : Timestamp.from(r.claimedUntil()), + writeJson(r.evalState()), + Timestamp.from(r.createdAt()), r.createdBy(), + Timestamp.from(r.updatedAt()), r.updatedBy()); + return r; + } + + @Override + public Optional findById(UUID id) { + var list = jdbc.query("SELECT * FROM alert_rules WHERE id = ?", rowMapper(), id); + return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0)); + } + + @Override + public List listByEnvironment(UUID environmentId) { + return jdbc.query( + "SELECT * FROM alert_rules WHERE environment_id = ? ORDER BY created_at DESC", + rowMapper(), environmentId); + } + + @Override + public List findAllByOutboundConnectionId(UUID connectionId) { + String sql = """ + SELECT * FROM alert_rules + WHERE webhooks @> ?::jsonb + ORDER BY created_at DESC + """; + String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]"; + return jdbc.query(sql, rowMapper(), predicate); + } + + @Override + public List findRuleIdsByOutboundConnectionId(UUID connectionId) { + String sql = """ + SELECT id FROM alert_rules + WHERE webhooks @> ?::jsonb + """; + String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]"; + return jdbc.queryForList(sql, UUID.class, predicate); + } + + @Override + public void delete(UUID id) { + jdbc.update("DELETE FROM alert_rules WHERE id = ?", id); + } + + @Override + public List claimDueRules(String instanceId, int batchSize, int claimTtlSeconds) { + String sql = """ + UPDATE alert_rules + SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval + WHERE id IN ( + SELECT id FROM alert_rules + WHERE enabled = true + AND next_evaluation_at <= now() + AND (claimed_until IS NULL OR claimed_until < now()) + ORDER BY next_evaluation_at + LIMIT ? + FOR UPDATE SKIP LOCKED + ) + RETURNING * + """; + return jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize); + } + + @Override + public void releaseClaim(UUID ruleId, Instant nextEvaluationAt, Map evalState) { + jdbc.update(""" + UPDATE alert_rules + SET claimed_by = NULL, claimed_until = NULL, + next_evaluation_at = ?, eval_state = ?::jsonb + WHERE id = ? + """, + Timestamp.from(nextEvaluationAt), writeJson(evalState), ruleId); + } + + private RowMapper rowMapper() { + return (rs, i) -> { + ConditionKind kind = ConditionKind.valueOf(rs.getString("condition_kind")); + AlertCondition cond = om.readValue(rs.getString("condition"), AlertCondition.class); + List webhooks = om.readValue( + rs.getString("webhooks"), new TypeReference<>() {}); + Map evalState = om.readValue( + rs.getString("eval_state"), new TypeReference<>() {}); + + Timestamp cu = rs.getTimestamp("claimed_until"); + return new AlertRule( + (UUID) rs.getObject("id"), + (UUID) rs.getObject("environment_id"), + rs.getString("name"), + rs.getString("description"), + AlertSeverity.valueOf(rs.getString("severity")), + rs.getBoolean("enabled"), + kind, cond, + rs.getInt("evaluation_interval_seconds"), + rs.getInt("for_duration_seconds"), + rs.getInt("re_notify_minutes"), + rs.getString("notification_title_tmpl"), + rs.getString("notification_message_tmpl"), + webhooks, List.of(), + rs.getTimestamp("next_evaluation_at").toInstant(), + rs.getString("claimed_by"), + cu == null ? null : cu.toInstant(), + evalState, + rs.getTimestamp("created_at").toInstant(), + rs.getString("created_by"), + rs.getTimestamp("updated_at").toInstant(), + rs.getString("updated_by")); + }; + } + + private String writeJson(Object o) { + try { return om.writeValueAsString(o); } + catch (Exception e) { throw new IllegalStateException(e); } + } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java +git commit -m "feat(alerting): Postgres repository for alert_rules" +``` + +### Task 8: Wire `OutboundConnectionServiceImpl.rulesReferencing()` (CRITICAL — Plan 01 gate) + +> **This is the Plan 01 known-incomplete item.** Plan 01 shipped `rulesReferencing()` returning `[]`. Until this task lands, outbound connections can be deleted or narrowed while rules reference them, corrupting production. **Do not skip or defer.** + +**Files:** +- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java` +- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java` +- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java` + +- [ ] **Step 1: GitNexus impact check** + +Run `gitnexus_impact({target: "OutboundConnectionServiceImpl", direction: "upstream"})`. Report blast radius. Expected: controller + bean config + UI hooks (Plan 01). No production paths should be affected by replacing a stub with real behaviour. + +- [ ] **Step 2: Write the failing integration test** + +```java +package com.cameleer.server.app.outbound; + +import com.cameleer.server.app.AbstractPostgresIT; +import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository; +import com.cameleer.server.core.alerting.*; +import com.cameleer.server.core.outbound.*; +import com.fasterxml.jackson.databind.ObjectMapper; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.springframework.beans.factory.annotation.Autowired; + +import java.time.Instant; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +class OutboundConnectionServiceRulesReferencingIT extends AbstractPostgresIT { + + @Autowired OutboundConnectionService service; + @Autowired OutboundConnectionRepository repo; + + private UUID envId; + private UUID connId; + private PostgresAlertRuleRepository ruleRepo; + + @BeforeEach + void seed() { + ruleRepo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper()); + envId = UUID.randomUUID(); + jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "env-" + UUID.randomUUID()); + jdbcTemplate.update( + "INSERT INTO users (user_id, username, password_hash, email, enabled) " + + "VALUES ('u-ref', 'u-ref', 'x', 'a@b', true) ON CONFLICT DO NOTHING"); + var c = repo.save(new OutboundConnection( + UUID.randomUUID(), "default", "conn", null, "https://example.test", + OutboundMethod.POST, Map.of(), null, TrustMode.SYSTEM_DEFAULT, List.of(), null, + OutboundAuth.None.INSTANCE, List.of(), + Instant.now(), "u-ref", Instant.now(), "u-ref")); + connId = c.id(); + + var rule = new AlertRule( + UUID.randomUUID(), envId, "r", null, AlertSeverity.WARNING, true, + ConditionKind.AGENT_STATE, + new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60), + 60, 0, 60, "t", "m", + List.of(new WebhookBinding(UUID.randomUUID(), connId, null, Map.of())), + List.of(), Instant.now(), null, null, Map.of(), + Instant.now(), "u-ref", Instant.now(), "u-ref"); + ruleRepo.save(rule); + } + + @Test + void deleteConnectionReferencedByRuleReturns409() { + assertThat(service.rulesReferencing(connId)).hasSize(1); + assertThatThrownBy(() -> service.delete(connId, "u-ref")) + .hasMessageContaining("referenced by rules"); + } +} +``` + +- [ ] **Step 3: Run — FAIL** (stub returns empty list, so delete succeeds). + +- [ ] **Step 4: Replace the stub** + +In `OutboundConnectionServiceImpl.java`: + +```java +// existing imports + add: +import com.cameleer.server.core.alerting.AlertRuleRepository; + +public class OutboundConnectionServiceImpl implements OutboundConnectionService { + + private final OutboundConnectionRepository repo; + private final AlertRuleRepository ruleRepo; // NEW + private final String tenantId; + + public OutboundConnectionServiceImpl( + OutboundConnectionRepository repo, + AlertRuleRepository ruleRepo, + String tenantId) { + this.repo = repo; + this.ruleRepo = ruleRepo; + this.tenantId = tenantId; + } + + // … create/update/delete/get/list unchanged … + + @Override + public List rulesReferencing(UUID id) { + return ruleRepo.findRuleIdsByOutboundConnectionId(id); + } +} +``` + +Update `OutboundBeanConfig.java` to inject `AlertRuleRepository`: + +```java +@Bean +public OutboundConnectionService outboundConnectionService( + OutboundConnectionRepository repo, + AlertRuleRepository ruleRepo, + @Value("${cameleer.server.tenant.id:default}") String tenantId) { + return new OutboundConnectionServiceImpl(repo, ruleRepo, tenantId); +} +``` + +Add the `AlertRuleRepository` bean in a new `AlertingBeanConfig.java` stub (completed in Phase 7): + +```java +package com.cameleer.server.app.alerting.config; + +import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository; +import com.cameleer.server.core.alerting.AlertRuleRepository; +import com.fasterxml.jackson.databind.ObjectMapper; +import org.springframework.context.annotation.Bean; +import org.springframework.context.annotation.Configuration; +import org.springframework.jdbc.core.JdbcTemplate; + +@Configuration +public class AlertingBeanConfig { + @Bean + public AlertRuleRepository alertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) { + return new PostgresAlertRuleRepository(jdbc, om); + } +} +``` + +- [ ] **Step 5: Run — PASS**. + +- [ ] **Step 6: GitNexus detect_changes + commit** + +```bash +# Verify scope +# gitnexus_detect_changes({scope: "staged"}) +git add cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java \ + cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java \ + cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java +git commit -m "fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)" +``` + +### Task 9: `PostgresAlertInstanceRepository` + +**Files:** +- Create: `.../alerting/storage/PostgresAlertInstanceRepository.java` +- Test: `.../alerting/storage/PostgresAlertInstanceRepositoryIT.java` + +- [ ] **Step 1: Write the failing test** covering: save/findById, findOpenForRule (filter `state IN ('PENDING','FIRING','ACKNOWLEDGED')`), listForInbox with user/group/role filters (seed 3 instances: one targeting user, one targeting group, one targeting role; assert listForInbox returns all three for a user in those groups/roles), countUnreadForUser (uses LEFT JOIN `alert_reads`), ack, resolve, deleteResolvedBefore. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — same RowMapper pattern as Task 7. Key queries: + +```sql +-- findOpenForRule +SELECT * FROM alert_instances + WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED') + ORDER BY fired_at DESC LIMIT 1; + +-- listForInbox (bind userId, groupIds array, roleNames array as ? placeholders) +SELECT * FROM alert_instances + WHERE environment_id = ? + AND state IN ('FIRING','ACKNOWLEDGED','RESOLVED') + AND ( + ? = ANY(target_user_ids) + OR target_group_ids && ?::uuid[] + OR target_role_names && ?::text[] + ) + ORDER BY fired_at DESC LIMIT ?; + +-- countUnreadForUser +SELECT count(*) FROM alert_instances ai + WHERE ai.environment_id = ? + AND ai.state IN ('FIRING','ACKNOWLEDGED') + AND ( + ? = ANY(ai.target_user_ids) + OR ai.target_group_ids && ?::uuid[] + OR ai.target_role_names && ?::text[] + ) + AND NOT EXISTS ( + SELECT 1 FROM alert_reads ar + WHERE ar.alert_instance_id = ai.id AND ar.user_id = ? + ); +``` + +Array binding via `connection.createArrayOf("uuid", uuids)` / `createArrayOf("text", names)` inside a `ConnectionCallback`. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepositoryIT.java +git commit -m "feat(alerting): Postgres repository for alert_instances with inbox queries" +``` + +### Task 10: `PostgresAlertSilenceRepository`, `PostgresAlertNotificationRepository`, `PostgresAlertReadRepository` + +**Files:** +- Create: three repositories under `.../alerting/storage/` +- Test: one IT per repository in `.../alerting/storage/` + +- [ ] **Step 1: Write all three failing ITs** (one file each). Cover: + - `Silence`: save/findById, listActive filters by `now BETWEEN starts_at AND ends_at`, delete. + - `Notification`: save/findById, claimDueNotifications (SKIP LOCKED), scheduleRetry bumps attempts + `next_attempt_at`, markDelivered + markFailed transition status, deleteSettledBefore purges `DELIVERED` + `FAILED`. + - `Read`: markRead is idempotent (uses `ON CONFLICT DO NOTHING`), bulkMarkRead handles empty list. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** following the same JdbcTemplate pattern. Notification claim query mirrors Task 7's rule claim: + +```sql +UPDATE alert_notifications + SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval + WHERE id IN ( + SELECT id FROM alert_notifications + WHERE status = 'PENDING' + AND next_attempt_at <= now() + AND (claimed_until IS NULL OR claimed_until < now()) + ORDER BY next_attempt_at + LIMIT ? + FOR UPDATE SKIP LOCKED + ) + RETURNING *; +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/ \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/Postgres*IT.java +git commit -m "feat(alerting): Postgres repositories for silences, notifications, reads" +``` + +### Task 11: Wire all alerting repositories in `AlertingBeanConfig` + +**Files:** +- Modify: `.../alerting/config/AlertingBeanConfig.java` + +- [ ] **Step 1: Add beans for the remaining repositories** + +```java +@Bean public AlertInstanceRepository alertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) { + return new PostgresAlertInstanceRepository(jdbc, om); +} +@Bean public AlertSilenceRepository alertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) { + return new PostgresAlertSilenceRepository(jdbc, om); +} +@Bean public AlertNotificationRepository alertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) { + return new PostgresAlertNotificationRepository(jdbc, om); +} +@Bean public AlertReadRepository alertReadRepository(JdbcTemplate jdbc) { + return new PostgresAlertReadRepository(jdbc); +} +``` + +- [ ] **Step 2: Verify compile + existing ITs still pass** + +```bash +mvn -pl cameleer-server-app test -Dtest='PostgresAlert*IT' +``` + +- [ ] **Step 3: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java +git commit -m "feat(alerting): wire all alerting repository beans" +``` + +--- + +## Phase 4 — ClickHouse reads: new count methods and projections + +### Task 12: Add `ClickHouseLogStore.countLogs(LogSearchRequest)` + +**Files:** +- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java` +- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java` + +- [ ] **Step 1: GitNexus impact check** + +Run `gitnexus_impact({target: "ClickHouseLogStore", direction: "upstream"})`. Expected callers: `LogQueryController`, `ContainerLogForwarder`, `ClickHouseConfig`. Adding a method is non-breaking — no downstream callers affected. + +- [ ] **Step 2: Write the failing test** + +```java +package com.cameleer.server.app.search; + +import com.cameleer.server.app.AbstractPostgresIT; +import com.cameleer.server.core.search.LogSearchRequest; +import org.junit.jupiter.api.Test; +import org.springframework.beans.factory.annotation.Autowired; + +import java.time.Instant; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +class ClickHouseLogStoreCountIT extends AbstractPostgresIT { + + @Autowired ClickHouseLogStore store; + + @Test + void countLogsRespectsLevelPatternAndWindow() { + // Seed 3 ERROR TimeoutException + 2 INFO rows in 'orders' app for env 'dev' within last 5 min + // (seed helper uses existing `indexBatch` path) + long count = store.countLogs(new LogSearchRequest( + /* environment */ "dev", + /* application */ "orders", + /* agentId */ null, + /* exchangeId */ null, + /* logger */ null, + /* sources */ List.of(), + /* levels */ List.of("ERROR"), + /* q */ "TimeoutException", + /* from */ Instant.now().minusSeconds(300), + /* to */ Instant.now(), + /* cursor */ null, + /* limit */ 100, + /* sort */ "desc" + )); + assertThat(count).isEqualTo(3); + } +} +``` + +(Adjust `LogSearchRequest` constructor to the actual record signature — check `cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java` for exact order.) + +- [ ] **Step 3: Run — FAIL**. + +- [ ] **Step 4: Implement the method** + +In `ClickHouseLogStore.java`, add a new public method. Reuse the WHERE-clause builder already used by `search(LogSearchRequest)`, but: +- No `FINAL`. +- Skip cursor, limit, sort. +- `SELECT count() FROM logs WHERE `. +- Include the `tenant_id = ?` predicate. + +```java +public long countLogs(LogSearchRequest request) { + StringBuilder where = new StringBuilder("tenant_id = ? AND timestamp BETWEEN ? AND ?"); + List args = new ArrayList<>(); + args.add(tenantId); + args.add(Timestamp.from(request.from())); + args.add(Timestamp.from(request.to())); + if (request.environment() != null) { where.append(" AND environment = ?"); args.add(request.environment()); } + if (request.application() != null) { where.append(" AND application = ?"); args.add(request.application()); } + // … level multi, logger, q (positionCaseInsensitive(message, ?) > 0), exchangeId, agentId … + String sql = "SELECT count() FROM logs WHERE " + where; // NO FINAL + Long n = jdbc.queryForObject(sql, Long.class, args.toArray()); + return n == null ? 0L : n; +} +``` + +(Imports: `java.sql.Timestamp`, `java.util.ArrayList`.) + +- [ ] **Step 5: Run — PASS**. + +- [ ] **Step 6: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java +git commit -m "feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator" +``` + +### Task 13: Add `ClickHouseSearchIndex.countExecutionsForAlerting(AlertMatchSpec)` + +**Files:** +- Create: `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java` +- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java` +- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java` + +- [ ] **Step 1: GitNexus impact check** + +Run `gitnexus_impact({target: "ClickHouseSearchIndex", direction: "upstream"})`. Additive method — no downstream breakage. + +- [ ] **Step 2: Create `AlertMatchSpec` record** + +```java +package com.cameleer.server.core.alerting; + +import java.time.Instant; +import java.util.Map; + +/** Specification for alerting-specific execution counting. + * Distinct from SearchRequest: no text-in-body subqueries, no cursor, no FINAL. + * All fields except tenant/env/from/to are nullable filters. */ +public record AlertMatchSpec( + String tenantId, + String environment, + String applicationId, // nullable + String routeId, // nullable + String status, // "FAILED" / "SUCCESS" / null + Map attributes, // exact match on execution attribute key=value + Instant from, + Instant to, + Instant after // nullable; used by PER_EXCHANGE to advance cursor +) { + public AlertMatchSpec { + attributes = attributes == null ? Map.of() : Map.copyOf(attributes); + } +} +``` + +- [ ] **Step 3: Write the failing test** — seed a mix of FAILED/SUCCESS executions with various attribute maps, assert count matches. + +- [ ] **Step 4: Run — FAIL**. + +- [ ] **Step 5: Implement on `ClickHouseSearchIndex`** + +```java +public long countExecutionsForAlerting(AlertMatchSpec spec) { + StringBuilder where = new StringBuilder( + "tenant_id = ? AND environment = ? AND start_time BETWEEN ? AND ?"); + List args = new ArrayList<>(); + args.add(spec.tenantId()); + args.add(spec.environment()); + args.add(Timestamp.from(spec.from())); + args.add(Timestamp.from(spec.to())); + if (spec.applicationId() != null) { where.append(" AND application_id = ?"); args.add(spec.applicationId()); } + if (spec.routeId() != null) { where.append(" AND route_id = ?"); args.add(spec.routeId()); } + if (spec.status() != null) { where.append(" AND status = ?"); args.add(spec.status()); } + if (spec.after() != null) { + where.append(" AND start_time > ?"); + args.add(Timestamp.from(spec.after())); + } + // attribute filters: use Map column access — pattern matches existing search() impl + for (var e : spec.attributes().entrySet()) { + where.append(" AND attributes[?] = ?"); + args.add(e.getKey()); + args.add(e.getValue()); + } + String sql = "SELECT count() FROM executions WHERE " + where; // NO FINAL + Long n = jdbc.queryForObject(sql, Long.class, args.toArray()); + return n == null ? 0L : n; +} +``` + +- [ ] **Step 6: Run — PASS**. + +- [ ] **Step 7: Commit** + +```bash +git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java \ + cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java +git commit -m "feat(alerting): countExecutionsForAlerting for exchange-match evaluator" +``` + +### Task 14: ClickHouse projections migration + +**Files:** +- Create: `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql` +- Modify: the schema initializer invocation site (likely `ClickHouseConfig` or `ClickHouseSchemaInitializer`) to also run this file on startup. + +- [ ] **Step 1: Write the SQL file** + +```sql +-- Additive, idempotent. Safe to drop + rebuild with no data loss. +ALTER TABLE executions + ADD PROJECTION IF NOT EXISTS alerting_app_status + (SELECT * ORDER BY (tenant_id, environment, application_id, status, start_time)); + +ALTER TABLE executions + ADD PROJECTION IF NOT EXISTS alerting_route_status + (SELECT * ORDER BY (tenant_id, environment, route_id, status, start_time)); + +ALTER TABLE logs + ADD PROJECTION IF NOT EXISTS alerting_app_level + (SELECT * ORDER BY (tenant_id, environment, application, level, timestamp)); + +ALTER TABLE agent_metrics + ADD PROJECTION IF NOT EXISTS alerting_instance_metric + (SELECT * ORDER BY (tenant_id, environment, instance_id, metric_name, collected_at)); + +ALTER TABLE executions MATERIALIZE PROJECTION alerting_app_status; +ALTER TABLE executions MATERIALIZE PROJECTION alerting_route_status; +ALTER TABLE logs MATERIALIZE PROJECTION alerting_app_level; +ALTER TABLE agent_metrics MATERIALIZE PROJECTION alerting_instance_metric; +``` + +(Adjust table column names to match real `init.sql` — confirm `application` vs `application_id` on the `logs` and `agent_metrics` tables.) + +- [ ] **Step 2: Hook into `ClickHouseSchemaInitializer`** + +Find the initializer and add a second invocation: + +```java +runIdempotent("clickhouse/init.sql"); +runIdempotent("clickhouse/alerting_projections.sql"); +``` + +- [ ] **Step 3: Add a smoke IT** + +```java +@Test +void projectionsExistAfterStartup() { + var names = jdbcTemplate.queryForList( + "SELECT name FROM system.projections WHERE table IN ('executions','logs','agent_metrics')", + String.class); + assertThat(names).contains( + "alerting_app_status","alerting_route_status","alerting_app_level","alerting_instance_metric"); +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql \ + cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/search/AlertingProjectionsIT.java +git commit -m "feat(alerting): ClickHouse projections for alerting read paths" +``` + +--- + +## Phase 5 — Mustache templating and silence matching + +### Task 15: Add JMustache dependency + +**Files:** +- Modify: `cameleer-server-core/pom.xml` + +- [ ] **Step 1: Add dependency** + +```xml + + com.samskivert + jmustache + 1.16 + +``` + +- [ ] **Step 2: Verify resolve** + +Run: `mvn -pl cameleer-server-core dependency:resolve` + +- [ ] **Step 3: Commit** + +```bash +git add cameleer-server-core/pom.xml +git commit -m "chore(alerting): add jmustache 1.16" +``` + +### Task 16: `MustacheRenderer` + +**Files:** +- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java` +- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +package com.cameleer.server.app.alerting.notify; + +import org.junit.jupiter.api.Test; +import java.util.Map; +import static org.assertj.core.api.Assertions.assertThat; + +class MustacheRendererTest { + + private final MustacheRenderer r = new MustacheRenderer(); + + @Test + void rendersSimpleTemplate() { + String out = r.render("Hello {{name}}", Map.of("name", "world")); + assertThat(out).isEqualTo("Hello world"); + } + + @Test + void rendersNestedPath() { + String out = r.render("{{alert.severity}}", Map.of("alert", Map.of("severity","CRITICAL"))); + assertThat(out).isEqualTo("CRITICAL"); + } + + @Test + void missingVariableRendersLiteral() { + String out = r.render("{{missing.path}}", Map.of()); + assertThat(out).isEqualTo("{{missing.path}}"); + } + + @Test + void malformedTemplateReturnsRawWithWarn() { + String out = r.render("{{unclosed", Map.of("unclosed","x")); + assertThat(out).isEqualTo("{{unclosed"); + } +} +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** + +```java +package com.cameleer.server.app.alerting.notify; + +import com.samskivert.mustache.Mustache; +import com.samskivert.mustache.Template; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.springframework.stereotype.Component; + +import java.util.Map; + +@Component +public class MustacheRenderer { + + private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class); + + private final Mustache.Compiler compiler = Mustache.compiler() + .nullValue("") + .emptyStringIsFalse(true) + .defaultValue(null) // null triggers MissingContext -> we intercept below + .escapeHTML(false); + + public String render(String template, Map context) { + if (template == null) return ""; + try { + Template t = compiler.compile(template); + return t.execute(new LiteralFallbackContext(context)); + } catch (Exception e) { + log.warn("Mustache render failed for template='{}': {}", abbreviate(template), e.getMessage()); + return template; + } + } + + /** Returns `{{path}}` literal when a variable is missing. */ + private static class LiteralFallbackContext { + private final Map map; + LiteralFallbackContext(Map map) { this.map = map; } + // JMustache uses reflection / Map lookup, so we rely on wrapping the missing-value callback: + // easiest approach: compile with a custom `Mustache.Compiler.Loader` and intercept resolution. + // Simpler: post-process the output to detect unresolved `{{}}` sections → not possible after render. + // Alternative: pre-flight — scan template tokens against context and replace unresolved tokens + // with the literal before compilation. Use this simple approach: + } +} +``` + +Simpler implementation (ships for v1): + +```java +@Component +public class MustacheRenderer { + + private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class); + private static final java.util.regex.Pattern TOKEN = + java.util.regex.Pattern.compile("\\{\\{\\s*([a-zA-Z0-9_.]+)\\s*}}"); + + private final Mustache.Compiler compiler = Mustache.compiler() + .defaultValue("") + .escapeHTML(false); + + public String render(String template, Map context) { + if (template == null) return ""; + String resolved = preResolve(template, context); + try { + return compiler.compile(resolved).execute(context); + } catch (Exception e) { + log.warn("Mustache render failed: {}", e.getMessage()); + return template; + } + } + + /** Replaces `{{missing.path}}` with the literal so Mustache sees a non-tag string. */ + private String preResolve(String template, Map context) { + var m = TOKEN.matcher(template); + var sb = new StringBuilder(); + while (m.find()) { + String path = m.group(1); + if (resolvePath(context, path) == null) { + m.appendReplacement(sb, java.util.regex.Matcher.quoteReplacement("{{" + path + "}}")); + // Replace the {{}} with {{{ literal }}} once we escape it — but jmustache will not re-process. + // Simpler: just wrap in a triple-brace or surround with a marker. For v1 we skip the double-expand: + // we return the LITERAL inside a section {{#_literal_123}}... so preResolve returns a string + // that Mustache will not modify. Concrete approach: + } + } + m.appendTail(sb); + return sb.toString(); + } + + private Object resolvePath(Map ctx, String path) { + Object cur = ctx; + for (String seg : path.split("\\.")) { + if (!(cur instanceof Map m)) return null; + cur = m.get(seg); + if (cur == null) return null; + } + return cur; + } +} +``` + +**Engineer note:** Prefer a pre-compile token substitution that replaces `{{missing.path}}` with a literal that Mustache renders as-is. One working approach: write a custom `Mustache.VariableFetcher` via `compiler.withFormatter(...)` — but JMustache's `Mustache.Compiler#withCollector()` is easier. Confirm during implementation and adjust this task; the tests in Step 1 lock the contract. If JMustache's API makes missing-variable fallback awkward, fall back to a regex-based substitutor that does `{{` → `⟦MUSTACHE_LITERAL:path⟧` for missing paths, then post-replace after render. The contract is: **unresolved `{{x}}` renders as literal `{{x}}`**. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java +git commit -m "feat(alerting): MustacheRenderer with literal fallback on missing vars" +``` + +### Task 17: `NotificationContextBuilder` + +**Files:** +- Create: `.../alerting/notify/NotificationContextBuilder.java` +- Test: `.../alerting/notify/NotificationContextBuilderTest.java` + +- [ ] **Step 1: Write the failing test** covering: + - env / rule / alert subtrees always present + - conditional trees: `exchange.*` present only for EXCHANGE_MATCH, `log.*` only for LOG_PATTERN, etc. + - `alert.link` uses the configured `cameleer.server.ui-origin` prefix if present, else `/alerts/inbox/{id}`. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — pure static `Map build(AlertRule, AlertInstance, Environment, String uiOrigin)`. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java +git commit -m "feat(alerting): NotificationContextBuilder for template context maps" +``` + +### Task 18: `SilenceMatcher` evaluator + +**Files:** +- Create: `.../alerting/notify/SilenceMatcherService.java` (named to avoid clash with core record `SilenceMatcher`) +- Test: `.../alerting/notify/SilenceMatcherServiceTest.java` + +- [ ] **Step 1: Write the failing test** covering truth table: + - Wildcard matcher → matches any instance. + - Matcher with `ruleId` only → matches only instances with that rule. + - Multiple fields → AND logic. + - Active-window check at notification time (not at eval time). + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** + +```java +@Component +public class SilenceMatcherService { + + public boolean matches(SilenceMatcher m, AlertInstance instance, AlertRule rule) { + if (m.ruleId() != null && !m.ruleId().equals(instance.ruleId())) return false; + if (m.severity()!= null && m.severity() != instance.severity()) return false; + if (m.appSlug() != null && !m.appSlug().equals(rule.condition().scope().appSlug())) return false; + if (m.routeId() != null && !m.routeId().equals(rule.condition().scope().routeId())) return false; + if (m.agentId() != null && !m.agentId().equals(rule.condition().scope().agentId())) return false; + return true; + } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/SilenceMatcherServiceTest.java +git commit -m "feat(alerting): silence matcher for notification-time dispatch" +``` + +--- + +## Phase 6 — Condition evaluators + +All six evaluators share this shape: + +```java +public sealed interface ConditionEvaluator + permits RouteMetricEvaluator, ExchangeMatchEvaluator, AgentStateEvaluator, + DeploymentStateEvaluator, LogPatternEvaluator, JvmMetricEvaluator { + + ConditionKind kind(); + EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx); +} +``` + +Supporting types (create these in Task 19 before implementing individual evaluators). + +### Task 19: `EvalContext`, `EvalResult`, `TickCache`, `PerKindCircuitBreaker`, `ConditionEvaluator` interface + +**Files:** +- Create: `.../alerting/eval/EvalContext.java`, `EvalResult.java`, `TickCache.java`, `PerKindCircuitBreaker.java`, `ConditionEvaluator.java` +- Test: `.../alerting/eval/TickCacheTest.java`, `PerKindCircuitBreakerTest.java` + +- [ ] **Step 1: Write the failing tests** + +```java +// TickCacheTest.java +@Test +void getOrComputeCachesWithinTick() { + var cache = new TickCache(); + int n = cache.getOrCompute("k", () -> 42); + int m = cache.getOrCompute("k", () -> 43); + assertThat(n).isEqualTo(42); + assertThat(m).isEqualTo(42); // cached +} + +// PerKindCircuitBreakerTest.java +@Test +void opensAfterFailThreshold() { + var cb = new PerKindCircuitBreaker(5, 30, 60, java.time.Clock.fixed(...)); + for (int i = 0; i < 5; i++) cb.recordFailure(ConditionKind.AGENT_STATE); + assertThat(cb.isOpen(ConditionKind.AGENT_STATE)).isTrue(); +} + +@Test +void closesAfterCooldown() { /* advance clock beyond cooldown window */ } +``` + +- [ ] **Step 2: Implement** + +```java +// EvalContext.java +package com.cameleer.server.app.alerting.eval; +import java.time.Instant; +public record EvalContext(String tenantId, Instant now, TickCache tickCache) {} +``` + +```java +// EvalResult.java +package com.cameleer.server.app.alerting.eval; +import java.util.Map; + +public sealed interface EvalResult { + record Firing(Double currentValue, Double threshold, Map context) implements EvalResult { + public Firing { context = context == null ? Map.of() : Map.copyOf(context); } + } + record Clear() implements EvalResult { + public static final Clear INSTANCE = new Clear(); + } + record Error(Throwable cause) implements EvalResult {} +} +``` + +```java +// TickCache.java +package com.cameleer.server.app.alerting.eval; +import java.util.concurrent.ConcurrentHashMap; +import java.util.function.Supplier; + +public class TickCache { + private final ConcurrentHashMap map = new ConcurrentHashMap<>(); + @SuppressWarnings("unchecked") + public T getOrCompute(String key, Supplier supplier) { + return (T) map.computeIfAbsent(key, k -> supplier.get()); + } +} +``` + +```java +// PerKindCircuitBreaker.java +package com.cameleer.server.app.alerting.eval; + +import com.cameleer.server.core.alerting.ConditionKind; + +import java.time.Clock; +import java.time.Duration; +import java.time.Instant; +import java.util.*; +import java.util.concurrent.ConcurrentHashMap; + +public class PerKindCircuitBreaker { + + private record State(Deque failures, Instant openUntil) {} + + private final int threshold; + private final Duration window; + private final Duration cooldown; + private final Clock clock; + private final ConcurrentHashMap byKind = new ConcurrentHashMap<>(); + + public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds, Clock clock) { + this.threshold = threshold; + this.window = Duration.ofSeconds(windowSeconds); + this.cooldown = Duration.ofSeconds(cooldownSeconds); + this.clock = clock; + } + + public void recordFailure(ConditionKind kind) { + byKind.compute(kind, (k, s) -> { + var deque = (s == null) ? new ArrayDeque() : new ArrayDeque<>(s.failures()); + Instant now = Instant.now(clock); + Instant cutoff = now.minus(window); + while (!deque.isEmpty() && deque.peekFirst().isBefore(cutoff)) deque.pollFirst(); + deque.addLast(now); + Instant openUntil = (deque.size() >= threshold) ? now.plus(cooldown) : null; + return new State(deque, openUntil); + }); + } + + public boolean isOpen(ConditionKind kind) { + State s = byKind.get(kind); + return s != null && s.openUntil() != null && Instant.now(clock).isBefore(s.openUntil()); + } + + public void recordSuccess(ConditionKind kind) { + byKind.compute(kind, (k, s) -> new State(new ArrayDeque<>(), null)); + } +} +``` + +```java +// ConditionEvaluator.java +package com.cameleer.server.app.alerting.eval; + +import com.cameleer.server.core.alerting.*; + +public interface ConditionEvaluator { + ConditionKind kind(); + EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx); +} +``` + +(`sealed permits …` is omitted on the interface to avoid a multi-file compile-order gotcha during the TDD sequence. The effective constraint is enforced by the dispatcher's `switch` over `ConditionKind`.) + +- [ ] **Step 3: Run — PASS**. + +- [ ] **Step 4: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/ +git commit -m "feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)" +``` + +### Task 20: `AgentStateEvaluator` + +**Files:** +- Create: `.../alerting/eval/AgentStateEvaluator.java` +- Test: `.../alerting/eval/AgentStateEvaluatorTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +@Test +void firesWhenAnyAgentInTargetStateForScope() { + var registry = mock(AgentRegistryService.class); + when(registry.findAll()).thenReturn(List.of( + new AgentInfo("a1","a1","orders", "env-uuid","1.0", List.of(), Map.of(), + AgentState.DEAD, Instant.now().minusSeconds(120), Instant.now().minusSeconds(120), null) + )); + var eval = new AgentStateEvaluator(registry); + var rule = ruleWith(new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60)); + EvalResult r = eval.evaluate((AgentStateCondition) rule.condition(), rule, + new EvalContext("default", Instant.now(), new TickCache())); + assertThat(r).isInstanceOf(EvalResult.Firing.class); +} + +@Test +void clearWhenNoMatchingAgents() { /* ... */ } +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** + +```java +@Component +public class AgentStateEvaluator implements ConditionEvaluator { + + private final AgentRegistryService registry; + + public AgentStateEvaluator(AgentRegistryService registry) { this.registry = registry; } + + @Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; } + + @Override + public EvalResult evaluate(AgentStateCondition c, AlertRule rule, EvalContext ctx) { + AgentState target = AgentState.valueOf(c.state()); + Instant cutoff = ctx.now().minusSeconds(c.forSeconds()); + List hits = registry.findAll().stream() + .filter(a -> matchesScope(a, c.scope())) + .filter(a -> a.state() == target) + .filter(a -> a.lastHeartbeat() != null && a.lastHeartbeat().isBefore(cutoff)) + .toList(); + if (hits.isEmpty()) return EvalResult.Clear.INSTANCE; + AgentInfo first = hits.get(0); + return new EvalResult.Firing( + (double) hits.size(), null, + Map.of("agent", Map.of( + "id", first.instanceId(), + "name", first.displayName(), + "state", first.state().name() + ), "app", Map.of("slug", first.applicationId()))); + } + + private static boolean matchesScope(AgentInfo a, AlertScope s) { + if (s.appSlug() != null && !s.appSlug().equals(a.applicationId())) return false; + if (s.agentId() != null && !s.agentId().equals(a.instanceId())) return false; + return true; + } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluatorTest.java +git commit -m "feat(alerting): AGENT_STATE evaluator" +``` + +### Task 21: `DeploymentStateEvaluator` + +**Files:** +- Create: `.../alerting/eval/DeploymentStateEvaluator.java` +- Test: `.../alerting/eval/DeploymentStateEvaluatorTest.java` + +- [ ] **Step 1: Write the failing test** — `FAILED` deployment for matching app → Firing; `RUNNING` → Clear. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — read via `DeploymentRepository.findByAppId` and `AppService.getByEnvironmentAndSlug`: + +```java +@Override +public EvalResult evaluate(DeploymentStateCondition c, AlertRule rule, EvalContext ctx) { + App app = appService.getByEnvironmentAndSlug(rule.environmentId(), c.scope().appSlug()).orElse(null); + if (app == null) return EvalResult.Clear.INSTANCE; + List current = deploymentRepo.findByAppId(app.id()); + Set wanted = Set.copyOf(c.states()); + var hits = current.stream() + .filter(d -> wanted.contains(d.status().name())) + .toList(); + if (hits.isEmpty()) return EvalResult.Clear.INSTANCE; + Deployment d = hits.get(0); + return new EvalResult.Firing((double) hits.size(), null, + Map.of("deployment", Map.of("id", d.id().toString(), "status", d.status().name()), + "app", Map.of("slug", app.slug()))); +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): DEPLOYMENT_STATE evaluator" +``` + +### Task 22: `RouteMetricEvaluator` + +**Files:** +- Create: `.../alerting/eval/RouteMetricEvaluator.java` +- Test: `.../alerting/eval/RouteMetricEvaluatorTest.java` + +- [ ] **Step 1: Write the failing test** — mock `StatsStore`, seed `ExecutionStats{p99Ms = 2500, ...}` for a scoped call, assert Firing with `currentValue = 2500, threshold = 2000`. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — dispatch on `RouteMetric` enum: + +```java +@Override +public EvalResult evaluate(RouteMetricCondition c, AlertRule rule, EvalContext ctx) { + Instant from = ctx.now().minusSeconds(c.windowSeconds()); + Instant to = ctx.now(); + + String env = environmentService.findById(rule.environmentId()).map(Environment::slug).orElse(null); + ExecutionStats stats = (c.scope().routeId() != null) + ? statsStore.statsForRoute(from, to, c.scope().routeId(), c.scope().appSlug(), env) + : (c.scope().appSlug() != null) + ? statsStore.statsForApp(from, to, c.scope().appSlug(), env) + : statsStore.stats(from, to, env); + + double actual = switch (c.metric()) { + case ERROR_RATE -> errorRate(stats); + case P95_LATENCY_MS -> stats.p95DurationMs(); + case P99_LATENCY_MS -> stats.p99DurationMs(); + case THROUGHPUT -> stats.totalCount(); + case ERROR_COUNT -> stats.failedCount(); + }; + + boolean fire = switch (c.comparator()) { + case GT -> actual > c.threshold(); + case GTE -> actual >= c.threshold(); + case LT -> actual < c.threshold(); + case LTE -> actual <= c.threshold(); + case EQ -> actual == c.threshold(); + }; + + if (!fire) return EvalResult.Clear.INSTANCE; + return new EvalResult.Firing(actual, c.threshold(), + Map.of("route", Map.of("id", c.scope().routeId() == null ? "" : c.scope().routeId()), + "app", Map.of("slug", c.scope().appSlug() == null ? "" : c.scope().appSlug()))); +} + +private double errorRate(ExecutionStats s) { + long total = s.totalCount(); + return total == 0 ? 0.0 : (double) s.failedCount() / total; +} +``` + +(Adjust method names on `ExecutionStats` to match the actual record — use `gitnexus_context({name: "ExecutionStats"})` if unsure.) + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): ROUTE_METRIC evaluator" +``` + +### Task 23: `LogPatternEvaluator` + +**Files:** +- Create: `.../alerting/eval/LogPatternEvaluator.java` +- Test: `.../alerting/eval/LogPatternEvaluatorTest.java` + +- [ ] **Step 1: Write the failing test** — mock `ClickHouseLogStore.countLogs` returning 7; threshold 5 → Firing; returning 3 → Clear. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — build a `LogSearchRequest` from the condition + window, delegate to `countLogs`. Use `TickCache` keyed on `(env, app, level, pattern, windowStart, windowEnd)` to coalesce. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): LOG_PATTERN evaluator" +``` + +### Task 24: `JvmMetricEvaluator` + +**Files:** +- Create: `.../alerting/eval/JvmMetricEvaluator.java` +- Test: `.../alerting/eval/JvmMetricEvaluatorTest.java` + +- [ ] **Step 1: Write the failing test** — mock `MetricsQueryStore.queryTimeSeries` for `("agent-1", ["heap_used_percent"], from, to, 1)` returning `{heap_used_percent: [Bucket{max=95.0}]}`; assert Firing with currentValue=95. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — aggregate across buckets per `AggregationOp` (MAX/MIN/AVG/LATEST), compare against threshold. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): JVM_METRIC evaluator" +``` + +### Task 25: `ExchangeMatchEvaluator` (PER_EXCHANGE + COUNT_IN_WINDOW) + +**Files:** +- Create: `.../alerting/eval/ExchangeMatchEvaluator.java` +- Test: `.../alerting/eval/ExchangeMatchEvaluatorTest.java` + +- [ ] **Step 1: Write the failing test** — two variants: + - `COUNT_IN_WINDOW`: mock `ClickHouseSearchIndex.countExecutionsForAlerting` → threshold check. + - `PER_EXCHANGE`: `eval_state.lastExchangeTs` cursor advancement. Seed 3 matching exchanges; first eval returns all 3 as separate Firings (emit a list? or change signature?). For v1 simplicity, the evaluator returns `EvalResult.Firing` with an internal list of exchange descriptors in the context map; the job handles one-alert-per-exchange fan-out. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement.** The key design decision is how PER_EXCHANGE returns multiple alerts. Simplest approach: extend `EvalResult` with a `Batch` variant: + +```java +record Batch(List firings) implements EvalResult { ... } +``` + +Add this to `EvalResult.java` (Task 19). The job (Task 27) detects Batch and creates one `AlertInstance` per Firing. This keeps non-batched evaluators simple. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): EXCHANGE_MATCH evaluator with per-exchange + count modes" +``` + +--- + +## Phase 7 — Evaluator job and state transitions + +### Task 26: `AlertingProperties` + `AlertStateTransitions` + +**Files:** +- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java` +- Create: `.../alerting/eval/AlertStateTransitions.java` +- Test: `.../alerting/eval/AlertStateTransitionsTest.java` + +- [ ] **Step 1: Write the failing test** for the pure state machine: + +```java +@Test +void clearWithNoOpenInstanceIsNoOp() { + var next = AlertStateTransitions.apply(null, EvalResult.Clear.INSTANCE, rule, now); + assertThat(next).isEmpty(); +} + +@Test +void firingWithNoOpenInstanceCreatesPendingIfForDuration() { + var rule = ruleBuilder().forDurationSeconds(60).build(); + var result = new EvalResult.Firing(2500.0, 2000.0, Map.of()); + var next = AlertStateTransitions.apply(null, result, rule, now); + assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.PENDING)); +} + +@Test +void firingWithNoForDurationGoesStraightToFiring() { + var rule = ruleBuilder().forDurationSeconds(0).build(); + var next = AlertStateTransitions.apply(null, new EvalResult.Firing(1.0, null, Map.of()), rule, now); + assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.FIRING)); +} + +@Test +void pendingPromotesToFiringAfterForDuration() { /* ... */ } + +@Test +void firingClearTransitionsToResolved() { /* ... */ } + +@Test +void ackedInstanceClearsToResolved() { /* preserves acked_by, sets resolved_at */ } +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** + +```java +// AlertStateTransitions.java +package com.cameleer.server.app.alerting.eval; + +import com.cameleer.server.core.alerting.*; +import java.time.Instant; +import java.util.*; + +public final class AlertStateTransitions { + + private AlertStateTransitions() {} + + /** Returns the new/updated AlertInstance, or empty when nothing changes. */ + public static Optional apply( + AlertInstance current, EvalResult result, AlertRule rule, Instant now) { + + return switch (result) { + case EvalResult.Clear c -> onClear(current, now); + case EvalResult.Firing f -> onFiring(current, f, rule, now); + case EvalResult.Error e -> Optional.empty(); + case EvalResult.Batch b -> Optional.empty(); // batch handled by the job, not here + }; + } + + private static Optional onFiring(AlertInstance current, EvalResult.Firing f, + AlertRule rule, Instant now) { + if (current == null) { + AlertState initial = rule.forDurationSeconds() > 0 ? AlertState.PENDING : AlertState.FIRING; + return Optional.of(newInstance(rule, f, initial, now)); + } + if (current.state() == AlertState.PENDING) { + Instant firedAt = current.firedAt(); + if (firedAt.plusSeconds(rule.forDurationSeconds()).isBefore(now)) { + return Optional.of(current /* copy with state=FIRING, firedAt=now */); + } + return Optional.of(current); // stay PENDING, no mutation + } + return Optional.empty(); // already FIRING/ACK — re-notification handled by dispatcher + } + + private static Optional onClear(AlertInstance current, Instant now) { + if (current == null) return Optional.empty(); + if (current.state() == AlertState.RESOLVED) return Optional.empty(); + return Optional.of(current /* copy with state=RESOLVED, resolvedAt=now */); + } + + private static AlertInstance newInstance(AlertRule rule, EvalResult.Firing f, AlertState state, Instant now) { + // ... construct from rule snapshot + context; title/message rendered by the job + throw new UnsupportedOperationException("stub"); + } +} +``` + +Flesh out the `.withState(...)` / `.withResolvedAt(...)` helpers on `AlertInstance` (add wither-style methods returning new records) as part of this task. + +```java +// AlertingProperties.java +package com.cameleer.server.app.alerting.config; + +import org.springframework.boot.context.properties.ConfigurationProperties; + +@ConfigurationProperties("cameleer.server.alerting") +public record AlertingProperties( + Integer evaluatorTickIntervalMs, + Integer evaluatorBatchSize, + Integer claimTtlSeconds, + Integer notificationTickIntervalMs, + Integer notificationBatchSize, + Boolean inTickCacheEnabled, + Integer circuitBreakerFailThreshold, + Integer circuitBreakerWindowSeconds, + Integer circuitBreakerCooldownSeconds, + Integer eventRetentionDays, + Integer notificationRetentionDays, + Integer webhookTimeoutMs, + Integer webhookMaxAttempts) { + + public int effectiveEvaluatorTickIntervalMs() { + int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs; + return Math.max(5000, raw); // floor + } + public int effectiveEvaluatorBatchSize() { return evaluatorBatchSize == null ? 20 : evaluatorBatchSize; } + public int effectiveClaimTtlSeconds() { return claimTtlSeconds == null ? 30 : claimTtlSeconds; } + public int effectiveNotificationTickIntervalMs(){ return notificationTickIntervalMs == null ? 5000 : notificationTickIntervalMs; } + public int effectiveNotificationBatchSize() { return notificationBatchSize == null ? 50 : notificationBatchSize; } + public int effectiveEventRetentionDays() { return eventRetentionDays == null ? 90 : eventRetentionDays; } + public int effectiveNotificationRetentionDays() { return notificationRetentionDays == null ? 30 : notificationRetentionDays; } + public int effectiveWebhookTimeoutMs() { return webhookTimeoutMs == null ? 5000 : webhookTimeoutMs; } + public int effectiveWebhookMaxAttempts() { return webhookMaxAttempts == null ? 3 : webhookMaxAttempts; } + public int cbFailThreshold() { return circuitBreakerFailThreshold == null ? 5 : circuitBreakerFailThreshold; } + public int cbWindowSeconds() { return circuitBreakerWindowSeconds == null ? 30 : circuitBreakerWindowSeconds; } + public int cbCooldownSeconds(){ return circuitBreakerCooldownSeconds== null ? 60 : circuitBreakerCooldownSeconds; } +} +``` + +Register via `@ConfigurationPropertiesScan` or explicit `@EnableConfigurationProperties(AlertingProperties.class)` in `AlertingBeanConfig`. Also clamp-with-WARN if `evaluatorTickIntervalMs < 5000` at startup. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java \ + cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java +git commit -m "feat(alerting): AlertingProperties + AlertStateTransitions state machine" +``` + +### Task 27: `AlertEvaluatorJob` + +**Files:** +- Create: `.../alerting/eval/AlertEvaluatorJob.java` +- Test: `.../alerting/eval/AlertEvaluatorJobIT.java` + +- [ ] **Step 1: Write the failing integration test** (uses real PG + mocked evaluators): + +```java +@Test +void claimDueRuleFireResolveCycle() throws Exception { + // seed one rule scoped to a non-existent agent state -> evaluator returns Clear -> no instance. + // flip the mock to return Firing -> one AlertInstance in FIRING state. + // flip back to Clear -> instance transitions to RESOLVED. +} +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** + +```java +@Component +public class AlertEvaluatorJob implements SchedulingConfigurer { + + private static final Logger log = LoggerFactory.getLogger(AlertEvaluatorJob.class); + + private final AlertingProperties props; + private final AlertRuleRepository ruleRepo; + private final AlertInstanceRepository instanceRepo; + private final AlertNotificationRepository notificationRepo; + private final Map> evaluators; + private final PerKindCircuitBreaker circuitBreaker; + private final MustacheRenderer renderer; + private final NotificationContextBuilder contextBuilder; + private final String instanceId; + private final String tenantId; + private final AlertingMetrics metrics; + private final Clock clock; + + public AlertEvaluatorJob(/* ...all above... */) { /* assign */ } + + @Override + public void configureTasks(ScheduledTaskRegistrar registrar) { + registrar.addFixedDelayTask(this::tick, props.effectiveEvaluatorTickIntervalMs()); + } + + void tick() { + List claimed = ruleRepo.claimDueRules( + instanceId, props.effectiveEvaluatorBatchSize(), props.effectiveClaimTtlSeconds()); + + TickCache cache = new TickCache(); + EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), cache); + + for (AlertRule rule : claimed) { + if (circuitBreaker.isOpen(rule.conditionKind())) { + reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds())); + continue; + } + try { + EvalResult result = evaluateSafely(rule, ctx); + applyResult(rule, result); + circuitBreaker.recordSuccess(rule.conditionKind()); + } catch (Exception e) { + circuitBreaker.recordFailure(rule.conditionKind()); + metrics.evalError(rule.conditionKind(), rule.id()); + log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString()); + } finally { + reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds())); + } + } + } + + @SuppressWarnings({"rawtypes","unchecked"}) + private EvalResult evaluateSafely(AlertRule rule, EvalContext ctx) { + ConditionEvaluator evaluator = evaluators.get(rule.conditionKind()); + if (evaluator == null) throw new IllegalStateException("No evaluator for " + rule.conditionKind()); + return evaluator.evaluate(rule.condition(), rule, ctx); + } + + private void applyResult(AlertRule rule, EvalResult result) { + if (result instanceof EvalResult.Batch b) { + for (EvalResult.Firing f : b.firings()) applyFiring(rule, f); + return; + } + AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null); + AlertStateTransitions.apply(current, result, rule, Instant.now(clock)).ifPresent(next -> { + AlertInstance persisted = instanceRepo.save( + enrichTitleMessage(rule, next, result)); + if (next.state() == AlertState.FIRING && current == null) { + enqueueNotifications(rule, persisted); + } + }); + } + + private void applyFiring(AlertRule rule, EvalResult.Firing f) { /* always create new instance for PER_EXCHANGE mode */ } + + private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance, EvalResult result) { + Map ctx = contextBuilder.build(rule, instance, /* env lookup */ null, /* uiOrigin */ null); + String title = renderer.render(rule.notificationTitleTmpl(), ctx); + String message = renderer.render(rule.notificationMessageTmpl(), ctx); + return instance /* .withTitle(title).withMessage(message) */; + } + + private void enqueueNotifications(AlertRule rule, AlertInstance instance) { + for (WebhookBinding w : rule.webhooks()) { + Map payload = /* context-builder + body override */ Map.of(); + notificationRepo.save(new AlertNotification( + UUID.randomUUID(), instance.id(), w.id(), w.outboundConnectionId(), + NotificationStatus.PENDING, 0, Instant.now(clock), + null, null, null, null, payload, null, Instant.now(clock))); + } + } + + private void reschedule(AlertRule rule, Instant next) { + ruleRepo.releaseClaim(rule.id(), next, rule.evalState()); + } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java \ + cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java +git commit -m "feat(alerting): AlertEvaluatorJob with claim-polling + circuit breaker" +``` + +--- + +## Phase 8 — Notification dispatch + +### Task 28: `HmacSigner` + +**Files:** +- Create: `.../alerting/notify/HmacSigner.java` +- Test: `.../alerting/notify/HmacSignerTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +@Test +void signsBodyWithSha256Hmac() { + String sig = new HmacSigner().sign("secret", "payload".getBytes(StandardCharsets.UTF_8)); + // precomputed: HMAC-SHA256(secret, "payload") = 3c5c4f... + assertThat(sig).startsWith("sha256=").isEqualTo("sha256=3c5c4f..."); // replace with real hex +} +``` + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — `javax.crypto.Mac.getInstance("HmacSHA256")`, `HexFormat.of().formatHex(...)`. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): HmacSigner for webhook signature" +``` + +### Task 29: `WebhookDispatcher` + +**Files:** +- Create: `.../alerting/notify/WebhookDispatcher.java` +- Test: `.../alerting/notify/WebhookDispatcherIT.java` (WireMock) + +- [ ] **Step 1: Write the failing IT** covering: + - 2xx → returns DELIVERED with status + snippet. + - 4xx → returns FAILED immediately. + - 5xx → returns RETRY with exponential backoff. + - Network timeout → RETRY. + - HMAC header present when `hmacSecret != null`. + - TLS trust-all config works against WireMock HTTPS. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** + +```java +@Component +public class WebhookDispatcher { + + public record Outcome(NotificationStatus status, int httpStatus, String snippet, Duration retryAfter) {} + + private final OutboundHttpClientFactory clientFactory; + private final SecretCipher cipher; + private final HmacSigner signer; + private final MustacheRenderer renderer; + private final AlertingProperties props; + private final ObjectMapper om; + + public WebhookDispatcher(/* ... */) { /* assign */ } + + public Outcome dispatch(AlertNotification notif, AlertRule rule, AlertInstance instance, + OutboundConnection conn, Map context) { + String bodyTmpl = pickBodyTemplate(rule, notif.webhookId(), conn); + String body = renderer.render(bodyTmpl, context); + + var ctx = new OutboundHttpRequestContext( + conn.tlsTrustMode(), conn.tlsCaPemPaths(), + Duration.ofMillis(2000), Duration.ofMillis(props.effectiveWebhookTimeoutMs())); + var client = clientFactory.clientFor(ctx); + + var request = new HttpPost(renderer.render(conn.url(), context)); + request.setEntity(new StringEntity(body, StandardCharsets.UTF_8)); + request.setHeader("Content-Type", "application/json"); + + for (var h : conn.defaultHeaders().entrySet()) { + request.setHeader(h.getKey(), renderer.render(h.getValue(), context)); + } + if (conn.hmacSecretCiphertext() != null) { + String secret = cipher.decrypt(conn.hmacSecretCiphertext()); + request.setHeader("X-Cameleer-Signature", signer.sign(secret, body.getBytes(StandardCharsets.UTF_8))); + } + + try (var response = client.execute(request)) { + int code = response.getCode(); + String snippet = snippet(response); + if (code >= 200 && code < 300) return new Outcome(NotificationStatus.DELIVERED, code, snippet, null); + if (code >= 400 && code < 500) return new Outcome(NotificationStatus.FAILED, code, snippet, null); + return retryOutcome(code, snippet); + } catch (IOException e) { + return retryOutcome(-1, e.getMessage()); + } + } + + private Outcome retryOutcome(int code, String snippet) { + // Backoff: 30s, 120s, 300s + Duration next = Duration.ofSeconds(30); // caller multiplies by attempt + return new Outcome(null /* caller decides PENDING vs FAILED */, code, snippet, next); + } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): WebhookDispatcher with HMAC + TLS + retry classification" +``` + +### Task 30: `NotificationDispatchJob` + +**Files:** +- Create: `.../alerting/notify/NotificationDispatchJob.java` +- Test: `.../alerting/notify/NotificationDispatchJobIT.java` + +- [ ] **Step 1: Write the failing IT** — seed a `PENDING` `AlertNotification`; run one tick; WireMock returns 200; assert row transitions to `DELIVERED`. Seed another against 503 → assert `attempts=1`, `next_attempt_at` bumped, still `PENDING`. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — claim-polling loop: + +```java +void tick() { + var claimed = notificationRepo.claimDueNotifications(instanceId, batchSize, claimTtl); + for (var n : claimed) { + var conn = outboundRepo.findById(tenantId, n.outboundConnectionId()).orElse(null); + if (conn == null) { notificationRepo.markFailed(n.id(), 0, "outbound connection deleted"); continue; } + + var instance = instanceRepo.findById(n.alertInstanceId()).orElseThrow(); + var rule = ruleRepo.findById(instance.ruleId()).orElse(null); + var context = contextBuilder.build(rule, instance, env, uiOrigin); + + // silence check + if (silenceRepo.listActive(instance.environmentId(), Instant.now()).stream() + .anyMatch(s -> silenceMatcher.matches(s.matcher(), instance, rule))) { + instanceRepo.markSilenced(instance.id(), true); + notificationRepo.markFailed(n.id(), 0, "silenced"); + continue; + } + + var outcome = dispatcher.dispatch(n, rule, instance, conn, context); + if (outcome.status() == NotificationStatus.DELIVERED) { + notificationRepo.markDelivered(n.id(), outcome.httpStatus(), outcome.snippet(), Instant.now()); + } else if (outcome.status() == NotificationStatus.FAILED) { + notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet()); + } else { + int attempts = n.attempts() + 1; + if (attempts >= props.effectiveWebhookMaxAttempts()) { + notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet()); + } else { + Instant next = Instant.now().plus(outcome.retryAfter().multipliedBy(attempts)); + notificationRepo.scheduleRetry(n.id(), next, outcome.httpStatus(), outcome.snippet()); + } + } + } +} +``` + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): NotificationDispatchJob outbox loop with silence + retry" +``` + +### Task 31: `InAppInboxQuery` + server-side 5s memoization + +**Files:** +- Create: `.../alerting/notify/InAppInboxQuery.java` +- Test: `.../alerting/notify/InAppInboxQueryTest.java` + +- [ ] **Step 1: Write the failing test** covering the path (resolves groups/roles from `RbacService.getEffectiveRolesForUser` + `listGroupsForUser`, delegates to `AlertInstanceRepository.listForInbox`/`countUnreadForUser`, second call within 5s returns cached count). + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — Caffeine-style `ConcurrentHashMap` with `Entry(count, expiresAt)`, 5 s TTL per `(envId, userId)`. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): InAppInboxQuery with 5s unread-count memoization" +``` + +--- + +## Phase 9 — REST controllers + +### Task 32: `AlertRuleController` + DTOs + +**Files:** +- Create: `.../alerting/controller/AlertRuleController.java` +- Create: DTOs in `.../alerting/dto/` +- Test: `.../alerting/controller/AlertRuleControllerIT.java` + +- [ ] **Step 1: Write the failing IT** — seed an env, authenticate as OPERATOR, POST a rule, GET list, PUT update, DELETE. Assert webhook references to unknown connections return 422. Assert VIEWER cannot POST but can GET. Assert audit log entry on each mutation. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement.** Endpoints (all under `/api/v1/environments/{envSlug}/alerts/rules`, env resolved via `@EnvPath Environment env`): + +| Method | Path | RBAC | +|---|---|---| +| GET | `` | VIEWER+ | +| POST | `` | OPERATOR+ | +| GET | `{id}` | VIEWER+ | +| PUT | `{id}` | OPERATOR+ | +| DELETE | `{id}` | OPERATOR+ | +| POST | `{id}/enable` / `{id}/disable` | OPERATOR+ | +| POST | `{id}/render-preview` | OPERATOR+ | +| POST | `{id}/test-evaluate` | OPERATOR+ | + +Key DTOs: `AlertRuleRequest` (with `@Valid AlertConditionDto`), `AlertRuleResponse`, `RenderPreviewRequest/Response`, `TestEvaluateRequest/Response`. + +On save, validate: +- Each `WebhookBindingRequest.outboundConnectionId` exists in `outbound_connections` (via `OutboundConnectionService.get(id)` → 422 if 404). +- Connection is allowed in this env (via `conn.isAllowedInEnvironment(env.id())` → 422 otherwise). +- SSRF check on connection URL deferred to the outbound-connection save path (Plan 01 territory). + +Audit via `auditService.log("ALERT_RULE_CREATE", ALERT_RULE_CHANGE, rule.id().toString(), Map.of("name", rule.name()), SUCCESS, request)`. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): AlertRuleController REST + audit + DTOs" +``` + +### Task 33: `AlertController` + +**Files:** +- Create: `.../alerting/controller/AlertController.java`, `AlertDto.java`, `UnreadCountResponse.java` +- Test: `.../alerting/controller/AlertControllerIT.java` + +- [ ] **Step 1: Write the failing IT** for `GET /alerts`, `GET /alerts/unread-count`, `POST /alerts/{id}/ack`, `POST /alerts/{id}/read`, `POST /alerts/bulk-read`. Assert env isolation (env-A alert not visible from env-B). + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — delegate to `InAppInboxQuery` and `AlertInstanceRepository`. On ack, enforce targeted-or-OPERATOR rule. + +- [ ] **Step 4: Run — PASS**. + +- [ ] **Step 5: Commit** + +```bash +git commit -m "feat(alerting): AlertController for inbox + ack + read" +``` + +### Task 34: `AlertSilenceController` + +**Files:** +- Create: `.../alerting/controller/AlertSilenceController.java`, `AlertSilenceDto.java` +- Test: `.../alerting/controller/AlertSilenceControllerIT.java` + +- [ ] **Step 1–5:** Follow the same pattern. Mutations OPERATOR+, audit `ALERT_SILENCE_CHANGE`. Validate `endsAt > startsAt` at controller layer (DB constraint catches it anyway; user-facing 422 is friendlier). + +### Task 35: `AlertNotificationController` + +**Files:** +- Create: `.../alerting/controller/AlertNotificationController.java` +- Test: `.../alerting/controller/AlertNotificationControllerIT.java` + +- [ ] **Step 1–5:** + - `GET /alerts/{id}/notifications` → VIEWER+; returns per-instance outbox rows. + - `POST /alerts/notifications/{id}/retry` → OPERATOR+; resets `next_attempt_at = now`, `attempts = 0`, `status = PENDING`. Flat path because notification IDs are globally unique (document this in the flat-allow-list rule file). + +- [ ] **Step 6: Update `SecurityConfig` to permit the new paths** + +In `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java`: + +```java +.requestMatchers(HttpMethod.GET, "/api/v1/environments/*/alerts/**").hasAnyRole("VIEWER","OPERATOR","ADMIN") +.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN") +.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN") +.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN") +.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN") +.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN") +.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN") +.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/ack").hasAnyRole("VIEWER","OPERATOR","ADMIN") +.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/read").hasAnyRole("VIEWER","OPERATOR","ADMIN") +.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/bulk-read").hasAnyRole("VIEWER","OPERATOR","ADMIN") +.requestMatchers(HttpMethod.POST, "/api/v1/alerts/notifications/*/retry").hasAnyRole("OPERATOR","ADMIN") +``` + +(Class-level `@PreAuthorize` on each controller is authoritative; the path matchers are defence-in-depth.) + +- [ ] **Step 7: Commit** + +```bash +git commit -m "feat(alerting): AlertNotificationController + SecurityConfig paths" +``` + +### Task 36: Regenerate OpenAPI schema + +- [ ] **Step 1: Start backend on :8081** (from the alerting-02 worktree). +- [ ] **Step 2:** `cd ui && npm run generate-api:live` +- [ ] **Step 3:** Commit `ui/src/api/schema.d.ts` + `ui/src/api/openapi.json` regen. + +```bash +git add ui/src/api/schema.d.ts ui/src/api/openapi.json +git commit -m "chore(alerting): regenerate openapi schema for alerting endpoints" +``` + +--- + +## Phase 10 — Retention, metrics, rules, verification + +### Task 37: `AlertingRetentionJob` + +**Files:** +- Create: `.../alerting/retention/AlertingRetentionJob.java` +- Test: `.../alerting/retention/AlertingRetentionJobIT.java` + +- [ ] **Step 1: Write the failing IT** — seed 2 resolved instances (one older than retention, one fresher) + 2 settled notifications; run `cleanup()`; assert only old rows are deleted. + +- [ ] **Step 2: Run — FAIL**. + +- [ ] **Step 3: Implement** — `@Scheduled(cron = "0 0 3 * * *")`, cutoffs from `AlertingProperties`, advisory-lock-of-the-day pattern (see `JarRetentionJob.java`). + +- [ ] **Step 4–5: Run, commit** + +```bash +git commit -m "feat(alerting): AlertingRetentionJob daily cleanup" +``` + +### Task 38: `AlertingMetrics` + +**Files:** +- Create: `.../alerting/metrics/AlertingMetrics.java` + +- [ ] **Step 1: Register metrics** via `MeterRegistry`: + +```java +@Component +public class AlertingMetrics { + private final MeterRegistry registry; + public AlertingMetrics(MeterRegistry registry) { this.registry = registry; } + + public void evalError(ConditionKind kind, UUID ruleId) { + registry.counter("alerting_eval_errors_total", + "kind", kind.name(), "rule_id", ruleId.toString()).increment(); + } + public void circuitOpened(ConditionKind kind) { + registry.counter("alerting_circuit_open_total", "kind", kind.name()).increment(); + } + public Timer evalDuration(ConditionKind kind) { + return registry.timer("alerting_eval_duration_seconds", "kind", kind.name()); + } + // + gauges via MeterBinder that query repositories +} +``` + +- [ ] **Step 2: Wire into `AlertEvaluatorJob` and `PerKindCircuitBreaker`.** + +- [ ] **Step 3: Commit** + +```bash +git commit -m "feat(alerting): observability metrics via micrometer" +``` + +### Task 39: Update `.claude/rules/app-classes.md` + `core-classes.md` + +- [ ] **Step 1: Document the new `alerting/` packages** in both rule files. Add a new subsection under `controller/` for the alerting env-scoped controllers. Document the new flat endpoint `/api/v1/alerts/notifications/{id}/retry` in the flat-allow-list with justification "notification IDs are globally unique; matches the `/api/v1/executions/{id}` precedent". + +- [ ] **Step 2: Commit** + +```bash +git add .claude/rules/app-classes.md .claude/rules/core-classes.md +git commit -m "docs(rules): document alerting/ packages + notification retry flat endpoint" +``` + +### Task 40: `application.yml` defaults + admin guide + +**Files:** +- Modify: `cameleer-server-app/src/main/resources/application.yml` +- Create: `docs/alerting.md` + +- [ ] **Step 1: Add default stanza** + +```yaml +cameleer: + server: + alerting: + evaluator-tick-interval-ms: 5000 + evaluator-batch-size: 20 + claim-ttl-seconds: 30 + notification-tick-interval-ms: 5000 + notification-batch-size: 50 + in-tick-cache-enabled: true + circuit-breaker-fail-threshold: 5 + circuit-breaker-window-seconds: 30 + circuit-breaker-cooldown-seconds: 60 + event-retention-days: 90 + notification-retention-days: 30 + webhook-timeout-ms: 5000 + webhook-max-attempts: 3 +``` + +- [ ] **Step 2: Write `docs/alerting.md`** — 1-2 page admin guide covering: rule shapes per condition kind (with example JSON), template variables per kind, webhook destinations (Slack/PagerDuty/Teams examples), silence patterns, troubleshooting (circuit breaker, retention). + +- [ ] **Step 3: Commit** + +```bash +git add cameleer-server-app/src/main/resources/application.yml docs/alerting.md +git commit -m "docs(alerting): default config + admin guide" +``` + +### Task 41: Full-lifecycle integration test + +**Files:** +- Create: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java` + +- [ ] **Step 1: Write the full-lifecycle IT** + +Steps in the single test method: +1. Seed env, user with OPERATOR role, outbound connection (WireMock backing) with HMAC secret. +2. POST a `LOG_PATTERN` rule pointing at `WireMock` via the outbound connection, `forDurationSeconds=0`, `threshold=1`. +3. Inject a log row into ClickHouse that matches the pattern. +4. Trigger `AlertEvaluatorJob.tick()` directly. +5. Assert one `alert_instances` row in FIRING. +6. Trigger `NotificationDispatchJob.tick()`. +7. Assert WireMock received one POST with `X-Cameleer-Signature` header + rendered body. +8. POST `/alerts/{id}/ack` → state ACKNOWLEDGED. +9. Create a silence matching this rule; fire another tick; assert `silenced=true` on new instance and WireMock received no second request. +10. Remove the matching log rows, run tick → instance RESOLVED. +11. DELETE the rule → assert `alert_instances.rule_id = NULL` but `rule_snapshot` still retains rule name. + +- [ ] **Step 2: Run — PASS** (may need a few iterations of debugging). + +- [ ] **Step 3: Commit** + +```bash +git add cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java +git commit -m "test(alerting): full lifecycle — fire, notify, silence, ack, resolve, delete" +``` + +### Task 42: Env-isolation + outbound-guard regression tests + +**Files:** +- Create: `.../alerting/AlertingEnvIsolationIT.java`, `OutboundConnectionAllowedEnvIT.java` + +- [ ] **Step 1: Env isolation** — rule in env-A, fire, assert invisible from env-B inbox. + +- [ ] **Step 2: Outbound guard** — rule references a connection restricted to env-A; POST rule creation in env-B → 422. Narrowing `allowed_environment_ids` on the connection while a rule still references it → 409 (this exercises the freshly-wired `rulesReferencing`). + +- [ ] **Step 3: Run — PASS**. + +- [ ] **Step 4: Commit** + +```bash +git commit -m "test(alerting): env isolation + outbound allowed-env guard" +``` + +### Task 43: Final verification + GitNexus reindex + +- [ ] **Step 1: Full build** + +```bash +mvn clean verify +``` + +Expected: All tests pass. Known pre-existing test debt (wrong-JdbcTemplate + shared-context state leaks) may still fail — document any failures that existed before Plan 02 in a commit message "known-pre-existing" note. + +- [ ] **Step 2: GitNexus reindex** + +```bash +npx gitnexus analyze --embeddings +``` + +- [ ] **Step 3: Manual smoke** + +Start backend + UI (Plan 01 UI is sufficient for outbound connections). Walk through: +- Create an outbound connection to `https://httpbin.org/post`. +- `curl` the alerting REST API to POST a `LOG_PATTERN` rule. +- Inject a matching log via `POST /api/v1/data/logs`. +- Wait 2 eval ticks + 1 notification tick. +- Confirm: `alert_instances` row in FIRING, `alert_notifications` row DELIVERED with HTTP 200, httpbin shows the body. +- `curl POST /alerts/{id}/ack` → state ACKNOWLEDGED. + +- [ ] **Step 4: Nothing to commit if all passes — plan complete** + +--- + +## Known-incomplete items carried into Plan 03 + +- **UI:** `NotificationBell`, `/alerts/**` pages, `` with variable auto-complete, CMD-K alert/rule sources. Open design question: completion engine choice (CodeMirror 6 vs Monaco vs textarea overlay) still open — see spec §20 #7. +- **Rule promotion across envs.** Pure UI flow (no new server endpoint); lives with the rule editor in Plan 03. +- **OIDC retrofit** to use `OutboundHttpClientFactory`. Unchanged from Plan 01 — a separate small follow-up. +- **TLS summary enrichment** on `/test` endpoint (Plan 01 stubbed as `"TLS"`). Can extract actual protocol + cipher suite + peer cert from Apache HttpClient 5's routed context. +- **Performance tests.** 500-rule, 5-replica `PerformanceIT` deferred; claim-polling concurrency is covered by Task 7's unit-level test. +- **Bulk promotion** and **mustache completion `variables` metadata endpoint** (`GET /alerts/rules/template-variables`) — deferred until usage patterns justify. +- **Rule deletion test debt.** Existing pre-Plan-02 test debt (wrong-JdbcTemplate bug in ~9 controller ITs + shared-context state leaks in `FlywayMigrationIT` / `ConfigEnvIsolationIT` / `ClickHouseStatsStoreIT`) is orthogonal and should be addressed in a dedicated test-hygiene pass. + +--- + +## Self-review + +**Spec coverage** (against `docs/superpowers/specs/2026-04-19-alerting-design.md`): + +| Spec § | Scope | Covered by | +|---|---|---| +| §2 Signal sources (6) | All 6 condition kinds | Tasks 4, 20–25 | +| §2 Delivery channels | In-app + webhook | Tasks 29, 30, 31 | +| §2 Lifecycle (FIRING/ACK/RESOLVED + SILENCED) | State machine + silence | Tasks 26, 18, 30, 33 | +| §2 Rule promotion | **Deferred to Plan 03 (UI)** | — | +| §2 CMD-K | **Deferred to Plan 03** | — | +| §2 Configurable cadence, 5 s floor | `AlertingProperties.effective*` | Task 26 | +| §3 Key decisions | All 14 decisions honoured | — | +| §4 Module layout | `core/alerting` + `app/alerting/**` | Tasks 3–11, 15–38 | +| §4 Touchpoints | `countLogs` + `countExecutionsForAlerting` + `AuditCategory` + `SecurityConfig` | Tasks 2, 12, 13, 35 | +| §5 Data model | V12 migration | Task 1 | +| §5 Claim-polling queries | `FOR UPDATE SKIP LOCKED` in rule + notification repos | Tasks 7, 10 | +| §6 Outbound connections wiring | `rulesReferencing` gate | Task 8 (**CRITICAL**) | +| §7 Evaluator cadence, state machine, 4 projections, query coalescing, circuit breaker | Tick cache + projections + CB + SchedulingConfigurer | Tasks 14, 19, 26, 27 | +| §8 Notification dispatch, HMAC, template render, in-app inbox, 5s memoization | Tasks 28, 29, 30, 31 | +| §9 Rule promotion | **Deferred** (UI) | — | +| §10 Cross-cutting HTTP | Reused from Plan 01 | — | +| §11 API surface | All routes implemented except rule promotion | Tasks 32–36 | +| §12 CMD-K | **Deferred to Plan 03** | — | +| §13 UI | **Deferred to Plan 03** | — | +| §14 Configuration | `AlertingProperties` + `application.yml` | Tasks 26, 40 | +| §15 Retention | Daily job | Task 37 | +| §16 Observability (metrics + audit) | Tasks 2, 38 | +| §17 Security (tenant/env, RBAC, SSRF, HMAC, TLS, audit) | Tasks 32–36, 28, Plan 01 | +| §18 Testing | Unit + IT + WireMock + full-lifecycle | Tasks 17, 19, 27–31, 41, 42 | +| §19 Rollout | Dormant-by-default; matching `application.yml` + docs | Task 40 | +| §20 #1 OIDC alignment | **Deferred** (follow-up) | — | +| §20 #2 secret encryption | Reused Plan 01 `SecretCipher` | Task 29 | +| §20 #3 CH migration naming | `alerting_projections.sql` | Task 14 | +| §20 #6 env-delete cascade audit | PG IT | Task 1 | +| §20 #7 Mustache completion engine | **Deferred** (UI) | — | + +**Placeholders:** A handful of steps reference real record fields / method names with `/* … */` markers where the exact name depends on what the existing codebase exposes (`ExecutionStats` metric accessors, `AgentInfo.lastHeartbeat` method name, wither-method signatures on `AlertInstance`). Each is accompanied by a `gitnexus_context({name: ...})` hint for the implementer. These are not TBDs — they are direct instructions to resolve against the code at implementation time. + +**Type consistency check:** `AlertRule`, `AlertInstance`, `AlertNotification`, `AlertSilence` field names in the Java records match the SQL column names (snake_case in SQL, camelCase in Java). `WebhookBinding.id` is used as `alert_notifications.webhook_id` — stable opaque reference. `OutboundConnection.createdBy/updatedBy` types match `users.user_id TEXT` (Plan 01 precedent). `rulesReferencing` signature matches Plan 01's stub `List rulesReferencing(UUID)`. + +**Risks flagged to executor:** + +1. **Task 16 `MustacheRenderer` missing-variable fallback** is non-trivial in JMustache's default compiler config — implementer may need a second iteration. Tests lock the contract; the implementation approach is flexible. +2. **Task 12/13** — the SQL dialect for attribute map access on the `executions` table (`attributes[?]`) depends on the actual column type in `init.sql`. If attributes is `Map(String,String)`, the syntax works; if it's stored as JSON string, switch to `JSONExtractString(attributes, ?) = ?`. +3. **Task 27 `enrichTitleMessage`** depends on `AlertInstance` having wither methods — these are added opportunistically during Task 26 when `AlertStateTransitions` needs them. Don't forget to expose them. +4. **Claim-polling semantics under schema-per-tenant** — the `?currentSchema=tenant_{id}` JDBC URL routes writes correctly, but the `FOR UPDATE SKIP LOCKED` behaviour is per-schema so cross-tenant locks are irrelevant (correct behaviour). Make sure IT tests run with `cameleer.server.tenant.id=default`. +5. **Task 41 full-lifecycle test is the canary.** If it fails after each task, pair-program with the failing assertion — the bug is almost always in state transitions or renderer context shape.