Files
cameleer-server/docs/superpowers/plans/2026-04-19-alerting-02-backend.md

3429 lines
138 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Alerting — Plan 02 — Backend Implementation
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Deliver the server-side alerting feature described in `docs/superpowers/specs/2026-04-19-alerting-design.md` — domain model, storage, evaluators for all six condition kinds, notification dispatch (webhook + in-app inbox), REST API, retention, metrics, and integration tests. UI, CMD-K integration, and load tests are explicitly deferred to Plan 03.
**Architecture:** Confined to new `alerting/` packages in both `cameleer-server-core` (pure records + interfaces) and `cameleer-server-app` (Spring-wired storage, scheduling, REST). Postgres stores rules/instances/silences/notifications; ClickHouse stores observability data read by evaluators (new `countLogs` / `countExecutionsForAlerting` methods, four additive projections). Claim-polling `FOR UPDATE SKIP LOCKED` makes the evaluator and dispatcher horizontally scalable. Rule→connection wiring (`rulesReferencing`) is populated in this plan — it is the gate that unlocks safe production use of Plan 01.
**Tech Stack:** Java 17, Spring Boot 3.4.3, PostgreSQL (Flyway V12), ClickHouse (idempotent init SQL), JMustache for templates, Apache HttpClient 5 via Plan 01's `OutboundHttpClientFactory`, Testcontainers + JUnit 5 + WireMock + AssertJ for tests.
---
## Base branch
**Branch Plan 02 off `feat/alerting-01-outbound-infra`.** Plan 02 depends on Plan 01's `OutboundConnection` domain, `OutboundHttpClientFactory` bean, `SecretCipher`, `OutboundConnectionServiceImpl.rulesReferencing()` stub, the V11 migration, and the `OUTBOUND_CONNECTION_CHANGE` / `OUTBOUND_HTTP_TRUST_CHANGE` audit categories. Branching off `main` is **not** an option — those classes do not exist there yet. When Plan 01 merges, rebase Plan 02 onto main; until then Plan 02 is stacked PR #2.
```bash
# Execute in a fresh worktree
git fetch origin
git worktree add -b feat/alerting-02-backend .worktrees/alerting-02 feat/alerting-01-outbound-infra
cd .worktrees/alerting-02
mvn clean compile # confirm Plan 01 code compiles as baseline
```
---
## File Structure
### Created — `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/`
| File | Responsibility |
|---|---|
| `AlertingProperties.java` | Not here — see app module. |
| `AlertRule.java` | Immutable record: id, environmentId, name, description, severity, enabled, conditionKind, condition, evaluationIntervalSeconds, forDurationSeconds, reNotifyMinutes, notificationTitleTmpl, notificationMessageTmpl, webhooks, targets, nextEvaluationAt, claimedBy, claimedUntil, evalState, audit fields. |
| `AlertCondition.java` | Sealed interface; Jackson DEDUCTION polymorphism root. |
| `RouteMetricCondition.java` | Record: scope, metric, comparator, threshold, windowSeconds. |
| `ExchangeMatchCondition.java` | Record: scope, filter, fireMode, threshold, windowSeconds, perExchangeLingerSeconds. |
| `AgentStateCondition.java` | Record: scope, state, forSeconds. |
| `DeploymentStateCondition.java` | Record: scope, states. |
| `LogPatternCondition.java` | Record: scope, level, pattern, threshold, windowSeconds. |
| `JvmMetricCondition.java` | Record: scope, metric, aggregation, comparator, threshold, windowSeconds. |
| `AlertScope.java` | Record: appSlug?, routeId?, agentId? — nullable fields, used by all conditions. |
| `ConditionKind.java` | Enum mirror of SQL `condition_kind_enum`. |
| `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java` | Enums used in conditions. |
| `AlertSeverity.java` | Enum mirror of SQL `severity_enum`. |
| `AlertState.java` | Enum mirror of SQL `alert_state_enum`. |
| `AlertInstance.java` | Immutable record for `alert_instances` row. |
| `AlertRuleTarget.java` | Record for `alert_rule_targets` row. |
| `TargetKind.java` | Enum mirror of SQL `target_kind_enum`. |
| `AlertSilence.java` | Record: id, environmentId, matcher, reason, startsAt, endsAt, createdBy, createdAt. |
| `SilenceMatcher.java` | Record: ruleId?, appSlug?, routeId?, agentId?, severity?. |
| `AlertNotification.java` | Record for `alert_notifications` outbox row. |
| `NotificationStatus.java` | Enum mirror of SQL `notification_status_enum`. |
| `WebhookBinding.java` | Record embedded in `alert_rules.webhooks` JSONB: id, outboundConnectionId, bodyOverride?, headerOverrides?. |
| `AlertRuleRepository.java` | CRUD + claim-polling interface. |
| `AlertInstanceRepository.java` | CRUD + query-for-inbox interface. |
| `AlertSilenceRepository.java` | CRUD interface. |
| `AlertNotificationRepository.java` | CRUD + claim-polling interface. |
| `AlertReadRepository.java` | Mark-read + count-unread interface. |
### Created — `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/`
| File | Responsibility |
|---|---|
| `config/AlertingProperties.java` | `@ConfigurationProperties("cameleer.server.alerting")`. |
| `config/AlertingBeanConfig.java` | Bean wiring for repositories, evaluators, dispatch, mustache renderer, etc. |
| `storage/PostgresAlertRuleRepository.java` | JdbcTemplate impl of `AlertRuleRepository`. |
| `storage/PostgresAlertInstanceRepository.java` | JdbcTemplate impl. |
| `storage/PostgresAlertSilenceRepository.java` | JdbcTemplate impl. |
| `storage/PostgresAlertNotificationRepository.java` | JdbcTemplate impl. |
| `storage/PostgresAlertReadRepository.java` | JdbcTemplate impl. |
| `eval/EvalContext.java` | Per-tick context (tenantId, now, tickCache). |
| `eval/EvalResult.java` | Sealed: `Firing(value, threshold, contextMap)` / `Clear` / `Error(Throwable)`. |
| `eval/TickCache.java` | `ConcurrentHashMap<String,Object>` discarded per tick. |
| `eval/PerKindCircuitBreaker.java` | Failure window + cooldown per `ConditionKind`. |
| `eval/ConditionEvaluator.java` | Generic interface: `evaluate(C, AlertRule, EvalContext)`. |
| `eval/RouteMetricEvaluator.java` | Reads `StatsStore`. |
| `eval/ExchangeMatchEvaluator.java` | Reads `ClickHouseSearchIndex.countExecutionsForAlerting` + `SearchService.search` for PER_EXCHANGE cursor mode. |
| `eval/AgentStateEvaluator.java` | Reads `AgentRegistryService.findAll`. |
| `eval/DeploymentStateEvaluator.java` | Reads `DeploymentRepository.findByAppId`. |
| `eval/LogPatternEvaluator.java` | Reads new `ClickHouseLogStore.countLogs`. |
| `eval/JvmMetricEvaluator.java` | Reads `MetricsQueryStore.queryTimeSeries`. |
| `eval/AlertEvaluatorJob.java` | `@Component` implementing `SchedulingConfigurer`; claim-polling loop. |
| `eval/AlertStateTransitions.java` | Pure function: given current instance + EvalResult → new state + timestamps. |
| `notify/MustacheRenderer.java` | JMustache wrapper; resilient to bad templates. |
| `notify/NotificationContextBuilder.java` | Pure: builds context map from `AlertInstance` + rule + env. |
| `notify/SilenceMatcher.java` | Pure: evaluates a `SilenceMatcher` against an `AlertInstance`. |
| `notify/InAppInboxQuery.java` | Server-side query helper for `/alerts` and unread-count. |
| `notify/WebhookDispatcher.java` | Renders + POSTs + HMAC signs; classifies 2xx/4xx/5xx → status. |
| `notify/NotificationDispatchJob.java` | `@Component` `SchedulingConfigurer`; claim-polling on `alert_notifications`. |
| `notify/HmacSigner.java` | Pure: computes `sha256=<hmac(secret, body)>`. |
| `retention/AlertingRetentionJob.java` | `@Scheduled(cron = "0 0 3 * * *")` — delete old `alert_instances` + `alert_notifications`. |
| `controller/AlertRuleController.java` | `/api/v1/environments/{envSlug}/alerts/rules`. |
| `controller/AlertController.java` | `/api/v1/environments/{envSlug}/alerts` + instance actions. |
| `controller/AlertSilenceController.java` | `/api/v1/environments/{envSlug}/alerts/silences`. |
| `controller/AlertNotificationController.java` | `/api/v1/environments/{envSlug}/alerts/{id}/notifications`, `/alerts/notifications/{id}/retry`. |
| `dto/AlertRuleDto.java`, `dto/AlertDto.java`, `dto/AlertSilenceDto.java`, `dto/AlertNotificationDto.java`, `dto/ConditionDto.java`, `dto/WebhookBindingDto.java`, `dto/RenderPreviewRequest.java`, `dto/RenderPreviewResponse.java`, `dto/TestEvaluateRequest.java`, `dto/TestEvaluateResponse.java`, `dto/UnreadCountResponse.java` | Request/response DTOs. |
| `metrics/AlertingMetrics.java` | Micrometer registrations for counters/gauges/histograms. |
### Created — resources
| File | Responsibility |
|---|---|
| `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql` | Flyway migration: 5 enums, 6 tables, indexes, cascades. |
| `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql` | 4 projections on `executions` / `logs` / `agent_metrics`, all `IF NOT EXISTS`. |
### Modified
| File | Change |
|---|---|
| `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` | Add `ALERT_RULE_CHANGE`, `ALERT_SILENCE_CHANGE`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java` | Replace the `rulesReferencing(UUID)` stub with a call through `AlertRuleRepository.findRuleIdsByOutboundConnectionId`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java` | Add `long countLogs(LogSearchRequest)` — no `FINAL`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java` | Add `long countExecutionsForAlerting(AlertMatchSpec)` — no `FINAL`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java` | Run `alerting_projections.sql` via existing `ClickHouseSchemaInitializer`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java` | Permit new `/api/v1/environments/{envSlug}/alerts/**` path matchers with role-based access. |
| `cameleer-server-core/pom.xml` | Add `com.samskivert:jmustache:1.16`. |
| `.claude/rules/app-classes.md`, `.claude/rules/core-classes.md` | Document new packages. |
| `cameleer-server-app/src/main/resources/application.yml` | Default `AlertingProperties` stanza + comment linking to the admin guide. |
---
## Conventions
- **TDD.** Every task starts with a failing test, implements the minimum to pass, then commits.
- **One commit per task.** Commit messages: `feat(alerting): …`, `test(alerting): …`, `fix(alerting): …`, `chore(alerting): …`, `docs(alerting): …`.
- **Tenant invariant.** Every ClickHouse query and Postgres table referencing observability data filters by `tenantId` (injected via `AlertingBeanConfig` from `cameleer.server.tenant.id`).
- **No `FINAL`** on the two new CH count methods — alerting tolerates brief duplicate counts.
- **Jackson polymorphism** via `@JsonTypeInfo(use = DEDUCTION)` with `@JsonSubTypes` on `AlertCondition`.
- **Pure `core/`, Spring-only in `app/`.** No `@Component`, `@Service`, or `@Scheduled` annotations in `cameleer-server-core`.
- **Claim polling.** `FOR UPDATE SKIP LOCKED` + `claimed_by` / `claimed_until` with 30 s TTL.
- **Instance id** for claim ownership: use `InetAddress.getLocalHost().getHostName() + ":" + processPid()`; exposed as a bean `"alertingInstanceId"` of type `String`.
- **GitNexus hygiene.** Before modifying any existing class (`OutboundConnectionServiceImpl`, `ClickHouseLogStore`, `ClickHouseSearchIndex`, `AuditCategory`, `SecurityConfig`), run `gitnexus_impact({target: "<className>", direction: "upstream"})` and report blast radius. Run `gitnexus_detect_changes()` before each commit.
---
## Phase 1 — Flyway V12 migration and audit categories
### Task 1: `V12__alerting_tables.sql`
**Files:**
- Create: `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java`
- [ ] **Step 1: Write the failing integration test**
```java
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class V12MigrationIT extends AbstractPostgresIT {
@Test
void allAlertingTablesAndEnumsExist() {
var tables = jdbcTemplate.queryForList(
"SELECT table_name FROM information_schema.tables WHERE table_schema='public' " +
"AND table_name IN ('alert_rules','alert_rule_targets','alert_instances'," +
"'alert_silences','alert_notifications','alert_reads')",
String.class);
assertThat(tables).containsExactlyInAnyOrder(
"alert_rules","alert_rule_targets","alert_instances",
"alert_silences","alert_notifications","alert_reads");
var enums = jdbcTemplate.queryForList(
"SELECT typname FROM pg_type WHERE typname IN " +
"('severity_enum','condition_kind_enum','alert_state_enum'," +
"'target_kind_enum','notification_status_enum')",
String.class);
assertThat(enums).hasSize(5);
}
@Test
void deletingEnvironmentCascadesAlertingRows() {
var envId = java.util.UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-cascade-env");
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES (?, ?, 'x', 'a@b', true)", "u1", "u1");
var ruleId = java.util.UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " +
"notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " +
"VALUES (?, ?, 'r', 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'u1', 'u1')",
ruleId, envId);
var instanceId = java.util.UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO alert_instances (id, rule_id, rule_snapshot, environment_id, state, severity, " +
"fired_at, context, title, message) VALUES (?, ?, '{}'::jsonb, ?, 'FIRING', 'WARNING', " +
"now(), '{}'::jsonb, 't', 'm')",
instanceId, ruleId, envId);
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
assertThat(jdbcTemplate.queryForObject(
"SELECT count(*) FROM alert_rules WHERE environment_id = ?",
Integer.class, envId)).isZero();
assertThat(jdbcTemplate.queryForObject(
"SELECT count(*) FROM alert_instances WHERE environment_id = ?",
Integer.class, envId)).isZero();
}
}
```
- [ ] **Step 2: Run the test to verify it fails**
Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT`
Expected: FAIL — tables do not exist.
- [ ] **Step 3: Write the migration**
Create `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql`:
```sql
-- Enums (outbound_method_enum / outbound_auth_kind_enum / trust_mode_enum already exist from V11)
CREATE TYPE severity_enum AS ENUM ('CRITICAL','WARNING','INFO');
CREATE TYPE condition_kind_enum AS ENUM ('ROUTE_METRIC','EXCHANGE_MATCH','AGENT_STATE','DEPLOYMENT_STATE','LOG_PATTERN','JVM_METRIC');
CREATE TYPE alert_state_enum AS ENUM ('PENDING','FIRING','ACKNOWLEDGED','RESOLVED');
CREATE TYPE target_kind_enum AS ENUM ('USER','GROUP','ROLE');
CREATE TYPE notification_status_enum AS ENUM ('PENDING','DELIVERED','FAILED');
CREATE TABLE alert_rules (
id uuid PRIMARY KEY,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
name varchar(200) NOT NULL,
description text,
severity severity_enum NOT NULL,
enabled boolean NOT NULL DEFAULT true,
condition_kind condition_kind_enum NOT NULL,
condition jsonb NOT NULL,
evaluation_interval_seconds int NOT NULL DEFAULT 60 CHECK (evaluation_interval_seconds >= 5),
for_duration_seconds int NOT NULL DEFAULT 0 CHECK (for_duration_seconds >= 0),
re_notify_minutes int NOT NULL DEFAULT 60 CHECK (re_notify_minutes >= 0),
notification_title_tmpl text NOT NULL,
notification_message_tmpl text NOT NULL,
webhooks jsonb NOT NULL DEFAULT '[]',
next_evaluation_at timestamptz NOT NULL DEFAULT now(),
claimed_by varchar(64),
claimed_until timestamptz,
eval_state jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now(),
created_by text NOT NULL REFERENCES users(user_id),
updated_at timestamptz NOT NULL DEFAULT now(),
updated_by text NOT NULL REFERENCES users(user_id)
);
CREATE INDEX alert_rules_env_idx ON alert_rules (environment_id);
CREATE INDEX alert_rules_claim_due_idx ON alert_rules (next_evaluation_at) WHERE enabled = true;
CREATE TABLE alert_rule_targets (
id uuid PRIMARY KEY,
rule_id uuid NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
target_kind target_kind_enum NOT NULL,
target_id varchar(128) NOT NULL,
UNIQUE (rule_id, target_kind, target_id)
);
CREATE INDEX alert_rule_targets_lookup_idx ON alert_rule_targets (target_kind, target_id);
CREATE TABLE alert_instances (
id uuid PRIMARY KEY,
rule_id uuid REFERENCES alert_rules(id) ON DELETE SET NULL,
rule_snapshot jsonb NOT NULL,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
state alert_state_enum NOT NULL,
severity severity_enum NOT NULL,
fired_at timestamptz NOT NULL,
acked_at timestamptz,
acked_by text REFERENCES users(user_id),
resolved_at timestamptz,
last_notified_at timestamptz,
silenced boolean NOT NULL DEFAULT false,
current_value numeric,
threshold numeric,
context jsonb NOT NULL,
title text NOT NULL,
message text NOT NULL,
target_user_ids text[] NOT NULL DEFAULT '{}',
target_group_ids uuid[] NOT NULL DEFAULT '{}',
target_role_names text[] NOT NULL DEFAULT '{}'
);
CREATE INDEX alert_instances_inbox_idx ON alert_instances (environment_id, state, fired_at DESC);
CREATE INDEX alert_instances_open_rule_idx ON alert_instances (rule_id, state) WHERE rule_id IS NOT NULL;
CREATE INDEX alert_instances_resolved_idx ON alert_instances (resolved_at) WHERE state = 'RESOLVED';
CREATE INDEX alert_instances_target_u_idx ON alert_instances USING GIN (target_user_ids);
CREATE INDEX alert_instances_target_g_idx ON alert_instances USING GIN (target_group_ids);
CREATE INDEX alert_instances_target_r_idx ON alert_instances USING GIN (target_role_names);
CREATE TABLE alert_silences (
id uuid PRIMARY KEY,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
matcher jsonb NOT NULL,
reason text,
starts_at timestamptz NOT NULL,
ends_at timestamptz NOT NULL CHECK (ends_at > starts_at),
created_by text NOT NULL REFERENCES users(user_id),
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_silences_active_idx ON alert_silences (environment_id, ends_at);
CREATE TABLE alert_notifications (
id uuid PRIMARY KEY,
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
webhook_id uuid,
outbound_connection_id uuid REFERENCES outbound_connections(id) ON DELETE SET NULL,
status notification_status_enum NOT NULL DEFAULT 'PENDING',
attempts int NOT NULL DEFAULT 0,
next_attempt_at timestamptz NOT NULL DEFAULT now(),
claimed_by varchar(64),
claimed_until timestamptz,
last_response_status int,
last_response_snippet text,
payload jsonb NOT NULL,
delivered_at timestamptz,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_notifications_pending_idx ON alert_notifications (next_attempt_at) WHERE status = 'PENDING';
CREATE INDEX alert_notifications_instance_idx ON alert_notifications (alert_instance_id);
CREATE TABLE alert_reads (
user_id text NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
read_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (user_id, alert_instance_id)
);
```
Notes:
- Plan 01 established `users.user_id` as TEXT. All FK-to-users columns in this migration are `text`, not `uuid`.
- `target_user_ids` is `text[]` (matches `users.user_id`).
- `outbound_connections` (Plan 01) is referenced with `ON DELETE SET NULL` — matches the spec's "409 if referenced" semantics at the app layer while preserving referential cleanup if the admin-facing guard is bypassed.
- [ ] **Step 4: Run the test to verify it passes**
Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT`
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java
git commit -m "feat(alerting): V12 flyway migration for alerting tables"
```
### Task 2: Extend `AuditCategory`
**Files:**
- Modify: `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "AuditCategory", direction: "upstream"})` — report the blast radius (additive enum values are non-breaking; affected files are the admin rule file + any switch statements).
- [ ] **Step 2: Write the failing test**
```java
package com.cameleer.server.core.admin;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class AuditCategoryTest {
@Test
void alertingCategoriesPresent() {
assertThat(AuditCategory.valueOf("ALERT_RULE_CHANGE")).isNotNull();
assertThat(AuditCategory.valueOf("ALERT_SILENCE_CHANGE")).isNotNull();
}
}
```
- [ ] **Step 3: Run the test — FAIL**
Run: `mvn -pl cameleer-server-core test -Dtest=AuditCategoryTest`
Expected: FAIL — `IllegalArgumentException: No enum constant`.
- [ ] **Step 4: Add the enum values**
Replace the whole enum body with:
```java
package com.cameleer.server.core.admin;
public enum AuditCategory {
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE
}
```
- [ ] **Step 5: Run the test — PASS**
- [ ] **Step 6: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java \
cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java
git commit -m "feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories"
```
---
## Phase 2 — Core domain model
Each task in this phase adds a small, focused set of pure-Java records and enums under `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/`. All records use canonical constructors with explicit `@NotNull`-style defensive copying only for mutable collections (`List.copyOf`, `Map.copyOf`). Jackson polymorphism is handled by `@JsonTypeInfo(use = DEDUCTION)` on `AlertCondition`.
### Task 3: Enums + `AlertScope`
**Files:**
- Create: `.../alerting/AlertSeverity.java`, `AlertState.java`, `ConditionKind.java`, `TargetKind.java`, `NotificationStatus.java`, `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java`, `AlertScope.java`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.core.alerting;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class AlertScopeTest {
@Test
void allFieldsNullIsEnvWide() {
var s = new AlertScope(null, null, null);
assertThat(s.isEnvWide()).isTrue();
}
@Test
void appScoped() {
var s = new AlertScope("orders", null, null);
assertThat(s.isEnvWide()).isFalse();
assertThat(s.appSlug()).isEqualTo("orders");
}
@Test
void enumsHaveExpectedValues() {
assertThat(AlertSeverity.values()).containsExactly(
AlertSeverity.CRITICAL, AlertSeverity.WARNING, AlertSeverity.INFO);
assertThat(AlertState.values()).containsExactly(
AlertState.PENDING, AlertState.FIRING, AlertState.ACKNOWLEDGED, AlertState.RESOLVED);
assertThat(ConditionKind.values()).hasSize(6);
assertThat(TargetKind.values()).containsExactly(
TargetKind.USER, TargetKind.GROUP, TargetKind.ROLE);
assertThat(NotificationStatus.values()).containsExactly(
NotificationStatus.PENDING, NotificationStatus.DELIVERED, NotificationStatus.FAILED);
}
}
```
- [ ] **Step 2: Run — FAIL** (`cannot find symbol`).
Run: `mvn -pl cameleer-server-core test -Dtest=AlertScopeTest`
- [ ] **Step 3: Create the files**
```java
// AlertSeverity.java
package com.cameleer.server.core.alerting;
public enum AlertSeverity { CRITICAL, WARNING, INFO }
// AlertState.java
package com.cameleer.server.core.alerting;
public enum AlertState { PENDING, FIRING, ACKNOWLEDGED, RESOLVED }
// ConditionKind.java
package com.cameleer.server.core.alerting;
public enum ConditionKind { ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC }
// TargetKind.java
package com.cameleer.server.core.alerting;
public enum TargetKind { USER, GROUP, ROLE }
// NotificationStatus.java
package com.cameleer.server.core.alerting;
public enum NotificationStatus { PENDING, DELIVERED, FAILED }
// RouteMetric.java
package com.cameleer.server.core.alerting;
public enum RouteMetric { ERROR_RATE, P95_LATENCY_MS, P99_LATENCY_MS, THROUGHPUT, ERROR_COUNT }
// Comparator.java
package com.cameleer.server.core.alerting;
public enum Comparator { GT, GTE, LT, LTE, EQ }
// AggregationOp.java
package com.cameleer.server.core.alerting;
public enum AggregationOp { MAX, MIN, AVG, LATEST }
// FireMode.java
package com.cameleer.server.core.alerting;
public enum FireMode { PER_EXCHANGE, COUNT_IN_WINDOW }
// AlertScope.java
package com.cameleer.server.core.alerting;
public record AlertScope(String appSlug, String routeId, String agentId) {
public boolean isEnvWide() { return appSlug == null && routeId == null && agentId == null; }
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java
git commit -m "feat(alerting): core enums + AlertScope"
```
### Task 4: `AlertCondition` sealed hierarchy + Jackson polymorphism
**Files:**
- Create: `.../alerting/AlertCondition.java`, `RouteMetricCondition.java`, `ExchangeMatchCondition.java` (with nested `ExchangeFilter`), `AgentStateCondition.java`, `DeploymentStateCondition.java`, `LogPatternCondition.java`, `JvmMetricCondition.java`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.core.alerting;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class AlertConditionJsonTest {
private final ObjectMapper om = new ObjectMapper();
@Test
void roundtripRouteMetric() throws Exception {
var c = new RouteMetricCondition(
new AlertScope("orders", "route-1", null),
RouteMetric.P99_LATENCY_MS, Comparator.GT, 2000.0, 300);
String json = om.writeValueAsString((AlertCondition) c);
AlertCondition parsed = om.readValue(json, AlertCondition.class);
assertThat(parsed).isInstanceOf(RouteMetricCondition.class);
assertThat(parsed.kind()).isEqualTo(ConditionKind.ROUTE_METRIC);
}
@Test
void roundtripExchangeMatchPerExchange() throws Exception {
var c = new ExchangeMatchCondition(
new AlertScope("orders", null, null),
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of("type","payment")),
FireMode.PER_EXCHANGE, null, null, 300);
String json = om.writeValueAsString((AlertCondition) c);
AlertCondition parsed = om.readValue(json, AlertCondition.class);
assertThat(parsed).isInstanceOf(ExchangeMatchCondition.class);
}
@Test
void roundtripExchangeMatchCountInWindow() throws Exception {
var c = new ExchangeMatchCondition(
new AlertScope("orders", null, null),
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
FireMode.COUNT_IN_WINDOW, 5, 900, null);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(((ExchangeMatchCondition) parsed).threshold()).isEqualTo(5);
}
@Test
void roundtripAgentState() throws Exception {
var c = new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(AgentStateCondition.class);
}
@Test
void roundtripDeploymentState() throws Exception {
var c = new DeploymentStateCondition(new AlertScope("orders", null, null), List.of("FAILED","DEGRADED"));
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(DeploymentStateCondition.class);
}
@Test
void roundtripLogPattern() throws Exception {
var c = new LogPatternCondition(new AlertScope("orders", null, null),
"ERROR", "TimeoutException", 5, 900);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(LogPatternCondition.class);
}
@Test
void roundtripJvmMetric() throws Exception {
var c = new JvmMetricCondition(new AlertScope("orders", null, null),
"heap_used_percent", AggregationOp.MAX, Comparator.GT, 90.0, 300);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(JvmMetricCondition.class);
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Create the sealed hierarchy**
```java
// AlertCondition.java
package com.cameleer.server.core.alerting;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
@JsonTypeInfo(use = JsonTypeInfo.Id.DEDUCTION)
@JsonSubTypes({
@JsonSubTypes.Type(RouteMetricCondition.class),
@JsonSubTypes.Type(ExchangeMatchCondition.class),
@JsonSubTypes.Type(AgentStateCondition.class),
@JsonSubTypes.Type(DeploymentStateCondition.class),
@JsonSubTypes.Type(LogPatternCondition.class),
@JsonSubTypes.Type(JvmMetricCondition.class)
})
public sealed interface AlertCondition permits
RouteMetricCondition, ExchangeMatchCondition, AgentStateCondition,
DeploymentStateCondition, LogPatternCondition, JvmMetricCondition {
ConditionKind kind();
AlertScope scope();
}
```
```java
// RouteMetricCondition.java
package com.cameleer.server.core.alerting;
public record RouteMetricCondition(
AlertScope scope,
RouteMetric metric,
Comparator comparator,
double threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.ROUTE_METRIC; }
}
```
```java
// ExchangeMatchCondition.java
package com.cameleer.server.core.alerting;
import java.util.Map;
public record ExchangeMatchCondition(
AlertScope scope,
ExchangeFilter filter,
FireMode fireMode,
Integer threshold, // required when COUNT_IN_WINDOW; null for PER_EXCHANGE
Integer windowSeconds, // required when COUNT_IN_WINDOW
Integer perExchangeLingerSeconds // required when PER_EXCHANGE
) implements AlertCondition {
public ExchangeMatchCondition {
if (fireMode == FireMode.COUNT_IN_WINDOW && (threshold == null || windowSeconds == null))
throw new IllegalArgumentException("COUNT_IN_WINDOW requires threshold + windowSeconds");
if (fireMode == FireMode.PER_EXCHANGE && perExchangeLingerSeconds == null)
throw new IllegalArgumentException("PER_EXCHANGE requires perExchangeLingerSeconds");
}
@Override public ConditionKind kind() { return ConditionKind.EXCHANGE_MATCH; }
public record ExchangeFilter(String status, Map<String, String> attributes) {
public ExchangeFilter { attributes = attributes == null ? Map.of() : Map.copyOf(attributes); }
}
}
```
```java
// AgentStateCondition.java
package com.cameleer.server.core.alerting;
public record AgentStateCondition(AlertScope scope, String state, int forSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
}
```
```java
// DeploymentStateCondition.java
package com.cameleer.server.core.alerting;
import java.util.List;
public record DeploymentStateCondition(AlertScope scope, List<String> states) implements AlertCondition {
public DeploymentStateCondition { states = List.copyOf(states); }
@Override public ConditionKind kind() { return ConditionKind.DEPLOYMENT_STATE; }
}
```
```java
// LogPatternCondition.java
package com.cameleer.server.core.alerting;
public record LogPatternCondition(
AlertScope scope,
String level,
String pattern,
int threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.LOG_PATTERN; }
}
```
```java
// JvmMetricCondition.java
package com.cameleer.server.core.alerting;
public record JvmMetricCondition(
AlertScope scope,
String metric,
AggregationOp aggregation,
Comparator comparator,
double threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.JVM_METRIC; }
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java
git commit -m "feat(alerting): sealed AlertCondition hierarchy with Jackson deduction"
```
### Task 5: Core data records (`AlertRule`, `AlertInstance`, `AlertSilence`, `SilenceMatcher`, `AlertRuleTarget`, `AlertNotification`, `WebhookBinding`)
**Files:**
- Create: the seven records above under `.../alerting/`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.core.alerting;
import org.junit.jupiter.api.Test;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class AlertDomainRecordsTest {
@Test
void alertRuleDefensiveCopy() {
var webhooks = new java.util.ArrayList<WebhookBinding>();
webhooks.add(new WebhookBinding(UUID.randomUUID(), UUID.randomUUID(), null, null));
var r = newRule(webhooks);
webhooks.clear();
assertThat(r.webhooks()).hasSize(1);
}
@Test
void silenceMatcherAllFieldsNullMatchesEverything() {
var m = new SilenceMatcher(null, null, null, null, null);
assertThat(m.isWildcard()).isTrue();
}
private AlertRule newRule(List<WebhookBinding> wh) {
return new AlertRule(
UUID.randomUUID(), UUID.randomUUID(), "r", null,
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
60, 0, 60, "t", "m", wh, List.of(),
Instant.now(), null, null, Map.of(),
Instant.now(), "u1", Instant.now(), "u1");
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Create the records**
```java
// AlertRule.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
public record AlertRule(
UUID id,
UUID environmentId,
String name,
String description,
AlertSeverity severity,
boolean enabled,
ConditionKind conditionKind,
AlertCondition condition,
int evaluationIntervalSeconds,
int forDurationSeconds,
int reNotifyMinutes,
String notificationTitleTmpl,
String notificationMessageTmpl,
List<WebhookBinding> webhooks,
List<AlertRuleTarget> targets,
Instant nextEvaluationAt,
String claimedBy,
Instant claimedUntil,
Map<String, Object> evalState,
Instant createdAt,
String createdBy,
Instant updatedAt,
String updatedBy) {
public AlertRule {
webhooks = webhooks == null ? List.of() : List.copyOf(webhooks);
targets = targets == null ? List.of() : List.copyOf(targets);
evalState = evalState == null ? Map.of() : Map.copyOf(evalState);
}
}
```
```java
// AlertInstance.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
public record AlertInstance(
UUID id,
UUID ruleId, // nullable after rule deletion
Map<String, Object> ruleSnapshot,
UUID environmentId,
AlertState state,
AlertSeverity severity,
Instant firedAt,
Instant ackedAt,
String ackedBy,
Instant resolvedAt,
Instant lastNotifiedAt,
boolean silenced,
Double currentValue,
Double threshold,
Map<String, Object> context,
String title,
String message,
List<String> targetUserIds,
List<UUID> targetGroupIds,
List<String> targetRoleNames) {
public AlertInstance {
ruleSnapshot = ruleSnapshot == null ? Map.of() : Map.copyOf(ruleSnapshot);
context = context == null ? Map.of() : Map.copyOf(context);
targetUserIds = targetUserIds == null ? List.of() : List.copyOf(targetUserIds);
targetGroupIds = targetGroupIds == null ? List.of() : List.copyOf(targetGroupIds);
targetRoleNames = targetRoleNames == null ? List.of() : List.copyOf(targetRoleNames);
}
}
```
```java
// AlertRuleTarget.java
package com.cameleer.server.core.alerting;
import java.util.UUID;
public record AlertRuleTarget(UUID id, UUID ruleId, TargetKind kind, String targetId) {}
```
```java
// WebhookBinding.java
package com.cameleer.server.core.alerting;
import java.util.Map;
import java.util.UUID;
public record WebhookBinding(
UUID id,
UUID outboundConnectionId,
String bodyOverride,
Map<String, String> headerOverrides) {
public WebhookBinding {
headerOverrides = headerOverrides == null ? Map.of() : Map.copyOf(headerOverrides);
}
}
```
```java
// SilenceMatcher.java
package com.cameleer.server.core.alerting;
import java.util.UUID;
public record SilenceMatcher(
UUID ruleId, String appSlug, String routeId, String agentId, AlertSeverity severity) {
public boolean isWildcard() {
return ruleId == null && appSlug == null && routeId == null && agentId == null && severity == null;
}
}
```
```java
// AlertSilence.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.UUID;
public record AlertSilence(
UUID id,
UUID environmentId,
SilenceMatcher matcher,
String reason,
Instant startsAt,
Instant endsAt,
String createdBy,
Instant createdAt) {}
```
```java
// AlertNotification.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.Map;
import java.util.UUID;
public record AlertNotification(
UUID id,
UUID alertInstanceId,
UUID webhookId,
UUID outboundConnectionId,
NotificationStatus status,
int attempts,
Instant nextAttemptAt,
String claimedBy,
Instant claimedUntil,
Integer lastResponseStatus,
String lastResponseSnippet,
Map<String, Object> payload,
Instant deliveredAt,
Instant createdAt) {
public AlertNotification {
payload = payload == null ? Map.of() : Map.copyOf(payload);
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java
git commit -m "feat(alerting): core domain records (rule, instance, silence, notification)"
```
### Task 6: Repository interfaces
**Files:**
- Create: `.../alerting/AlertRuleRepository.java`, `AlertInstanceRepository.java`, `AlertSilenceRepository.java`, `AlertNotificationRepository.java`, `AlertReadRepository.java`
- No test (pure interfaces — covered by the Phase 3 integration tests).
- [ ] **Step 1: Create the interfaces**
```java
// AlertRuleRepository.java
package com.cameleer.server.core.alerting;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertRuleRepository {
AlertRule save(AlertRule rule); // upsert by id
Optional<AlertRule> findById(UUID id);
List<AlertRule> listByEnvironment(UUID environmentId);
List<AlertRule> findAllByOutboundConnectionId(UUID connectionId);
List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId); // used by rulesReferencing()
void delete(UUID id);
/** Claim up to batchSize rules whose next_evaluation_at <= now AND (claimed_until IS NULL OR claimed_until < now).
* Atomically sets claimed_by + claimed_until = now + ttl. Returns claimed rules. */
List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds);
/** Release claim + bump next_evaluation_at. */
void releaseClaim(UUID ruleId, java.time.Instant nextEvaluationAt,
java.util.Map<String, Object> evalState);
}
```
```java
// AlertInstanceRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertInstanceRepository {
AlertInstance save(AlertInstance instance); // upsert by id
Optional<AlertInstance> findById(UUID id);
Optional<AlertInstance> findOpenForRule(UUID ruleId); // state IN ('PENDING','FIRING','ACKNOWLEDGED')
List<AlertInstance> listForInbox(UUID environmentId,
List<String> userGroupIdFilter, // UUIDs as String? decide impl-side
String userId,
List<String> userRoleNames,
int limit);
long countUnreadForUser(UUID environmentId, String userId);
void ack(UUID id, String userId, Instant when);
void resolve(UUID id, Instant when);
void markSilenced(UUID id, boolean silenced);
void deleteResolvedBefore(Instant cutoff);
}
```
```java
// AlertSilenceRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertSilenceRepository {
AlertSilence save(AlertSilence silence);
Optional<AlertSilence> findById(UUID id);
List<AlertSilence> listActive(UUID environmentId, Instant when);
List<AlertSilence> listByEnvironment(UUID environmentId);
void delete(UUID id);
}
```
```java
// AlertNotificationRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertNotificationRepository {
AlertNotification save(AlertNotification n);
Optional<AlertNotification> findById(UUID id);
List<AlertNotification> listForInstance(UUID alertInstanceId);
List<AlertNotification> claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds);
void markDelivered(UUID id, int status, String snippet, Instant when);
void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet);
void markFailed(UUID id, int status, String snippet);
void deleteSettledBefore(Instant cutoff);
}
```
```java
// AlertReadRepository.java
package com.cameleer.server.core.alerting;
import java.util.List;
import java.util.UUID;
public interface AlertReadRepository {
void markRead(String userId, UUID alertInstanceId);
void bulkMarkRead(String userId, List<UUID> alertInstanceIds);
}
```
- [ ] **Step 2: Compile**
Run: `mvn -pl cameleer-server-core compile`
Expected: SUCCESS.
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/Alert*Repository.java
git commit -m "feat(alerting): core repository interfaces"
```
---
## Phase 3 — Postgres repositories
All repositories use `JdbcTemplate` and `ObjectMapper` for JSONB columns (same pattern as `PostgresOutboundConnectionRepository`). Convert UUID[] with `ConnectionCallback` + `Array.of("uuid", ...)` and text[] with `Array.of("text", ...)`.
### Task 7: `PostgresAlertRuleRepository`
**Files:**
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java`
- [ ] **Step 1: Write the failing integration test**
```java
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Test;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class PostgresAlertRuleRepositoryIT extends AbstractPostgresIT {
private PostgresAlertRuleRepository repo;
private UUID envId;
@AfterEach
void cleanup() {
jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId);
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-user'");
}
@org.junit.jupiter.api.BeforeEach
void setup() {
repo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
envId = UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-env-" + UUID.randomUUID());
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES ('test-user', 'test-user', 'x', 'a@b', true)");
}
@Test
void saveAndFindByIdRoundtrip() {
var rule = newRule(List.of());
repo.save(rule);
var found = repo.findById(rule.id()).orElseThrow();
assertThat(found.name()).isEqualTo(rule.name());
assertThat(found.condition()).isInstanceOf(AgentStateCondition.class);
}
@Test
void findRuleIdsByOutboundConnectionId() {
var connId = UUID.randomUUID();
var wb = new WebhookBinding(UUID.randomUUID(), connId, null, Map.of());
var rule = newRule(List.of(wb));
repo.save(rule);
List<UUID> ids = repo.findRuleIdsByOutboundConnectionId(connId);
assertThat(ids).containsExactly(rule.id());
assertThat(repo.findRuleIdsByOutboundConnectionId(UUID.randomUUID())).isEmpty();
}
@Test
void claimDueRulesAtomicSkipLocked() {
var rule = newRule(List.of());
repo.save(rule);
List<AlertRule> claimed = repo.claimDueRules("instance-A", 10, 30);
assertThat(claimed).hasSize(1);
// Second claimant sees nothing until first releases or TTL expires
List<AlertRule> second = repo.claimDueRules("instance-B", 10, 30);
assertThat(second).isEmpty();
}
private AlertRule newRule(List<WebhookBinding> webhooks) {
return new AlertRule(
UUID.randomUUID(), envId, "rule-" + UUID.randomUUID(), "desc",
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null, null, null), "DEAD", 60),
60, 0, 60, "t", "m", webhooks, List.of(),
Instant.now().minusSeconds(10), null, null, Map.of(),
Instant.now(), "test-user", Instant.now(), "test-user");
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement the repository**
```java
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.postgresql.util.PGobject;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.sql.Types;
import java.time.Instant;
import java.util.*;
public class PostgresAlertRuleRepository implements AlertRuleRepository {
private final JdbcTemplate jdbc;
private final ObjectMapper om;
public PostgresAlertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
this.jdbc = jdbc;
this.om = om;
}
@Override
public AlertRule save(AlertRule r) {
String sql = """
INSERT INTO alert_rules (id, environment_id, name, description, severity, enabled,
condition_kind, condition, evaluation_interval_seconds, for_duration_seconds,
re_notify_minutes, notification_title_tmpl, notification_message_tmpl,
webhooks, next_evaluation_at, claimed_by, claimed_until, eval_state,
created_at, created_by, updated_at, updated_by)
VALUES (?, ?, ?, ?, ?::severity_enum, ?, ?::condition_kind_enum, ?::jsonb, ?, ?, ?, ?, ?, ?::jsonb,
?, ?, ?, ?::jsonb, ?, ?, ?, ?)
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name, description = EXCLUDED.description,
severity = EXCLUDED.severity, enabled = EXCLUDED.enabled,
condition_kind = EXCLUDED.condition_kind, condition = EXCLUDED.condition,
evaluation_interval_seconds = EXCLUDED.evaluation_interval_seconds,
for_duration_seconds = EXCLUDED.for_duration_seconds,
re_notify_minutes = EXCLUDED.re_notify_minutes,
notification_title_tmpl = EXCLUDED.notification_title_tmpl,
notification_message_tmpl = EXCLUDED.notification_message_tmpl,
webhooks = EXCLUDED.webhooks, eval_state = EXCLUDED.eval_state,
updated_at = EXCLUDED.updated_at, updated_by = EXCLUDED.updated_by
""";
jdbc.update(sql,
r.id(), r.environmentId(), r.name(), r.description(),
r.severity().name(), r.enabled(), r.conditionKind().name(),
writeJson(r.condition()),
r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
r.notificationTitleTmpl(), r.notificationMessageTmpl(),
writeJson(r.webhooks()),
Timestamp.from(r.nextEvaluationAt()),
r.claimedBy(),
r.claimedUntil() == null ? null : Timestamp.from(r.claimedUntil()),
writeJson(r.evalState()),
Timestamp.from(r.createdAt()), r.createdBy(),
Timestamp.from(r.updatedAt()), r.updatedBy());
return r;
}
@Override
public Optional<AlertRule> findById(UUID id) {
var list = jdbc.query("SELECT * FROM alert_rules WHERE id = ?", rowMapper(), id);
return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
}
@Override
public List<AlertRule> listByEnvironment(UUID environmentId) {
return jdbc.query(
"SELECT * FROM alert_rules WHERE environment_id = ? ORDER BY created_at DESC",
rowMapper(), environmentId);
}
@Override
public List<AlertRule> findAllByOutboundConnectionId(UUID connectionId) {
String sql = """
SELECT * FROM alert_rules
WHERE webhooks @> ?::jsonb
ORDER BY created_at DESC
""";
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
return jdbc.query(sql, rowMapper(), predicate);
}
@Override
public List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId) {
String sql = """
SELECT id FROM alert_rules
WHERE webhooks @> ?::jsonb
""";
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
return jdbc.queryForList(sql, UUID.class, predicate);
}
@Override
public void delete(UUID id) {
jdbc.update("DELETE FROM alert_rules WHERE id = ?", id);
}
@Override
public List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds) {
String sql = """
UPDATE alert_rules
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
WHERE id IN (
SELECT id FROM alert_rules
WHERE enabled = true
AND next_evaluation_at <= now()
AND (claimed_until IS NULL OR claimed_until < now())
ORDER BY next_evaluation_at
LIMIT ?
FOR UPDATE SKIP LOCKED
)
RETURNING *
""";
return jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize);
}
@Override
public void releaseClaim(UUID ruleId, Instant nextEvaluationAt, Map<String, Object> evalState) {
jdbc.update("""
UPDATE alert_rules
SET claimed_by = NULL, claimed_until = NULL,
next_evaluation_at = ?, eval_state = ?::jsonb
WHERE id = ?
""",
Timestamp.from(nextEvaluationAt), writeJson(evalState), ruleId);
}
private RowMapper<AlertRule> rowMapper() {
return (rs, i) -> {
ConditionKind kind = ConditionKind.valueOf(rs.getString("condition_kind"));
AlertCondition cond = om.readValue(rs.getString("condition"), AlertCondition.class);
List<WebhookBinding> webhooks = om.readValue(
rs.getString("webhooks"), new TypeReference<>() {});
Map<String, Object> evalState = om.readValue(
rs.getString("eval_state"), new TypeReference<>() {});
Timestamp cu = rs.getTimestamp("claimed_until");
return new AlertRule(
(UUID) rs.getObject("id"),
(UUID) rs.getObject("environment_id"),
rs.getString("name"),
rs.getString("description"),
AlertSeverity.valueOf(rs.getString("severity")),
rs.getBoolean("enabled"),
kind, cond,
rs.getInt("evaluation_interval_seconds"),
rs.getInt("for_duration_seconds"),
rs.getInt("re_notify_minutes"),
rs.getString("notification_title_tmpl"),
rs.getString("notification_message_tmpl"),
webhooks, List.of(),
rs.getTimestamp("next_evaluation_at").toInstant(),
rs.getString("claimed_by"),
cu == null ? null : cu.toInstant(),
evalState,
rs.getTimestamp("created_at").toInstant(),
rs.getString("created_by"),
rs.getTimestamp("updated_at").toInstant(),
rs.getString("updated_by"));
};
}
private String writeJson(Object o) {
try { return om.writeValueAsString(o); }
catch (Exception e) { throw new IllegalStateException(e); }
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_rules"
```
### Task 8: Wire `OutboundConnectionServiceImpl.rulesReferencing()` (CRITICAL — Plan 01 gate)
> **This is the Plan 01 known-incomplete item.** Plan 01 shipped `rulesReferencing()` returning `[]`. Until this task lands, outbound connections can be deleted or narrowed while rules reference them, corrupting production. **Do not skip or defer.**
**Files:**
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java`
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "OutboundConnectionServiceImpl", direction: "upstream"})`. Report blast radius. Expected: controller + bean config + UI hooks (Plan 01). No production paths should be affected by replacing a stub with real behaviour.
- [ ] **Step 2: Write the failing integration test**
```java
package com.cameleer.server.app.outbound;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.*;
import com.cameleer.server.core.outbound.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class OutboundConnectionServiceRulesReferencingIT extends AbstractPostgresIT {
@Autowired OutboundConnectionService service;
@Autowired OutboundConnectionRepository repo;
private UUID envId;
private UUID connId;
private PostgresAlertRuleRepository ruleRepo;
@BeforeEach
void seed() {
ruleRepo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
envId = UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "env-" + UUID.randomUUID());
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES ('u-ref', 'u-ref', 'x', 'a@b', true) ON CONFLICT DO NOTHING");
var c = repo.save(new OutboundConnection(
UUID.randomUUID(), "default", "conn", null, "https://example.test",
OutboundMethod.POST, Map.of(), null, TrustMode.SYSTEM_DEFAULT, List.of(), null,
OutboundAuth.None.INSTANCE, List.of(),
Instant.now(), "u-ref", Instant.now(), "u-ref"));
connId = c.id();
var rule = new AlertRule(
UUID.randomUUID(), envId, "r", null, AlertSeverity.WARNING, true,
ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
60, 0, 60, "t", "m",
List.of(new WebhookBinding(UUID.randomUUID(), connId, null, Map.of())),
List.of(), Instant.now(), null, null, Map.of(),
Instant.now(), "u-ref", Instant.now(), "u-ref");
ruleRepo.save(rule);
}
@Test
void deleteConnectionReferencedByRuleReturns409() {
assertThat(service.rulesReferencing(connId)).hasSize(1);
assertThatThrownBy(() -> service.delete(connId, "u-ref"))
.hasMessageContaining("referenced by rules");
}
}
```
- [ ] **Step 3: Run — FAIL** (stub returns empty list, so delete succeeds).
- [ ] **Step 4: Replace the stub**
In `OutboundConnectionServiceImpl.java`:
```java
// existing imports + add:
import com.cameleer.server.core.alerting.AlertRuleRepository;
public class OutboundConnectionServiceImpl implements OutboundConnectionService {
private final OutboundConnectionRepository repo;
private final AlertRuleRepository ruleRepo; // NEW
private final String tenantId;
public OutboundConnectionServiceImpl(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
String tenantId) {
this.repo = repo;
this.ruleRepo = ruleRepo;
this.tenantId = tenantId;
}
// … create/update/delete/get/list unchanged …
@Override
public List<UUID> rulesReferencing(UUID id) {
return ruleRepo.findRuleIdsByOutboundConnectionId(id);
}
}
```
Update `OutboundBeanConfig.java` to inject `AlertRuleRepository`:
```java
@Bean
public OutboundConnectionService outboundConnectionService(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
@Value("${cameleer.server.tenant.id:default}") String tenantId) {
return new OutboundConnectionServiceImpl(repo, ruleRepo, tenantId);
}
```
Add the `AlertRuleRepository` bean in a new `AlertingBeanConfig.java` stub (completed in Phase 7):
```java
package com.cameleer.server.app.alerting.config;
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.AlertRuleRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
@Configuration
public class AlertingBeanConfig {
@Bean
public AlertRuleRepository alertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertRuleRepository(jdbc, om);
}
}
```
- [ ] **Step 5: Run — PASS**.
- [ ] **Step 6: GitNexus detect_changes + commit**
```bash
# Verify scope
# gitnexus_detect_changes({scope: "staged"})
git add cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java
git commit -m "fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)"
```
### Task 9: `PostgresAlertInstanceRepository`
**Files:**
- Create: `.../alerting/storage/PostgresAlertInstanceRepository.java`
- Test: `.../alerting/storage/PostgresAlertInstanceRepositoryIT.java`
- [ ] **Step 1: Write the failing test** covering: save/findById, findOpenForRule (filter `state IN ('PENDING','FIRING','ACKNOWLEDGED')`), listForInbox with user/group/role filters (seed 3 instances: one targeting user, one targeting group, one targeting role; assert listForInbox returns all three for a user in those groups/roles), countUnreadForUser (uses LEFT JOIN `alert_reads`), ack, resolve, deleteResolvedBefore.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — same RowMapper pattern as Task 7. Key queries:
```sql
-- findOpenForRule
SELECT * FROM alert_instances
WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED')
ORDER BY fired_at DESC LIMIT 1;
-- listForInbox (bind userId, groupIds array, roleNames array as ? placeholders)
SELECT * FROM alert_instances
WHERE environment_id = ?
AND state IN ('FIRING','ACKNOWLEDGED','RESOLVED')
AND (
? = ANY(target_user_ids)
OR target_group_ids && ?::uuid[]
OR target_role_names && ?::text[]
)
ORDER BY fired_at DESC LIMIT ?;
-- countUnreadForUser
SELECT count(*) FROM alert_instances ai
WHERE ai.environment_id = ?
AND ai.state IN ('FIRING','ACKNOWLEDGED')
AND (
? = ANY(ai.target_user_ids)
OR ai.target_group_ids && ?::uuid[]
OR ai.target_role_names && ?::text[]
)
AND NOT EXISTS (
SELECT 1 FROM alert_reads ar
WHERE ar.alert_instance_id = ai.id AND ar.user_id = ?
);
```
Array binding via `connection.createArrayOf("uuid", uuids)` / `createArrayOf("text", names)` inside a `ConnectionCallback`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_instances with inbox queries"
```
### Task 10: `PostgresAlertSilenceRepository`, `PostgresAlertNotificationRepository`, `PostgresAlertReadRepository`
**Files:**
- Create: three repositories under `.../alerting/storage/`
- Test: one IT per repository in `.../alerting/storage/`
- [ ] **Step 1: Write all three failing ITs** (one file each). Cover:
- `Silence`: save/findById, listActive filters by `now BETWEEN starts_at AND ends_at`, delete.
- `Notification`: save/findById, claimDueNotifications (SKIP LOCKED), scheduleRetry bumps attempts + `next_attempt_at`, markDelivered + markFailed transition status, deleteSettledBefore purges `DELIVERED` + `FAILED`.
- `Read`: markRead is idempotent (uses `ON CONFLICT DO NOTHING`), bulkMarkRead handles empty list.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** following the same JdbcTemplate pattern. Notification claim query mirrors Task 7's rule claim:
```sql
UPDATE alert_notifications
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
WHERE id IN (
SELECT id FROM alert_notifications
WHERE status = 'PENDING'
AND next_attempt_at <= now()
AND (claimed_until IS NULL OR claimed_until < now())
ORDER BY next_attempt_at
LIMIT ?
FOR UPDATE SKIP LOCKED
)
RETURNING *;
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/ \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/Postgres*IT.java
git commit -m "feat(alerting): Postgres repositories for silences, notifications, reads"
```
### Task 11: Wire all alerting repositories in `AlertingBeanConfig`
**Files:**
- Modify: `.../alerting/config/AlertingBeanConfig.java`
- [ ] **Step 1: Add beans for the remaining repositories**
```java
@Bean public AlertInstanceRepository alertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertInstanceRepository(jdbc, om);
}
@Bean public AlertSilenceRepository alertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertSilenceRepository(jdbc, om);
}
@Bean public AlertNotificationRepository alertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertNotificationRepository(jdbc, om);
}
@Bean public AlertReadRepository alertReadRepository(JdbcTemplate jdbc) {
return new PostgresAlertReadRepository(jdbc);
}
```
- [ ] **Step 2: Verify compile + existing ITs still pass**
```bash
mvn -pl cameleer-server-app test -Dtest='PostgresAlert*IT'
```
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java
git commit -m "feat(alerting): wire all alerting repository beans"
```
---
## Phase 4 — ClickHouse reads: new count methods and projections
### Task 12: Add `ClickHouseLogStore.countLogs(LogSearchRequest)`
**Files:**
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "ClickHouseLogStore", direction: "upstream"})`. Expected callers: `LogQueryController`, `ContainerLogForwarder`, `ClickHouseConfig`. Adding a method is non-breaking — no downstream callers affected.
- [ ] **Step 2: Write the failing test**
```java
package com.cameleer.server.app.search;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.search.LogSearchRequest;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import java.time.Instant;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
class ClickHouseLogStoreCountIT extends AbstractPostgresIT {
@Autowired ClickHouseLogStore store;
@Test
void countLogsRespectsLevelPatternAndWindow() {
// Seed 3 ERROR TimeoutException + 2 INFO rows in 'orders' app for env 'dev' within last 5 min
// (seed helper uses existing `indexBatch` path)
long count = store.countLogs(new LogSearchRequest(
/* environment */ "dev",
/* application */ "orders",
/* agentId */ null,
/* exchangeId */ null,
/* logger */ null,
/* sources */ List.of(),
/* levels */ List.of("ERROR"),
/* q */ "TimeoutException",
/* from */ Instant.now().minusSeconds(300),
/* to */ Instant.now(),
/* cursor */ null,
/* limit */ 100,
/* sort */ "desc"
));
assertThat(count).isEqualTo(3);
}
}
```
(Adjust `LogSearchRequest` constructor to the actual record signature — check `cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java` for exact order.)
- [ ] **Step 3: Run — FAIL**.
- [ ] **Step 4: Implement the method**
In `ClickHouseLogStore.java`, add a new public method. Reuse the WHERE-clause builder already used by `search(LogSearchRequest)`, but:
- No `FINAL`.
- Skip cursor, limit, sort.
- `SELECT count() FROM logs WHERE <tenant + env + app + level IN (...) + logger + q LIKE + timestamp BETWEEN>`.
- Include the `tenant_id = ?` predicate.
```java
public long countLogs(LogSearchRequest request) {
StringBuilder where = new StringBuilder("tenant_id = ? AND timestamp BETWEEN ? AND ?");
List<Object> args = new ArrayList<>();
args.add(tenantId);
args.add(Timestamp.from(request.from()));
args.add(Timestamp.from(request.to()));
if (request.environment() != null) { where.append(" AND environment = ?"); args.add(request.environment()); }
if (request.application() != null) { where.append(" AND application = ?"); args.add(request.application()); }
// … level multi, logger, q (positionCaseInsensitive(message, ?) > 0), exchangeId, agentId …
String sql = "SELECT count() FROM logs WHERE " + where; // NO FINAL
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
return n == null ? 0L : n;
}
```
(Imports: `java.sql.Timestamp`, `java.util.ArrayList`.)
- [ ] **Step 5: Run — PASS**.
- [ ] **Step 6: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java
git commit -m "feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator"
```
### Task 13: Add `ClickHouseSearchIndex.countExecutionsForAlerting(AlertMatchSpec)`
**Files:**
- Create: `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java`
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "ClickHouseSearchIndex", direction: "upstream"})`. Additive method — no downstream breakage.
- [ ] **Step 2: Create `AlertMatchSpec` record**
```java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.Map;
/** Specification for alerting-specific execution counting.
* Distinct from SearchRequest: no text-in-body subqueries, no cursor, no FINAL.
* All fields except tenant/env/from/to are nullable filters. */
public record AlertMatchSpec(
String tenantId,
String environment,
String applicationId, // nullable
String routeId, // nullable
String status, // "FAILED" / "SUCCESS" / null
Map<String, String> attributes, // exact match on execution attribute key=value
Instant from,
Instant to,
Instant after // nullable; used by PER_EXCHANGE to advance cursor
) {
public AlertMatchSpec {
attributes = attributes == null ? Map.of() : Map.copyOf(attributes);
}
}
```
- [ ] **Step 3: Write the failing test** — seed a mix of FAILED/SUCCESS executions with various attribute maps, assert count matches.
- [ ] **Step 4: Run — FAIL**.
- [ ] **Step 5: Implement on `ClickHouseSearchIndex`**
```java
public long countExecutionsForAlerting(AlertMatchSpec spec) {
StringBuilder where = new StringBuilder(
"tenant_id = ? AND environment = ? AND start_time BETWEEN ? AND ?");
List<Object> args = new ArrayList<>();
args.add(spec.tenantId());
args.add(spec.environment());
args.add(Timestamp.from(spec.from()));
args.add(Timestamp.from(spec.to()));
if (spec.applicationId() != null) { where.append(" AND application_id = ?"); args.add(spec.applicationId()); }
if (spec.routeId() != null) { where.append(" AND route_id = ?"); args.add(spec.routeId()); }
if (spec.status() != null) { where.append(" AND status = ?"); args.add(spec.status()); }
if (spec.after() != null) {
where.append(" AND start_time > ?");
args.add(Timestamp.from(spec.after()));
}
// attribute filters: use Map column access — pattern matches existing search() impl
for (var e : spec.attributes().entrySet()) {
where.append(" AND attributes[?] = ?");
args.add(e.getKey());
args.add(e.getValue());
}
String sql = "SELECT count() FROM executions WHERE " + where; // NO FINAL
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
return n == null ? 0L : n;
}
```
- [ ] **Step 6: Run — PASS**.
- [ ] **Step 7: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java
git commit -m "feat(alerting): countExecutionsForAlerting for exchange-match evaluator"
```
### Task 14: ClickHouse projections migration
**Files:**
- Create: `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql`
- Modify: the schema initializer invocation site (likely `ClickHouseConfig` or `ClickHouseSchemaInitializer`) to also run this file on startup.
- [ ] **Step 1: Write the SQL file**
```sql
-- Additive, idempotent. Safe to drop + rebuild with no data loss.
ALTER TABLE executions
ADD PROJECTION IF NOT EXISTS alerting_app_status
(SELECT * ORDER BY (tenant_id, environment, application_id, status, start_time));
ALTER TABLE executions
ADD PROJECTION IF NOT EXISTS alerting_route_status
(SELECT * ORDER BY (tenant_id, environment, route_id, status, start_time));
ALTER TABLE logs
ADD PROJECTION IF NOT EXISTS alerting_app_level
(SELECT * ORDER BY (tenant_id, environment, application, level, timestamp));
ALTER TABLE agent_metrics
ADD PROJECTION IF NOT EXISTS alerting_instance_metric
(SELECT * ORDER BY (tenant_id, environment, instance_id, metric_name, collected_at));
ALTER TABLE executions MATERIALIZE PROJECTION alerting_app_status;
ALTER TABLE executions MATERIALIZE PROJECTION alerting_route_status;
ALTER TABLE logs MATERIALIZE PROJECTION alerting_app_level;
ALTER TABLE agent_metrics MATERIALIZE PROJECTION alerting_instance_metric;
```
(Adjust table column names to match real `init.sql` — confirm `application` vs `application_id` on the `logs` and `agent_metrics` tables.)
- [ ] **Step 2: Hook into `ClickHouseSchemaInitializer`**
Find the initializer and add a second invocation:
```java
runIdempotent("clickhouse/init.sql");
runIdempotent("clickhouse/alerting_projections.sql");
```
- [ ] **Step 3: Add a smoke IT**
```java
@Test
void projectionsExistAfterStartup() {
var names = jdbcTemplate.queryForList(
"SELECT name FROM system.projections WHERE table IN ('executions','logs','agent_metrics')",
String.class);
assertThat(names).contains(
"alerting_app_status","alerting_route_status","alerting_app_level","alerting_instance_metric");
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql \
cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/AlertingProjectionsIT.java
git commit -m "feat(alerting): ClickHouse projections for alerting read paths"
```
---
## Phase 5 — Mustache templating and silence matching
### Task 15: Add JMustache dependency
**Files:**
- Modify: `cameleer-server-core/pom.xml`
- [ ] **Step 1: Add dependency**
```xml
<dependency>
<groupId>com.samskivert</groupId>
<artifactId>jmustache</artifactId>
<version>1.16</version>
</dependency>
```
- [ ] **Step 2: Verify resolve**
Run: `mvn -pl cameleer-server-core dependency:resolve`
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-core/pom.xml
git commit -m "chore(alerting): add jmustache 1.16"
```
### Task 16: `MustacheRenderer`
**Files:**
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.app.alerting.notify;
import org.junit.jupiter.api.Test;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class MustacheRendererTest {
private final MustacheRenderer r = new MustacheRenderer();
@Test
void rendersSimpleTemplate() {
String out = r.render("Hello {{name}}", Map.of("name", "world"));
assertThat(out).isEqualTo("Hello world");
}
@Test
void rendersNestedPath() {
String out = r.render("{{alert.severity}}", Map.of("alert", Map.of("severity","CRITICAL")));
assertThat(out).isEqualTo("CRITICAL");
}
@Test
void missingVariableRendersLiteral() {
String out = r.render("{{missing.path}}", Map.of());
assertThat(out).isEqualTo("{{missing.path}}");
}
@Test
void malformedTemplateReturnsRawWithWarn() {
String out = r.render("{{unclosed", Map.of("unclosed","x"));
assertThat(out).isEqualTo("{{unclosed");
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
package com.cameleer.server.app.alerting.notify;
import com.samskivert.mustache.Mustache;
import com.samskivert.mustache.Template;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.util.Map;
@Component
public class MustacheRenderer {
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
private final Mustache.Compiler compiler = Mustache.compiler()
.nullValue("")
.emptyStringIsFalse(true)
.defaultValue(null) // null triggers MissingContext -> we intercept below
.escapeHTML(false);
public String render(String template, Map<String, Object> context) {
if (template == null) return "";
try {
Template t = compiler.compile(template);
return t.execute(new LiteralFallbackContext(context));
} catch (Exception e) {
log.warn("Mustache render failed for template='{}': {}", abbreviate(template), e.getMessage());
return template;
}
}
/** Returns `{{path}}` literal when a variable is missing. */
private static class LiteralFallbackContext {
private final Map<String, Object> map;
LiteralFallbackContext(Map<String, Object> map) { this.map = map; }
// JMustache uses reflection / Map lookup, so we rely on wrapping the missing-value callback:
// easiest approach: compile with a custom `Mustache.Compiler.Loader` and intercept resolution.
// Simpler: post-process the output to detect unresolved `{{}}` sections → not possible after render.
// Alternative: pre-flight — scan template tokens against context and replace unresolved tokens
// with the literal before compilation. Use this simple approach:
}
}
```
Simpler implementation (ships for v1):
```java
@Component
public class MustacheRenderer {
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
private static final java.util.regex.Pattern TOKEN =
java.util.regex.Pattern.compile("\\{\\{\\s*([a-zA-Z0-9_.]+)\\s*}}");
private final Mustache.Compiler compiler = Mustache.compiler()
.defaultValue("")
.escapeHTML(false);
public String render(String template, Map<String, Object> context) {
if (template == null) return "";
String resolved = preResolve(template, context);
try {
return compiler.compile(resolved).execute(context);
} catch (Exception e) {
log.warn("Mustache render failed: {}", e.getMessage());
return template;
}
}
/** Replaces `{{missing.path}}` with the literal so Mustache sees a non-tag string. */
private String preResolve(String template, Map<String, Object> context) {
var m = TOKEN.matcher(template);
var sb = new StringBuilder();
while (m.find()) {
String path = m.group(1);
if (resolvePath(context, path) == null) {
m.appendReplacement(sb, java.util.regex.Matcher.quoteReplacement("{{" + path + "}}"));
// Replace the {{}} with {{{ literal }}} once we escape it — but jmustache will not re-process.
// Simpler: just wrap in a triple-brace or surround with a marker. For v1 we skip the double-expand:
// we return the LITERAL inside a section {{#_literal_123}}... so preResolve returns a string
// that Mustache will not modify. Concrete approach:
}
}
m.appendTail(sb);
return sb.toString();
}
private Object resolvePath(Map<String, Object> ctx, String path) {
Object cur = ctx;
for (String seg : path.split("\\.")) {
if (!(cur instanceof Map<?,?> m)) return null;
cur = m.get(seg);
if (cur == null) return null;
}
return cur;
}
}
```
**Engineer note:** Prefer a pre-compile token substitution that replaces `{{missing.path}}` with a literal that Mustache renders as-is. One working approach: write a custom `Mustache.VariableFetcher` via `compiler.withFormatter(...)` — but JMustache's `Mustache.Compiler#withCollector()` is easier. Confirm during implementation and adjust this task; the tests in Step 1 lock the contract. If JMustache's API makes missing-variable fallback awkward, fall back to a regex-based substitutor that does `{{``⟦MUSTACHE_LITERAL:path⟧` for missing paths, then post-replace after render. The contract is: **unresolved `{{x}}` renders as literal `{{x}}`**.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java
git commit -m "feat(alerting): MustacheRenderer with literal fallback on missing vars"
```
### Task 17: `NotificationContextBuilder`
**Files:**
- Create: `.../alerting/notify/NotificationContextBuilder.java`
- Test: `.../alerting/notify/NotificationContextBuilderTest.java`
- [ ] **Step 1: Write the failing test** covering:
- env / rule / alert subtrees always present
- conditional trees: `exchange.*` present only for EXCHANGE_MATCH, `log.*` only for LOG_PATTERN, etc.
- `alert.link` uses the configured `cameleer.server.ui-origin` prefix if present, else `/alerts/inbox/{id}`.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — pure static `Map<String,Object> build(AlertRule, AlertInstance, Environment, String uiOrigin)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java
git commit -m "feat(alerting): NotificationContextBuilder for template context maps"
```
### Task 18: `SilenceMatcher` evaluator
**Files:**
- Create: `.../alerting/notify/SilenceMatcherService.java` (named to avoid clash with core record `SilenceMatcher`)
- Test: `.../alerting/notify/SilenceMatcherServiceTest.java`
- [ ] **Step 1: Write the failing test** covering truth table:
- Wildcard matcher → matches any instance.
- Matcher with `ruleId` only → matches only instances with that rule.
- Multiple fields → AND logic.
- Active-window check at notification time (not at eval time).
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class SilenceMatcherService {
public boolean matches(SilenceMatcher m, AlertInstance instance, AlertRule rule) {
if (m.ruleId() != null && !m.ruleId().equals(instance.ruleId())) return false;
if (m.severity()!= null && m.severity() != instance.severity()) return false;
if (m.appSlug() != null && !m.appSlug().equals(rule.condition().scope().appSlug())) return false;
if (m.routeId() != null && !m.routeId().equals(rule.condition().scope().routeId())) return false;
if (m.agentId() != null && !m.agentId().equals(rule.condition().scope().agentId())) return false;
return true;
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/SilenceMatcherServiceTest.java
git commit -m "feat(alerting): silence matcher for notification-time dispatch"
```
---
## Phase 6 — Condition evaluators
All six evaluators share this shape:
```java
public sealed interface ConditionEvaluator<C extends AlertCondition>
permits RouteMetricEvaluator, ExchangeMatchEvaluator, AgentStateEvaluator,
DeploymentStateEvaluator, LogPatternEvaluator, JvmMetricEvaluator {
ConditionKind kind();
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}
```
Supporting types (create these in Task 19 before implementing individual evaluators).
### Task 19: `EvalContext`, `EvalResult`, `TickCache`, `PerKindCircuitBreaker`, `ConditionEvaluator` interface
**Files:**
- Create: `.../alerting/eval/EvalContext.java`, `EvalResult.java`, `TickCache.java`, `PerKindCircuitBreaker.java`, `ConditionEvaluator.java`
- Test: `.../alerting/eval/TickCacheTest.java`, `PerKindCircuitBreakerTest.java`
- [ ] **Step 1: Write the failing tests**
```java
// TickCacheTest.java
@Test
void getOrComputeCachesWithinTick() {
var cache = new TickCache();
int n = cache.getOrCompute("k", () -> 42);
int m = cache.getOrCompute("k", () -> 43);
assertThat(n).isEqualTo(42);
assertThat(m).isEqualTo(42); // cached
}
// PerKindCircuitBreakerTest.java
@Test
void opensAfterFailThreshold() {
var cb = new PerKindCircuitBreaker(5, 30, 60, java.time.Clock.fixed(...));
for (int i = 0; i < 5; i++) cb.recordFailure(ConditionKind.AGENT_STATE);
assertThat(cb.isOpen(ConditionKind.AGENT_STATE)).isTrue();
}
@Test
void closesAfterCooldown() { /* advance clock beyond cooldown window */ }
```
- [ ] **Step 2: Implement**
```java
// EvalContext.java
package com.cameleer.server.app.alerting.eval;
import java.time.Instant;
public record EvalContext(String tenantId, Instant now, TickCache tickCache) {}
```
```java
// EvalResult.java
package com.cameleer.server.app.alerting.eval;
import java.util.Map;
public sealed interface EvalResult {
record Firing(Double currentValue, Double threshold, Map<String, Object> context) implements EvalResult {
public Firing { context = context == null ? Map.of() : Map.copyOf(context); }
}
record Clear() implements EvalResult {
public static final Clear INSTANCE = new Clear();
}
record Error(Throwable cause) implements EvalResult {}
}
```
```java
// TickCache.java
package com.cameleer.server.app.alerting.eval;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Supplier;
public class TickCache {
private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<>();
@SuppressWarnings("unchecked")
public <T> T getOrCompute(String key, Supplier<T> supplier) {
return (T) map.computeIfAbsent(key, k -> supplier.get());
}
}
```
```java
// PerKindCircuitBreaker.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.ConditionKind;
import java.time.Clock;
import java.time.Duration;
import java.time.Instant;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
public class PerKindCircuitBreaker {
private record State(Deque<Instant> failures, Instant openUntil) {}
private final int threshold;
private final Duration window;
private final Duration cooldown;
private final Clock clock;
private final ConcurrentHashMap<ConditionKind, State> byKind = new ConcurrentHashMap<>();
public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds, Clock clock) {
this.threshold = threshold;
this.window = Duration.ofSeconds(windowSeconds);
this.cooldown = Duration.ofSeconds(cooldownSeconds);
this.clock = clock;
}
public void recordFailure(ConditionKind kind) {
byKind.compute(kind, (k, s) -> {
var deque = (s == null) ? new ArrayDeque<Instant>() : new ArrayDeque<>(s.failures());
Instant now = Instant.now(clock);
Instant cutoff = now.minus(window);
while (!deque.isEmpty() && deque.peekFirst().isBefore(cutoff)) deque.pollFirst();
deque.addLast(now);
Instant openUntil = (deque.size() >= threshold) ? now.plus(cooldown) : null;
return new State(deque, openUntil);
});
}
public boolean isOpen(ConditionKind kind) {
State s = byKind.get(kind);
return s != null && s.openUntil() != null && Instant.now(clock).isBefore(s.openUntil());
}
public void recordSuccess(ConditionKind kind) {
byKind.compute(kind, (k, s) -> new State(new ArrayDeque<>(), null));
}
}
```
```java
// ConditionEvaluator.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.*;
public interface ConditionEvaluator<C extends AlertCondition> {
ConditionKind kind();
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}
```
(`sealed permits …` is omitted on the interface to avoid a multi-file compile-order gotcha during the TDD sequence. The effective constraint is enforced by the dispatcher's `switch` over `ConditionKind`.)
- [ ] **Step 3: Run — PASS**.
- [ ] **Step 4: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/
git commit -m "feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)"
```
### Task 20: `AgentStateEvaluator`
**Files:**
- Create: `.../alerting/eval/AgentStateEvaluator.java`
- Test: `.../alerting/eval/AgentStateEvaluatorTest.java`
- [ ] **Step 1: Write the failing test**
```java
@Test
void firesWhenAnyAgentInTargetStateForScope() {
var registry = mock(AgentRegistryService.class);
when(registry.findAll()).thenReturn(List.of(
new AgentInfo("a1","a1","orders", "env-uuid","1.0", List.of(), Map.of(),
AgentState.DEAD, Instant.now().minusSeconds(120), Instant.now().minusSeconds(120), null)
));
var eval = new AgentStateEvaluator(registry);
var rule = ruleWith(new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60));
EvalResult r = eval.evaluate((AgentStateCondition) rule.condition(), rule,
new EvalContext("default", Instant.now(), new TickCache()));
assertThat(r).isInstanceOf(EvalResult.Firing.class);
}
@Test
void clearWhenNoMatchingAgents() { /* ... */ }
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class AgentStateEvaluator implements ConditionEvaluator<AgentStateCondition> {
private final AgentRegistryService registry;
public AgentStateEvaluator(AgentRegistryService registry) { this.registry = registry; }
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
@Override
public EvalResult evaluate(AgentStateCondition c, AlertRule rule, EvalContext ctx) {
AgentState target = AgentState.valueOf(c.state());
Instant cutoff = ctx.now().minusSeconds(c.forSeconds());
List<AgentInfo> hits = registry.findAll().stream()
.filter(a -> matchesScope(a, c.scope()))
.filter(a -> a.state() == target)
.filter(a -> a.lastHeartbeat() != null && a.lastHeartbeat().isBefore(cutoff))
.toList();
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
AgentInfo first = hits.get(0);
return new EvalResult.Firing(
(double) hits.size(), null,
Map.of("agent", Map.of(
"id", first.instanceId(),
"name", first.displayName(),
"state", first.state().name()
), "app", Map.of("slug", first.applicationId())));
}
private static boolean matchesScope(AgentInfo a, AlertScope s) {
if (s.appSlug() != null && !s.appSlug().equals(a.applicationId())) return false;
if (s.agentId() != null && !s.agentId().equals(a.instanceId())) return false;
return true;
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluatorTest.java
git commit -m "feat(alerting): AGENT_STATE evaluator"
```
### Task 21: `DeploymentStateEvaluator`
**Files:**
- Create: `.../alerting/eval/DeploymentStateEvaluator.java`
- Test: `.../alerting/eval/DeploymentStateEvaluatorTest.java`
- [ ] **Step 1: Write the failing test**`FAILED` deployment for matching app → Firing; `RUNNING` → Clear.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — read via `DeploymentRepository.findByAppId` and `AppService.getByEnvironmentAndSlug`:
```java
@Override
public EvalResult evaluate(DeploymentStateCondition c, AlertRule rule, EvalContext ctx) {
App app = appService.getByEnvironmentAndSlug(rule.environmentId(), c.scope().appSlug()).orElse(null);
if (app == null) return EvalResult.Clear.INSTANCE;
List<Deployment> current = deploymentRepo.findByAppId(app.id());
Set<String> wanted = Set.copyOf(c.states());
var hits = current.stream()
.filter(d -> wanted.contains(d.status().name()))
.toList();
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
Deployment d = hits.get(0);
return new EvalResult.Firing((double) hits.size(), null,
Map.of("deployment", Map.of("id", d.id().toString(), "status", d.status().name()),
"app", Map.of("slug", app.slug())));
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): DEPLOYMENT_STATE evaluator"
```
### Task 22: `RouteMetricEvaluator`
**Files:**
- Create: `.../alerting/eval/RouteMetricEvaluator.java`
- Test: `.../alerting/eval/RouteMetricEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — mock `StatsStore`, seed `ExecutionStats{p99Ms = 2500, ...}` for a scoped call, assert Firing with `currentValue = 2500, threshold = 2000`.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — dispatch on `RouteMetric` enum:
```java
@Override
public EvalResult evaluate(RouteMetricCondition c, AlertRule rule, EvalContext ctx) {
Instant from = ctx.now().minusSeconds(c.windowSeconds());
Instant to = ctx.now();
String env = environmentService.findById(rule.environmentId()).map(Environment::slug).orElse(null);
ExecutionStats stats = (c.scope().routeId() != null)
? statsStore.statsForRoute(from, to, c.scope().routeId(), c.scope().appSlug(), env)
: (c.scope().appSlug() != null)
? statsStore.statsForApp(from, to, c.scope().appSlug(), env)
: statsStore.stats(from, to, env);
double actual = switch (c.metric()) {
case ERROR_RATE -> errorRate(stats);
case P95_LATENCY_MS -> stats.p95DurationMs();
case P99_LATENCY_MS -> stats.p99DurationMs();
case THROUGHPUT -> stats.totalCount();
case ERROR_COUNT -> stats.failedCount();
};
boolean fire = switch (c.comparator()) {
case GT -> actual > c.threshold();
case GTE -> actual >= c.threshold();
case LT -> actual < c.threshold();
case LTE -> actual <= c.threshold();
case EQ -> actual == c.threshold();
};
if (!fire) return EvalResult.Clear.INSTANCE;
return new EvalResult.Firing(actual, c.threshold(),
Map.of("route", Map.of("id", c.scope().routeId() == null ? "" : c.scope().routeId()),
"app", Map.of("slug", c.scope().appSlug() == null ? "" : c.scope().appSlug())));
}
private double errorRate(ExecutionStats s) {
long total = s.totalCount();
return total == 0 ? 0.0 : (double) s.failedCount() / total;
}
```
(Adjust method names on `ExecutionStats` to match the actual record — use `gitnexus_context({name: "ExecutionStats"})` if unsure.)
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): ROUTE_METRIC evaluator"
```
### Task 23: `LogPatternEvaluator`
**Files:**
- Create: `.../alerting/eval/LogPatternEvaluator.java`
- Test: `.../alerting/eval/LogPatternEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — mock `ClickHouseLogStore.countLogs` returning 7; threshold 5 → Firing; returning 3 → Clear.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — build a `LogSearchRequest` from the condition + window, delegate to `countLogs`. Use `TickCache` keyed on `(env, app, level, pattern, windowStart, windowEnd)` to coalesce.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): LOG_PATTERN evaluator"
```
### Task 24: `JvmMetricEvaluator`
**Files:**
- Create: `.../alerting/eval/JvmMetricEvaluator.java`
- Test: `.../alerting/eval/JvmMetricEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — mock `MetricsQueryStore.queryTimeSeries` for `("agent-1", ["heap_used_percent"], from, to, 1)` returning `{heap_used_percent: [Bucket{max=95.0}]}`; assert Firing with currentValue=95.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — aggregate across buckets per `AggregationOp` (MAX/MIN/AVG/LATEST), compare against threshold.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): JVM_METRIC evaluator"
```
### Task 25: `ExchangeMatchEvaluator` (PER_EXCHANGE + COUNT_IN_WINDOW)
**Files:**
- Create: `.../alerting/eval/ExchangeMatchEvaluator.java`
- Test: `.../alerting/eval/ExchangeMatchEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — two variants:
- `COUNT_IN_WINDOW`: mock `ClickHouseSearchIndex.countExecutionsForAlerting` → threshold check.
- `PER_EXCHANGE`: `eval_state.lastExchangeTs` cursor advancement. Seed 3 matching exchanges; first eval returns all 3 as separate Firings (emit a list? or change signature?). For v1 simplicity, the evaluator returns `EvalResult.Firing` with an internal list of exchange descriptors in the context map; the job handles one-alert-per-exchange fan-out.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement.** The key design decision is how PER_EXCHANGE returns multiple alerts. Simplest approach: extend `EvalResult` with a `Batch` variant:
```java
record Batch(List<Firing> firings) implements EvalResult { ... }
```
Add this to `EvalResult.java` (Task 19). The job (Task 27) detects Batch and creates one `AlertInstance` per Firing. This keeps non-batched evaluators simple.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): EXCHANGE_MATCH evaluator with per-exchange + count modes"
```
---
## Phase 7 — Evaluator job and state transitions
### Task 26: `AlertingProperties` + `AlertStateTransitions`
**Files:**
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java`
- Create: `.../alerting/eval/AlertStateTransitions.java`
- Test: `.../alerting/eval/AlertStateTransitionsTest.java`
- [ ] **Step 1: Write the failing test** for the pure state machine:
```java
@Test
void clearWithNoOpenInstanceIsNoOp() {
var next = AlertStateTransitions.apply(null, EvalResult.Clear.INSTANCE, rule, now);
assertThat(next).isEmpty();
}
@Test
void firingWithNoOpenInstanceCreatesPendingIfForDuration() {
var rule = ruleBuilder().forDurationSeconds(60).build();
var result = new EvalResult.Firing(2500.0, 2000.0, Map.of());
var next = AlertStateTransitions.apply(null, result, rule, now);
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.PENDING));
}
@Test
void firingWithNoForDurationGoesStraightToFiring() {
var rule = ruleBuilder().forDurationSeconds(0).build();
var next = AlertStateTransitions.apply(null, new EvalResult.Firing(1.0, null, Map.of()), rule, now);
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.FIRING));
}
@Test
void pendingPromotesToFiringAfterForDuration() { /* ... */ }
@Test
void firingClearTransitionsToResolved() { /* ... */ }
@Test
void ackedInstanceClearsToResolved() { /* preserves acked_by, sets resolved_at */ }
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
// AlertStateTransitions.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.*;
import java.time.Instant;
import java.util.*;
public final class AlertStateTransitions {
private AlertStateTransitions() {}
/** Returns the new/updated AlertInstance, or empty when nothing changes. */
public static Optional<AlertInstance> apply(
AlertInstance current, EvalResult result, AlertRule rule, Instant now) {
return switch (result) {
case EvalResult.Clear c -> onClear(current, now);
case EvalResult.Firing f -> onFiring(current, f, rule, now);
case EvalResult.Error e -> Optional.empty();
case EvalResult.Batch b -> Optional.empty(); // batch handled by the job, not here
};
}
private static Optional<AlertInstance> onFiring(AlertInstance current, EvalResult.Firing f,
AlertRule rule, Instant now) {
if (current == null) {
AlertState initial = rule.forDurationSeconds() > 0 ? AlertState.PENDING : AlertState.FIRING;
return Optional.of(newInstance(rule, f, initial, now));
}
if (current.state() == AlertState.PENDING) {
Instant firedAt = current.firedAt();
if (firedAt.plusSeconds(rule.forDurationSeconds()).isBefore(now)) {
return Optional.of(current /* copy with state=FIRING, firedAt=now */);
}
return Optional.of(current); // stay PENDING, no mutation
}
return Optional.empty(); // already FIRING/ACK — re-notification handled by dispatcher
}
private static Optional<AlertInstance> onClear(AlertInstance current, Instant now) {
if (current == null) return Optional.empty();
if (current.state() == AlertState.RESOLVED) return Optional.empty();
return Optional.of(current /* copy with state=RESOLVED, resolvedAt=now */);
}
private static AlertInstance newInstance(AlertRule rule, EvalResult.Firing f, AlertState state, Instant now) {
// ... construct from rule snapshot + context; title/message rendered by the job
throw new UnsupportedOperationException("stub");
}
}
```
Flesh out the `.withState(...)` / `.withResolvedAt(...)` helpers on `AlertInstance` (add wither-style methods returning new records) as part of this task.
```java
// AlertingProperties.java
package com.cameleer.server.app.alerting.config;
import org.springframework.boot.context.properties.ConfigurationProperties;
@ConfigurationProperties("cameleer.server.alerting")
public record AlertingProperties(
Integer evaluatorTickIntervalMs,
Integer evaluatorBatchSize,
Integer claimTtlSeconds,
Integer notificationTickIntervalMs,
Integer notificationBatchSize,
Boolean inTickCacheEnabled,
Integer circuitBreakerFailThreshold,
Integer circuitBreakerWindowSeconds,
Integer circuitBreakerCooldownSeconds,
Integer eventRetentionDays,
Integer notificationRetentionDays,
Integer webhookTimeoutMs,
Integer webhookMaxAttempts) {
public int effectiveEvaluatorTickIntervalMs() {
int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs;
return Math.max(5000, raw); // floor
}
public int effectiveEvaluatorBatchSize() { return evaluatorBatchSize == null ? 20 : evaluatorBatchSize; }
public int effectiveClaimTtlSeconds() { return claimTtlSeconds == null ? 30 : claimTtlSeconds; }
public int effectiveNotificationTickIntervalMs(){ return notificationTickIntervalMs == null ? 5000 : notificationTickIntervalMs; }
public int effectiveNotificationBatchSize() { return notificationBatchSize == null ? 50 : notificationBatchSize; }
public int effectiveEventRetentionDays() { return eventRetentionDays == null ? 90 : eventRetentionDays; }
public int effectiveNotificationRetentionDays() { return notificationRetentionDays == null ? 30 : notificationRetentionDays; }
public int effectiveWebhookTimeoutMs() { return webhookTimeoutMs == null ? 5000 : webhookTimeoutMs; }
public int effectiveWebhookMaxAttempts() { return webhookMaxAttempts == null ? 3 : webhookMaxAttempts; }
public int cbFailThreshold() { return circuitBreakerFailThreshold == null ? 5 : circuitBreakerFailThreshold; }
public int cbWindowSeconds() { return circuitBreakerWindowSeconds == null ? 30 : circuitBreakerWindowSeconds; }
public int cbCooldownSeconds(){ return circuitBreakerCooldownSeconds== null ? 60 : circuitBreakerCooldownSeconds; }
}
```
Register via `@ConfigurationPropertiesScan` or explicit `@EnableConfigurationProperties(AlertingProperties.class)` in `AlertingBeanConfig`. Also clamp-with-WARN if `evaluatorTickIntervalMs < 5000` at startup.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java
git commit -m "feat(alerting): AlertingProperties + AlertStateTransitions state machine"
```
### Task 27: `AlertEvaluatorJob`
**Files:**
- Create: `.../alerting/eval/AlertEvaluatorJob.java`
- Test: `.../alerting/eval/AlertEvaluatorJobIT.java`
- [ ] **Step 1: Write the failing integration test** (uses real PG + mocked evaluators):
```java
@Test
void claimDueRuleFireResolveCycle() throws Exception {
// seed one rule scoped to a non-existent agent state -> evaluator returns Clear -> no instance.
// flip the mock to return Firing -> one AlertInstance in FIRING state.
// flip back to Clear -> instance transitions to RESOLVED.
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class AlertEvaluatorJob implements SchedulingConfigurer {
private static final Logger log = LoggerFactory.getLogger(AlertEvaluatorJob.class);
private final AlertingProperties props;
private final AlertRuleRepository ruleRepo;
private final AlertInstanceRepository instanceRepo;
private final AlertNotificationRepository notificationRepo;
private final Map<ConditionKind, ConditionEvaluator<?>> evaluators;
private final PerKindCircuitBreaker circuitBreaker;
private final MustacheRenderer renderer;
private final NotificationContextBuilder contextBuilder;
private final String instanceId;
private final String tenantId;
private final AlertingMetrics metrics;
private final Clock clock;
public AlertEvaluatorJob(/* ...all above... */) { /* assign */ }
@Override
public void configureTasks(ScheduledTaskRegistrar registrar) {
registrar.addFixedDelayTask(this::tick, props.effectiveEvaluatorTickIntervalMs());
}
void tick() {
List<AlertRule> claimed = ruleRepo.claimDueRules(
instanceId, props.effectiveEvaluatorBatchSize(), props.effectiveClaimTtlSeconds());
TickCache cache = new TickCache();
EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), cache);
for (AlertRule rule : claimed) {
if (circuitBreaker.isOpen(rule.conditionKind())) {
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
continue;
}
try {
EvalResult result = evaluateSafely(rule, ctx);
applyResult(rule, result);
circuitBreaker.recordSuccess(rule.conditionKind());
} catch (Exception e) {
circuitBreaker.recordFailure(rule.conditionKind());
metrics.evalError(rule.conditionKind(), rule.id());
log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString());
} finally {
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
}
}
}
@SuppressWarnings({"rawtypes","unchecked"})
private EvalResult evaluateSafely(AlertRule rule, EvalContext ctx) {
ConditionEvaluator evaluator = evaluators.get(rule.conditionKind());
if (evaluator == null) throw new IllegalStateException("No evaluator for " + rule.conditionKind());
return evaluator.evaluate(rule.condition(), rule, ctx);
}
private void applyResult(AlertRule rule, EvalResult result) {
if (result instanceof EvalResult.Batch b) {
for (EvalResult.Firing f : b.firings()) applyFiring(rule, f);
return;
}
AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null);
AlertStateTransitions.apply(current, result, rule, Instant.now(clock)).ifPresent(next -> {
AlertInstance persisted = instanceRepo.save(
enrichTitleMessage(rule, next, result));
if (next.state() == AlertState.FIRING && current == null) {
enqueueNotifications(rule, persisted);
}
});
}
private void applyFiring(AlertRule rule, EvalResult.Firing f) { /* always create new instance for PER_EXCHANGE mode */ }
private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance, EvalResult result) {
Map<String,Object> ctx = contextBuilder.build(rule, instance, /* env lookup */ null, /* uiOrigin */ null);
String title = renderer.render(rule.notificationTitleTmpl(), ctx);
String message = renderer.render(rule.notificationMessageTmpl(), ctx);
return instance /* .withTitle(title).withMessage(message) */;
}
private void enqueueNotifications(AlertRule rule, AlertInstance instance) {
for (WebhookBinding w : rule.webhooks()) {
Map<String,Object> payload = /* context-builder + body override */ Map.of();
notificationRepo.save(new AlertNotification(
UUID.randomUUID(), instance.id(), w.id(), w.outboundConnectionId(),
NotificationStatus.PENDING, 0, Instant.now(clock),
null, null, null, null, payload, null, Instant.now(clock)));
}
}
private void reschedule(AlertRule rule, Instant next) {
ruleRepo.releaseClaim(rule.id(), next, rule.evalState());
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java
git commit -m "feat(alerting): AlertEvaluatorJob with claim-polling + circuit breaker"
```
---
## Phase 8 — Notification dispatch
### Task 28: `HmacSigner`
**Files:**
- Create: `.../alerting/notify/HmacSigner.java`
- Test: `.../alerting/notify/HmacSignerTest.java`
- [ ] **Step 1: Write the failing test**
```java
@Test
void signsBodyWithSha256Hmac() {
String sig = new HmacSigner().sign("secret", "payload".getBytes(StandardCharsets.UTF_8));
// precomputed: HMAC-SHA256(secret, "payload") = 3c5c4f...
assertThat(sig).startsWith("sha256=").isEqualTo("sha256=3c5c4f..."); // replace with real hex
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**`javax.crypto.Mac.getInstance("HmacSHA256")`, `HexFormat.of().formatHex(...)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): HmacSigner for webhook signature"
```
### Task 29: `WebhookDispatcher`
**Files:**
- Create: `.../alerting/notify/WebhookDispatcher.java`
- Test: `.../alerting/notify/WebhookDispatcherIT.java` (WireMock)
- [ ] **Step 1: Write the failing IT** covering:
- 2xx → returns DELIVERED with status + snippet.
- 4xx → returns FAILED immediately.
- 5xx → returns RETRY with exponential backoff.
- Network timeout → RETRY.
- HMAC header present when `hmacSecret != null`.
- TLS trust-all config works against WireMock HTTPS.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class WebhookDispatcher {
public record Outcome(NotificationStatus status, int httpStatus, String snippet, Duration retryAfter) {}
private final OutboundHttpClientFactory clientFactory;
private final SecretCipher cipher;
private final HmacSigner signer;
private final MustacheRenderer renderer;
private final AlertingProperties props;
private final ObjectMapper om;
public WebhookDispatcher(/* ... */) { /* assign */ }
public Outcome dispatch(AlertNotification notif, AlertRule rule, AlertInstance instance,
OutboundConnection conn, Map<String,Object> context) {
String bodyTmpl = pickBodyTemplate(rule, notif.webhookId(), conn);
String body = renderer.render(bodyTmpl, context);
var ctx = new OutboundHttpRequestContext(
conn.tlsTrustMode(), conn.tlsCaPemPaths(),
Duration.ofMillis(2000), Duration.ofMillis(props.effectiveWebhookTimeoutMs()));
var client = clientFactory.clientFor(ctx);
var request = new HttpPost(renderer.render(conn.url(), context));
request.setEntity(new StringEntity(body, StandardCharsets.UTF_8));
request.setHeader("Content-Type", "application/json");
for (var h : conn.defaultHeaders().entrySet()) {
request.setHeader(h.getKey(), renderer.render(h.getValue(), context));
}
if (conn.hmacSecretCiphertext() != null) {
String secret = cipher.decrypt(conn.hmacSecretCiphertext());
request.setHeader("X-Cameleer-Signature", signer.sign(secret, body.getBytes(StandardCharsets.UTF_8)));
}
try (var response = client.execute(request)) {
int code = response.getCode();
String snippet = snippet(response);
if (code >= 200 && code < 300) return new Outcome(NotificationStatus.DELIVERED, code, snippet, null);
if (code >= 400 && code < 500) return new Outcome(NotificationStatus.FAILED, code, snippet, null);
return retryOutcome(code, snippet);
} catch (IOException e) {
return retryOutcome(-1, e.getMessage());
}
}
private Outcome retryOutcome(int code, String snippet) {
// Backoff: 30s, 120s, 300s
Duration next = Duration.ofSeconds(30); // caller multiplies by attempt
return new Outcome(null /* caller decides PENDING vs FAILED */, code, snippet, next);
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): WebhookDispatcher with HMAC + TLS + retry classification"
```
### Task 30: `NotificationDispatchJob`
**Files:**
- Create: `.../alerting/notify/NotificationDispatchJob.java`
- Test: `.../alerting/notify/NotificationDispatchJobIT.java`
- [ ] **Step 1: Write the failing IT** — seed a `PENDING` `AlertNotification`; run one tick; WireMock returns 200; assert row transitions to `DELIVERED`. Seed another against 503 → assert `attempts=1`, `next_attempt_at` bumped, still `PENDING`.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — claim-polling loop:
```java
void tick() {
var claimed = notificationRepo.claimDueNotifications(instanceId, batchSize, claimTtl);
for (var n : claimed) {
var conn = outboundRepo.findById(tenantId, n.outboundConnectionId()).orElse(null);
if (conn == null) { notificationRepo.markFailed(n.id(), 0, "outbound connection deleted"); continue; }
var instance = instanceRepo.findById(n.alertInstanceId()).orElseThrow();
var rule = ruleRepo.findById(instance.ruleId()).orElse(null);
var context = contextBuilder.build(rule, instance, env, uiOrigin);
// silence check
if (silenceRepo.listActive(instance.environmentId(), Instant.now()).stream()
.anyMatch(s -> silenceMatcher.matches(s.matcher(), instance, rule))) {
instanceRepo.markSilenced(instance.id(), true);
notificationRepo.markFailed(n.id(), 0, "silenced");
continue;
}
var outcome = dispatcher.dispatch(n, rule, instance, conn, context);
if (outcome.status() == NotificationStatus.DELIVERED) {
notificationRepo.markDelivered(n.id(), outcome.httpStatus(), outcome.snippet(), Instant.now());
} else if (outcome.status() == NotificationStatus.FAILED) {
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
} else {
int attempts = n.attempts() + 1;
if (attempts >= props.effectiveWebhookMaxAttempts()) {
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
} else {
Instant next = Instant.now().plus(outcome.retryAfter().multipliedBy(attempts));
notificationRepo.scheduleRetry(n.id(), next, outcome.httpStatus(), outcome.snippet());
}
}
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): NotificationDispatchJob outbox loop with silence + retry"
```
### Task 31: `InAppInboxQuery` + server-side 5s memoization
**Files:**
- Create: `.../alerting/notify/InAppInboxQuery.java`
- Test: `.../alerting/notify/InAppInboxQueryTest.java`
- [ ] **Step 1: Write the failing test** covering the path (resolves groups/roles from `RbacService.getEffectiveRolesForUser` + `listGroupsForUser`, delegates to `AlertInstanceRepository.listForInbox`/`countUnreadForUser`, second call within 5s returns cached count).
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — Caffeine-style `ConcurrentHashMap<Key, Entry>` with `Entry(count, expiresAt)`, 5 s TTL per `(envId, userId)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): InAppInboxQuery with 5s unread-count memoization"
```
---
## Phase 9 — REST controllers
### Task 32: `AlertRuleController` + DTOs
**Files:**
- Create: `.../alerting/controller/AlertRuleController.java`
- Create: DTOs in `.../alerting/dto/`
- Test: `.../alerting/controller/AlertRuleControllerIT.java`
- [ ] **Step 1: Write the failing IT** — seed an env, authenticate as OPERATOR, POST a rule, GET list, PUT update, DELETE. Assert webhook references to unknown connections return 422. Assert VIEWER cannot POST but can GET. Assert audit log entry on each mutation.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement.** Endpoints (all under `/api/v1/environments/{envSlug}/alerts/rules`, env resolved via `@EnvPath Environment env`):
| Method | Path | RBAC |
|---|---|---|
| GET | `` | VIEWER+ |
| POST | `` | OPERATOR+ |
| GET | `{id}` | VIEWER+ |
| PUT | `{id}` | OPERATOR+ |
| DELETE | `{id}` | OPERATOR+ |
| POST | `{id}/enable` / `{id}/disable` | OPERATOR+ |
| POST | `{id}/render-preview` | OPERATOR+ |
| POST | `{id}/test-evaluate` | OPERATOR+ |
Key DTOs: `AlertRuleRequest` (with `@Valid AlertConditionDto`), `AlertRuleResponse`, `RenderPreviewRequest/Response`, `TestEvaluateRequest/Response`.
On save, validate:
- Each `WebhookBindingRequest.outboundConnectionId` exists in `outbound_connections` (via `OutboundConnectionService.get(id)` → 422 if 404).
- Connection is allowed in this env (via `conn.isAllowedInEnvironment(env.id())` → 422 otherwise).
- SSRF check on connection URL deferred to the outbound-connection save path (Plan 01 territory).
Audit via `auditService.log("ALERT_RULE_CREATE", ALERT_RULE_CHANGE, rule.id().toString(), Map.of("name", rule.name()), SUCCESS, request)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): AlertRuleController REST + audit + DTOs"
```
### Task 33: `AlertController`
**Files:**
- Create: `.../alerting/controller/AlertController.java`, `AlertDto.java`, `UnreadCountResponse.java`
- Test: `.../alerting/controller/AlertControllerIT.java`
- [ ] **Step 1: Write the failing IT** for `GET /alerts`, `GET /alerts/unread-count`, `POST /alerts/{id}/ack`, `POST /alerts/{id}/read`, `POST /alerts/bulk-read`. Assert env isolation (env-A alert not visible from env-B).
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — delegate to `InAppInboxQuery` and `AlertInstanceRepository`. On ack, enforce targeted-or-OPERATOR rule.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): AlertController for inbox + ack + read"
```
### Task 34: `AlertSilenceController`
**Files:**
- Create: `.../alerting/controller/AlertSilenceController.java`, `AlertSilenceDto.java`
- Test: `.../alerting/controller/AlertSilenceControllerIT.java`
- [ ] **Step 15:** Follow the same pattern. Mutations OPERATOR+, audit `ALERT_SILENCE_CHANGE`. Validate `endsAt > startsAt` at controller layer (DB constraint catches it anyway; user-facing 422 is friendlier).
### Task 35: `AlertNotificationController`
**Files:**
- Create: `.../alerting/controller/AlertNotificationController.java`
- Test: `.../alerting/controller/AlertNotificationControllerIT.java`
- [ ] **Step 15:**
- `GET /alerts/{id}/notifications` → VIEWER+; returns per-instance outbox rows.
- `POST /alerts/notifications/{id}/retry` → OPERATOR+; resets `next_attempt_at = now`, `attempts = 0`, `status = PENDING`. Flat path because notification IDs are globally unique (document this in the flat-allow-list rule file).
- [ ] **Step 6: Update `SecurityConfig` to permit the new paths**
In `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java`:
```java
.requestMatchers(HttpMethod.GET, "/api/v1/environments/*/alerts/**").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/ack").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/bulk-read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/alerts/notifications/*/retry").hasAnyRole("OPERATOR","ADMIN")
```
(Class-level `@PreAuthorize` on each controller is authoritative; the path matchers are defence-in-depth.)
- [ ] **Step 7: Commit**
```bash
git commit -m "feat(alerting): AlertNotificationController + SecurityConfig paths"
```
### Task 36: Regenerate OpenAPI schema
- [ ] **Step 1: Start backend on :8081** (from the alerting-02 worktree).
- [ ] **Step 2:** `cd ui && npm run generate-api:live`
- [ ] **Step 3:** Commit `ui/src/api/schema.d.ts` + `ui/src/api/openapi.json` regen.
```bash
git add ui/src/api/schema.d.ts ui/src/api/openapi.json
git commit -m "chore(alerting): regenerate openapi schema for alerting endpoints"
```
---
## Phase 10 — Retention, metrics, rules, verification
### Task 37: `AlertingRetentionJob`
**Files:**
- Create: `.../alerting/retention/AlertingRetentionJob.java`
- Test: `.../alerting/retention/AlertingRetentionJobIT.java`
- [ ] **Step 1: Write the failing IT** — seed 2 resolved instances (one older than retention, one fresher) + 2 settled notifications; run `cleanup()`; assert only old rows are deleted.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — `@Scheduled(cron = "0 0 3 * * *")`, cutoffs from `AlertingProperties`, advisory-lock-of-the-day pattern (see `JarRetentionJob.java`).
- [ ] **Step 45: Run, commit**
```bash
git commit -m "feat(alerting): AlertingRetentionJob daily cleanup"
```
### Task 38: `AlertingMetrics`
**Files:**
- Create: `.../alerting/metrics/AlertingMetrics.java`
- [ ] **Step 1: Register metrics** via `MeterRegistry`:
```java
@Component
public class AlertingMetrics {
private final MeterRegistry registry;
public AlertingMetrics(MeterRegistry registry) { this.registry = registry; }
public void evalError(ConditionKind kind, UUID ruleId) {
registry.counter("alerting_eval_errors_total",
"kind", kind.name(), "rule_id", ruleId.toString()).increment();
}
public void circuitOpened(ConditionKind kind) {
registry.counter("alerting_circuit_open_total", "kind", kind.name()).increment();
}
public Timer evalDuration(ConditionKind kind) {
return registry.timer("alerting_eval_duration_seconds", "kind", kind.name());
}
// + gauges via MeterBinder that query repositories
}
```
- [ ] **Step 2: Wire into `AlertEvaluatorJob` and `PerKindCircuitBreaker`.**
- [ ] **Step 3: Commit**
```bash
git commit -m "feat(alerting): observability metrics via micrometer"
```
### Task 39: Update `.claude/rules/app-classes.md` + `core-classes.md`
- [ ] **Step 1: Document the new `alerting/` packages** in both rule files. Add a new subsection under `controller/` for the alerting env-scoped controllers. Document the new flat endpoint `/api/v1/alerts/notifications/{id}/retry` in the flat-allow-list with justification "notification IDs are globally unique; matches the `/api/v1/executions/{id}` precedent".
- [ ] **Step 2: Commit**
```bash
git add .claude/rules/app-classes.md .claude/rules/core-classes.md
git commit -m "docs(rules): document alerting/ packages + notification retry flat endpoint"
```
### Task 40: `application.yml` defaults + admin guide
**Files:**
- Modify: `cameleer-server-app/src/main/resources/application.yml`
- Create: `docs/alerting.md`
- [ ] **Step 1: Add default stanza**
```yaml
cameleer:
server:
alerting:
evaluator-tick-interval-ms: 5000
evaluator-batch-size: 20
claim-ttl-seconds: 30
notification-tick-interval-ms: 5000
notification-batch-size: 50
in-tick-cache-enabled: true
circuit-breaker-fail-threshold: 5
circuit-breaker-window-seconds: 30
circuit-breaker-cooldown-seconds: 60
event-retention-days: 90
notification-retention-days: 30
webhook-timeout-ms: 5000
webhook-max-attempts: 3
```
- [ ] **Step 2: Write `docs/alerting.md`** — 1-2 page admin guide covering: rule shapes per condition kind (with example JSON), template variables per kind, webhook destinations (Slack/PagerDuty/Teams examples), silence patterns, troubleshooting (circuit breaker, retention).
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-app/src/main/resources/application.yml docs/alerting.md
git commit -m "docs(alerting): default config + admin guide"
```
### Task 41: Full-lifecycle integration test
**Files:**
- Create: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java`
- [ ] **Step 1: Write the full-lifecycle IT**
Steps in the single test method:
1. Seed env, user with OPERATOR role, outbound connection (WireMock backing) with HMAC secret.
2. POST a `LOG_PATTERN` rule pointing at `WireMock` via the outbound connection, `forDurationSeconds=0`, `threshold=1`.
3. Inject a log row into ClickHouse that matches the pattern.
4. Trigger `AlertEvaluatorJob.tick()` directly.
5. Assert one `alert_instances` row in FIRING.
6. Trigger `NotificationDispatchJob.tick()`.
7. Assert WireMock received one POST with `X-Cameleer-Signature` header + rendered body.
8. POST `/alerts/{id}/ack` → state ACKNOWLEDGED.
9. Create a silence matching this rule; fire another tick; assert `silenced=true` on new instance and WireMock received no second request.
10. Remove the matching log rows, run tick → instance RESOLVED.
11. DELETE the rule → assert `alert_instances.rule_id = NULL` but `rule_snapshot` still retains rule name.
- [ ] **Step 2: Run — PASS** (may need a few iterations of debugging).
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java
git commit -m "test(alerting): full lifecycle — fire, notify, silence, ack, resolve, delete"
```
### Task 42: Env-isolation + outbound-guard regression tests
**Files:**
- Create: `.../alerting/AlertingEnvIsolationIT.java`, `OutboundConnectionAllowedEnvIT.java`
- [ ] **Step 1: Env isolation** — rule in env-A, fire, assert invisible from env-B inbox.
- [ ] **Step 2: Outbound guard** — rule references a connection restricted to env-A; POST rule creation in env-B → 422. Narrowing `allowed_environment_ids` on the connection while a rule still references it → 409 (this exercises the freshly-wired `rulesReferencing`).
- [ ] **Step 3: Run — PASS**.
- [ ] **Step 4: Commit**
```bash
git commit -m "test(alerting): env isolation + outbound allowed-env guard"
```
### Task 43: Final verification + GitNexus reindex
- [ ] **Step 1: Full build**
```bash
mvn clean verify
```
Expected: All tests pass. Known pre-existing test debt (wrong-JdbcTemplate + shared-context state leaks) may still fail — document any failures that existed before Plan 02 in a commit message "known-pre-existing" note.
- [ ] **Step 2: GitNexus reindex**
```bash
npx gitnexus analyze --embeddings
```
- [ ] **Step 3: Manual smoke**
Start backend + UI (Plan 01 UI is sufficient for outbound connections). Walk through:
- Create an outbound connection to `https://httpbin.org/post`.
- `curl` the alerting REST API to POST a `LOG_PATTERN` rule.
- Inject a matching log via `POST /api/v1/data/logs`.
- Wait 2 eval ticks + 1 notification tick.
- Confirm: `alert_instances` row in FIRING, `alert_notifications` row DELIVERED with HTTP 200, httpbin shows the body.
- `curl POST /alerts/{id}/ack` → state ACKNOWLEDGED.
- [ ] **Step 4: Nothing to commit if all passes — plan complete**
---
## Known-incomplete items carried into Plan 03
- **UI:** `NotificationBell`, `/alerts/**` pages, `<MustacheEditor />` with variable auto-complete, CMD-K alert/rule sources. Open design question: completion engine choice (CodeMirror 6 vs Monaco vs textarea overlay) still open — see spec §20 #7.
- **Rule promotion across envs.** Pure UI flow (no new server endpoint); lives with the rule editor in Plan 03.
- **OIDC retrofit** to use `OutboundHttpClientFactory`. Unchanged from Plan 01 — a separate small follow-up.
- **TLS summary enrichment** on `/test` endpoint (Plan 01 stubbed as `"TLS"`). Can extract actual protocol + cipher suite + peer cert from Apache HttpClient 5's routed context.
- **Performance tests.** 500-rule, 5-replica `PerformanceIT` deferred; claim-polling concurrency is covered by Task 7's unit-level test.
- **Bulk promotion** and **mustache completion `variables` metadata endpoint** (`GET /alerts/rules/template-variables`) — deferred until usage patterns justify.
- **Rule deletion test debt.** Existing pre-Plan-02 test debt (wrong-JdbcTemplate bug in ~9 controller ITs + shared-context state leaks in `FlywayMigrationIT` / `ConfigEnvIsolationIT` / `ClickHouseStatsStoreIT`) is orthogonal and should be addressed in a dedicated test-hygiene pass.
---
## Self-review
**Spec coverage** (against `docs/superpowers/specs/2026-04-19-alerting-design.md`):
| Spec § | Scope | Covered by |
|---|---|---|
| §2 Signal sources (6) | All 6 condition kinds | Tasks 4, 2025 |
| §2 Delivery channels | In-app + webhook | Tasks 29, 30, 31 |
| §2 Lifecycle (FIRING/ACK/RESOLVED + SILENCED) | State machine + silence | Tasks 26, 18, 30, 33 |
| §2 Rule promotion | **Deferred to Plan 03 (UI)** | — |
| §2 CMD-K | **Deferred to Plan 03** | — |
| §2 Configurable cadence, 5 s floor | `AlertingProperties.effective*` | Task 26 |
| §3 Key decisions | All 14 decisions honoured | — |
| §4 Module layout | `core/alerting` + `app/alerting/**` | Tasks 311, 1538 |
| §4 Touchpoints | `countLogs` + `countExecutionsForAlerting` + `AuditCategory` + `SecurityConfig` | Tasks 2, 12, 13, 35 |
| §5 Data model | V12 migration | Task 1 |
| §5 Claim-polling queries | `FOR UPDATE SKIP LOCKED` in rule + notification repos | Tasks 7, 10 |
| §6 Outbound connections wiring | `rulesReferencing` gate | Task 8 (**CRITICAL**) |
| §7 Evaluator cadence, state machine, 4 projections, query coalescing, circuit breaker | Tick cache + projections + CB + SchedulingConfigurer | Tasks 14, 19, 26, 27 |
| §8 Notification dispatch, HMAC, template render, in-app inbox, 5s memoization | Tasks 28, 29, 30, 31 |
| §9 Rule promotion | **Deferred** (UI) | — |
| §10 Cross-cutting HTTP | Reused from Plan 01 | — |
| §11 API surface | All routes implemented except rule promotion | Tasks 3236 |
| §12 CMD-K | **Deferred to Plan 03** | — |
| §13 UI | **Deferred to Plan 03** | — |
| §14 Configuration | `AlertingProperties` + `application.yml` | Tasks 26, 40 |
| §15 Retention | Daily job | Task 37 |
| §16 Observability (metrics + audit) | Tasks 2, 38 |
| §17 Security (tenant/env, RBAC, SSRF, HMAC, TLS, audit) | Tasks 3236, 28, Plan 01 |
| §18 Testing | Unit + IT + WireMock + full-lifecycle | Tasks 17, 19, 2731, 41, 42 |
| §19 Rollout | Dormant-by-default; matching `application.yml` + docs | Task 40 |
| §20 #1 OIDC alignment | **Deferred** (follow-up) | — |
| §20 #2 secret encryption | Reused Plan 01 `SecretCipher` | Task 29 |
| §20 #3 CH migration naming | `alerting_projections.sql` | Task 14 |
| §20 #6 env-delete cascade audit | PG IT | Task 1 |
| §20 #7 Mustache completion engine | **Deferred** (UI) | — |
**Placeholders:** A handful of steps reference real record fields / method names with `/* … */` markers where the exact name depends on what the existing codebase exposes (`ExecutionStats` metric accessors, `AgentInfo.lastHeartbeat` method name, wither-method signatures on `AlertInstance`). Each is accompanied by a `gitnexus_context({name: ...})` hint for the implementer. These are not TBDs — they are direct instructions to resolve against the code at implementation time.
**Type consistency check:** `AlertRule`, `AlertInstance`, `AlertNotification`, `AlertSilence` field names in the Java records match the SQL column names (snake_case in SQL, camelCase in Java). `WebhookBinding.id` is used as `alert_notifications.webhook_id` — stable opaque reference. `OutboundConnection.createdBy/updatedBy` types match `users.user_id TEXT` (Plan 01 precedent). `rulesReferencing` signature matches Plan 01's stub `List<UUID> rulesReferencing(UUID)`.
**Risks flagged to executor:**
1. **Task 16 `MustacheRenderer` missing-variable fallback** is non-trivial in JMustache's default compiler config — implementer may need a second iteration. Tests lock the contract; the implementation approach is flexible.
2. **Task 12/13** — the SQL dialect for attribute map access on the `executions` table (`attributes[?]`) depends on the actual column type in `init.sql`. If attributes is `Map(String,String)`, the syntax works; if it's stored as JSON string, switch to `JSONExtractString(attributes, ?) = ?`.
3. **Task 27 `enrichTitleMessage`** depends on `AlertInstance` having wither methods — these are added opportunistically during Task 26 when `AlertStateTransitions` needs them. Don't forget to expose them.
4. **Claim-polling semantics under schema-per-tenant** — the `?currentSchema=tenant_{id}` JDBC URL routes writes correctly, but the `FOR UPDATE SKIP LOCKED` behaviour is per-schema so cross-tenant locks are irrelevant (correct behaviour). Make sure IT tests run with `cameleer.server.tenant.id=default`.
5. **Task 41 full-lifecycle test is the canary.** If it fails after each task, pair-program with the failing assertion — the bug is almost always in state transitions or renderer context shape.