Files
cameleer-server/docs/superpowers/plans/2026-04-19-alerting-02-backend.md

3430 lines
138 KiB
Markdown
Raw Normal View History

# Alerting — Plan 02 — Backend Implementation
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Deliver the server-side alerting feature described in `docs/superpowers/specs/2026-04-19-alerting-design.md` — domain model, storage, evaluators for all six condition kinds, notification dispatch (webhook + in-app inbox), REST API, retention, metrics, and integration tests. UI, CMD-K integration, and load tests are explicitly deferred to Plan 03.
**Architecture:** Confined to new `alerting/` packages in both `cameleer-server-core` (pure records + interfaces) and `cameleer-server-app` (Spring-wired storage, scheduling, REST). Postgres stores rules/instances/silences/notifications; ClickHouse stores observability data read by evaluators (new `countLogs` / `countExecutionsForAlerting` methods, four additive projections). Claim-polling `FOR UPDATE SKIP LOCKED` makes the evaluator and dispatcher horizontally scalable. Rule→connection wiring (`rulesReferencing`) is populated in this plan — it is the gate that unlocks safe production use of Plan 01.
**Tech Stack:** Java 17, Spring Boot 3.4.3, PostgreSQL (Flyway V12), ClickHouse (idempotent init SQL), JMustache for templates, Apache HttpClient 5 via Plan 01's `OutboundHttpClientFactory`, Testcontainers + JUnit 5 + WireMock + AssertJ for tests.
---
## Base branch
**Branch Plan 02 off `feat/alerting-01-outbound-infra`.** Plan 02 depends on Plan 01's `OutboundConnection` domain, `OutboundHttpClientFactory` bean, `SecretCipher`, `OutboundConnectionServiceImpl.rulesReferencing()` stub, the V11 migration, and the `OUTBOUND_CONNECTION_CHANGE` / `OUTBOUND_HTTP_TRUST_CHANGE` audit categories. Branching off `main` is **not** an option — those classes do not exist there yet. When Plan 01 merges, rebase Plan 02 onto main; until then Plan 02 is stacked PR #2.
```bash
# Execute in a fresh worktree
git fetch origin
git worktree add -b feat/alerting-02-backend .worktrees/alerting-02 feat/alerting-01-outbound-infra
cd .worktrees/alerting-02
mvn clean compile # confirm Plan 01 code compiles as baseline
```
---
## File Structure
### Created — `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/`
| File | Responsibility |
|---|---|
| `AlertingProperties.java` | Not here — see app module. |
| `AlertRule.java` | Immutable record: id, environmentId, name, description, severity, enabled, conditionKind, condition, evaluationIntervalSeconds, forDurationSeconds, reNotifyMinutes, notificationTitleTmpl, notificationMessageTmpl, webhooks, targets, nextEvaluationAt, claimedBy, claimedUntil, evalState, audit fields. |
| `AlertCondition.java` | Sealed interface; Jackson `kind`-based polymorphism root (Id.NAME + EXISTING_PROPERTY). |
| `RouteMetricCondition.java` | Record: scope, metric, comparator, threshold, windowSeconds. |
| `ExchangeMatchCondition.java` | Record: scope, filter, fireMode, threshold, windowSeconds, perExchangeLingerSeconds. |
| `AgentStateCondition.java` | Record: scope, state, forSeconds. |
| `DeploymentStateCondition.java` | Record: scope, states. |
| `LogPatternCondition.java` | Record: scope, level, pattern, threshold, windowSeconds. |
| `JvmMetricCondition.java` | Record: scope, metric, aggregation, comparator, threshold, windowSeconds. |
| `AlertScope.java` | Record: appSlug?, routeId?, agentId? — nullable fields, used by all conditions. |
| `ConditionKind.java` | Enum mirror of SQL `condition_kind_enum`. |
| `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java` | Enums used in conditions. |
| `AlertSeverity.java` | Enum mirror of SQL `severity_enum`. |
| `AlertState.java` | Enum mirror of SQL `alert_state_enum`. |
| `AlertInstance.java` | Immutable record for `alert_instances` row. |
| `AlertRuleTarget.java` | Record for `alert_rule_targets` row. |
| `TargetKind.java` | Enum mirror of SQL `target_kind_enum`. |
| `AlertSilence.java` | Record: id, environmentId, matcher, reason, startsAt, endsAt, createdBy, createdAt. |
| `SilenceMatcher.java` | Record: ruleId?, appSlug?, routeId?, agentId?, severity?. |
| `AlertNotification.java` | Record for `alert_notifications` outbox row. |
| `NotificationStatus.java` | Enum mirror of SQL `notification_status_enum`. |
| `WebhookBinding.java` | Record embedded in `alert_rules.webhooks` JSONB: id, outboundConnectionId, bodyOverride?, headerOverrides?. |
| `AlertRuleRepository.java` | CRUD + claim-polling interface. |
| `AlertInstanceRepository.java` | CRUD + query-for-inbox interface. |
| `AlertSilenceRepository.java` | CRUD interface. |
| `AlertNotificationRepository.java` | CRUD + claim-polling interface. |
| `AlertReadRepository.java` | Mark-read + count-unread interface. |
### Created — `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/`
| File | Responsibility |
|---|---|
| `config/AlertingProperties.java` | `@ConfigurationProperties("cameleer.server.alerting")`. |
| `config/AlertingBeanConfig.java` | Bean wiring for repositories, evaluators, dispatch, mustache renderer, etc. |
| `storage/PostgresAlertRuleRepository.java` | JdbcTemplate impl of `AlertRuleRepository`. |
| `storage/PostgresAlertInstanceRepository.java` | JdbcTemplate impl. |
| `storage/PostgresAlertSilenceRepository.java` | JdbcTemplate impl. |
| `storage/PostgresAlertNotificationRepository.java` | JdbcTemplate impl. |
| `storage/PostgresAlertReadRepository.java` | JdbcTemplate impl. |
| `eval/EvalContext.java` | Per-tick context (tenantId, now, tickCache). |
| `eval/EvalResult.java` | Sealed: `Firing(value, threshold, contextMap)` / `Clear` / `Error(Throwable)`. |
| `eval/TickCache.java` | `ConcurrentHashMap<String,Object>` discarded per tick. |
| `eval/PerKindCircuitBreaker.java` | Failure window + cooldown per `ConditionKind`. |
| `eval/ConditionEvaluator.java` | Generic interface: `evaluate(C, AlertRule, EvalContext)`. |
| `eval/RouteMetricEvaluator.java` | Reads `StatsStore`. |
| `eval/ExchangeMatchEvaluator.java` | Reads `ClickHouseSearchIndex.countExecutionsForAlerting` + `SearchService.search` for PER_EXCHANGE cursor mode. |
| `eval/AgentStateEvaluator.java` | Reads `AgentRegistryService.findAll`. |
| `eval/DeploymentStateEvaluator.java` | Reads `DeploymentRepository.findByAppId`. |
| `eval/LogPatternEvaluator.java` | Reads new `ClickHouseLogStore.countLogs`. |
| `eval/JvmMetricEvaluator.java` | Reads `MetricsQueryStore.queryTimeSeries`. |
| `eval/AlertEvaluatorJob.java` | `@Component` implementing `SchedulingConfigurer`; claim-polling loop. |
| `eval/AlertStateTransitions.java` | Pure function: given current instance + EvalResult → new state + timestamps. |
| `notify/MustacheRenderer.java` | JMustache wrapper; resilient to bad templates. |
| `notify/NotificationContextBuilder.java` | Pure: builds context map from `AlertInstance` + rule + env. |
| `notify/SilenceMatcher.java` | Pure: evaluates a `SilenceMatcher` against an `AlertInstance`. |
| `notify/InAppInboxQuery.java` | Server-side query helper for `/alerts` and unread-count. |
| `notify/WebhookDispatcher.java` | Renders + POSTs + HMAC signs; classifies 2xx/4xx/5xx → status. |
| `notify/NotificationDispatchJob.java` | `@Component` `SchedulingConfigurer`; claim-polling on `alert_notifications`. |
| `notify/HmacSigner.java` | Pure: computes `sha256=<hmac(secret, body)>`. |
| `retention/AlertingRetentionJob.java` | `@Scheduled(cron = "0 0 3 * * *")` — delete old `alert_instances` + `alert_notifications`. |
| `controller/AlertRuleController.java` | `/api/v1/environments/{envSlug}/alerts/rules`. |
| `controller/AlertController.java` | `/api/v1/environments/{envSlug}/alerts` + instance actions. |
| `controller/AlertSilenceController.java` | `/api/v1/environments/{envSlug}/alerts/silences`. |
| `controller/AlertNotificationController.java` | `/api/v1/environments/{envSlug}/alerts/{id}/notifications`, `/alerts/notifications/{id}/retry`. |
| `dto/AlertRuleDto.java`, `dto/AlertDto.java`, `dto/AlertSilenceDto.java`, `dto/AlertNotificationDto.java`, `dto/ConditionDto.java`, `dto/WebhookBindingDto.java`, `dto/RenderPreviewRequest.java`, `dto/RenderPreviewResponse.java`, `dto/TestEvaluateRequest.java`, `dto/TestEvaluateResponse.java`, `dto/UnreadCountResponse.java` | Request/response DTOs. |
| `metrics/AlertingMetrics.java` | Micrometer registrations for counters/gauges/histograms. |
### Created — resources
| File | Responsibility |
|---|---|
| `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql` | Flyway migration: 5 enums, 6 tables, indexes, cascades. |
| `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql` | 4 projections on `executions` / `logs` / `agent_metrics`, all `IF NOT EXISTS`. |
### Modified
| File | Change |
|---|---|
| `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` | Add `ALERT_RULE_CHANGE`, `ALERT_SILENCE_CHANGE`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java` | Replace the `rulesReferencing(UUID)` stub with a call through `AlertRuleRepository.findRuleIdsByOutboundConnectionId`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java` | Add `long countLogs(LogSearchRequest)` — no `FINAL`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java` | Add `long countExecutionsForAlerting(AlertMatchSpec)` — no `FINAL`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java` | Run `alerting_projections.sql` via existing `ClickHouseSchemaInitializer`. |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java` | Permit new `/api/v1/environments/{envSlug}/alerts/**` path matchers with role-based access. |
| `cameleer-server-core/pom.xml` | Add `com.samskivert:jmustache:1.16`. |
| `.claude/rules/app-classes.md`, `.claude/rules/core-classes.md` | Document new packages. |
| `cameleer-server-app/src/main/resources/application.yml` | Default `AlertingProperties` stanza + comment linking to the admin guide. |
---
## Conventions
- **TDD.** Every task starts with a failing test, implements the minimum to pass, then commits.
- **One commit per task.** Commit messages: `feat(alerting): …`, `test(alerting): …`, `fix(alerting): …`, `chore(alerting): …`, `docs(alerting): …`.
- **Tenant invariant.** Every ClickHouse query and Postgres table referencing observability data filters by `tenantId` (injected via `AlertingBeanConfig` from `cameleer.server.tenant.id`).
- **No `FINAL`** on the two new CH count methods — alerting tolerates brief duplicate counts.
- **Jackson polymorphism** via `@JsonTypeInfo(use = Id.NAME, property = "kind", include = EXISTING_PROPERTY)` with `@JsonSubTypes` on `AlertCondition`.
- **Pure `core/`, Spring-only in `app/`.** No `@Component`, `@Service`, or `@Scheduled` annotations in `cameleer-server-core`.
- **Claim polling.** `FOR UPDATE SKIP LOCKED` + `claimed_by` / `claimed_until` with 30 s TTL.
- **Instance id** for claim ownership: use `InetAddress.getLocalHost().getHostName() + ":" + processPid()`; exposed as a bean `"alertingInstanceId"` of type `String`.
- **GitNexus hygiene.** Before modifying any existing class (`OutboundConnectionServiceImpl`, `ClickHouseLogStore`, `ClickHouseSearchIndex`, `AuditCategory`, `SecurityConfig`), run `gitnexus_impact({target: "<className>", direction: "upstream"})` and report blast radius. Run `gitnexus_detect_changes()` before each commit.
---
## Phase 1 — Flyway V12 migration and audit categories
### Task 1: `V12__alerting_tables.sql`
**Files:**
- Create: `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java`
- [ ] **Step 1: Write the failing integration test**
```java
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class V12MigrationIT extends AbstractPostgresIT {
@Test
void allAlertingTablesAndEnumsExist() {
var tables = jdbcTemplate.queryForList(
"SELECT table_name FROM information_schema.tables WHERE table_schema='public' " +
"AND table_name IN ('alert_rules','alert_rule_targets','alert_instances'," +
"'alert_silences','alert_notifications','alert_reads')",
String.class);
assertThat(tables).containsExactlyInAnyOrder(
"alert_rules","alert_rule_targets","alert_instances",
"alert_silences","alert_notifications","alert_reads");
var enums = jdbcTemplate.queryForList(
"SELECT typname FROM pg_type WHERE typname IN " +
"('severity_enum','condition_kind_enum','alert_state_enum'," +
"'target_kind_enum','notification_status_enum')",
String.class);
assertThat(enums).hasSize(5);
}
@Test
void deletingEnvironmentCascadesAlertingRows() {
var envId = java.util.UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-cascade-env");
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES (?, ?, 'x', 'a@b', true)", "u1", "u1");
var ruleId = java.util.UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " +
"notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " +
"VALUES (?, ?, 'r', 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'u1', 'u1')",
ruleId, envId);
var instanceId = java.util.UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO alert_instances (id, rule_id, rule_snapshot, environment_id, state, severity, " +
"fired_at, context, title, message) VALUES (?, ?, '{}'::jsonb, ?, 'FIRING', 'WARNING', " +
"now(), '{}'::jsonb, 't', 'm')",
instanceId, ruleId, envId);
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
assertThat(jdbcTemplate.queryForObject(
"SELECT count(*) FROM alert_rules WHERE environment_id = ?",
Integer.class, envId)).isZero();
assertThat(jdbcTemplate.queryForObject(
"SELECT count(*) FROM alert_instances WHERE environment_id = ?",
Integer.class, envId)).isZero();
}
}
```
- [ ] **Step 2: Run the test to verify it fails**
Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT`
Expected: FAIL — tables do not exist.
- [ ] **Step 3: Write the migration**
Create `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql`:
```sql
-- Enums (outbound_method_enum / outbound_auth_kind_enum / trust_mode_enum already exist from V11)
CREATE TYPE severity_enum AS ENUM ('CRITICAL','WARNING','INFO');
CREATE TYPE condition_kind_enum AS ENUM ('ROUTE_METRIC','EXCHANGE_MATCH','AGENT_STATE','DEPLOYMENT_STATE','LOG_PATTERN','JVM_METRIC');
CREATE TYPE alert_state_enum AS ENUM ('PENDING','FIRING','ACKNOWLEDGED','RESOLVED');
CREATE TYPE target_kind_enum AS ENUM ('USER','GROUP','ROLE');
CREATE TYPE notification_status_enum AS ENUM ('PENDING','DELIVERED','FAILED');
CREATE TABLE alert_rules (
id uuid PRIMARY KEY,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
name varchar(200) NOT NULL,
description text,
severity severity_enum NOT NULL,
enabled boolean NOT NULL DEFAULT true,
condition_kind condition_kind_enum NOT NULL,
condition jsonb NOT NULL,
evaluation_interval_seconds int NOT NULL DEFAULT 60 CHECK (evaluation_interval_seconds >= 5),
for_duration_seconds int NOT NULL DEFAULT 0 CHECK (for_duration_seconds >= 0),
re_notify_minutes int NOT NULL DEFAULT 60 CHECK (re_notify_minutes >= 0),
notification_title_tmpl text NOT NULL,
notification_message_tmpl text NOT NULL,
webhooks jsonb NOT NULL DEFAULT '[]',
next_evaluation_at timestamptz NOT NULL DEFAULT now(),
claimed_by varchar(64),
claimed_until timestamptz,
eval_state jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now(),
created_by text NOT NULL REFERENCES users(user_id),
updated_at timestamptz NOT NULL DEFAULT now(),
updated_by text NOT NULL REFERENCES users(user_id)
);
CREATE INDEX alert_rules_env_idx ON alert_rules (environment_id);
CREATE INDEX alert_rules_claim_due_idx ON alert_rules (next_evaluation_at) WHERE enabled = true;
CREATE TABLE alert_rule_targets (
id uuid PRIMARY KEY,
rule_id uuid NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
target_kind target_kind_enum NOT NULL,
target_id varchar(128) NOT NULL,
UNIQUE (rule_id, target_kind, target_id)
);
CREATE INDEX alert_rule_targets_lookup_idx ON alert_rule_targets (target_kind, target_id);
CREATE TABLE alert_instances (
id uuid PRIMARY KEY,
rule_id uuid REFERENCES alert_rules(id) ON DELETE SET NULL,
rule_snapshot jsonb NOT NULL,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
state alert_state_enum NOT NULL,
severity severity_enum NOT NULL,
fired_at timestamptz NOT NULL,
acked_at timestamptz,
acked_by text REFERENCES users(user_id),
resolved_at timestamptz,
last_notified_at timestamptz,
silenced boolean NOT NULL DEFAULT false,
current_value numeric,
threshold numeric,
context jsonb NOT NULL,
title text NOT NULL,
message text NOT NULL,
target_user_ids text[] NOT NULL DEFAULT '{}',
target_group_ids uuid[] NOT NULL DEFAULT '{}',
target_role_names text[] NOT NULL DEFAULT '{}'
);
CREATE INDEX alert_instances_inbox_idx ON alert_instances (environment_id, state, fired_at DESC);
CREATE INDEX alert_instances_open_rule_idx ON alert_instances (rule_id, state) WHERE rule_id IS NOT NULL;
CREATE INDEX alert_instances_resolved_idx ON alert_instances (resolved_at) WHERE state = 'RESOLVED';
CREATE INDEX alert_instances_target_u_idx ON alert_instances USING GIN (target_user_ids);
CREATE INDEX alert_instances_target_g_idx ON alert_instances USING GIN (target_group_ids);
CREATE INDEX alert_instances_target_r_idx ON alert_instances USING GIN (target_role_names);
CREATE TABLE alert_silences (
id uuid PRIMARY KEY,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
matcher jsonb NOT NULL,
reason text,
starts_at timestamptz NOT NULL,
ends_at timestamptz NOT NULL CHECK (ends_at > starts_at),
created_by text NOT NULL REFERENCES users(user_id),
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_silences_active_idx ON alert_silences (environment_id, ends_at);
CREATE TABLE alert_notifications (
id uuid PRIMARY KEY,
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
webhook_id uuid,
outbound_connection_id uuid REFERENCES outbound_connections(id) ON DELETE SET NULL,
status notification_status_enum NOT NULL DEFAULT 'PENDING',
attempts int NOT NULL DEFAULT 0,
next_attempt_at timestamptz NOT NULL DEFAULT now(),
claimed_by varchar(64),
claimed_until timestamptz,
last_response_status int,
last_response_snippet text,
payload jsonb NOT NULL,
delivered_at timestamptz,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_notifications_pending_idx ON alert_notifications (next_attempt_at) WHERE status = 'PENDING';
CREATE INDEX alert_notifications_instance_idx ON alert_notifications (alert_instance_id);
CREATE TABLE alert_reads (
user_id text NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
read_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (user_id, alert_instance_id)
);
```
Notes:
- Plan 01 established `users.user_id` as TEXT. All FK-to-users columns in this migration are `text`, not `uuid`.
- `target_user_ids` is `text[]` (matches `users.user_id`).
- `outbound_connections` (Plan 01) is referenced with `ON DELETE SET NULL` — matches the spec's "409 if referenced" semantics at the app layer while preserving referential cleanup if the admin-facing guard is bypassed.
- [ ] **Step 4: Run the test to verify it passes**
Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT`
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java
git commit -m "feat(alerting): V12 flyway migration for alerting tables"
```
### Task 2: Extend `AuditCategory`
**Files:**
- Modify: `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "AuditCategory", direction: "upstream"})` — report the blast radius (additive enum values are non-breaking; affected files are the admin rule file + any switch statements).
- [ ] **Step 2: Write the failing test**
```java
package com.cameleer.server.core.admin;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class AuditCategoryTest {
@Test
void alertingCategoriesPresent() {
assertThat(AuditCategory.valueOf("ALERT_RULE_CHANGE")).isNotNull();
assertThat(AuditCategory.valueOf("ALERT_SILENCE_CHANGE")).isNotNull();
}
}
```
- [ ] **Step 3: Run the test — FAIL**
Run: `mvn -pl cameleer-server-core test -Dtest=AuditCategoryTest`
Expected: FAIL — `IllegalArgumentException: No enum constant`.
- [ ] **Step 4: Add the enum values**
Replace the whole enum body with:
```java
package com.cameleer.server.core.admin;
public enum AuditCategory {
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE
}
```
- [ ] **Step 5: Run the test — PASS**
- [ ] **Step 6: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java \
cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java
git commit -m "feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories"
```
---
## Phase 2 — Core domain model
Each task in this phase adds a small, focused set of pure-Java records and enums under `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/`. All records use canonical constructors with explicit `@NotNull`-style defensive copying only for mutable collections (`List.copyOf`, `Map.copyOf`). Jackson polymorphism is handled by `@JsonTypeInfo(use = Id.NAME, property = "kind", include = EXISTING_PROPERTY)` on `AlertCondition` — the subtype is read from the existing `kind` field each record exposes.
### Task 3: Enums + `AlertScope`
**Files:**
- Create: `.../alerting/AlertSeverity.java`, `AlertState.java`, `ConditionKind.java`, `TargetKind.java`, `NotificationStatus.java`, `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java`, `AlertScope.java`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.core.alerting;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class AlertScopeTest {
@Test
void allFieldsNullIsEnvWide() {
var s = new AlertScope(null, null, null);
assertThat(s.isEnvWide()).isTrue();
}
@Test
void appScoped() {
var s = new AlertScope("orders", null, null);
assertThat(s.isEnvWide()).isFalse();
assertThat(s.appSlug()).isEqualTo("orders");
}
@Test
void enumsHaveExpectedValues() {
assertThat(AlertSeverity.values()).containsExactly(
AlertSeverity.CRITICAL, AlertSeverity.WARNING, AlertSeverity.INFO);
assertThat(AlertState.values()).containsExactly(
AlertState.PENDING, AlertState.FIRING, AlertState.ACKNOWLEDGED, AlertState.RESOLVED);
assertThat(ConditionKind.values()).hasSize(6);
assertThat(TargetKind.values()).containsExactly(
TargetKind.USER, TargetKind.GROUP, TargetKind.ROLE);
assertThat(NotificationStatus.values()).containsExactly(
NotificationStatus.PENDING, NotificationStatus.DELIVERED, NotificationStatus.FAILED);
}
}
```
- [ ] **Step 2: Run — FAIL** (`cannot find symbol`).
Run: `mvn -pl cameleer-server-core test -Dtest=AlertScopeTest`
- [ ] **Step 3: Create the files**
```java
// AlertSeverity.java
package com.cameleer.server.core.alerting;
public enum AlertSeverity { CRITICAL, WARNING, INFO }
// AlertState.java
package com.cameleer.server.core.alerting;
public enum AlertState { PENDING, FIRING, ACKNOWLEDGED, RESOLVED }
// ConditionKind.java
package com.cameleer.server.core.alerting;
public enum ConditionKind { ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC }
// TargetKind.java
package com.cameleer.server.core.alerting;
public enum TargetKind { USER, GROUP, ROLE }
// NotificationStatus.java
package com.cameleer.server.core.alerting;
public enum NotificationStatus { PENDING, DELIVERED, FAILED }
// RouteMetric.java
package com.cameleer.server.core.alerting;
public enum RouteMetric { ERROR_RATE, P95_LATENCY_MS, P99_LATENCY_MS, THROUGHPUT, ERROR_COUNT }
// Comparator.java
package com.cameleer.server.core.alerting;
public enum Comparator { GT, GTE, LT, LTE, EQ }
// AggregationOp.java
package com.cameleer.server.core.alerting;
public enum AggregationOp { MAX, MIN, AVG, LATEST }
// FireMode.java
package com.cameleer.server.core.alerting;
public enum FireMode { PER_EXCHANGE, COUNT_IN_WINDOW }
// AlertScope.java
package com.cameleer.server.core.alerting;
public record AlertScope(String appSlug, String routeId, String agentId) {
public boolean isEnvWide() { return appSlug == null && routeId == null && agentId == null; }
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java
git commit -m "feat(alerting): core enums + AlertScope"
```
### Task 4: `AlertCondition` sealed hierarchy + Jackson polymorphism
**Files:**
- Create: `.../alerting/AlertCondition.java`, `RouteMetricCondition.java`, `ExchangeMatchCondition.java` (with nested `ExchangeFilter`), `AgentStateCondition.java`, `DeploymentStateCondition.java`, `LogPatternCondition.java`, `JvmMetricCondition.java`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.core.alerting;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class AlertConditionJsonTest {
private final ObjectMapper om = new ObjectMapper();
@Test
void roundtripRouteMetric() throws Exception {
var c = new RouteMetricCondition(
new AlertScope("orders", "route-1", null),
RouteMetric.P99_LATENCY_MS, Comparator.GT, 2000.0, 300);
String json = om.writeValueAsString((AlertCondition) c);
AlertCondition parsed = om.readValue(json, AlertCondition.class);
assertThat(parsed).isInstanceOf(RouteMetricCondition.class);
assertThat(parsed.kind()).isEqualTo(ConditionKind.ROUTE_METRIC);
}
@Test
void roundtripExchangeMatchPerExchange() throws Exception {
var c = new ExchangeMatchCondition(
new AlertScope("orders", null, null),
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of("type","payment")),
FireMode.PER_EXCHANGE, null, null, 300);
String json = om.writeValueAsString((AlertCondition) c);
AlertCondition parsed = om.readValue(json, AlertCondition.class);
assertThat(parsed).isInstanceOf(ExchangeMatchCondition.class);
}
@Test
void roundtripExchangeMatchCountInWindow() throws Exception {
var c = new ExchangeMatchCondition(
new AlertScope("orders", null, null),
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
FireMode.COUNT_IN_WINDOW, 5, 900, null);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(((ExchangeMatchCondition) parsed).threshold()).isEqualTo(5);
}
@Test
void roundtripAgentState() throws Exception {
var c = new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(AgentStateCondition.class);
}
@Test
void roundtripDeploymentState() throws Exception {
var c = new DeploymentStateCondition(new AlertScope("orders", null, null), List.of("FAILED","DEGRADED"));
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(DeploymentStateCondition.class);
}
@Test
void roundtripLogPattern() throws Exception {
var c = new LogPatternCondition(new AlertScope("orders", null, null),
"ERROR", "TimeoutException", 5, 900);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(LogPatternCondition.class);
}
@Test
void roundtripJvmMetric() throws Exception {
var c = new JvmMetricCondition(new AlertScope("orders", null, null),
"heap_used_percent", AggregationOp.MAX, Comparator.GT, 90.0, 300);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(JvmMetricCondition.class);
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Create the sealed hierarchy**
```java
// AlertCondition.java
package com.cameleer.server.core.alerting;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "kind",
include = JsonTypeInfo.As.EXISTING_PROPERTY, visible = true)
@JsonSubTypes({
@JsonSubTypes.Type(value = RouteMetricCondition.class, name = "ROUTE_METRIC"),
@JsonSubTypes.Type(value = ExchangeMatchCondition.class, name = "EXCHANGE_MATCH"),
@JsonSubTypes.Type(value = AgentStateCondition.class, name = "AGENT_STATE"),
@JsonSubTypes.Type(value = DeploymentStateCondition.class, name = "DEPLOYMENT_STATE"),
@JsonSubTypes.Type(value = LogPatternCondition.class, name = "LOG_PATTERN"),
@JsonSubTypes.Type(value = JvmMetricCondition.class, name = "JVM_METRIC")
})
public sealed interface AlertCondition permits
RouteMetricCondition, ExchangeMatchCondition, AgentStateCondition,
DeploymentStateCondition, LogPatternCondition, JvmMetricCondition {
ConditionKind kind();
AlertScope scope();
}
```
```java
// RouteMetricCondition.java
package com.cameleer.server.core.alerting;
public record RouteMetricCondition(
AlertScope scope,
RouteMetric metric,
Comparator comparator,
double threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.ROUTE_METRIC; }
}
```
```java
// ExchangeMatchCondition.java
package com.cameleer.server.core.alerting;
import java.util.Map;
public record ExchangeMatchCondition(
AlertScope scope,
ExchangeFilter filter,
FireMode fireMode,
Integer threshold, // required when COUNT_IN_WINDOW; null for PER_EXCHANGE
Integer windowSeconds, // required when COUNT_IN_WINDOW
Integer perExchangeLingerSeconds // required when PER_EXCHANGE
) implements AlertCondition {
public ExchangeMatchCondition {
if (fireMode == FireMode.COUNT_IN_WINDOW && (threshold == null || windowSeconds == null))
throw new IllegalArgumentException("COUNT_IN_WINDOW requires threshold + windowSeconds");
if (fireMode == FireMode.PER_EXCHANGE && perExchangeLingerSeconds == null)
throw new IllegalArgumentException("PER_EXCHANGE requires perExchangeLingerSeconds");
}
@Override public ConditionKind kind() { return ConditionKind.EXCHANGE_MATCH; }
public record ExchangeFilter(String status, Map<String, String> attributes) {
public ExchangeFilter { attributes = attributes == null ? Map.of() : Map.copyOf(attributes); }
}
}
```
```java
// AgentStateCondition.java
package com.cameleer.server.core.alerting;
public record AgentStateCondition(AlertScope scope, String state, int forSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
}
```
```java
// DeploymentStateCondition.java
package com.cameleer.server.core.alerting;
import java.util.List;
public record DeploymentStateCondition(AlertScope scope, List<String> states) implements AlertCondition {
public DeploymentStateCondition { states = List.copyOf(states); }
@Override public ConditionKind kind() { return ConditionKind.DEPLOYMENT_STATE; }
}
```
```java
// LogPatternCondition.java
package com.cameleer.server.core.alerting;
public record LogPatternCondition(
AlertScope scope,
String level,
String pattern,
int threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.LOG_PATTERN; }
}
```
```java
// JvmMetricCondition.java
package com.cameleer.server.core.alerting;
public record JvmMetricCondition(
AlertScope scope,
String metric,
AggregationOp aggregation,
Comparator comparator,
double threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.JVM_METRIC; }
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java
git commit -m "feat(alerting): sealed AlertCondition hierarchy with Jackson deduction"
```
### Task 5: Core data records (`AlertRule`, `AlertInstance`, `AlertSilence`, `SilenceMatcher`, `AlertRuleTarget`, `AlertNotification`, `WebhookBinding`)
**Files:**
- Create: the seven records above under `.../alerting/`
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.core.alerting;
import org.junit.jupiter.api.Test;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class AlertDomainRecordsTest {
@Test
void alertRuleDefensiveCopy() {
var webhooks = new java.util.ArrayList<WebhookBinding>();
webhooks.add(new WebhookBinding(UUID.randomUUID(), UUID.randomUUID(), null, null));
var r = newRule(webhooks);
webhooks.clear();
assertThat(r.webhooks()).hasSize(1);
}
@Test
void silenceMatcherAllFieldsNullMatchesEverything() {
var m = new SilenceMatcher(null, null, null, null, null);
assertThat(m.isWildcard()).isTrue();
}
private AlertRule newRule(List<WebhookBinding> wh) {
return new AlertRule(
UUID.randomUUID(), UUID.randomUUID(), "r", null,
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
60, 0, 60, "t", "m", wh, List.of(),
Instant.now(), null, null, Map.of(),
Instant.now(), "u1", Instant.now(), "u1");
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Create the records**
```java
// AlertRule.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
public record AlertRule(
UUID id,
UUID environmentId,
String name,
String description,
AlertSeverity severity,
boolean enabled,
ConditionKind conditionKind,
AlertCondition condition,
int evaluationIntervalSeconds,
int forDurationSeconds,
int reNotifyMinutes,
String notificationTitleTmpl,
String notificationMessageTmpl,
List<WebhookBinding> webhooks,
List<AlertRuleTarget> targets,
Instant nextEvaluationAt,
String claimedBy,
Instant claimedUntil,
Map<String, Object> evalState,
Instant createdAt,
String createdBy,
Instant updatedAt,
String updatedBy) {
public AlertRule {
webhooks = webhooks == null ? List.of() : List.copyOf(webhooks);
targets = targets == null ? List.of() : List.copyOf(targets);
evalState = evalState == null ? Map.of() : Map.copyOf(evalState);
}
}
```
```java
// AlertInstance.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
public record AlertInstance(
UUID id,
UUID ruleId, // nullable after rule deletion
Map<String, Object> ruleSnapshot,
UUID environmentId,
AlertState state,
AlertSeverity severity,
Instant firedAt,
Instant ackedAt,
String ackedBy,
Instant resolvedAt,
Instant lastNotifiedAt,
boolean silenced,
Double currentValue,
Double threshold,
Map<String, Object> context,
String title,
String message,
List<String> targetUserIds,
List<UUID> targetGroupIds,
List<String> targetRoleNames) {
public AlertInstance {
ruleSnapshot = ruleSnapshot == null ? Map.of() : Map.copyOf(ruleSnapshot);
context = context == null ? Map.of() : Map.copyOf(context);
targetUserIds = targetUserIds == null ? List.of() : List.copyOf(targetUserIds);
targetGroupIds = targetGroupIds == null ? List.of() : List.copyOf(targetGroupIds);
targetRoleNames = targetRoleNames == null ? List.of() : List.copyOf(targetRoleNames);
}
}
```
```java
// AlertRuleTarget.java
package com.cameleer.server.core.alerting;
import java.util.UUID;
public record AlertRuleTarget(UUID id, UUID ruleId, TargetKind kind, String targetId) {}
```
```java
// WebhookBinding.java
package com.cameleer.server.core.alerting;
import java.util.Map;
import java.util.UUID;
public record WebhookBinding(
UUID id,
UUID outboundConnectionId,
String bodyOverride,
Map<String, String> headerOverrides) {
public WebhookBinding {
headerOverrides = headerOverrides == null ? Map.of() : Map.copyOf(headerOverrides);
}
}
```
```java
// SilenceMatcher.java
package com.cameleer.server.core.alerting;
import java.util.UUID;
public record SilenceMatcher(
UUID ruleId, String appSlug, String routeId, String agentId, AlertSeverity severity) {
public boolean isWildcard() {
return ruleId == null && appSlug == null && routeId == null && agentId == null && severity == null;
}
}
```
```java
// AlertSilence.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.UUID;
public record AlertSilence(
UUID id,
UUID environmentId,
SilenceMatcher matcher,
String reason,
Instant startsAt,
Instant endsAt,
String createdBy,
Instant createdAt) {}
```
```java
// AlertNotification.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.Map;
import java.util.UUID;
public record AlertNotification(
UUID id,
UUID alertInstanceId,
UUID webhookId,
UUID outboundConnectionId,
NotificationStatus status,
int attempts,
Instant nextAttemptAt,
String claimedBy,
Instant claimedUntil,
Integer lastResponseStatus,
String lastResponseSnippet,
Map<String, Object> payload,
Instant deliveredAt,
Instant createdAt) {
public AlertNotification {
payload = payload == null ? Map.of() : Map.copyOf(payload);
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java
git commit -m "feat(alerting): core domain records (rule, instance, silence, notification)"
```
### Task 6: Repository interfaces
**Files:**
- Create: `.../alerting/AlertRuleRepository.java`, `AlertInstanceRepository.java`, `AlertSilenceRepository.java`, `AlertNotificationRepository.java`, `AlertReadRepository.java`
- No test (pure interfaces — covered by the Phase 3 integration tests).
- [ ] **Step 1: Create the interfaces**
```java
// AlertRuleRepository.java
package com.cameleer.server.core.alerting;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertRuleRepository {
AlertRule save(AlertRule rule); // upsert by id
Optional<AlertRule> findById(UUID id);
List<AlertRule> listByEnvironment(UUID environmentId);
List<AlertRule> findAllByOutboundConnectionId(UUID connectionId);
List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId); // used by rulesReferencing()
void delete(UUID id);
/** Claim up to batchSize rules whose next_evaluation_at <= now AND (claimed_until IS NULL OR claimed_until < now).
* Atomically sets claimed_by + claimed_until = now + ttl. Returns claimed rules. */
List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds);
/** Release claim + bump next_evaluation_at. */
void releaseClaim(UUID ruleId, java.time.Instant nextEvaluationAt,
java.util.Map<String, Object> evalState);
}
```
```java
// AlertInstanceRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertInstanceRepository {
AlertInstance save(AlertInstance instance); // upsert by id
Optional<AlertInstance> findById(UUID id);
Optional<AlertInstance> findOpenForRule(UUID ruleId); // state IN ('PENDING','FIRING','ACKNOWLEDGED')
List<AlertInstance> listForInbox(UUID environmentId,
List<String> userGroupIdFilter, // UUIDs as String? decide impl-side
String userId,
List<String> userRoleNames,
int limit);
long countUnreadForUser(UUID environmentId, String userId);
void ack(UUID id, String userId, Instant when);
void resolve(UUID id, Instant when);
void markSilenced(UUID id, boolean silenced);
void deleteResolvedBefore(Instant cutoff);
}
```
```java
// AlertSilenceRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertSilenceRepository {
AlertSilence save(AlertSilence silence);
Optional<AlertSilence> findById(UUID id);
List<AlertSilence> listActive(UUID environmentId, Instant when);
List<AlertSilence> listByEnvironment(UUID environmentId);
void delete(UUID id);
}
```
```java
// AlertNotificationRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertNotificationRepository {
AlertNotification save(AlertNotification n);
Optional<AlertNotification> findById(UUID id);
List<AlertNotification> listForInstance(UUID alertInstanceId);
List<AlertNotification> claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds);
void markDelivered(UUID id, int status, String snippet, Instant when);
void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet);
void markFailed(UUID id, int status, String snippet);
void deleteSettledBefore(Instant cutoff);
}
```
```java
// AlertReadRepository.java
package com.cameleer.server.core.alerting;
import java.util.List;
import java.util.UUID;
public interface AlertReadRepository {
void markRead(String userId, UUID alertInstanceId);
void bulkMarkRead(String userId, List<UUID> alertInstanceIds);
}
```
- [ ] **Step 2: Compile**
Run: `mvn -pl cameleer-server-core compile`
Expected: SUCCESS.
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/Alert*Repository.java
git commit -m "feat(alerting): core repository interfaces"
```
---
## Phase 3 — Postgres repositories
All repositories use `JdbcTemplate` and `ObjectMapper` for JSONB columns (same pattern as `PostgresOutboundConnectionRepository`). Convert UUID[] with `ConnectionCallback` + `Array.of("uuid", ...)` and text[] with `Array.of("text", ...)`.
### Task 7: `PostgresAlertRuleRepository`
**Files:**
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java`
- [ ] **Step 1: Write the failing integration test**
```java
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Test;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class PostgresAlertRuleRepositoryIT extends AbstractPostgresIT {
private PostgresAlertRuleRepository repo;
private UUID envId;
@AfterEach
void cleanup() {
jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId);
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-user'");
}
@org.junit.jupiter.api.BeforeEach
void setup() {
repo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
envId = UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-env-" + UUID.randomUUID());
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES ('test-user', 'test-user', 'x', 'a@b', true)");
}
@Test
void saveAndFindByIdRoundtrip() {
var rule = newRule(List.of());
repo.save(rule);
var found = repo.findById(rule.id()).orElseThrow();
assertThat(found.name()).isEqualTo(rule.name());
assertThat(found.condition()).isInstanceOf(AgentStateCondition.class);
}
@Test
void findRuleIdsByOutboundConnectionId() {
var connId = UUID.randomUUID();
var wb = new WebhookBinding(UUID.randomUUID(), connId, null, Map.of());
var rule = newRule(List.of(wb));
repo.save(rule);
List<UUID> ids = repo.findRuleIdsByOutboundConnectionId(connId);
assertThat(ids).containsExactly(rule.id());
assertThat(repo.findRuleIdsByOutboundConnectionId(UUID.randomUUID())).isEmpty();
}
@Test
void claimDueRulesAtomicSkipLocked() {
var rule = newRule(List.of());
repo.save(rule);
List<AlertRule> claimed = repo.claimDueRules("instance-A", 10, 30);
assertThat(claimed).hasSize(1);
// Second claimant sees nothing until first releases or TTL expires
List<AlertRule> second = repo.claimDueRules("instance-B", 10, 30);
assertThat(second).isEmpty();
}
private AlertRule newRule(List<WebhookBinding> webhooks) {
return new AlertRule(
UUID.randomUUID(), envId, "rule-" + UUID.randomUUID(), "desc",
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null, null, null), "DEAD", 60),
60, 0, 60, "t", "m", webhooks, List.of(),
Instant.now().minusSeconds(10), null, null, Map.of(),
Instant.now(), "test-user", Instant.now(), "test-user");
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement the repository**
```java
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.postgresql.util.PGobject;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.sql.Types;
import java.time.Instant;
import java.util.*;
public class PostgresAlertRuleRepository implements AlertRuleRepository {
private final JdbcTemplate jdbc;
private final ObjectMapper om;
public PostgresAlertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
this.jdbc = jdbc;
this.om = om;
}
@Override
public AlertRule save(AlertRule r) {
String sql = """
INSERT INTO alert_rules (id, environment_id, name, description, severity, enabled,
condition_kind, condition, evaluation_interval_seconds, for_duration_seconds,
re_notify_minutes, notification_title_tmpl, notification_message_tmpl,
webhooks, next_evaluation_at, claimed_by, claimed_until, eval_state,
created_at, created_by, updated_at, updated_by)
VALUES (?, ?, ?, ?, ?::severity_enum, ?, ?::condition_kind_enum, ?::jsonb, ?, ?, ?, ?, ?, ?::jsonb,
?, ?, ?, ?::jsonb, ?, ?, ?, ?)
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name, description = EXCLUDED.description,
severity = EXCLUDED.severity, enabled = EXCLUDED.enabled,
condition_kind = EXCLUDED.condition_kind, condition = EXCLUDED.condition,
evaluation_interval_seconds = EXCLUDED.evaluation_interval_seconds,
for_duration_seconds = EXCLUDED.for_duration_seconds,
re_notify_minutes = EXCLUDED.re_notify_minutes,
notification_title_tmpl = EXCLUDED.notification_title_tmpl,
notification_message_tmpl = EXCLUDED.notification_message_tmpl,
webhooks = EXCLUDED.webhooks, eval_state = EXCLUDED.eval_state,
updated_at = EXCLUDED.updated_at, updated_by = EXCLUDED.updated_by
""";
jdbc.update(sql,
r.id(), r.environmentId(), r.name(), r.description(),
r.severity().name(), r.enabled(), r.conditionKind().name(),
writeJson(r.condition()),
r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
r.notificationTitleTmpl(), r.notificationMessageTmpl(),
writeJson(r.webhooks()),
Timestamp.from(r.nextEvaluationAt()),
r.claimedBy(),
r.claimedUntil() == null ? null : Timestamp.from(r.claimedUntil()),
writeJson(r.evalState()),
Timestamp.from(r.createdAt()), r.createdBy(),
Timestamp.from(r.updatedAt()), r.updatedBy());
return r;
}
@Override
public Optional<AlertRule> findById(UUID id) {
var list = jdbc.query("SELECT * FROM alert_rules WHERE id = ?", rowMapper(), id);
return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
}
@Override
public List<AlertRule> listByEnvironment(UUID environmentId) {
return jdbc.query(
"SELECT * FROM alert_rules WHERE environment_id = ? ORDER BY created_at DESC",
rowMapper(), environmentId);
}
@Override
public List<AlertRule> findAllByOutboundConnectionId(UUID connectionId) {
String sql = """
SELECT * FROM alert_rules
WHERE webhooks @> ?::jsonb
ORDER BY created_at DESC
""";
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
return jdbc.query(sql, rowMapper(), predicate);
}
@Override
public List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId) {
String sql = """
SELECT id FROM alert_rules
WHERE webhooks @> ?::jsonb
""";
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
return jdbc.queryForList(sql, UUID.class, predicate);
}
@Override
public void delete(UUID id) {
jdbc.update("DELETE FROM alert_rules WHERE id = ?", id);
}
@Override
public List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds) {
String sql = """
UPDATE alert_rules
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
WHERE id IN (
SELECT id FROM alert_rules
WHERE enabled = true
AND next_evaluation_at <= now()
AND (claimed_until IS NULL OR claimed_until < now())
ORDER BY next_evaluation_at
LIMIT ?
FOR UPDATE SKIP LOCKED
)
RETURNING *
""";
return jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize);
}
@Override
public void releaseClaim(UUID ruleId, Instant nextEvaluationAt, Map<String, Object> evalState) {
jdbc.update("""
UPDATE alert_rules
SET claimed_by = NULL, claimed_until = NULL,
next_evaluation_at = ?, eval_state = ?::jsonb
WHERE id = ?
""",
Timestamp.from(nextEvaluationAt), writeJson(evalState), ruleId);
}
private RowMapper<AlertRule> rowMapper() {
return (rs, i) -> {
ConditionKind kind = ConditionKind.valueOf(rs.getString("condition_kind"));
AlertCondition cond = om.readValue(rs.getString("condition"), AlertCondition.class);
List<WebhookBinding> webhooks = om.readValue(
rs.getString("webhooks"), new TypeReference<>() {});
Map<String, Object> evalState = om.readValue(
rs.getString("eval_state"), new TypeReference<>() {});
Timestamp cu = rs.getTimestamp("claimed_until");
return new AlertRule(
(UUID) rs.getObject("id"),
(UUID) rs.getObject("environment_id"),
rs.getString("name"),
rs.getString("description"),
AlertSeverity.valueOf(rs.getString("severity")),
rs.getBoolean("enabled"),
kind, cond,
rs.getInt("evaluation_interval_seconds"),
rs.getInt("for_duration_seconds"),
rs.getInt("re_notify_minutes"),
rs.getString("notification_title_tmpl"),
rs.getString("notification_message_tmpl"),
webhooks, List.of(),
rs.getTimestamp("next_evaluation_at").toInstant(),
rs.getString("claimed_by"),
cu == null ? null : cu.toInstant(),
evalState,
rs.getTimestamp("created_at").toInstant(),
rs.getString("created_by"),
rs.getTimestamp("updated_at").toInstant(),
rs.getString("updated_by"));
};
}
private String writeJson(Object o) {
try { return om.writeValueAsString(o); }
catch (Exception e) { throw new IllegalStateException(e); }
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_rules"
```
### Task 8: Wire `OutboundConnectionServiceImpl.rulesReferencing()` (CRITICAL — Plan 01 gate)
> **This is the Plan 01 known-incomplete item.** Plan 01 shipped `rulesReferencing()` returning `[]`. Until this task lands, outbound connections can be deleted or narrowed while rules reference them, corrupting production. **Do not skip or defer.**
**Files:**
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java`
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "OutboundConnectionServiceImpl", direction: "upstream"})`. Report blast radius. Expected: controller + bean config + UI hooks (Plan 01). No production paths should be affected by replacing a stub with real behaviour.
- [ ] **Step 2: Write the failing integration test**
```java
package com.cameleer.server.app.outbound;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.*;
import com.cameleer.server.core.outbound.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class OutboundConnectionServiceRulesReferencingIT extends AbstractPostgresIT {
@Autowired OutboundConnectionService service;
@Autowired OutboundConnectionRepository repo;
private UUID envId;
private UUID connId;
private PostgresAlertRuleRepository ruleRepo;
@BeforeEach
void seed() {
ruleRepo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
envId = UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "env-" + UUID.randomUUID());
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES ('u-ref', 'u-ref', 'x', 'a@b', true) ON CONFLICT DO NOTHING");
var c = repo.save(new OutboundConnection(
UUID.randomUUID(), "default", "conn", null, "https://example.test",
OutboundMethod.POST, Map.of(), null, TrustMode.SYSTEM_DEFAULT, List.of(), null,
OutboundAuth.None.INSTANCE, List.of(),
Instant.now(), "u-ref", Instant.now(), "u-ref"));
connId = c.id();
var rule = new AlertRule(
UUID.randomUUID(), envId, "r", null, AlertSeverity.WARNING, true,
ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
60, 0, 60, "t", "m",
List.of(new WebhookBinding(UUID.randomUUID(), connId, null, Map.of())),
List.of(), Instant.now(), null, null, Map.of(),
Instant.now(), "u-ref", Instant.now(), "u-ref");
ruleRepo.save(rule);
}
@Test
void deleteConnectionReferencedByRuleReturns409() {
assertThat(service.rulesReferencing(connId)).hasSize(1);
assertThatThrownBy(() -> service.delete(connId, "u-ref"))
.hasMessageContaining("referenced by rules");
}
}
```
- [ ] **Step 3: Run — FAIL** (stub returns empty list, so delete succeeds).
- [ ] **Step 4: Replace the stub**
In `OutboundConnectionServiceImpl.java`:
```java
// existing imports + add:
import com.cameleer.server.core.alerting.AlertRuleRepository;
public class OutboundConnectionServiceImpl implements OutboundConnectionService {
private final OutboundConnectionRepository repo;
private final AlertRuleRepository ruleRepo; // NEW
private final String tenantId;
public OutboundConnectionServiceImpl(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
String tenantId) {
this.repo = repo;
this.ruleRepo = ruleRepo;
this.tenantId = tenantId;
}
// … create/update/delete/get/list unchanged …
@Override
public List<UUID> rulesReferencing(UUID id) {
return ruleRepo.findRuleIdsByOutboundConnectionId(id);
}
}
```
Update `OutboundBeanConfig.java` to inject `AlertRuleRepository`:
```java
@Bean
public OutboundConnectionService outboundConnectionService(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
@Value("${cameleer.server.tenant.id:default}") String tenantId) {
return new OutboundConnectionServiceImpl(repo, ruleRepo, tenantId);
}
```
Add the `AlertRuleRepository` bean in a new `AlertingBeanConfig.java` stub (completed in Phase 7):
```java
package com.cameleer.server.app.alerting.config;
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.AlertRuleRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
@Configuration
public class AlertingBeanConfig {
@Bean
public AlertRuleRepository alertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertRuleRepository(jdbc, om);
}
}
```
- [ ] **Step 5: Run — PASS**.
- [ ] **Step 6: GitNexus detect_changes + commit**
```bash
# Verify scope
# gitnexus_detect_changes({scope: "staged"})
git add cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java
git commit -m "fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)"
```
### Task 9: `PostgresAlertInstanceRepository`
**Files:**
- Create: `.../alerting/storage/PostgresAlertInstanceRepository.java`
- Test: `.../alerting/storage/PostgresAlertInstanceRepositoryIT.java`
- [ ] **Step 1: Write the failing test** covering: save/findById, findOpenForRule (filter `state IN ('PENDING','FIRING','ACKNOWLEDGED')`), listForInbox with user/group/role filters (seed 3 instances: one targeting user, one targeting group, one targeting role; assert listForInbox returns all three for a user in those groups/roles), countUnreadForUser (uses LEFT JOIN `alert_reads`), ack, resolve, deleteResolvedBefore.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — same RowMapper pattern as Task 7. Key queries:
```sql
-- findOpenForRule
SELECT * FROM alert_instances
WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED')
ORDER BY fired_at DESC LIMIT 1;
-- listForInbox (bind userId, groupIds array, roleNames array as ? placeholders)
SELECT * FROM alert_instances
WHERE environment_id = ?
AND state IN ('FIRING','ACKNOWLEDGED','RESOLVED')
AND (
? = ANY(target_user_ids)
OR target_group_ids && ?::uuid[]
OR target_role_names && ?::text[]
)
ORDER BY fired_at DESC LIMIT ?;
-- countUnreadForUser
SELECT count(*) FROM alert_instances ai
WHERE ai.environment_id = ?
AND ai.state IN ('FIRING','ACKNOWLEDGED')
AND (
? = ANY(ai.target_user_ids)
OR ai.target_group_ids && ?::uuid[]
OR ai.target_role_names && ?::text[]
)
AND NOT EXISTS (
SELECT 1 FROM alert_reads ar
WHERE ar.alert_instance_id = ai.id AND ar.user_id = ?
);
```
Array binding via `connection.createArrayOf("uuid", uuids)` / `createArrayOf("text", names)` inside a `ConnectionCallback`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_instances with inbox queries"
```
### Task 10: `PostgresAlertSilenceRepository`, `PostgresAlertNotificationRepository`, `PostgresAlertReadRepository`
**Files:**
- Create: three repositories under `.../alerting/storage/`
- Test: one IT per repository in `.../alerting/storage/`
- [ ] **Step 1: Write all three failing ITs** (one file each). Cover:
- `Silence`: save/findById, listActive filters by `now BETWEEN starts_at AND ends_at`, delete.
- `Notification`: save/findById, claimDueNotifications (SKIP LOCKED), scheduleRetry bumps attempts + `next_attempt_at`, markDelivered + markFailed transition status, deleteSettledBefore purges `DELIVERED` + `FAILED`.
- `Read`: markRead is idempotent (uses `ON CONFLICT DO NOTHING`), bulkMarkRead handles empty list.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** following the same JdbcTemplate pattern. Notification claim query mirrors Task 7's rule claim:
```sql
UPDATE alert_notifications
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
WHERE id IN (
SELECT id FROM alert_notifications
WHERE status = 'PENDING'
AND next_attempt_at <= now()
AND (claimed_until IS NULL OR claimed_until < now())
ORDER BY next_attempt_at
LIMIT ?
FOR UPDATE SKIP LOCKED
)
RETURNING *;
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/ \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/Postgres*IT.java
git commit -m "feat(alerting): Postgres repositories for silences, notifications, reads"
```
### Task 11: Wire all alerting repositories in `AlertingBeanConfig`
**Files:**
- Modify: `.../alerting/config/AlertingBeanConfig.java`
- [ ] **Step 1: Add beans for the remaining repositories**
```java
@Bean public AlertInstanceRepository alertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertInstanceRepository(jdbc, om);
}
@Bean public AlertSilenceRepository alertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertSilenceRepository(jdbc, om);
}
@Bean public AlertNotificationRepository alertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertNotificationRepository(jdbc, om);
}
@Bean public AlertReadRepository alertReadRepository(JdbcTemplate jdbc) {
return new PostgresAlertReadRepository(jdbc);
}
```
- [ ] **Step 2: Verify compile + existing ITs still pass**
```bash
mvn -pl cameleer-server-app test -Dtest='PostgresAlert*IT'
```
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java
git commit -m "feat(alerting): wire all alerting repository beans"
```
---
## Phase 4 — ClickHouse reads: new count methods and projections
### Task 12: Add `ClickHouseLogStore.countLogs(LogSearchRequest)`
**Files:**
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "ClickHouseLogStore", direction: "upstream"})`. Expected callers: `LogQueryController`, `ContainerLogForwarder`, `ClickHouseConfig`. Adding a method is non-breaking — no downstream callers affected.
- [ ] **Step 2: Write the failing test**
```java
package com.cameleer.server.app.search;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.search.LogSearchRequest;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import java.time.Instant;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
class ClickHouseLogStoreCountIT extends AbstractPostgresIT {
@Autowired ClickHouseLogStore store;
@Test
void countLogsRespectsLevelPatternAndWindow() {
// Seed 3 ERROR TimeoutException + 2 INFO rows in 'orders' app for env 'dev' within last 5 min
// (seed helper uses existing `indexBatch` path)
long count = store.countLogs(new LogSearchRequest(
/* environment */ "dev",
/* application */ "orders",
/* agentId */ null,
/* exchangeId */ null,
/* logger */ null,
/* sources */ List.of(),
/* levels */ List.of("ERROR"),
/* q */ "TimeoutException",
/* from */ Instant.now().minusSeconds(300),
/* to */ Instant.now(),
/* cursor */ null,
/* limit */ 100,
/* sort */ "desc"
));
assertThat(count).isEqualTo(3);
}
}
```
(Adjust `LogSearchRequest` constructor to the actual record signature — check `cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java` for exact order.)
- [ ] **Step 3: Run — FAIL**.
- [ ] **Step 4: Implement the method**
In `ClickHouseLogStore.java`, add a new public method. Reuse the WHERE-clause builder already used by `search(LogSearchRequest)`, but:
- No `FINAL`.
- Skip cursor, limit, sort.
- `SELECT count() FROM logs WHERE <tenant + env + app + level IN (...) + logger + q LIKE + timestamp BETWEEN>`.
- Include the `tenant_id = ?` predicate.
```java
public long countLogs(LogSearchRequest request) {
StringBuilder where = new StringBuilder("tenant_id = ? AND timestamp BETWEEN ? AND ?");
List<Object> args = new ArrayList<>();
args.add(tenantId);
args.add(Timestamp.from(request.from()));
args.add(Timestamp.from(request.to()));
if (request.environment() != null) { where.append(" AND environment = ?"); args.add(request.environment()); }
if (request.application() != null) { where.append(" AND application = ?"); args.add(request.application()); }
// … level multi, logger, q (positionCaseInsensitive(message, ?) > 0), exchangeId, agentId …
String sql = "SELECT count() FROM logs WHERE " + where; // NO FINAL
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
return n == null ? 0L : n;
}
```
(Imports: `java.sql.Timestamp`, `java.util.ArrayList`.)
- [ ] **Step 5: Run — PASS**.
- [ ] **Step 6: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java
git commit -m "feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator"
```
### Task 13: Add `ClickHouseSearchIndex.countExecutionsForAlerting(AlertMatchSpec)`
**Files:**
- Create: `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java`
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java`
- [ ] **Step 1: GitNexus impact check**
Run `gitnexus_impact({target: "ClickHouseSearchIndex", direction: "upstream"})`. Additive method — no downstream breakage.
- [ ] **Step 2: Create `AlertMatchSpec` record**
```java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.Map;
/** Specification for alerting-specific execution counting.
* Distinct from SearchRequest: no text-in-body subqueries, no cursor, no FINAL.
* All fields except tenant/env/from/to are nullable filters. */
public record AlertMatchSpec(
String tenantId,
String environment,
String applicationId, // nullable
String routeId, // nullable
String status, // "FAILED" / "SUCCESS" / null
Map<String, String> attributes, // exact match on execution attribute key=value
Instant from,
Instant to,
Instant after // nullable; used by PER_EXCHANGE to advance cursor
) {
public AlertMatchSpec {
attributes = attributes == null ? Map.of() : Map.copyOf(attributes);
}
}
```
- [ ] **Step 3: Write the failing test** — seed a mix of FAILED/SUCCESS executions with various attribute maps, assert count matches.
- [ ] **Step 4: Run — FAIL**.
- [ ] **Step 5: Implement on `ClickHouseSearchIndex`**
```java
public long countExecutionsForAlerting(AlertMatchSpec spec) {
StringBuilder where = new StringBuilder(
"tenant_id = ? AND environment = ? AND start_time BETWEEN ? AND ?");
List<Object> args = new ArrayList<>();
args.add(spec.tenantId());
args.add(spec.environment());
args.add(Timestamp.from(spec.from()));
args.add(Timestamp.from(spec.to()));
if (spec.applicationId() != null) { where.append(" AND application_id = ?"); args.add(spec.applicationId()); }
if (spec.routeId() != null) { where.append(" AND route_id = ?"); args.add(spec.routeId()); }
if (spec.status() != null) { where.append(" AND status = ?"); args.add(spec.status()); }
if (spec.after() != null) {
where.append(" AND start_time > ?");
args.add(Timestamp.from(spec.after()));
}
// attribute filters: use Map column access — pattern matches existing search() impl
for (var e : spec.attributes().entrySet()) {
where.append(" AND attributes[?] = ?");
args.add(e.getKey());
args.add(e.getValue());
}
String sql = "SELECT count() FROM executions WHERE " + where; // NO FINAL
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
return n == null ? 0L : n;
}
```
- [ ] **Step 6: Run — PASS**.
- [ ] **Step 7: Commit**
```bash
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java
git commit -m "feat(alerting): countExecutionsForAlerting for exchange-match evaluator"
```
### Task 14: ClickHouse projections migration
**Files:**
- Create: `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql`
- Modify: the schema initializer invocation site (likely `ClickHouseConfig` or `ClickHouseSchemaInitializer`) to also run this file on startup.
- [ ] **Step 1: Write the SQL file**
```sql
-- Additive, idempotent. Safe to drop + rebuild with no data loss.
ALTER TABLE executions
ADD PROJECTION IF NOT EXISTS alerting_app_status
(SELECT * ORDER BY (tenant_id, environment, application_id, status, start_time));
ALTER TABLE executions
ADD PROJECTION IF NOT EXISTS alerting_route_status
(SELECT * ORDER BY (tenant_id, environment, route_id, status, start_time));
ALTER TABLE logs
ADD PROJECTION IF NOT EXISTS alerting_app_level
(SELECT * ORDER BY (tenant_id, environment, application, level, timestamp));
ALTER TABLE agent_metrics
ADD PROJECTION IF NOT EXISTS alerting_instance_metric
(SELECT * ORDER BY (tenant_id, environment, instance_id, metric_name, collected_at));
ALTER TABLE executions MATERIALIZE PROJECTION alerting_app_status;
ALTER TABLE executions MATERIALIZE PROJECTION alerting_route_status;
ALTER TABLE logs MATERIALIZE PROJECTION alerting_app_level;
ALTER TABLE agent_metrics MATERIALIZE PROJECTION alerting_instance_metric;
```
(Adjust table column names to match real `init.sql` — confirm `application` vs `application_id` on the `logs` and `agent_metrics` tables.)
- [ ] **Step 2: Hook into `ClickHouseSchemaInitializer`**
Find the initializer and add a second invocation:
```java
runIdempotent("clickhouse/init.sql");
runIdempotent("clickhouse/alerting_projections.sql");
```
- [ ] **Step 3: Add a smoke IT**
```java
@Test
void projectionsExistAfterStartup() {
var names = jdbcTemplate.queryForList(
"SELECT name FROM system.projections WHERE table IN ('executions','logs','agent_metrics')",
String.class);
assertThat(names).contains(
"alerting_app_status","alerting_route_status","alerting_app_level","alerting_instance_metric");
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql \
cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/AlertingProjectionsIT.java
git commit -m "feat(alerting): ClickHouse projections for alerting read paths"
```
---
## Phase 5 — Mustache templating and silence matching
### Task 15: Add JMustache dependency
**Files:**
- Modify: `cameleer-server-core/pom.xml`
- [ ] **Step 1: Add dependency**
```xml
<dependency>
<groupId>com.samskivert</groupId>
<artifactId>jmustache</artifactId>
<version>1.16</version>
</dependency>
```
- [ ] **Step 2: Verify resolve**
Run: `mvn -pl cameleer-server-core dependency:resolve`
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-core/pom.xml
git commit -m "chore(alerting): add jmustache 1.16"
```
### Task 16: `MustacheRenderer`
**Files:**
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java`
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java`
- [ ] **Step 1: Write the failing test**
```java
package com.cameleer.server.app.alerting.notify;
import org.junit.jupiter.api.Test;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class MustacheRendererTest {
private final MustacheRenderer r = new MustacheRenderer();
@Test
void rendersSimpleTemplate() {
String out = r.render("Hello {{name}}", Map.of("name", "world"));
assertThat(out).isEqualTo("Hello world");
}
@Test
void rendersNestedPath() {
String out = r.render("{{alert.severity}}", Map.of("alert", Map.of("severity","CRITICAL")));
assertThat(out).isEqualTo("CRITICAL");
}
@Test
void missingVariableRendersLiteral() {
String out = r.render("{{missing.path}}", Map.of());
assertThat(out).isEqualTo("{{missing.path}}");
}
@Test
void malformedTemplateReturnsRawWithWarn() {
String out = r.render("{{unclosed", Map.of("unclosed","x"));
assertThat(out).isEqualTo("{{unclosed");
}
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
package com.cameleer.server.app.alerting.notify;
import com.samskivert.mustache.Mustache;
import com.samskivert.mustache.Template;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.util.Map;
@Component
public class MustacheRenderer {
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
private final Mustache.Compiler compiler = Mustache.compiler()
.nullValue("")
.emptyStringIsFalse(true)
.defaultValue(null) // null triggers MissingContext -> we intercept below
.escapeHTML(false);
public String render(String template, Map<String, Object> context) {
if (template == null) return "";
try {
Template t = compiler.compile(template);
return t.execute(new LiteralFallbackContext(context));
} catch (Exception e) {
log.warn("Mustache render failed for template='{}': {}", abbreviate(template), e.getMessage());
return template;
}
}
/** Returns `{{path}}` literal when a variable is missing. */
private static class LiteralFallbackContext {
private final Map<String, Object> map;
LiteralFallbackContext(Map<String, Object> map) { this.map = map; }
// JMustache uses reflection / Map lookup, so we rely on wrapping the missing-value callback:
// easiest approach: compile with a custom `Mustache.Compiler.Loader` and intercept resolution.
// Simpler: post-process the output to detect unresolved `{{}}` sections → not possible after render.
// Alternative: pre-flight — scan template tokens against context and replace unresolved tokens
// with the literal before compilation. Use this simple approach:
}
}
```
Simpler implementation (ships for v1):
```java
@Component
public class MustacheRenderer {
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
private static final java.util.regex.Pattern TOKEN =
java.util.regex.Pattern.compile("\\{\\{\\s*([a-zA-Z0-9_.]+)\\s*}}");
private final Mustache.Compiler compiler = Mustache.compiler()
.defaultValue("")
.escapeHTML(false);
public String render(String template, Map<String, Object> context) {
if (template == null) return "";
String resolved = preResolve(template, context);
try {
return compiler.compile(resolved).execute(context);
} catch (Exception e) {
log.warn("Mustache render failed: {}", e.getMessage());
return template;
}
}
/** Replaces `{{missing.path}}` with the literal so Mustache sees a non-tag string. */
private String preResolve(String template, Map<String, Object> context) {
var m = TOKEN.matcher(template);
var sb = new StringBuilder();
while (m.find()) {
String path = m.group(1);
if (resolvePath(context, path) == null) {
m.appendReplacement(sb, java.util.regex.Matcher.quoteReplacement("{{" + path + "}}"));
// Replace the {{}} with {{{ literal }}} once we escape it — but jmustache will not re-process.
// Simpler: just wrap in a triple-brace or surround with a marker. For v1 we skip the double-expand:
// we return the LITERAL inside a section {{#_literal_123}}... so preResolve returns a string
// that Mustache will not modify. Concrete approach:
}
}
m.appendTail(sb);
return sb.toString();
}
private Object resolvePath(Map<String, Object> ctx, String path) {
Object cur = ctx;
for (String seg : path.split("\\.")) {
if (!(cur instanceof Map<?,?> m)) return null;
cur = m.get(seg);
if (cur == null) return null;
}
return cur;
}
}
```
**Engineer note:** Prefer a pre-compile token substitution that replaces `{{missing.path}}` with a literal that Mustache renders as-is. One working approach: write a custom `Mustache.VariableFetcher` via `compiler.withFormatter(...)` — but JMustache's `Mustache.Compiler#withCollector()` is easier. Confirm during implementation and adjust this task; the tests in Step 1 lock the contract. If JMustache's API makes missing-variable fallback awkward, fall back to a regex-based substitutor that does `{{``⟦MUSTACHE_LITERAL:path⟧` for missing paths, then post-replace after render. The contract is: **unresolved `{{x}}` renders as literal `{{x}}`**.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java
git commit -m "feat(alerting): MustacheRenderer with literal fallback on missing vars"
```
### Task 17: `NotificationContextBuilder`
**Files:**
- Create: `.../alerting/notify/NotificationContextBuilder.java`
- Test: `.../alerting/notify/NotificationContextBuilderTest.java`
- [ ] **Step 1: Write the failing test** covering:
- env / rule / alert subtrees always present
- conditional trees: `exchange.*` present only for EXCHANGE_MATCH, `log.*` only for LOG_PATTERN, etc.
- `alert.link` uses the configured `cameleer.server.ui-origin` prefix if present, else `/alerts/inbox/{id}`.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — pure static `Map<String,Object> build(AlertRule, AlertInstance, Environment, String uiOrigin)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java
git commit -m "feat(alerting): NotificationContextBuilder for template context maps"
```
### Task 18: `SilenceMatcher` evaluator
**Files:**
- Create: `.../alerting/notify/SilenceMatcherService.java` (named to avoid clash with core record `SilenceMatcher`)
- Test: `.../alerting/notify/SilenceMatcherServiceTest.java`
- [ ] **Step 1: Write the failing test** covering truth table:
- Wildcard matcher → matches any instance.
- Matcher with `ruleId` only → matches only instances with that rule.
- Multiple fields → AND logic.
- Active-window check at notification time (not at eval time).
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class SilenceMatcherService {
public boolean matches(SilenceMatcher m, AlertInstance instance, AlertRule rule) {
if (m.ruleId() != null && !m.ruleId().equals(instance.ruleId())) return false;
if (m.severity()!= null && m.severity() != instance.severity()) return false;
if (m.appSlug() != null && !m.appSlug().equals(rule.condition().scope().appSlug())) return false;
if (m.routeId() != null && !m.routeId().equals(rule.condition().scope().routeId())) return false;
if (m.agentId() != null && !m.agentId().equals(rule.condition().scope().agentId())) return false;
return true;
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/SilenceMatcherServiceTest.java
git commit -m "feat(alerting): silence matcher for notification-time dispatch"
```
---
## Phase 6 — Condition evaluators
All six evaluators share this shape:
```java
public sealed interface ConditionEvaluator<C extends AlertCondition>
permits RouteMetricEvaluator, ExchangeMatchEvaluator, AgentStateEvaluator,
DeploymentStateEvaluator, LogPatternEvaluator, JvmMetricEvaluator {
ConditionKind kind();
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}
```
Supporting types (create these in Task 19 before implementing individual evaluators).
### Task 19: `EvalContext`, `EvalResult`, `TickCache`, `PerKindCircuitBreaker`, `ConditionEvaluator` interface
**Files:**
- Create: `.../alerting/eval/EvalContext.java`, `EvalResult.java`, `TickCache.java`, `PerKindCircuitBreaker.java`, `ConditionEvaluator.java`
- Test: `.../alerting/eval/TickCacheTest.java`, `PerKindCircuitBreakerTest.java`
- [ ] **Step 1: Write the failing tests**
```java
// TickCacheTest.java
@Test
void getOrComputeCachesWithinTick() {
var cache = new TickCache();
int n = cache.getOrCompute("k", () -> 42);
int m = cache.getOrCompute("k", () -> 43);
assertThat(n).isEqualTo(42);
assertThat(m).isEqualTo(42); // cached
}
// PerKindCircuitBreakerTest.java
@Test
void opensAfterFailThreshold() {
var cb = new PerKindCircuitBreaker(5, 30, 60, java.time.Clock.fixed(...));
for (int i = 0; i < 5; i++) cb.recordFailure(ConditionKind.AGENT_STATE);
assertThat(cb.isOpen(ConditionKind.AGENT_STATE)).isTrue();
}
@Test
void closesAfterCooldown() { /* advance clock beyond cooldown window */ }
```
- [ ] **Step 2: Implement**
```java
// EvalContext.java
package com.cameleer.server.app.alerting.eval;
import java.time.Instant;
public record EvalContext(String tenantId, Instant now, TickCache tickCache) {}
```
```java
// EvalResult.java
package com.cameleer.server.app.alerting.eval;
import java.util.Map;
public sealed interface EvalResult {
record Firing(Double currentValue, Double threshold, Map<String, Object> context) implements EvalResult {
public Firing { context = context == null ? Map.of() : Map.copyOf(context); }
}
record Clear() implements EvalResult {
public static final Clear INSTANCE = new Clear();
}
record Error(Throwable cause) implements EvalResult {}
}
```
```java
// TickCache.java
package com.cameleer.server.app.alerting.eval;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Supplier;
public class TickCache {
private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<>();
@SuppressWarnings("unchecked")
public <T> T getOrCompute(String key, Supplier<T> supplier) {
return (T) map.computeIfAbsent(key, k -> supplier.get());
}
}
```
```java
// PerKindCircuitBreaker.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.ConditionKind;
import java.time.Clock;
import java.time.Duration;
import java.time.Instant;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
public class PerKindCircuitBreaker {
private record State(Deque<Instant> failures, Instant openUntil) {}
private final int threshold;
private final Duration window;
private final Duration cooldown;
private final Clock clock;
private final ConcurrentHashMap<ConditionKind, State> byKind = new ConcurrentHashMap<>();
public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds, Clock clock) {
this.threshold = threshold;
this.window = Duration.ofSeconds(windowSeconds);
this.cooldown = Duration.ofSeconds(cooldownSeconds);
this.clock = clock;
}
public void recordFailure(ConditionKind kind) {
byKind.compute(kind, (k, s) -> {
var deque = (s == null) ? new ArrayDeque<Instant>() : new ArrayDeque<>(s.failures());
Instant now = Instant.now(clock);
Instant cutoff = now.minus(window);
while (!deque.isEmpty() && deque.peekFirst().isBefore(cutoff)) deque.pollFirst();
deque.addLast(now);
Instant openUntil = (deque.size() >= threshold) ? now.plus(cooldown) : null;
return new State(deque, openUntil);
});
}
public boolean isOpen(ConditionKind kind) {
State s = byKind.get(kind);
return s != null && s.openUntil() != null && Instant.now(clock).isBefore(s.openUntil());
}
public void recordSuccess(ConditionKind kind) {
byKind.compute(kind, (k, s) -> new State(new ArrayDeque<>(), null));
}
}
```
```java
// ConditionEvaluator.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.*;
public interface ConditionEvaluator<C extends AlertCondition> {
ConditionKind kind();
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}
```
(`sealed permits …` is omitted on the interface to avoid a multi-file compile-order gotcha during the TDD sequence. The effective constraint is enforced by the dispatcher's `switch` over `ConditionKind`.)
- [ ] **Step 3: Run — PASS**.
- [ ] **Step 4: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/
git commit -m "feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)"
```
### Task 20: `AgentStateEvaluator`
**Files:**
- Create: `.../alerting/eval/AgentStateEvaluator.java`
- Test: `.../alerting/eval/AgentStateEvaluatorTest.java`
- [ ] **Step 1: Write the failing test**
```java
@Test
void firesWhenAnyAgentInTargetStateForScope() {
var registry = mock(AgentRegistryService.class);
when(registry.findAll()).thenReturn(List.of(
new AgentInfo("a1","a1","orders", "env-uuid","1.0", List.of(), Map.of(),
AgentState.DEAD, Instant.now().minusSeconds(120), Instant.now().minusSeconds(120), null)
));
var eval = new AgentStateEvaluator(registry);
var rule = ruleWith(new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60));
EvalResult r = eval.evaluate((AgentStateCondition) rule.condition(), rule,
new EvalContext("default", Instant.now(), new TickCache()));
assertThat(r).isInstanceOf(EvalResult.Firing.class);
}
@Test
void clearWhenNoMatchingAgents() { /* ... */ }
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class AgentStateEvaluator implements ConditionEvaluator<AgentStateCondition> {
private final AgentRegistryService registry;
public AgentStateEvaluator(AgentRegistryService registry) { this.registry = registry; }
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
@Override
public EvalResult evaluate(AgentStateCondition c, AlertRule rule, EvalContext ctx) {
AgentState target = AgentState.valueOf(c.state());
Instant cutoff = ctx.now().minusSeconds(c.forSeconds());
List<AgentInfo> hits = registry.findAll().stream()
.filter(a -> matchesScope(a, c.scope()))
.filter(a -> a.state() == target)
.filter(a -> a.lastHeartbeat() != null && a.lastHeartbeat().isBefore(cutoff))
.toList();
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
AgentInfo first = hits.get(0);
return new EvalResult.Firing(
(double) hits.size(), null,
Map.of("agent", Map.of(
"id", first.instanceId(),
"name", first.displayName(),
"state", first.state().name()
), "app", Map.of("slug", first.applicationId())));
}
private static boolean matchesScope(AgentInfo a, AlertScope s) {
if (s.appSlug() != null && !s.appSlug().equals(a.applicationId())) return false;
if (s.agentId() != null && !s.agentId().equals(a.instanceId())) return false;
return true;
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluatorTest.java
git commit -m "feat(alerting): AGENT_STATE evaluator"
```
### Task 21: `DeploymentStateEvaluator`
**Files:**
- Create: `.../alerting/eval/DeploymentStateEvaluator.java`
- Test: `.../alerting/eval/DeploymentStateEvaluatorTest.java`
- [ ] **Step 1: Write the failing test**`FAILED` deployment for matching app → Firing; `RUNNING` → Clear.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — read via `DeploymentRepository.findByAppId` and `AppService.getByEnvironmentAndSlug`:
```java
@Override
public EvalResult evaluate(DeploymentStateCondition c, AlertRule rule, EvalContext ctx) {
App app = appService.getByEnvironmentAndSlug(rule.environmentId(), c.scope().appSlug()).orElse(null);
if (app == null) return EvalResult.Clear.INSTANCE;
List<Deployment> current = deploymentRepo.findByAppId(app.id());
Set<String> wanted = Set.copyOf(c.states());
var hits = current.stream()
.filter(d -> wanted.contains(d.status().name()))
.toList();
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
Deployment d = hits.get(0);
return new EvalResult.Firing((double) hits.size(), null,
Map.of("deployment", Map.of("id", d.id().toString(), "status", d.status().name()),
"app", Map.of("slug", app.slug())));
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): DEPLOYMENT_STATE evaluator"
```
### Task 22: `RouteMetricEvaluator`
**Files:**
- Create: `.../alerting/eval/RouteMetricEvaluator.java`
- Test: `.../alerting/eval/RouteMetricEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — mock `StatsStore`, seed `ExecutionStats{p99Ms = 2500, ...}` for a scoped call, assert Firing with `currentValue = 2500, threshold = 2000`.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — dispatch on `RouteMetric` enum:
```java
@Override
public EvalResult evaluate(RouteMetricCondition c, AlertRule rule, EvalContext ctx) {
Instant from = ctx.now().minusSeconds(c.windowSeconds());
Instant to = ctx.now();
String env = environmentService.findById(rule.environmentId()).map(Environment::slug).orElse(null);
ExecutionStats stats = (c.scope().routeId() != null)
? statsStore.statsForRoute(from, to, c.scope().routeId(), c.scope().appSlug(), env)
: (c.scope().appSlug() != null)
? statsStore.statsForApp(from, to, c.scope().appSlug(), env)
: statsStore.stats(from, to, env);
double actual = switch (c.metric()) {
case ERROR_RATE -> errorRate(stats);
case P95_LATENCY_MS -> stats.p95DurationMs();
case P99_LATENCY_MS -> stats.p99DurationMs();
case THROUGHPUT -> stats.totalCount();
case ERROR_COUNT -> stats.failedCount();
};
boolean fire = switch (c.comparator()) {
case GT -> actual > c.threshold();
case GTE -> actual >= c.threshold();
case LT -> actual < c.threshold();
case LTE -> actual <= c.threshold();
case EQ -> actual == c.threshold();
};
if (!fire) return EvalResult.Clear.INSTANCE;
return new EvalResult.Firing(actual, c.threshold(),
Map.of("route", Map.of("id", c.scope().routeId() == null ? "" : c.scope().routeId()),
"app", Map.of("slug", c.scope().appSlug() == null ? "" : c.scope().appSlug())));
}
private double errorRate(ExecutionStats s) {
long total = s.totalCount();
return total == 0 ? 0.0 : (double) s.failedCount() / total;
}
```
(Adjust method names on `ExecutionStats` to match the actual record — use `gitnexus_context({name: "ExecutionStats"})` if unsure.)
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): ROUTE_METRIC evaluator"
```
### Task 23: `LogPatternEvaluator`
**Files:**
- Create: `.../alerting/eval/LogPatternEvaluator.java`
- Test: `.../alerting/eval/LogPatternEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — mock `ClickHouseLogStore.countLogs` returning 7; threshold 5 → Firing; returning 3 → Clear.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — build a `LogSearchRequest` from the condition + window, delegate to `countLogs`. Use `TickCache` keyed on `(env, app, level, pattern, windowStart, windowEnd)` to coalesce.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): LOG_PATTERN evaluator"
```
### Task 24: `JvmMetricEvaluator`
**Files:**
- Create: `.../alerting/eval/JvmMetricEvaluator.java`
- Test: `.../alerting/eval/JvmMetricEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — mock `MetricsQueryStore.queryTimeSeries` for `("agent-1", ["heap_used_percent"], from, to, 1)` returning `{heap_used_percent: [Bucket{max=95.0}]}`; assert Firing with currentValue=95.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — aggregate across buckets per `AggregationOp` (MAX/MIN/AVG/LATEST), compare against threshold.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): JVM_METRIC evaluator"
```
### Task 25: `ExchangeMatchEvaluator` (PER_EXCHANGE + COUNT_IN_WINDOW)
**Files:**
- Create: `.../alerting/eval/ExchangeMatchEvaluator.java`
- Test: `.../alerting/eval/ExchangeMatchEvaluatorTest.java`
- [ ] **Step 1: Write the failing test** — two variants:
- `COUNT_IN_WINDOW`: mock `ClickHouseSearchIndex.countExecutionsForAlerting` → threshold check.
- `PER_EXCHANGE`: `eval_state.lastExchangeTs` cursor advancement. Seed 3 matching exchanges; first eval returns all 3 as separate Firings (emit a list? or change signature?). For v1 simplicity, the evaluator returns `EvalResult.Firing` with an internal list of exchange descriptors in the context map; the job handles one-alert-per-exchange fan-out.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement.** The key design decision is how PER_EXCHANGE returns multiple alerts. Simplest approach: extend `EvalResult` with a `Batch` variant:
```java
record Batch(List<Firing> firings) implements EvalResult { ... }
```
Add this to `EvalResult.java` (Task 19). The job (Task 27) detects Batch and creates one `AlertInstance` per Firing. This keeps non-batched evaluators simple.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): EXCHANGE_MATCH evaluator with per-exchange + count modes"
```
---
## Phase 7 — Evaluator job and state transitions
### Task 26: `AlertingProperties` + `AlertStateTransitions`
**Files:**
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java`
- Create: `.../alerting/eval/AlertStateTransitions.java`
- Test: `.../alerting/eval/AlertStateTransitionsTest.java`
- [ ] **Step 1: Write the failing test** for the pure state machine:
```java
@Test
void clearWithNoOpenInstanceIsNoOp() {
var next = AlertStateTransitions.apply(null, EvalResult.Clear.INSTANCE, rule, now);
assertThat(next).isEmpty();
}
@Test
void firingWithNoOpenInstanceCreatesPendingIfForDuration() {
var rule = ruleBuilder().forDurationSeconds(60).build();
var result = new EvalResult.Firing(2500.0, 2000.0, Map.of());
var next = AlertStateTransitions.apply(null, result, rule, now);
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.PENDING));
}
@Test
void firingWithNoForDurationGoesStraightToFiring() {
var rule = ruleBuilder().forDurationSeconds(0).build();
var next = AlertStateTransitions.apply(null, new EvalResult.Firing(1.0, null, Map.of()), rule, now);
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.FIRING));
}
@Test
void pendingPromotesToFiringAfterForDuration() { /* ... */ }
@Test
void firingClearTransitionsToResolved() { /* ... */ }
@Test
void ackedInstanceClearsToResolved() { /* preserves acked_by, sets resolved_at */ }
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
// AlertStateTransitions.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.*;
import java.time.Instant;
import java.util.*;
public final class AlertStateTransitions {
private AlertStateTransitions() {}
/** Returns the new/updated AlertInstance, or empty when nothing changes. */
public static Optional<AlertInstance> apply(
AlertInstance current, EvalResult result, AlertRule rule, Instant now) {
return switch (result) {
case EvalResult.Clear c -> onClear(current, now);
case EvalResult.Firing f -> onFiring(current, f, rule, now);
case EvalResult.Error e -> Optional.empty();
case EvalResult.Batch b -> Optional.empty(); // batch handled by the job, not here
};
}
private static Optional<AlertInstance> onFiring(AlertInstance current, EvalResult.Firing f,
AlertRule rule, Instant now) {
if (current == null) {
AlertState initial = rule.forDurationSeconds() > 0 ? AlertState.PENDING : AlertState.FIRING;
return Optional.of(newInstance(rule, f, initial, now));
}
if (current.state() == AlertState.PENDING) {
Instant firedAt = current.firedAt();
if (firedAt.plusSeconds(rule.forDurationSeconds()).isBefore(now)) {
return Optional.of(current /* copy with state=FIRING, firedAt=now */);
}
return Optional.of(current); // stay PENDING, no mutation
}
return Optional.empty(); // already FIRING/ACK — re-notification handled by dispatcher
}
private static Optional<AlertInstance> onClear(AlertInstance current, Instant now) {
if (current == null) return Optional.empty();
if (current.state() == AlertState.RESOLVED) return Optional.empty();
return Optional.of(current /* copy with state=RESOLVED, resolvedAt=now */);
}
private static AlertInstance newInstance(AlertRule rule, EvalResult.Firing f, AlertState state, Instant now) {
// ... construct from rule snapshot + context; title/message rendered by the job
throw new UnsupportedOperationException("stub");
}
}
```
Flesh out the `.withState(...)` / `.withResolvedAt(...)` helpers on `AlertInstance` (add wither-style methods returning new records) as part of this task.
```java
// AlertingProperties.java
package com.cameleer.server.app.alerting.config;
import org.springframework.boot.context.properties.ConfigurationProperties;
@ConfigurationProperties("cameleer.server.alerting")
public record AlertingProperties(
Integer evaluatorTickIntervalMs,
Integer evaluatorBatchSize,
Integer claimTtlSeconds,
Integer notificationTickIntervalMs,
Integer notificationBatchSize,
Boolean inTickCacheEnabled,
Integer circuitBreakerFailThreshold,
Integer circuitBreakerWindowSeconds,
Integer circuitBreakerCooldownSeconds,
Integer eventRetentionDays,
Integer notificationRetentionDays,
Integer webhookTimeoutMs,
Integer webhookMaxAttempts) {
public int effectiveEvaluatorTickIntervalMs() {
int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs;
return Math.max(5000, raw); // floor
}
public int effectiveEvaluatorBatchSize() { return evaluatorBatchSize == null ? 20 : evaluatorBatchSize; }
public int effectiveClaimTtlSeconds() { return claimTtlSeconds == null ? 30 : claimTtlSeconds; }
public int effectiveNotificationTickIntervalMs(){ return notificationTickIntervalMs == null ? 5000 : notificationTickIntervalMs; }
public int effectiveNotificationBatchSize() { return notificationBatchSize == null ? 50 : notificationBatchSize; }
public int effectiveEventRetentionDays() { return eventRetentionDays == null ? 90 : eventRetentionDays; }
public int effectiveNotificationRetentionDays() { return notificationRetentionDays == null ? 30 : notificationRetentionDays; }
public int effectiveWebhookTimeoutMs() { return webhookTimeoutMs == null ? 5000 : webhookTimeoutMs; }
public int effectiveWebhookMaxAttempts() { return webhookMaxAttempts == null ? 3 : webhookMaxAttempts; }
public int cbFailThreshold() { return circuitBreakerFailThreshold == null ? 5 : circuitBreakerFailThreshold; }
public int cbWindowSeconds() { return circuitBreakerWindowSeconds == null ? 30 : circuitBreakerWindowSeconds; }
public int cbCooldownSeconds(){ return circuitBreakerCooldownSeconds== null ? 60 : circuitBreakerCooldownSeconds; }
}
```
Register via `@ConfigurationPropertiesScan` or explicit `@EnableConfigurationProperties(AlertingProperties.class)` in `AlertingBeanConfig`. Also clamp-with-WARN if `evaluatorTickIntervalMs < 5000` at startup.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java
git commit -m "feat(alerting): AlertingProperties + AlertStateTransitions state machine"
```
### Task 27: `AlertEvaluatorJob`
**Files:**
- Create: `.../alerting/eval/AlertEvaluatorJob.java`
- Test: `.../alerting/eval/AlertEvaluatorJobIT.java`
- [ ] **Step 1: Write the failing integration test** (uses real PG + mocked evaluators):
```java
@Test
void claimDueRuleFireResolveCycle() throws Exception {
// seed one rule scoped to a non-existent agent state -> evaluator returns Clear -> no instance.
// flip the mock to return Firing -> one AlertInstance in FIRING state.
// flip back to Clear -> instance transitions to RESOLVED.
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class AlertEvaluatorJob implements SchedulingConfigurer {
private static final Logger log = LoggerFactory.getLogger(AlertEvaluatorJob.class);
private final AlertingProperties props;
private final AlertRuleRepository ruleRepo;
private final AlertInstanceRepository instanceRepo;
private final AlertNotificationRepository notificationRepo;
private final Map<ConditionKind, ConditionEvaluator<?>> evaluators;
private final PerKindCircuitBreaker circuitBreaker;
private final MustacheRenderer renderer;
private final NotificationContextBuilder contextBuilder;
private final String instanceId;
private final String tenantId;
private final AlertingMetrics metrics;
private final Clock clock;
public AlertEvaluatorJob(/* ...all above... */) { /* assign */ }
@Override
public void configureTasks(ScheduledTaskRegistrar registrar) {
registrar.addFixedDelayTask(this::tick, props.effectiveEvaluatorTickIntervalMs());
}
void tick() {
List<AlertRule> claimed = ruleRepo.claimDueRules(
instanceId, props.effectiveEvaluatorBatchSize(), props.effectiveClaimTtlSeconds());
TickCache cache = new TickCache();
EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), cache);
for (AlertRule rule : claimed) {
if (circuitBreaker.isOpen(rule.conditionKind())) {
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
continue;
}
try {
EvalResult result = evaluateSafely(rule, ctx);
applyResult(rule, result);
circuitBreaker.recordSuccess(rule.conditionKind());
} catch (Exception e) {
circuitBreaker.recordFailure(rule.conditionKind());
metrics.evalError(rule.conditionKind(), rule.id());
log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString());
} finally {
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
}
}
}
@SuppressWarnings({"rawtypes","unchecked"})
private EvalResult evaluateSafely(AlertRule rule, EvalContext ctx) {
ConditionEvaluator evaluator = evaluators.get(rule.conditionKind());
if (evaluator == null) throw new IllegalStateException("No evaluator for " + rule.conditionKind());
return evaluator.evaluate(rule.condition(), rule, ctx);
}
private void applyResult(AlertRule rule, EvalResult result) {
if (result instanceof EvalResult.Batch b) {
for (EvalResult.Firing f : b.firings()) applyFiring(rule, f);
return;
}
AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null);
AlertStateTransitions.apply(current, result, rule, Instant.now(clock)).ifPresent(next -> {
AlertInstance persisted = instanceRepo.save(
enrichTitleMessage(rule, next, result));
if (next.state() == AlertState.FIRING && current == null) {
enqueueNotifications(rule, persisted);
}
});
}
private void applyFiring(AlertRule rule, EvalResult.Firing f) { /* always create new instance for PER_EXCHANGE mode */ }
private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance, EvalResult result) {
Map<String,Object> ctx = contextBuilder.build(rule, instance, /* env lookup */ null, /* uiOrigin */ null);
String title = renderer.render(rule.notificationTitleTmpl(), ctx);
String message = renderer.render(rule.notificationMessageTmpl(), ctx);
return instance /* .withTitle(title).withMessage(message) */;
}
private void enqueueNotifications(AlertRule rule, AlertInstance instance) {
for (WebhookBinding w : rule.webhooks()) {
Map<String,Object> payload = /* context-builder + body override */ Map.of();
notificationRepo.save(new AlertNotification(
UUID.randomUUID(), instance.id(), w.id(), w.outboundConnectionId(),
NotificationStatus.PENDING, 0, Instant.now(clock),
null, null, null, null, payload, null, Instant.now(clock)));
}
}
private void reschedule(AlertRule rule, Instant next) {
ruleRepo.releaseClaim(rule.id(), next, rule.evalState());
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java
git commit -m "feat(alerting): AlertEvaluatorJob with claim-polling + circuit breaker"
```
---
## Phase 8 — Notification dispatch
### Task 28: `HmacSigner`
**Files:**
- Create: `.../alerting/notify/HmacSigner.java`
- Test: `.../alerting/notify/HmacSignerTest.java`
- [ ] **Step 1: Write the failing test**
```java
@Test
void signsBodyWithSha256Hmac() {
String sig = new HmacSigner().sign("secret", "payload".getBytes(StandardCharsets.UTF_8));
// precomputed: HMAC-SHA256(secret, "payload") = 3c5c4f...
assertThat(sig).startsWith("sha256=").isEqualTo("sha256=3c5c4f..."); // replace with real hex
}
```
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**`javax.crypto.Mac.getInstance("HmacSHA256")`, `HexFormat.of().formatHex(...)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): HmacSigner for webhook signature"
```
### Task 29: `WebhookDispatcher`
**Files:**
- Create: `.../alerting/notify/WebhookDispatcher.java`
- Test: `.../alerting/notify/WebhookDispatcherIT.java` (WireMock)
- [ ] **Step 1: Write the failing IT** covering:
- 2xx → returns DELIVERED with status + snippet.
- 4xx → returns FAILED immediately.
- 5xx → returns RETRY with exponential backoff.
- Network timeout → RETRY.
- HMAC header present when `hmacSecret != null`.
- TLS trust-all config works against WireMock HTTPS.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**
```java
@Component
public class WebhookDispatcher {
public record Outcome(NotificationStatus status, int httpStatus, String snippet, Duration retryAfter) {}
private final OutboundHttpClientFactory clientFactory;
private final SecretCipher cipher;
private final HmacSigner signer;
private final MustacheRenderer renderer;
private final AlertingProperties props;
private final ObjectMapper om;
public WebhookDispatcher(/* ... */) { /* assign */ }
public Outcome dispatch(AlertNotification notif, AlertRule rule, AlertInstance instance,
OutboundConnection conn, Map<String,Object> context) {
String bodyTmpl = pickBodyTemplate(rule, notif.webhookId(), conn);
String body = renderer.render(bodyTmpl, context);
var ctx = new OutboundHttpRequestContext(
conn.tlsTrustMode(), conn.tlsCaPemPaths(),
Duration.ofMillis(2000), Duration.ofMillis(props.effectiveWebhookTimeoutMs()));
var client = clientFactory.clientFor(ctx);
var request = new HttpPost(renderer.render(conn.url(), context));
request.setEntity(new StringEntity(body, StandardCharsets.UTF_8));
request.setHeader("Content-Type", "application/json");
for (var h : conn.defaultHeaders().entrySet()) {
request.setHeader(h.getKey(), renderer.render(h.getValue(), context));
}
if (conn.hmacSecretCiphertext() != null) {
String secret = cipher.decrypt(conn.hmacSecretCiphertext());
request.setHeader("X-Cameleer-Signature", signer.sign(secret, body.getBytes(StandardCharsets.UTF_8)));
}
try (var response = client.execute(request)) {
int code = response.getCode();
String snippet = snippet(response);
if (code >= 200 && code < 300) return new Outcome(NotificationStatus.DELIVERED, code, snippet, null);
if (code >= 400 && code < 500) return new Outcome(NotificationStatus.FAILED, code, snippet, null);
return retryOutcome(code, snippet);
} catch (IOException e) {
return retryOutcome(-1, e.getMessage());
}
}
private Outcome retryOutcome(int code, String snippet) {
// Backoff: 30s, 120s, 300s
Duration next = Duration.ofSeconds(30); // caller multiplies by attempt
return new Outcome(null /* caller decides PENDING vs FAILED */, code, snippet, next);
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): WebhookDispatcher with HMAC + TLS + retry classification"
```
### Task 30: `NotificationDispatchJob`
**Files:**
- Create: `.../alerting/notify/NotificationDispatchJob.java`
- Test: `.../alerting/notify/NotificationDispatchJobIT.java`
- [ ] **Step 1: Write the failing IT** — seed a `PENDING` `AlertNotification`; run one tick; WireMock returns 200; assert row transitions to `DELIVERED`. Seed another against 503 → assert `attempts=1`, `next_attempt_at` bumped, still `PENDING`.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — claim-polling loop:
```java
void tick() {
var claimed = notificationRepo.claimDueNotifications(instanceId, batchSize, claimTtl);
for (var n : claimed) {
var conn = outboundRepo.findById(tenantId, n.outboundConnectionId()).orElse(null);
if (conn == null) { notificationRepo.markFailed(n.id(), 0, "outbound connection deleted"); continue; }
var instance = instanceRepo.findById(n.alertInstanceId()).orElseThrow();
var rule = ruleRepo.findById(instance.ruleId()).orElse(null);
var context = contextBuilder.build(rule, instance, env, uiOrigin);
// silence check
if (silenceRepo.listActive(instance.environmentId(), Instant.now()).stream()
.anyMatch(s -> silenceMatcher.matches(s.matcher(), instance, rule))) {
instanceRepo.markSilenced(instance.id(), true);
notificationRepo.markFailed(n.id(), 0, "silenced");
continue;
}
var outcome = dispatcher.dispatch(n, rule, instance, conn, context);
if (outcome.status() == NotificationStatus.DELIVERED) {
notificationRepo.markDelivered(n.id(), outcome.httpStatus(), outcome.snippet(), Instant.now());
} else if (outcome.status() == NotificationStatus.FAILED) {
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
} else {
int attempts = n.attempts() + 1;
if (attempts >= props.effectiveWebhookMaxAttempts()) {
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
} else {
Instant next = Instant.now().plus(outcome.retryAfter().multipliedBy(attempts));
notificationRepo.scheduleRetry(n.id(), next, outcome.httpStatus(), outcome.snippet());
}
}
}
}
```
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): NotificationDispatchJob outbox loop with silence + retry"
```
### Task 31: `InAppInboxQuery` + server-side 5s memoization
**Files:**
- Create: `.../alerting/notify/InAppInboxQuery.java`
- Test: `.../alerting/notify/InAppInboxQueryTest.java`
- [ ] **Step 1: Write the failing test** covering the path (resolves groups/roles from `RbacService.getEffectiveRolesForUser` + `listGroupsForUser`, delegates to `AlertInstanceRepository.listForInbox`/`countUnreadForUser`, second call within 5s returns cached count).
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — Caffeine-style `ConcurrentHashMap<Key, Entry>` with `Entry(count, expiresAt)`, 5 s TTL per `(envId, userId)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): InAppInboxQuery with 5s unread-count memoization"
```
---
## Phase 9 — REST controllers
### Task 32: `AlertRuleController` + DTOs
**Files:**
- Create: `.../alerting/controller/AlertRuleController.java`
- Create: DTOs in `.../alerting/dto/`
- Test: `.../alerting/controller/AlertRuleControllerIT.java`
- [ ] **Step 1: Write the failing IT** — seed an env, authenticate as OPERATOR, POST a rule, GET list, PUT update, DELETE. Assert webhook references to unknown connections return 422. Assert VIEWER cannot POST but can GET. Assert audit log entry on each mutation.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement.** Endpoints (all under `/api/v1/environments/{envSlug}/alerts/rules`, env resolved via `@EnvPath Environment env`):
| Method | Path | RBAC |
|---|---|---|
| GET | `` | VIEWER+ |
| POST | `` | OPERATOR+ |
| GET | `{id}` | VIEWER+ |
| PUT | `{id}` | OPERATOR+ |
| DELETE | `{id}` | OPERATOR+ |
| POST | `{id}/enable` / `{id}/disable` | OPERATOR+ |
| POST | `{id}/render-preview` | OPERATOR+ |
| POST | `{id}/test-evaluate` | OPERATOR+ |
Key DTOs: `AlertRuleRequest` (with `@Valid AlertConditionDto`), `AlertRuleResponse`, `RenderPreviewRequest/Response`, `TestEvaluateRequest/Response`.
On save, validate:
- Each `WebhookBindingRequest.outboundConnectionId` exists in `outbound_connections` (via `OutboundConnectionService.get(id)` → 422 if 404).
- Connection is allowed in this env (via `conn.isAllowedInEnvironment(env.id())` → 422 otherwise).
- SSRF check on connection URL deferred to the outbound-connection save path (Plan 01 territory).
Audit via `auditService.log("ALERT_RULE_CREATE", ALERT_RULE_CHANGE, rule.id().toString(), Map.of("name", rule.name()), SUCCESS, request)`.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): AlertRuleController REST + audit + DTOs"
```
### Task 33: `AlertController`
**Files:**
- Create: `.../alerting/controller/AlertController.java`, `AlertDto.java`, `UnreadCountResponse.java`
- Test: `.../alerting/controller/AlertControllerIT.java`
- [ ] **Step 1: Write the failing IT** for `GET /alerts`, `GET /alerts/unread-count`, `POST /alerts/{id}/ack`, `POST /alerts/{id}/read`, `POST /alerts/bulk-read`. Assert env isolation (env-A alert not visible from env-B).
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement** — delegate to `InAppInboxQuery` and `AlertInstanceRepository`. On ack, enforce targeted-or-OPERATOR rule.
- [ ] **Step 4: Run — PASS**.
- [ ] **Step 5: Commit**
```bash
git commit -m "feat(alerting): AlertController for inbox + ack + read"
```
### Task 34: `AlertSilenceController`
**Files:**
- Create: `.../alerting/controller/AlertSilenceController.java`, `AlertSilenceDto.java`
- Test: `.../alerting/controller/AlertSilenceControllerIT.java`
- [ ] **Step 15:** Follow the same pattern. Mutations OPERATOR+, audit `ALERT_SILENCE_CHANGE`. Validate `endsAt > startsAt` at controller layer (DB constraint catches it anyway; user-facing 422 is friendlier).
### Task 35: `AlertNotificationController`
**Files:**
- Create: `.../alerting/controller/AlertNotificationController.java`
- Test: `.../alerting/controller/AlertNotificationControllerIT.java`
- [ ] **Step 15:**
- `GET /alerts/{id}/notifications` → VIEWER+; returns per-instance outbox rows.
- `POST /alerts/notifications/{id}/retry` → OPERATOR+; resets `next_attempt_at = now`, `attempts = 0`, `status = PENDING`. Flat path because notification IDs are globally unique (document this in the flat-allow-list rule file).
- [ ] **Step 6: Update `SecurityConfig` to permit the new paths**
In `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java`:
```java
.requestMatchers(HttpMethod.GET, "/api/v1/environments/*/alerts/**").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/ack").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/bulk-read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/alerts/notifications/*/retry").hasAnyRole("OPERATOR","ADMIN")
```
(Class-level `@PreAuthorize` on each controller is authoritative; the path matchers are defence-in-depth.)
- [ ] **Step 7: Commit**
```bash
git commit -m "feat(alerting): AlertNotificationController + SecurityConfig paths"
```
### Task 36: Regenerate OpenAPI schema
- [ ] **Step 1: Start backend on :8081** (from the alerting-02 worktree).
- [ ] **Step 2:** `cd ui && npm run generate-api:live`
- [ ] **Step 3:** Commit `ui/src/api/schema.d.ts` + `ui/src/api/openapi.json` regen.
```bash
git add ui/src/api/schema.d.ts ui/src/api/openapi.json
git commit -m "chore(alerting): regenerate openapi schema for alerting endpoints"
```
---
## Phase 10 — Retention, metrics, rules, verification
### Task 37: `AlertingRetentionJob`
**Files:**
- Create: `.../alerting/retention/AlertingRetentionJob.java`
- Test: `.../alerting/retention/AlertingRetentionJobIT.java`
- [ ] **Step 1: Write the failing IT** — seed 2 resolved instances (one older than retention, one fresher) + 2 settled notifications; run `cleanup()`; assert only old rows are deleted.
- [ ] **Step 2: Run — FAIL**.
- [ ] **Step 3: Implement**`@Scheduled(cron = "0 0 3 * * *")`, cutoffs from `AlertingProperties`, advisory-lock-of-the-day pattern (see `JarRetentionJob.java`).
- [ ] **Step 45: Run, commit**
```bash
git commit -m "feat(alerting): AlertingRetentionJob daily cleanup"
```
### Task 38: `AlertingMetrics`
**Files:**
- Create: `.../alerting/metrics/AlertingMetrics.java`
- [ ] **Step 1: Register metrics** via `MeterRegistry`:
```java
@Component
public class AlertingMetrics {
private final MeterRegistry registry;
public AlertingMetrics(MeterRegistry registry) { this.registry = registry; }
public void evalError(ConditionKind kind, UUID ruleId) {
registry.counter("alerting_eval_errors_total",
"kind", kind.name(), "rule_id", ruleId.toString()).increment();
}
public void circuitOpened(ConditionKind kind) {
registry.counter("alerting_circuit_open_total", "kind", kind.name()).increment();
}
public Timer evalDuration(ConditionKind kind) {
return registry.timer("alerting_eval_duration_seconds", "kind", kind.name());
}
// + gauges via MeterBinder that query repositories
}
```
- [ ] **Step 2: Wire into `AlertEvaluatorJob` and `PerKindCircuitBreaker`.**
- [ ] **Step 3: Commit**
```bash
git commit -m "feat(alerting): observability metrics via micrometer"
```
### Task 39: Update `.claude/rules/app-classes.md` + `core-classes.md`
- [ ] **Step 1: Document the new `alerting/` packages** in both rule files. Add a new subsection under `controller/` for the alerting env-scoped controllers. Document the new flat endpoint `/api/v1/alerts/notifications/{id}/retry` in the flat-allow-list with justification "notification IDs are globally unique; matches the `/api/v1/executions/{id}` precedent".
- [ ] **Step 2: Commit**
```bash
git add .claude/rules/app-classes.md .claude/rules/core-classes.md
git commit -m "docs(rules): document alerting/ packages + notification retry flat endpoint"
```
### Task 40: `application.yml` defaults + admin guide
**Files:**
- Modify: `cameleer-server-app/src/main/resources/application.yml`
- Create: `docs/alerting.md`
- [ ] **Step 1: Add default stanza**
```yaml
cameleer:
server:
alerting:
evaluator-tick-interval-ms: 5000
evaluator-batch-size: 20
claim-ttl-seconds: 30
notification-tick-interval-ms: 5000
notification-batch-size: 50
in-tick-cache-enabled: true
circuit-breaker-fail-threshold: 5
circuit-breaker-window-seconds: 30
circuit-breaker-cooldown-seconds: 60
event-retention-days: 90
notification-retention-days: 30
webhook-timeout-ms: 5000
webhook-max-attempts: 3
```
- [ ] **Step 2: Write `docs/alerting.md`** — 1-2 page admin guide covering: rule shapes per condition kind (with example JSON), template variables per kind, webhook destinations (Slack/PagerDuty/Teams examples), silence patterns, troubleshooting (circuit breaker, retention).
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-app/src/main/resources/application.yml docs/alerting.md
git commit -m "docs(alerting): default config + admin guide"
```
### Task 41: Full-lifecycle integration test
**Files:**
- Create: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java`
- [ ] **Step 1: Write the full-lifecycle IT**
Steps in the single test method:
1. Seed env, user with OPERATOR role, outbound connection (WireMock backing) with HMAC secret.
2. POST a `LOG_PATTERN` rule pointing at `WireMock` via the outbound connection, `forDurationSeconds=0`, `threshold=1`.
3. Inject a log row into ClickHouse that matches the pattern.
4. Trigger `AlertEvaluatorJob.tick()` directly.
5. Assert one `alert_instances` row in FIRING.
6. Trigger `NotificationDispatchJob.tick()`.
7. Assert WireMock received one POST with `X-Cameleer-Signature` header + rendered body.
8. POST `/alerts/{id}/ack` → state ACKNOWLEDGED.
9. Create a silence matching this rule; fire another tick; assert `silenced=true` on new instance and WireMock received no second request.
10. Remove the matching log rows, run tick → instance RESOLVED.
11. DELETE the rule → assert `alert_instances.rule_id = NULL` but `rule_snapshot` still retains rule name.
- [ ] **Step 2: Run — PASS** (may need a few iterations of debugging).
- [ ] **Step 3: Commit**
```bash
git add cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java
git commit -m "test(alerting): full lifecycle — fire, notify, silence, ack, resolve, delete"
```
### Task 42: Env-isolation + outbound-guard regression tests
**Files:**
- Create: `.../alerting/AlertingEnvIsolationIT.java`, `OutboundConnectionAllowedEnvIT.java`
- [ ] **Step 1: Env isolation** — rule in env-A, fire, assert invisible from env-B inbox.
- [ ] **Step 2: Outbound guard** — rule references a connection restricted to env-A; POST rule creation in env-B → 422. Narrowing `allowed_environment_ids` on the connection while a rule still references it → 409 (this exercises the freshly-wired `rulesReferencing`).
- [ ] **Step 3: Run — PASS**.
- [ ] **Step 4: Commit**
```bash
git commit -m "test(alerting): env isolation + outbound allowed-env guard"
```
### Task 43: Final verification + GitNexus reindex
- [ ] **Step 1: Full build**
```bash
mvn clean verify
```
Expected: All tests pass. Known pre-existing test debt (wrong-JdbcTemplate + shared-context state leaks) may still fail — document any failures that existed before Plan 02 in a commit message "known-pre-existing" note.
- [ ] **Step 2: GitNexus reindex**
```bash
npx gitnexus analyze --embeddings
```
- [ ] **Step 3: Manual smoke**
Start backend + UI (Plan 01 UI is sufficient for outbound connections). Walk through:
- Create an outbound connection to `https://httpbin.org/post`.
- `curl` the alerting REST API to POST a `LOG_PATTERN` rule.
- Inject a matching log via `POST /api/v1/data/logs`.
- Wait 2 eval ticks + 1 notification tick.
- Confirm: `alert_instances` row in FIRING, `alert_notifications` row DELIVERED with HTTP 200, httpbin shows the body.
- `curl POST /alerts/{id}/ack` → state ACKNOWLEDGED.
- [ ] **Step 4: Nothing to commit if all passes — plan complete**
---
## Known-incomplete items carried into Plan 03
- **UI:** `NotificationBell`, `/alerts/**` pages, `<MustacheEditor />` with variable auto-complete, CMD-K alert/rule sources. Open design question: completion engine choice (CodeMirror 6 vs Monaco vs textarea overlay) still open — see spec §20 #7.
- **Rule promotion across envs.** Pure UI flow (no new server endpoint); lives with the rule editor in Plan 03.
- **OIDC retrofit** to use `OutboundHttpClientFactory`. Unchanged from Plan 01 — a separate small follow-up.
- **TLS summary enrichment** on `/test` endpoint (Plan 01 stubbed as `"TLS"`). Can extract actual protocol + cipher suite + peer cert from Apache HttpClient 5's routed context.
- **Performance tests.** 500-rule, 5-replica `PerformanceIT` deferred; claim-polling concurrency is covered by Task 7's unit-level test.
- **Bulk promotion** and **mustache completion `variables` metadata endpoint** (`GET /alerts/rules/template-variables`) — deferred until usage patterns justify.
- **Rule deletion test debt.** Existing pre-Plan-02 test debt (wrong-JdbcTemplate bug in ~9 controller ITs + shared-context state leaks in `FlywayMigrationIT` / `ConfigEnvIsolationIT` / `ClickHouseStatsStoreIT`) is orthogonal and should be addressed in a dedicated test-hygiene pass.
---
## Self-review
**Spec coverage** (against `docs/superpowers/specs/2026-04-19-alerting-design.md`):
| Spec § | Scope | Covered by |
|---|---|---|
| §2 Signal sources (6) | All 6 condition kinds | Tasks 4, 2025 |
| §2 Delivery channels | In-app + webhook | Tasks 29, 30, 31 |
| §2 Lifecycle (FIRING/ACK/RESOLVED + SILENCED) | State machine + silence | Tasks 26, 18, 30, 33 |
| §2 Rule promotion | **Deferred to Plan 03 (UI)** | — |
| §2 CMD-K | **Deferred to Plan 03** | — |
| §2 Configurable cadence, 5 s floor | `AlertingProperties.effective*` | Task 26 |
| §3 Key decisions | All 14 decisions honoured | — |
| §4 Module layout | `core/alerting` + `app/alerting/**` | Tasks 311, 1538 |
| §4 Touchpoints | `countLogs` + `countExecutionsForAlerting` + `AuditCategory` + `SecurityConfig` | Tasks 2, 12, 13, 35 |
| §5 Data model | V12 migration | Task 1 |
| §5 Claim-polling queries | `FOR UPDATE SKIP LOCKED` in rule + notification repos | Tasks 7, 10 |
| §6 Outbound connections wiring | `rulesReferencing` gate | Task 8 (**CRITICAL**) |
| §7 Evaluator cadence, state machine, 4 projections, query coalescing, circuit breaker | Tick cache + projections + CB + SchedulingConfigurer | Tasks 14, 19, 26, 27 |
| §8 Notification dispatch, HMAC, template render, in-app inbox, 5s memoization | Tasks 28, 29, 30, 31 |
| §9 Rule promotion | **Deferred** (UI) | — |
| §10 Cross-cutting HTTP | Reused from Plan 01 | — |
| §11 API surface | All routes implemented except rule promotion | Tasks 3236 |
| §12 CMD-K | **Deferred to Plan 03** | — |
| §13 UI | **Deferred to Plan 03** | — |
| §14 Configuration | `AlertingProperties` + `application.yml` | Tasks 26, 40 |
| §15 Retention | Daily job | Task 37 |
| §16 Observability (metrics + audit) | Tasks 2, 38 |
| §17 Security (tenant/env, RBAC, SSRF, HMAC, TLS, audit) | Tasks 3236, 28, Plan 01 |
| §18 Testing | Unit + IT + WireMock + full-lifecycle | Tasks 17, 19, 2731, 41, 42 |
| §19 Rollout | Dormant-by-default; matching `application.yml` + docs | Task 40 |
| §20 #1 OIDC alignment | **Deferred** (follow-up) | — |
| §20 #2 secret encryption | Reused Plan 01 `SecretCipher` | Task 29 |
| §20 #3 CH migration naming | `alerting_projections.sql` | Task 14 |
| §20 #6 env-delete cascade audit | PG IT | Task 1 |
| §20 #7 Mustache completion engine | **Deferred** (UI) | — |
**Placeholders:** A handful of steps reference real record fields / method names with `/* … */` markers where the exact name depends on what the existing codebase exposes (`ExecutionStats` metric accessors, `AgentInfo.lastHeartbeat` method name, wither-method signatures on `AlertInstance`). Each is accompanied by a `gitnexus_context({name: ...})` hint for the implementer. These are not TBDs — they are direct instructions to resolve against the code at implementation time.
**Type consistency check:** `AlertRule`, `AlertInstance`, `AlertNotification`, `AlertSilence` field names in the Java records match the SQL column names (snake_case in SQL, camelCase in Java). `WebhookBinding.id` is used as `alert_notifications.webhook_id` — stable opaque reference. `OutboundConnection.createdBy/updatedBy` types match `users.user_id TEXT` (Plan 01 precedent). `rulesReferencing` signature matches Plan 01's stub `List<UUID> rulesReferencing(UUID)`.
**Risks flagged to executor:**
1. **Task 16 `MustacheRenderer` missing-variable fallback** is non-trivial in JMustache's default compiler config — implementer may need a second iteration. Tests lock the contract; the implementation approach is flexible.
2. **Task 12/13** — the SQL dialect for attribute map access on the `executions` table (`attributes[?]`) depends on the actual column type in `init.sql`. If attributes is `Map(String,String)`, the syntax works; if it's stored as JSON string, switch to `JSONExtractString(attributes, ?) = ?`.
3. **Task 27 `enrichTitleMessage`** depends on `AlertInstance` having wither methods — these are added opportunistically during Task 26 when `AlertStateTransitions` needs them. Don't forget to expose them.
4. **Claim-polling semantics under schema-per-tenant** — the `?currentSchema=tenant_{id}` JDBC URL routes writes correctly, but the `FOR UPDATE SKIP LOCKED` behaviour is per-schema so cross-tenant locks are irrelevant (correct behaviour). Make sure IT tests run with `cameleer.server.tenant.id=default`.
5. **Task 41 full-lifecycle test is the canary.** If it fails after each task, pair-program with the failing assertion — the bug is almost always in state transitions or renderer context shape.