3429 lines
138 KiB
Markdown
3429 lines
138 KiB
Markdown
# Alerting — Plan 02 — Backend Implementation
|
||
|
||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||
|
||
**Goal:** Deliver the server-side alerting feature described in `docs/superpowers/specs/2026-04-19-alerting-design.md` — domain model, storage, evaluators for all six condition kinds, notification dispatch (webhook + in-app inbox), REST API, retention, metrics, and integration tests. UI, CMD-K integration, and load tests are explicitly deferred to Plan 03.
|
||
|
||
**Architecture:** Confined to new `alerting/` packages in both `cameleer-server-core` (pure records + interfaces) and `cameleer-server-app` (Spring-wired storage, scheduling, REST). Postgres stores rules/instances/silences/notifications; ClickHouse stores observability data read by evaluators (new `countLogs` / `countExecutionsForAlerting` methods, four additive projections). Claim-polling `FOR UPDATE SKIP LOCKED` makes the evaluator and dispatcher horizontally scalable. Rule→connection wiring (`rulesReferencing`) is populated in this plan — it is the gate that unlocks safe production use of Plan 01.
|
||
|
||
**Tech Stack:** Java 17, Spring Boot 3.4.3, PostgreSQL (Flyway V12), ClickHouse (idempotent init SQL), JMustache for templates, Apache HttpClient 5 via Plan 01's `OutboundHttpClientFactory`, Testcontainers + JUnit 5 + WireMock + AssertJ for tests.
|
||
|
||
---
|
||
|
||
## Base branch
|
||
|
||
**Branch Plan 02 off `feat/alerting-01-outbound-infra`.** Plan 02 depends on Plan 01's `OutboundConnection` domain, `OutboundHttpClientFactory` bean, `SecretCipher`, `OutboundConnectionServiceImpl.rulesReferencing()` stub, the V11 migration, and the `OUTBOUND_CONNECTION_CHANGE` / `OUTBOUND_HTTP_TRUST_CHANGE` audit categories. Branching off `main` is **not** an option — those classes do not exist there yet. When Plan 01 merges, rebase Plan 02 onto main; until then Plan 02 is stacked PR #2.
|
||
|
||
```bash
|
||
# Execute in a fresh worktree
|
||
git fetch origin
|
||
git worktree add -b feat/alerting-02-backend .worktrees/alerting-02 feat/alerting-01-outbound-infra
|
||
cd .worktrees/alerting-02
|
||
mvn clean compile # confirm Plan 01 code compiles as baseline
|
||
```
|
||
|
||
---
|
||
|
||
## File Structure
|
||
|
||
### Created — `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/`
|
||
|
||
| File | Responsibility |
|
||
|---|---|
|
||
| `AlertingProperties.java` | Not here — see app module. |
|
||
| `AlertRule.java` | Immutable record: id, environmentId, name, description, severity, enabled, conditionKind, condition, evaluationIntervalSeconds, forDurationSeconds, reNotifyMinutes, notificationTitleTmpl, notificationMessageTmpl, webhooks, targets, nextEvaluationAt, claimedBy, claimedUntil, evalState, audit fields. |
|
||
| `AlertCondition.java` | Sealed interface; Jackson DEDUCTION polymorphism root. |
|
||
| `RouteMetricCondition.java` | Record: scope, metric, comparator, threshold, windowSeconds. |
|
||
| `ExchangeMatchCondition.java` | Record: scope, filter, fireMode, threshold, windowSeconds, perExchangeLingerSeconds. |
|
||
| `AgentStateCondition.java` | Record: scope, state, forSeconds. |
|
||
| `DeploymentStateCondition.java` | Record: scope, states. |
|
||
| `LogPatternCondition.java` | Record: scope, level, pattern, threshold, windowSeconds. |
|
||
| `JvmMetricCondition.java` | Record: scope, metric, aggregation, comparator, threshold, windowSeconds. |
|
||
| `AlertScope.java` | Record: appSlug?, routeId?, agentId? — nullable fields, used by all conditions. |
|
||
| `ConditionKind.java` | Enum mirror of SQL `condition_kind_enum`. |
|
||
| `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java` | Enums used in conditions. |
|
||
| `AlertSeverity.java` | Enum mirror of SQL `severity_enum`. |
|
||
| `AlertState.java` | Enum mirror of SQL `alert_state_enum`. |
|
||
| `AlertInstance.java` | Immutable record for `alert_instances` row. |
|
||
| `AlertRuleTarget.java` | Record for `alert_rule_targets` row. |
|
||
| `TargetKind.java` | Enum mirror of SQL `target_kind_enum`. |
|
||
| `AlertSilence.java` | Record: id, environmentId, matcher, reason, startsAt, endsAt, createdBy, createdAt. |
|
||
| `SilenceMatcher.java` | Record: ruleId?, appSlug?, routeId?, agentId?, severity?. |
|
||
| `AlertNotification.java` | Record for `alert_notifications` outbox row. |
|
||
| `NotificationStatus.java` | Enum mirror of SQL `notification_status_enum`. |
|
||
| `WebhookBinding.java` | Record embedded in `alert_rules.webhooks` JSONB: id, outboundConnectionId, bodyOverride?, headerOverrides?. |
|
||
| `AlertRuleRepository.java` | CRUD + claim-polling interface. |
|
||
| `AlertInstanceRepository.java` | CRUD + query-for-inbox interface. |
|
||
| `AlertSilenceRepository.java` | CRUD interface. |
|
||
| `AlertNotificationRepository.java` | CRUD + claim-polling interface. |
|
||
| `AlertReadRepository.java` | Mark-read + count-unread interface. |
|
||
|
||
### Created — `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/`
|
||
|
||
| File | Responsibility |
|
||
|---|---|
|
||
| `config/AlertingProperties.java` | `@ConfigurationProperties("cameleer.server.alerting")`. |
|
||
| `config/AlertingBeanConfig.java` | Bean wiring for repositories, evaluators, dispatch, mustache renderer, etc. |
|
||
| `storage/PostgresAlertRuleRepository.java` | JdbcTemplate impl of `AlertRuleRepository`. |
|
||
| `storage/PostgresAlertInstanceRepository.java` | JdbcTemplate impl. |
|
||
| `storage/PostgresAlertSilenceRepository.java` | JdbcTemplate impl. |
|
||
| `storage/PostgresAlertNotificationRepository.java` | JdbcTemplate impl. |
|
||
| `storage/PostgresAlertReadRepository.java` | JdbcTemplate impl. |
|
||
| `eval/EvalContext.java` | Per-tick context (tenantId, now, tickCache). |
|
||
| `eval/EvalResult.java` | Sealed: `Firing(value, threshold, contextMap)` / `Clear` / `Error(Throwable)`. |
|
||
| `eval/TickCache.java` | `ConcurrentHashMap<String,Object>` discarded per tick. |
|
||
| `eval/PerKindCircuitBreaker.java` | Failure window + cooldown per `ConditionKind`. |
|
||
| `eval/ConditionEvaluator.java` | Generic interface: `evaluate(C, AlertRule, EvalContext)`. |
|
||
| `eval/RouteMetricEvaluator.java` | Reads `StatsStore`. |
|
||
| `eval/ExchangeMatchEvaluator.java` | Reads `ClickHouseSearchIndex.countExecutionsForAlerting` + `SearchService.search` for PER_EXCHANGE cursor mode. |
|
||
| `eval/AgentStateEvaluator.java` | Reads `AgentRegistryService.findAll`. |
|
||
| `eval/DeploymentStateEvaluator.java` | Reads `DeploymentRepository.findByAppId`. |
|
||
| `eval/LogPatternEvaluator.java` | Reads new `ClickHouseLogStore.countLogs`. |
|
||
| `eval/JvmMetricEvaluator.java` | Reads `MetricsQueryStore.queryTimeSeries`. |
|
||
| `eval/AlertEvaluatorJob.java` | `@Component` implementing `SchedulingConfigurer`; claim-polling loop. |
|
||
| `eval/AlertStateTransitions.java` | Pure function: given current instance + EvalResult → new state + timestamps. |
|
||
| `notify/MustacheRenderer.java` | JMustache wrapper; resilient to bad templates. |
|
||
| `notify/NotificationContextBuilder.java` | Pure: builds context map from `AlertInstance` + rule + env. |
|
||
| `notify/SilenceMatcher.java` | Pure: evaluates a `SilenceMatcher` against an `AlertInstance`. |
|
||
| `notify/InAppInboxQuery.java` | Server-side query helper for `/alerts` and unread-count. |
|
||
| `notify/WebhookDispatcher.java` | Renders + POSTs + HMAC signs; classifies 2xx/4xx/5xx → status. |
|
||
| `notify/NotificationDispatchJob.java` | `@Component` `SchedulingConfigurer`; claim-polling on `alert_notifications`. |
|
||
| `notify/HmacSigner.java` | Pure: computes `sha256=<hmac(secret, body)>`. |
|
||
| `retention/AlertingRetentionJob.java` | `@Scheduled(cron = "0 0 3 * * *")` — delete old `alert_instances` + `alert_notifications`. |
|
||
| `controller/AlertRuleController.java` | `/api/v1/environments/{envSlug}/alerts/rules`. |
|
||
| `controller/AlertController.java` | `/api/v1/environments/{envSlug}/alerts` + instance actions. |
|
||
| `controller/AlertSilenceController.java` | `/api/v1/environments/{envSlug}/alerts/silences`. |
|
||
| `controller/AlertNotificationController.java` | `/api/v1/environments/{envSlug}/alerts/{id}/notifications`, `/alerts/notifications/{id}/retry`. |
|
||
| `dto/AlertRuleDto.java`, `dto/AlertDto.java`, `dto/AlertSilenceDto.java`, `dto/AlertNotificationDto.java`, `dto/ConditionDto.java`, `dto/WebhookBindingDto.java`, `dto/RenderPreviewRequest.java`, `dto/RenderPreviewResponse.java`, `dto/TestEvaluateRequest.java`, `dto/TestEvaluateResponse.java`, `dto/UnreadCountResponse.java` | Request/response DTOs. |
|
||
| `metrics/AlertingMetrics.java` | Micrometer registrations for counters/gauges/histograms. |
|
||
|
||
### Created — resources
|
||
|
||
| File | Responsibility |
|
||
|---|---|
|
||
| `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql` | Flyway migration: 5 enums, 6 tables, indexes, cascades. |
|
||
| `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql` | 4 projections on `executions` / `logs` / `agent_metrics`, all `IF NOT EXISTS`. |
|
||
|
||
### Modified
|
||
|
||
| File | Change |
|
||
|---|---|
|
||
| `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java` | Add `ALERT_RULE_CHANGE`, `ALERT_SILENCE_CHANGE`. |
|
||
| `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java` | Replace the `rulesReferencing(UUID)` stub with a call through `AlertRuleRepository.findRuleIdsByOutboundConnectionId`. |
|
||
| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java` | Add `long countLogs(LogSearchRequest)` — no `FINAL`. |
|
||
| `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java` | Add `long countExecutionsForAlerting(AlertMatchSpec)` — no `FINAL`. |
|
||
| `cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java` | Run `alerting_projections.sql` via existing `ClickHouseSchemaInitializer`. |
|
||
| `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java` | Permit new `/api/v1/environments/{envSlug}/alerts/**` path matchers with role-based access. |
|
||
| `cameleer-server-core/pom.xml` | Add `com.samskivert:jmustache:1.16`. |
|
||
| `.claude/rules/app-classes.md`, `.claude/rules/core-classes.md` | Document new packages. |
|
||
| `cameleer-server-app/src/main/resources/application.yml` | Default `AlertingProperties` stanza + comment linking to the admin guide. |
|
||
|
||
---
|
||
|
||
## Conventions
|
||
|
||
- **TDD.** Every task starts with a failing test, implements the minimum to pass, then commits.
|
||
- **One commit per task.** Commit messages: `feat(alerting): …`, `test(alerting): …`, `fix(alerting): …`, `chore(alerting): …`, `docs(alerting): …`.
|
||
- **Tenant invariant.** Every ClickHouse query and Postgres table referencing observability data filters by `tenantId` (injected via `AlertingBeanConfig` from `cameleer.server.tenant.id`).
|
||
- **No `FINAL`** on the two new CH count methods — alerting tolerates brief duplicate counts.
|
||
- **Jackson polymorphism** via `@JsonTypeInfo(use = DEDUCTION)` with `@JsonSubTypes` on `AlertCondition`.
|
||
- **Pure `core/`, Spring-only in `app/`.** No `@Component`, `@Service`, or `@Scheduled` annotations in `cameleer-server-core`.
|
||
- **Claim polling.** `FOR UPDATE SKIP LOCKED` + `claimed_by` / `claimed_until` with 30 s TTL.
|
||
- **Instance id** for claim ownership: use `InetAddress.getLocalHost().getHostName() + ":" + processPid()`; exposed as a bean `"alertingInstanceId"` of type `String`.
|
||
- **GitNexus hygiene.** Before modifying any existing class (`OutboundConnectionServiceImpl`, `ClickHouseLogStore`, `ClickHouseSearchIndex`, `AuditCategory`, `SecurityConfig`), run `gitnexus_impact({target: "<className>", direction: "upstream"})` and report blast radius. Run `gitnexus_detect_changes()` before each commit.
|
||
|
||
---
|
||
|
||
## Phase 1 — Flyway V12 migration and audit categories
|
||
|
||
### Task 1: `V12__alerting_tables.sql`
|
||
|
||
**Files:**
|
||
- Create: `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql`
|
||
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing integration test**
|
||
|
||
```java
|
||
package com.cameleer.server.app.alerting.storage;
|
||
|
||
import com.cameleer.server.app.AbstractPostgresIT;
|
||
import org.junit.jupiter.api.Test;
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class V12MigrationIT extends AbstractPostgresIT {
|
||
|
||
@Test
|
||
void allAlertingTablesAndEnumsExist() {
|
||
var tables = jdbcTemplate.queryForList(
|
||
"SELECT table_name FROM information_schema.tables WHERE table_schema='public' " +
|
||
"AND table_name IN ('alert_rules','alert_rule_targets','alert_instances'," +
|
||
"'alert_silences','alert_notifications','alert_reads')",
|
||
String.class);
|
||
assertThat(tables).containsExactlyInAnyOrder(
|
||
"alert_rules","alert_rule_targets","alert_instances",
|
||
"alert_silences","alert_notifications","alert_reads");
|
||
|
||
var enums = jdbcTemplate.queryForList(
|
||
"SELECT typname FROM pg_type WHERE typname IN " +
|
||
"('severity_enum','condition_kind_enum','alert_state_enum'," +
|
||
"'target_kind_enum','notification_status_enum')",
|
||
String.class);
|
||
assertThat(enums).hasSize(5);
|
||
}
|
||
|
||
@Test
|
||
void deletingEnvironmentCascadesAlertingRows() {
|
||
var envId = java.util.UUID.randomUUID();
|
||
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-cascade-env");
|
||
jdbcTemplate.update(
|
||
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
|
||
"VALUES (?, ?, 'x', 'a@b', true)", "u1", "u1");
|
||
var ruleId = java.util.UUID.randomUUID();
|
||
jdbcTemplate.update(
|
||
"INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " +
|
||
"notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " +
|
||
"VALUES (?, ?, 'r', 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'u1', 'u1')",
|
||
ruleId, envId);
|
||
var instanceId = java.util.UUID.randomUUID();
|
||
jdbcTemplate.update(
|
||
"INSERT INTO alert_instances (id, rule_id, rule_snapshot, environment_id, state, severity, " +
|
||
"fired_at, context, title, message) VALUES (?, ?, '{}'::jsonb, ?, 'FIRING', 'WARNING', " +
|
||
"now(), '{}'::jsonb, 't', 'm')",
|
||
instanceId, ruleId, envId);
|
||
|
||
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
|
||
|
||
assertThat(jdbcTemplate.queryForObject(
|
||
"SELECT count(*) FROM alert_rules WHERE environment_id = ?",
|
||
Integer.class, envId)).isZero();
|
||
assertThat(jdbcTemplate.queryForObject(
|
||
"SELECT count(*) FROM alert_instances WHERE environment_id = ?",
|
||
Integer.class, envId)).isZero();
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run the test to verify it fails**
|
||
|
||
Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT`
|
||
Expected: FAIL — tables do not exist.
|
||
|
||
- [ ] **Step 3: Write the migration**
|
||
|
||
Create `cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql`:
|
||
|
||
```sql
|
||
-- Enums (outbound_method_enum / outbound_auth_kind_enum / trust_mode_enum already exist from V11)
|
||
CREATE TYPE severity_enum AS ENUM ('CRITICAL','WARNING','INFO');
|
||
CREATE TYPE condition_kind_enum AS ENUM ('ROUTE_METRIC','EXCHANGE_MATCH','AGENT_STATE','DEPLOYMENT_STATE','LOG_PATTERN','JVM_METRIC');
|
||
CREATE TYPE alert_state_enum AS ENUM ('PENDING','FIRING','ACKNOWLEDGED','RESOLVED');
|
||
CREATE TYPE target_kind_enum AS ENUM ('USER','GROUP','ROLE');
|
||
CREATE TYPE notification_status_enum AS ENUM ('PENDING','DELIVERED','FAILED');
|
||
|
||
CREATE TABLE alert_rules (
|
||
id uuid PRIMARY KEY,
|
||
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||
name varchar(200) NOT NULL,
|
||
description text,
|
||
severity severity_enum NOT NULL,
|
||
enabled boolean NOT NULL DEFAULT true,
|
||
condition_kind condition_kind_enum NOT NULL,
|
||
condition jsonb NOT NULL,
|
||
evaluation_interval_seconds int NOT NULL DEFAULT 60 CHECK (evaluation_interval_seconds >= 5),
|
||
for_duration_seconds int NOT NULL DEFAULT 0 CHECK (for_duration_seconds >= 0),
|
||
re_notify_minutes int NOT NULL DEFAULT 60 CHECK (re_notify_minutes >= 0),
|
||
notification_title_tmpl text NOT NULL,
|
||
notification_message_tmpl text NOT NULL,
|
||
webhooks jsonb NOT NULL DEFAULT '[]',
|
||
next_evaluation_at timestamptz NOT NULL DEFAULT now(),
|
||
claimed_by varchar(64),
|
||
claimed_until timestamptz,
|
||
eval_state jsonb NOT NULL DEFAULT '{}',
|
||
created_at timestamptz NOT NULL DEFAULT now(),
|
||
created_by text NOT NULL REFERENCES users(user_id),
|
||
updated_at timestamptz NOT NULL DEFAULT now(),
|
||
updated_by text NOT NULL REFERENCES users(user_id)
|
||
);
|
||
CREATE INDEX alert_rules_env_idx ON alert_rules (environment_id);
|
||
CREATE INDEX alert_rules_claim_due_idx ON alert_rules (next_evaluation_at) WHERE enabled = true;
|
||
|
||
CREATE TABLE alert_rule_targets (
|
||
id uuid PRIMARY KEY,
|
||
rule_id uuid NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
|
||
target_kind target_kind_enum NOT NULL,
|
||
target_id varchar(128) NOT NULL,
|
||
UNIQUE (rule_id, target_kind, target_id)
|
||
);
|
||
CREATE INDEX alert_rule_targets_lookup_idx ON alert_rule_targets (target_kind, target_id);
|
||
|
||
CREATE TABLE alert_instances (
|
||
id uuid PRIMARY KEY,
|
||
rule_id uuid REFERENCES alert_rules(id) ON DELETE SET NULL,
|
||
rule_snapshot jsonb NOT NULL,
|
||
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||
state alert_state_enum NOT NULL,
|
||
severity severity_enum NOT NULL,
|
||
fired_at timestamptz NOT NULL,
|
||
acked_at timestamptz,
|
||
acked_by text REFERENCES users(user_id),
|
||
resolved_at timestamptz,
|
||
last_notified_at timestamptz,
|
||
silenced boolean NOT NULL DEFAULT false,
|
||
current_value numeric,
|
||
threshold numeric,
|
||
context jsonb NOT NULL,
|
||
title text NOT NULL,
|
||
message text NOT NULL,
|
||
target_user_ids text[] NOT NULL DEFAULT '{}',
|
||
target_group_ids uuid[] NOT NULL DEFAULT '{}',
|
||
target_role_names text[] NOT NULL DEFAULT '{}'
|
||
);
|
||
CREATE INDEX alert_instances_inbox_idx ON alert_instances (environment_id, state, fired_at DESC);
|
||
CREATE INDEX alert_instances_open_rule_idx ON alert_instances (rule_id, state) WHERE rule_id IS NOT NULL;
|
||
CREATE INDEX alert_instances_resolved_idx ON alert_instances (resolved_at) WHERE state = 'RESOLVED';
|
||
CREATE INDEX alert_instances_target_u_idx ON alert_instances USING GIN (target_user_ids);
|
||
CREATE INDEX alert_instances_target_g_idx ON alert_instances USING GIN (target_group_ids);
|
||
CREATE INDEX alert_instances_target_r_idx ON alert_instances USING GIN (target_role_names);
|
||
|
||
CREATE TABLE alert_silences (
|
||
id uuid PRIMARY KEY,
|
||
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
|
||
matcher jsonb NOT NULL,
|
||
reason text,
|
||
starts_at timestamptz NOT NULL,
|
||
ends_at timestamptz NOT NULL CHECK (ends_at > starts_at),
|
||
created_by text NOT NULL REFERENCES users(user_id),
|
||
created_at timestamptz NOT NULL DEFAULT now()
|
||
);
|
||
CREATE INDEX alert_silences_active_idx ON alert_silences (environment_id, ends_at);
|
||
|
||
CREATE TABLE alert_notifications (
|
||
id uuid PRIMARY KEY,
|
||
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
|
||
webhook_id uuid,
|
||
outbound_connection_id uuid REFERENCES outbound_connections(id) ON DELETE SET NULL,
|
||
status notification_status_enum NOT NULL DEFAULT 'PENDING',
|
||
attempts int NOT NULL DEFAULT 0,
|
||
next_attempt_at timestamptz NOT NULL DEFAULT now(),
|
||
claimed_by varchar(64),
|
||
claimed_until timestamptz,
|
||
last_response_status int,
|
||
last_response_snippet text,
|
||
payload jsonb NOT NULL,
|
||
delivered_at timestamptz,
|
||
created_at timestamptz NOT NULL DEFAULT now()
|
||
);
|
||
CREATE INDEX alert_notifications_pending_idx ON alert_notifications (next_attempt_at) WHERE status = 'PENDING';
|
||
CREATE INDEX alert_notifications_instance_idx ON alert_notifications (alert_instance_id);
|
||
|
||
CREATE TABLE alert_reads (
|
||
user_id text NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
|
||
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
|
||
read_at timestamptz NOT NULL DEFAULT now(),
|
||
PRIMARY KEY (user_id, alert_instance_id)
|
||
);
|
||
```
|
||
|
||
Notes:
|
||
- Plan 01 established `users.user_id` as TEXT. All FK-to-users columns in this migration are `text`, not `uuid`.
|
||
- `target_user_ids` is `text[]` (matches `users.user_id`).
|
||
- `outbound_connections` (Plan 01) is referenced with `ON DELETE SET NULL` — matches the spec's "409 if referenced" semantics at the app layer while preserving referential cleanup if the admin-facing guard is bypassed.
|
||
|
||
- [ ] **Step 4: Run the test to verify it passes**
|
||
|
||
Run: `mvn -pl cameleer-server-app test -Dtest=V12MigrationIT`
|
||
Expected: PASS.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java
|
||
git commit -m "feat(alerting): V12 flyway migration for alerting tables"
|
||
```
|
||
|
||
### Task 2: Extend `AuditCategory`
|
||
|
||
**Files:**
|
||
- Modify: `cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java`
|
||
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java`
|
||
|
||
- [ ] **Step 1: GitNexus impact check**
|
||
|
||
Run `gitnexus_impact({target: "AuditCategory", direction: "upstream"})` — report the blast radius (additive enum values are non-breaking; affected files are the admin rule file + any switch statements).
|
||
|
||
- [ ] **Step 2: Write the failing test**
|
||
|
||
```java
|
||
package com.cameleer.server.core.admin;
|
||
|
||
import org.junit.jupiter.api.Test;
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class AuditCategoryTest {
|
||
@Test
|
||
void alertingCategoriesPresent() {
|
||
assertThat(AuditCategory.valueOf("ALERT_RULE_CHANGE")).isNotNull();
|
||
assertThat(AuditCategory.valueOf("ALERT_SILENCE_CHANGE")).isNotNull();
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 3: Run the test — FAIL**
|
||
|
||
Run: `mvn -pl cameleer-server-core test -Dtest=AuditCategoryTest`
|
||
Expected: FAIL — `IllegalArgumentException: No enum constant`.
|
||
|
||
- [ ] **Step 4: Add the enum values**
|
||
|
||
Replace the whole enum body with:
|
||
|
||
```java
|
||
package com.cameleer.server.core.admin;
|
||
|
||
public enum AuditCategory {
|
||
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
|
||
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
|
||
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 5: Run the test — PASS**
|
||
|
||
- [ ] **Step 6: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java \
|
||
cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java
|
||
git commit -m "feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 2 — Core domain model
|
||
|
||
Each task in this phase adds a small, focused set of pure-Java records and enums under `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/`. All records use canonical constructors with explicit `@NotNull`-style defensive copying only for mutable collections (`List.copyOf`, `Map.copyOf`). Jackson polymorphism is handled by `@JsonTypeInfo(use = DEDUCTION)` on `AlertCondition`.
|
||
|
||
### Task 3: Enums + `AlertScope`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/AlertSeverity.java`, `AlertState.java`, `ConditionKind.java`, `TargetKind.java`, `NotificationStatus.java`, `RouteMetric.java`, `Comparator.java`, `AggregationOp.java`, `FireMode.java`, `AlertScope.java`
|
||
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test**
|
||
|
||
```java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import org.junit.jupiter.api.Test;
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class AlertScopeTest {
|
||
|
||
@Test
|
||
void allFieldsNullIsEnvWide() {
|
||
var s = new AlertScope(null, null, null);
|
||
assertThat(s.isEnvWide()).isTrue();
|
||
}
|
||
|
||
@Test
|
||
void appScoped() {
|
||
var s = new AlertScope("orders", null, null);
|
||
assertThat(s.isEnvWide()).isFalse();
|
||
assertThat(s.appSlug()).isEqualTo("orders");
|
||
}
|
||
|
||
@Test
|
||
void enumsHaveExpectedValues() {
|
||
assertThat(AlertSeverity.values()).containsExactly(
|
||
AlertSeverity.CRITICAL, AlertSeverity.WARNING, AlertSeverity.INFO);
|
||
assertThat(AlertState.values()).containsExactly(
|
||
AlertState.PENDING, AlertState.FIRING, AlertState.ACKNOWLEDGED, AlertState.RESOLVED);
|
||
assertThat(ConditionKind.values()).hasSize(6);
|
||
assertThat(TargetKind.values()).containsExactly(
|
||
TargetKind.USER, TargetKind.GROUP, TargetKind.ROLE);
|
||
assertThat(NotificationStatus.values()).containsExactly(
|
||
NotificationStatus.PENDING, NotificationStatus.DELIVERED, NotificationStatus.FAILED);
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL** (`cannot find symbol`).
|
||
|
||
Run: `mvn -pl cameleer-server-core test -Dtest=AlertScopeTest`
|
||
|
||
- [ ] **Step 3: Create the files**
|
||
|
||
```java
|
||
// AlertSeverity.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum AlertSeverity { CRITICAL, WARNING, INFO }
|
||
|
||
// AlertState.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum AlertState { PENDING, FIRING, ACKNOWLEDGED, RESOLVED }
|
||
|
||
// ConditionKind.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum ConditionKind { ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC }
|
||
|
||
// TargetKind.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum TargetKind { USER, GROUP, ROLE }
|
||
|
||
// NotificationStatus.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum NotificationStatus { PENDING, DELIVERED, FAILED }
|
||
|
||
// RouteMetric.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum RouteMetric { ERROR_RATE, P95_LATENCY_MS, P99_LATENCY_MS, THROUGHPUT, ERROR_COUNT }
|
||
|
||
// Comparator.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum Comparator { GT, GTE, LT, LTE, EQ }
|
||
|
||
// AggregationOp.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum AggregationOp { MAX, MIN, AVG, LATEST }
|
||
|
||
// FireMode.java
|
||
package com.cameleer.server.core.alerting;
|
||
public enum FireMode { PER_EXCHANGE, COUNT_IN_WINDOW }
|
||
|
||
// AlertScope.java
|
||
package com.cameleer.server.core.alerting;
|
||
public record AlertScope(String appSlug, String routeId, String agentId) {
|
||
public boolean isEnvWide() { return appSlug == null && routeId == null && agentId == null; }
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
|
||
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java
|
||
git commit -m "feat(alerting): core enums + AlertScope"
|
||
```
|
||
|
||
### Task 4: `AlertCondition` sealed hierarchy + Jackson polymorphism
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/AlertCondition.java`, `RouteMetricCondition.java`, `ExchangeMatchCondition.java` (with nested `ExchangeFilter`), `AgentStateCondition.java`, `DeploymentStateCondition.java`, `LogPatternCondition.java`, `JvmMetricCondition.java`
|
||
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test**
|
||
|
||
```java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||
import org.junit.jupiter.api.Test;
|
||
import java.util.List;
|
||
import java.util.Map;
|
||
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class AlertConditionJsonTest {
|
||
|
||
private final ObjectMapper om = new ObjectMapper();
|
||
|
||
@Test
|
||
void roundtripRouteMetric() throws Exception {
|
||
var c = new RouteMetricCondition(
|
||
new AlertScope("orders", "route-1", null),
|
||
RouteMetric.P99_LATENCY_MS, Comparator.GT, 2000.0, 300);
|
||
String json = om.writeValueAsString((AlertCondition) c);
|
||
AlertCondition parsed = om.readValue(json, AlertCondition.class);
|
||
assertThat(parsed).isInstanceOf(RouteMetricCondition.class);
|
||
assertThat(parsed.kind()).isEqualTo(ConditionKind.ROUTE_METRIC);
|
||
}
|
||
|
||
@Test
|
||
void roundtripExchangeMatchPerExchange() throws Exception {
|
||
var c = new ExchangeMatchCondition(
|
||
new AlertScope("orders", null, null),
|
||
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of("type","payment")),
|
||
FireMode.PER_EXCHANGE, null, null, 300);
|
||
String json = om.writeValueAsString((AlertCondition) c);
|
||
AlertCondition parsed = om.readValue(json, AlertCondition.class);
|
||
assertThat(parsed).isInstanceOf(ExchangeMatchCondition.class);
|
||
}
|
||
|
||
@Test
|
||
void roundtripExchangeMatchCountInWindow() throws Exception {
|
||
var c = new ExchangeMatchCondition(
|
||
new AlertScope("orders", null, null),
|
||
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
|
||
FireMode.COUNT_IN_WINDOW, 5, 900, null);
|
||
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
|
||
assertThat(((ExchangeMatchCondition) parsed).threshold()).isEqualTo(5);
|
||
}
|
||
|
||
@Test
|
||
void roundtripAgentState() throws Exception {
|
||
var c = new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60);
|
||
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
|
||
assertThat(parsed).isInstanceOf(AgentStateCondition.class);
|
||
}
|
||
|
||
@Test
|
||
void roundtripDeploymentState() throws Exception {
|
||
var c = new DeploymentStateCondition(new AlertScope("orders", null, null), List.of("FAILED","DEGRADED"));
|
||
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
|
||
assertThat(parsed).isInstanceOf(DeploymentStateCondition.class);
|
||
}
|
||
|
||
@Test
|
||
void roundtripLogPattern() throws Exception {
|
||
var c = new LogPatternCondition(new AlertScope("orders", null, null),
|
||
"ERROR", "TimeoutException", 5, 900);
|
||
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
|
||
assertThat(parsed).isInstanceOf(LogPatternCondition.class);
|
||
}
|
||
|
||
@Test
|
||
void roundtripJvmMetric() throws Exception {
|
||
var c = new JvmMetricCondition(new AlertScope("orders", null, null),
|
||
"heap_used_percent", AggregationOp.MAX, Comparator.GT, 90.0, 300);
|
||
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
|
||
assertThat(parsed).isInstanceOf(JvmMetricCondition.class);
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Create the sealed hierarchy**
|
||
|
||
```java
|
||
// AlertCondition.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import com.fasterxml.jackson.annotation.JsonSubTypes;
|
||
import com.fasterxml.jackson.annotation.JsonTypeInfo;
|
||
|
||
@JsonTypeInfo(use = JsonTypeInfo.Id.DEDUCTION)
|
||
@JsonSubTypes({
|
||
@JsonSubTypes.Type(RouteMetricCondition.class),
|
||
@JsonSubTypes.Type(ExchangeMatchCondition.class),
|
||
@JsonSubTypes.Type(AgentStateCondition.class),
|
||
@JsonSubTypes.Type(DeploymentStateCondition.class),
|
||
@JsonSubTypes.Type(LogPatternCondition.class),
|
||
@JsonSubTypes.Type(JvmMetricCondition.class)
|
||
})
|
||
public sealed interface AlertCondition permits
|
||
RouteMetricCondition, ExchangeMatchCondition, AgentStateCondition,
|
||
DeploymentStateCondition, LogPatternCondition, JvmMetricCondition {
|
||
|
||
ConditionKind kind();
|
||
AlertScope scope();
|
||
}
|
||
```
|
||
|
||
```java
|
||
// RouteMetricCondition.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
public record RouteMetricCondition(
|
||
AlertScope scope,
|
||
RouteMetric metric,
|
||
Comparator comparator,
|
||
double threshold,
|
||
int windowSeconds) implements AlertCondition {
|
||
@Override public ConditionKind kind() { return ConditionKind.ROUTE_METRIC; }
|
||
}
|
||
```
|
||
|
||
```java
|
||
// ExchangeMatchCondition.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.util.Map;
|
||
|
||
public record ExchangeMatchCondition(
|
||
AlertScope scope,
|
||
ExchangeFilter filter,
|
||
FireMode fireMode,
|
||
Integer threshold, // required when COUNT_IN_WINDOW; null for PER_EXCHANGE
|
||
Integer windowSeconds, // required when COUNT_IN_WINDOW
|
||
Integer perExchangeLingerSeconds // required when PER_EXCHANGE
|
||
) implements AlertCondition {
|
||
|
||
public ExchangeMatchCondition {
|
||
if (fireMode == FireMode.COUNT_IN_WINDOW && (threshold == null || windowSeconds == null))
|
||
throw new IllegalArgumentException("COUNT_IN_WINDOW requires threshold + windowSeconds");
|
||
if (fireMode == FireMode.PER_EXCHANGE && perExchangeLingerSeconds == null)
|
||
throw new IllegalArgumentException("PER_EXCHANGE requires perExchangeLingerSeconds");
|
||
}
|
||
|
||
@Override public ConditionKind kind() { return ConditionKind.EXCHANGE_MATCH; }
|
||
|
||
public record ExchangeFilter(String status, Map<String, String> attributes) {
|
||
public ExchangeFilter { attributes = attributes == null ? Map.of() : Map.copyOf(attributes); }
|
||
}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AgentStateCondition.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
public record AgentStateCondition(AlertScope scope, String state, int forSeconds) implements AlertCondition {
|
||
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
|
||
}
|
||
```
|
||
|
||
```java
|
||
// DeploymentStateCondition.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.util.List;
|
||
|
||
public record DeploymentStateCondition(AlertScope scope, List<String> states) implements AlertCondition {
|
||
public DeploymentStateCondition { states = List.copyOf(states); }
|
||
@Override public ConditionKind kind() { return ConditionKind.DEPLOYMENT_STATE; }
|
||
}
|
||
```
|
||
|
||
```java
|
||
// LogPatternCondition.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
public record LogPatternCondition(
|
||
AlertScope scope,
|
||
String level,
|
||
String pattern,
|
||
int threshold,
|
||
int windowSeconds) implements AlertCondition {
|
||
@Override public ConditionKind kind() { return ConditionKind.LOG_PATTERN; }
|
||
}
|
||
```
|
||
|
||
```java
|
||
// JvmMetricCondition.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
public record JvmMetricCondition(
|
||
AlertScope scope,
|
||
String metric,
|
||
AggregationOp aggregation,
|
||
Comparator comparator,
|
||
double threshold,
|
||
int windowSeconds) implements AlertCondition {
|
||
@Override public ConditionKind kind() { return ConditionKind.JVM_METRIC; }
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
|
||
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java
|
||
git commit -m "feat(alerting): sealed AlertCondition hierarchy with Jackson deduction"
|
||
```
|
||
|
||
### Task 5: Core data records (`AlertRule`, `AlertInstance`, `AlertSilence`, `SilenceMatcher`, `AlertRuleTarget`, `AlertNotification`, `WebhookBinding`)
|
||
|
||
**Files:**
|
||
- Create: the seven records above under `.../alerting/`
|
||
- Test: `cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test**
|
||
|
||
```java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import org.junit.jupiter.api.Test;
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Map;
|
||
import java.util.UUID;
|
||
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class AlertDomainRecordsTest {
|
||
|
||
@Test
|
||
void alertRuleDefensiveCopy() {
|
||
var webhooks = new java.util.ArrayList<WebhookBinding>();
|
||
webhooks.add(new WebhookBinding(UUID.randomUUID(), UUID.randomUUID(), null, null));
|
||
var r = newRule(webhooks);
|
||
webhooks.clear();
|
||
assertThat(r.webhooks()).hasSize(1);
|
||
}
|
||
|
||
@Test
|
||
void silenceMatcherAllFieldsNullMatchesEverything() {
|
||
var m = new SilenceMatcher(null, null, null, null, null);
|
||
assertThat(m.isWildcard()).isTrue();
|
||
}
|
||
|
||
private AlertRule newRule(List<WebhookBinding> wh) {
|
||
return new AlertRule(
|
||
UUID.randomUUID(), UUID.randomUUID(), "r", null,
|
||
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
|
||
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
|
||
60, 0, 60, "t", "m", wh, List.of(),
|
||
Instant.now(), null, null, Map.of(),
|
||
Instant.now(), "u1", Instant.now(), "u1");
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Create the records**
|
||
|
||
```java
|
||
// AlertRule.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Map;
|
||
import java.util.UUID;
|
||
|
||
public record AlertRule(
|
||
UUID id,
|
||
UUID environmentId,
|
||
String name,
|
||
String description,
|
||
AlertSeverity severity,
|
||
boolean enabled,
|
||
ConditionKind conditionKind,
|
||
AlertCondition condition,
|
||
int evaluationIntervalSeconds,
|
||
int forDurationSeconds,
|
||
int reNotifyMinutes,
|
||
String notificationTitleTmpl,
|
||
String notificationMessageTmpl,
|
||
List<WebhookBinding> webhooks,
|
||
List<AlertRuleTarget> targets,
|
||
Instant nextEvaluationAt,
|
||
String claimedBy,
|
||
Instant claimedUntil,
|
||
Map<String, Object> evalState,
|
||
Instant createdAt,
|
||
String createdBy,
|
||
Instant updatedAt,
|
||
String updatedBy) {
|
||
|
||
public AlertRule {
|
||
webhooks = webhooks == null ? List.of() : List.copyOf(webhooks);
|
||
targets = targets == null ? List.of() : List.copyOf(targets);
|
||
evalState = evalState == null ? Map.of() : Map.copyOf(evalState);
|
||
}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AlertInstance.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Map;
|
||
import java.util.UUID;
|
||
|
||
public record AlertInstance(
|
||
UUID id,
|
||
UUID ruleId, // nullable after rule deletion
|
||
Map<String, Object> ruleSnapshot,
|
||
UUID environmentId,
|
||
AlertState state,
|
||
AlertSeverity severity,
|
||
Instant firedAt,
|
||
Instant ackedAt,
|
||
String ackedBy,
|
||
Instant resolvedAt,
|
||
Instant lastNotifiedAt,
|
||
boolean silenced,
|
||
Double currentValue,
|
||
Double threshold,
|
||
Map<String, Object> context,
|
||
String title,
|
||
String message,
|
||
List<String> targetUserIds,
|
||
List<UUID> targetGroupIds,
|
||
List<String> targetRoleNames) {
|
||
|
||
public AlertInstance {
|
||
ruleSnapshot = ruleSnapshot == null ? Map.of() : Map.copyOf(ruleSnapshot);
|
||
context = context == null ? Map.of() : Map.copyOf(context);
|
||
targetUserIds = targetUserIds == null ? List.of() : List.copyOf(targetUserIds);
|
||
targetGroupIds = targetGroupIds == null ? List.of() : List.copyOf(targetGroupIds);
|
||
targetRoleNames = targetRoleNames == null ? List.of() : List.copyOf(targetRoleNames);
|
||
}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AlertRuleTarget.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.util.UUID;
|
||
|
||
public record AlertRuleTarget(UUID id, UUID ruleId, TargetKind kind, String targetId) {}
|
||
```
|
||
|
||
```java
|
||
// WebhookBinding.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.util.Map;
|
||
import java.util.UUID;
|
||
|
||
public record WebhookBinding(
|
||
UUID id,
|
||
UUID outboundConnectionId,
|
||
String bodyOverride,
|
||
Map<String, String> headerOverrides) {
|
||
|
||
public WebhookBinding {
|
||
headerOverrides = headerOverrides == null ? Map.of() : Map.copyOf(headerOverrides);
|
||
}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// SilenceMatcher.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.util.UUID;
|
||
|
||
public record SilenceMatcher(
|
||
UUID ruleId, String appSlug, String routeId, String agentId, AlertSeverity severity) {
|
||
|
||
public boolean isWildcard() {
|
||
return ruleId == null && appSlug == null && routeId == null && agentId == null && severity == null;
|
||
}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AlertSilence.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.UUID;
|
||
|
||
public record AlertSilence(
|
||
UUID id,
|
||
UUID environmentId,
|
||
SilenceMatcher matcher,
|
||
String reason,
|
||
Instant startsAt,
|
||
Instant endsAt,
|
||
String createdBy,
|
||
Instant createdAt) {}
|
||
```
|
||
|
||
```java
|
||
// AlertNotification.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.Map;
|
||
import java.util.UUID;
|
||
|
||
public record AlertNotification(
|
||
UUID id,
|
||
UUID alertInstanceId,
|
||
UUID webhookId,
|
||
UUID outboundConnectionId,
|
||
NotificationStatus status,
|
||
int attempts,
|
||
Instant nextAttemptAt,
|
||
String claimedBy,
|
||
Instant claimedUntil,
|
||
Integer lastResponseStatus,
|
||
String lastResponseSnippet,
|
||
Map<String, Object> payload,
|
||
Instant deliveredAt,
|
||
Instant createdAt) {
|
||
|
||
public AlertNotification {
|
||
payload = payload == null ? Map.of() : Map.copyOf(payload);
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
|
||
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java
|
||
git commit -m "feat(alerting): core domain records (rule, instance, silence, notification)"
|
||
```
|
||
|
||
### Task 6: Repository interfaces
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/AlertRuleRepository.java`, `AlertInstanceRepository.java`, `AlertSilenceRepository.java`, `AlertNotificationRepository.java`, `AlertReadRepository.java`
|
||
- No test (pure interfaces — covered by the Phase 3 integration tests).
|
||
|
||
- [ ] **Step 1: Create the interfaces**
|
||
|
||
```java
|
||
// AlertRuleRepository.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.util.List;
|
||
import java.util.Optional;
|
||
import java.util.UUID;
|
||
|
||
public interface AlertRuleRepository {
|
||
AlertRule save(AlertRule rule); // upsert by id
|
||
Optional<AlertRule> findById(UUID id);
|
||
List<AlertRule> listByEnvironment(UUID environmentId);
|
||
List<AlertRule> findAllByOutboundConnectionId(UUID connectionId);
|
||
List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId); // used by rulesReferencing()
|
||
void delete(UUID id);
|
||
|
||
/** Claim up to batchSize rules whose next_evaluation_at <= now AND (claimed_until IS NULL OR claimed_until < now).
|
||
* Atomically sets claimed_by + claimed_until = now + ttl. Returns claimed rules. */
|
||
List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds);
|
||
|
||
/** Release claim + bump next_evaluation_at. */
|
||
void releaseClaim(UUID ruleId, java.time.Instant nextEvaluationAt,
|
||
java.util.Map<String, Object> evalState);
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AlertInstanceRepository.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Optional;
|
||
import java.util.UUID;
|
||
|
||
public interface AlertInstanceRepository {
|
||
AlertInstance save(AlertInstance instance); // upsert by id
|
||
Optional<AlertInstance> findById(UUID id);
|
||
Optional<AlertInstance> findOpenForRule(UUID ruleId); // state IN ('PENDING','FIRING','ACKNOWLEDGED')
|
||
List<AlertInstance> listForInbox(UUID environmentId,
|
||
List<String> userGroupIdFilter, // UUIDs as String? decide impl-side
|
||
String userId,
|
||
List<String> userRoleNames,
|
||
int limit);
|
||
long countUnreadForUser(UUID environmentId, String userId);
|
||
void ack(UUID id, String userId, Instant when);
|
||
void resolve(UUID id, Instant when);
|
||
void markSilenced(UUID id, boolean silenced);
|
||
void deleteResolvedBefore(Instant cutoff);
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AlertSilenceRepository.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Optional;
|
||
import java.util.UUID;
|
||
|
||
public interface AlertSilenceRepository {
|
||
AlertSilence save(AlertSilence silence);
|
||
Optional<AlertSilence> findById(UUID id);
|
||
List<AlertSilence> listActive(UUID environmentId, Instant when);
|
||
List<AlertSilence> listByEnvironment(UUID environmentId);
|
||
void delete(UUID id);
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AlertNotificationRepository.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Optional;
|
||
import java.util.UUID;
|
||
|
||
public interface AlertNotificationRepository {
|
||
AlertNotification save(AlertNotification n);
|
||
Optional<AlertNotification> findById(UUID id);
|
||
List<AlertNotification> listForInstance(UUID alertInstanceId);
|
||
List<AlertNotification> claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds);
|
||
void markDelivered(UUID id, int status, String snippet, Instant when);
|
||
void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet);
|
||
void markFailed(UUID id, int status, String snippet);
|
||
void deleteSettledBefore(Instant cutoff);
|
||
}
|
||
```
|
||
|
||
```java
|
||
// AlertReadRepository.java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.util.List;
|
||
import java.util.UUID;
|
||
|
||
public interface AlertReadRepository {
|
||
void markRead(String userId, UUID alertInstanceId);
|
||
void bulkMarkRead(String userId, List<UUID> alertInstanceIds);
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Compile**
|
||
|
||
Run: `mvn -pl cameleer-server-core compile`
|
||
Expected: SUCCESS.
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/Alert*Repository.java
|
||
git commit -m "feat(alerting): core repository interfaces"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 3 — Postgres repositories
|
||
|
||
All repositories use `JdbcTemplate` and `ObjectMapper` for JSONB columns (same pattern as `PostgresOutboundConnectionRepository`). Convert UUID[] with `ConnectionCallback` + `Array.of("uuid", ...)` and text[] with `Array.of("text", ...)`.
|
||
|
||
### Task 7: `PostgresAlertRuleRepository`
|
||
|
||
**Files:**
|
||
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java`
|
||
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing integration test**
|
||
|
||
```java
|
||
package com.cameleer.server.app.alerting.storage;
|
||
|
||
import com.cameleer.server.app.AbstractPostgresIT;
|
||
import com.cameleer.server.core.alerting.*;
|
||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||
import org.junit.jupiter.api.AfterEach;
|
||
import org.junit.jupiter.api.Test;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Map;
|
||
import java.util.UUID;
|
||
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class PostgresAlertRuleRepositoryIT extends AbstractPostgresIT {
|
||
|
||
private PostgresAlertRuleRepository repo;
|
||
private UUID envId;
|
||
|
||
@AfterEach
|
||
void cleanup() {
|
||
jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId);
|
||
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
|
||
jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-user'");
|
||
}
|
||
|
||
@org.junit.jupiter.api.BeforeEach
|
||
void setup() {
|
||
repo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
|
||
envId = UUID.randomUUID();
|
||
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-env-" + UUID.randomUUID());
|
||
jdbcTemplate.update(
|
||
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
|
||
"VALUES ('test-user', 'test-user', 'x', 'a@b', true)");
|
||
}
|
||
|
||
@Test
|
||
void saveAndFindByIdRoundtrip() {
|
||
var rule = newRule(List.of());
|
||
repo.save(rule);
|
||
var found = repo.findById(rule.id()).orElseThrow();
|
||
assertThat(found.name()).isEqualTo(rule.name());
|
||
assertThat(found.condition()).isInstanceOf(AgentStateCondition.class);
|
||
}
|
||
|
||
@Test
|
||
void findRuleIdsByOutboundConnectionId() {
|
||
var connId = UUID.randomUUID();
|
||
var wb = new WebhookBinding(UUID.randomUUID(), connId, null, Map.of());
|
||
var rule = newRule(List.of(wb));
|
||
repo.save(rule);
|
||
|
||
List<UUID> ids = repo.findRuleIdsByOutboundConnectionId(connId);
|
||
assertThat(ids).containsExactly(rule.id());
|
||
|
||
assertThat(repo.findRuleIdsByOutboundConnectionId(UUID.randomUUID())).isEmpty();
|
||
}
|
||
|
||
@Test
|
||
void claimDueRulesAtomicSkipLocked() {
|
||
var rule = newRule(List.of());
|
||
repo.save(rule);
|
||
|
||
List<AlertRule> claimed = repo.claimDueRules("instance-A", 10, 30);
|
||
assertThat(claimed).hasSize(1);
|
||
|
||
// Second claimant sees nothing until first releases or TTL expires
|
||
List<AlertRule> second = repo.claimDueRules("instance-B", 10, 30);
|
||
assertThat(second).isEmpty();
|
||
}
|
||
|
||
private AlertRule newRule(List<WebhookBinding> webhooks) {
|
||
return new AlertRule(
|
||
UUID.randomUUID(), envId, "rule-" + UUID.randomUUID(), "desc",
|
||
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
|
||
new AgentStateCondition(new AlertScope(null, null, null), "DEAD", 60),
|
||
60, 0, 60, "t", "m", webhooks, List.of(),
|
||
Instant.now().minusSeconds(10), null, null, Map.of(),
|
||
Instant.now(), "test-user", Instant.now(), "test-user");
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement the repository**
|
||
|
||
```java
|
||
package com.cameleer.server.app.alerting.storage;
|
||
|
||
import com.cameleer.server.core.alerting.*;
|
||
import com.fasterxml.jackson.core.type.TypeReference;
|
||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||
import org.postgresql.util.PGobject;
|
||
import org.springframework.jdbc.core.JdbcTemplate;
|
||
import org.springframework.jdbc.core.RowMapper;
|
||
|
||
import java.sql.PreparedStatement;
|
||
import java.sql.SQLException;
|
||
import java.sql.Timestamp;
|
||
import java.sql.Types;
|
||
import java.time.Instant;
|
||
import java.util.*;
|
||
|
||
public class PostgresAlertRuleRepository implements AlertRuleRepository {
|
||
|
||
private final JdbcTemplate jdbc;
|
||
private final ObjectMapper om;
|
||
|
||
public PostgresAlertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
|
||
this.jdbc = jdbc;
|
||
this.om = om;
|
||
}
|
||
|
||
@Override
|
||
public AlertRule save(AlertRule r) {
|
||
String sql = """
|
||
INSERT INTO alert_rules (id, environment_id, name, description, severity, enabled,
|
||
condition_kind, condition, evaluation_interval_seconds, for_duration_seconds,
|
||
re_notify_minutes, notification_title_tmpl, notification_message_tmpl,
|
||
webhooks, next_evaluation_at, claimed_by, claimed_until, eval_state,
|
||
created_at, created_by, updated_at, updated_by)
|
||
VALUES (?, ?, ?, ?, ?::severity_enum, ?, ?::condition_kind_enum, ?::jsonb, ?, ?, ?, ?, ?, ?::jsonb,
|
||
?, ?, ?, ?::jsonb, ?, ?, ?, ?)
|
||
ON CONFLICT (id) DO UPDATE SET
|
||
name = EXCLUDED.name, description = EXCLUDED.description,
|
||
severity = EXCLUDED.severity, enabled = EXCLUDED.enabled,
|
||
condition_kind = EXCLUDED.condition_kind, condition = EXCLUDED.condition,
|
||
evaluation_interval_seconds = EXCLUDED.evaluation_interval_seconds,
|
||
for_duration_seconds = EXCLUDED.for_duration_seconds,
|
||
re_notify_minutes = EXCLUDED.re_notify_minutes,
|
||
notification_title_tmpl = EXCLUDED.notification_title_tmpl,
|
||
notification_message_tmpl = EXCLUDED.notification_message_tmpl,
|
||
webhooks = EXCLUDED.webhooks, eval_state = EXCLUDED.eval_state,
|
||
updated_at = EXCLUDED.updated_at, updated_by = EXCLUDED.updated_by
|
||
""";
|
||
jdbc.update(sql,
|
||
r.id(), r.environmentId(), r.name(), r.description(),
|
||
r.severity().name(), r.enabled(), r.conditionKind().name(),
|
||
writeJson(r.condition()),
|
||
r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
|
||
r.notificationTitleTmpl(), r.notificationMessageTmpl(),
|
||
writeJson(r.webhooks()),
|
||
Timestamp.from(r.nextEvaluationAt()),
|
||
r.claimedBy(),
|
||
r.claimedUntil() == null ? null : Timestamp.from(r.claimedUntil()),
|
||
writeJson(r.evalState()),
|
||
Timestamp.from(r.createdAt()), r.createdBy(),
|
||
Timestamp.from(r.updatedAt()), r.updatedBy());
|
||
return r;
|
||
}
|
||
|
||
@Override
|
||
public Optional<AlertRule> findById(UUID id) {
|
||
var list = jdbc.query("SELECT * FROM alert_rules WHERE id = ?", rowMapper(), id);
|
||
return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
|
||
}
|
||
|
||
@Override
|
||
public List<AlertRule> listByEnvironment(UUID environmentId) {
|
||
return jdbc.query(
|
||
"SELECT * FROM alert_rules WHERE environment_id = ? ORDER BY created_at DESC",
|
||
rowMapper(), environmentId);
|
||
}
|
||
|
||
@Override
|
||
public List<AlertRule> findAllByOutboundConnectionId(UUID connectionId) {
|
||
String sql = """
|
||
SELECT * FROM alert_rules
|
||
WHERE webhooks @> ?::jsonb
|
||
ORDER BY created_at DESC
|
||
""";
|
||
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
|
||
return jdbc.query(sql, rowMapper(), predicate);
|
||
}
|
||
|
||
@Override
|
||
public List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId) {
|
||
String sql = """
|
||
SELECT id FROM alert_rules
|
||
WHERE webhooks @> ?::jsonb
|
||
""";
|
||
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
|
||
return jdbc.queryForList(sql, UUID.class, predicate);
|
||
}
|
||
|
||
@Override
|
||
public void delete(UUID id) {
|
||
jdbc.update("DELETE FROM alert_rules WHERE id = ?", id);
|
||
}
|
||
|
||
@Override
|
||
public List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds) {
|
||
String sql = """
|
||
UPDATE alert_rules
|
||
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
|
||
WHERE id IN (
|
||
SELECT id FROM alert_rules
|
||
WHERE enabled = true
|
||
AND next_evaluation_at <= now()
|
||
AND (claimed_until IS NULL OR claimed_until < now())
|
||
ORDER BY next_evaluation_at
|
||
LIMIT ?
|
||
FOR UPDATE SKIP LOCKED
|
||
)
|
||
RETURNING *
|
||
""";
|
||
return jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize);
|
||
}
|
||
|
||
@Override
|
||
public void releaseClaim(UUID ruleId, Instant nextEvaluationAt, Map<String, Object> evalState) {
|
||
jdbc.update("""
|
||
UPDATE alert_rules
|
||
SET claimed_by = NULL, claimed_until = NULL,
|
||
next_evaluation_at = ?, eval_state = ?::jsonb
|
||
WHERE id = ?
|
||
""",
|
||
Timestamp.from(nextEvaluationAt), writeJson(evalState), ruleId);
|
||
}
|
||
|
||
private RowMapper<AlertRule> rowMapper() {
|
||
return (rs, i) -> {
|
||
ConditionKind kind = ConditionKind.valueOf(rs.getString("condition_kind"));
|
||
AlertCondition cond = om.readValue(rs.getString("condition"), AlertCondition.class);
|
||
List<WebhookBinding> webhooks = om.readValue(
|
||
rs.getString("webhooks"), new TypeReference<>() {});
|
||
Map<String, Object> evalState = om.readValue(
|
||
rs.getString("eval_state"), new TypeReference<>() {});
|
||
|
||
Timestamp cu = rs.getTimestamp("claimed_until");
|
||
return new AlertRule(
|
||
(UUID) rs.getObject("id"),
|
||
(UUID) rs.getObject("environment_id"),
|
||
rs.getString("name"),
|
||
rs.getString("description"),
|
||
AlertSeverity.valueOf(rs.getString("severity")),
|
||
rs.getBoolean("enabled"),
|
||
kind, cond,
|
||
rs.getInt("evaluation_interval_seconds"),
|
||
rs.getInt("for_duration_seconds"),
|
||
rs.getInt("re_notify_minutes"),
|
||
rs.getString("notification_title_tmpl"),
|
||
rs.getString("notification_message_tmpl"),
|
||
webhooks, List.of(),
|
||
rs.getTimestamp("next_evaluation_at").toInstant(),
|
||
rs.getString("claimed_by"),
|
||
cu == null ? null : cu.toInstant(),
|
||
evalState,
|
||
rs.getTimestamp("created_at").toInstant(),
|
||
rs.getString("created_by"),
|
||
rs.getTimestamp("updated_at").toInstant(),
|
||
rs.getString("updated_by"));
|
||
};
|
||
}
|
||
|
||
private String writeJson(Object o) {
|
||
try { return om.writeValueAsString(o); }
|
||
catch (Exception e) { throw new IllegalStateException(e); }
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java
|
||
git commit -m "feat(alerting): Postgres repository for alert_rules"
|
||
```
|
||
|
||
### Task 8: Wire `OutboundConnectionServiceImpl.rulesReferencing()` (CRITICAL — Plan 01 gate)
|
||
|
||
> **This is the Plan 01 known-incomplete item.** Plan 01 shipped `rulesReferencing()` returning `[]`. Until this task lands, outbound connections can be deleted or narrowed while rules reference them, corrupting production. **Do not skip or defer.**
|
||
|
||
**Files:**
|
||
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java`
|
||
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java`
|
||
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java`
|
||
|
||
- [ ] **Step 1: GitNexus impact check**
|
||
|
||
Run `gitnexus_impact({target: "OutboundConnectionServiceImpl", direction: "upstream"})`. Report blast radius. Expected: controller + bean config + UI hooks (Plan 01). No production paths should be affected by replacing a stub with real behaviour.
|
||
|
||
- [ ] **Step 2: Write the failing integration test**
|
||
|
||
```java
|
||
package com.cameleer.server.app.outbound;
|
||
|
||
import com.cameleer.server.app.AbstractPostgresIT;
|
||
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
|
||
import com.cameleer.server.core.alerting.*;
|
||
import com.cameleer.server.core.outbound.*;
|
||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||
import org.junit.jupiter.api.BeforeEach;
|
||
import org.junit.jupiter.api.Test;
|
||
import org.springframework.beans.factory.annotation.Autowired;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
import java.util.Map;
|
||
import java.util.UUID;
|
||
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
import static org.assertj.core.api.Assertions.assertThatThrownBy;
|
||
|
||
class OutboundConnectionServiceRulesReferencingIT extends AbstractPostgresIT {
|
||
|
||
@Autowired OutboundConnectionService service;
|
||
@Autowired OutboundConnectionRepository repo;
|
||
|
||
private UUID envId;
|
||
private UUID connId;
|
||
private PostgresAlertRuleRepository ruleRepo;
|
||
|
||
@BeforeEach
|
||
void seed() {
|
||
ruleRepo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
|
||
envId = UUID.randomUUID();
|
||
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "env-" + UUID.randomUUID());
|
||
jdbcTemplate.update(
|
||
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
|
||
"VALUES ('u-ref', 'u-ref', 'x', 'a@b', true) ON CONFLICT DO NOTHING");
|
||
var c = repo.save(new OutboundConnection(
|
||
UUID.randomUUID(), "default", "conn", null, "https://example.test",
|
||
OutboundMethod.POST, Map.of(), null, TrustMode.SYSTEM_DEFAULT, List.of(), null,
|
||
OutboundAuth.None.INSTANCE, List.of(),
|
||
Instant.now(), "u-ref", Instant.now(), "u-ref"));
|
||
connId = c.id();
|
||
|
||
var rule = new AlertRule(
|
||
UUID.randomUUID(), envId, "r", null, AlertSeverity.WARNING, true,
|
||
ConditionKind.AGENT_STATE,
|
||
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
|
||
60, 0, 60, "t", "m",
|
||
List.of(new WebhookBinding(UUID.randomUUID(), connId, null, Map.of())),
|
||
List.of(), Instant.now(), null, null, Map.of(),
|
||
Instant.now(), "u-ref", Instant.now(), "u-ref");
|
||
ruleRepo.save(rule);
|
||
}
|
||
|
||
@Test
|
||
void deleteConnectionReferencedByRuleReturns409() {
|
||
assertThat(service.rulesReferencing(connId)).hasSize(1);
|
||
assertThatThrownBy(() -> service.delete(connId, "u-ref"))
|
||
.hasMessageContaining("referenced by rules");
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 3: Run — FAIL** (stub returns empty list, so delete succeeds).
|
||
|
||
- [ ] **Step 4: Replace the stub**
|
||
|
||
In `OutboundConnectionServiceImpl.java`:
|
||
|
||
```java
|
||
// existing imports + add:
|
||
import com.cameleer.server.core.alerting.AlertRuleRepository;
|
||
|
||
public class OutboundConnectionServiceImpl implements OutboundConnectionService {
|
||
|
||
private final OutboundConnectionRepository repo;
|
||
private final AlertRuleRepository ruleRepo; // NEW
|
||
private final String tenantId;
|
||
|
||
public OutboundConnectionServiceImpl(
|
||
OutboundConnectionRepository repo,
|
||
AlertRuleRepository ruleRepo,
|
||
String tenantId) {
|
||
this.repo = repo;
|
||
this.ruleRepo = ruleRepo;
|
||
this.tenantId = tenantId;
|
||
}
|
||
|
||
// … create/update/delete/get/list unchanged …
|
||
|
||
@Override
|
||
public List<UUID> rulesReferencing(UUID id) {
|
||
return ruleRepo.findRuleIdsByOutboundConnectionId(id);
|
||
}
|
||
}
|
||
```
|
||
|
||
Update `OutboundBeanConfig.java` to inject `AlertRuleRepository`:
|
||
|
||
```java
|
||
@Bean
|
||
public OutboundConnectionService outboundConnectionService(
|
||
OutboundConnectionRepository repo,
|
||
AlertRuleRepository ruleRepo,
|
||
@Value("${cameleer.server.tenant.id:default}") String tenantId) {
|
||
return new OutboundConnectionServiceImpl(repo, ruleRepo, tenantId);
|
||
}
|
||
```
|
||
|
||
Add the `AlertRuleRepository` bean in a new `AlertingBeanConfig.java` stub (completed in Phase 7):
|
||
|
||
```java
|
||
package com.cameleer.server.app.alerting.config;
|
||
|
||
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
|
||
import com.cameleer.server.core.alerting.AlertRuleRepository;
|
||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||
import org.springframework.context.annotation.Bean;
|
||
import org.springframework.context.annotation.Configuration;
|
||
import org.springframework.jdbc.core.JdbcTemplate;
|
||
|
||
@Configuration
|
||
public class AlertingBeanConfig {
|
||
@Bean
|
||
public AlertRuleRepository alertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
|
||
return new PostgresAlertRuleRepository(jdbc, om);
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 5: Run — PASS**.
|
||
|
||
- [ ] **Step 6: GitNexus detect_changes + commit**
|
||
|
||
```bash
|
||
# Verify scope
|
||
# gitnexus_detect_changes({scope: "staged"})
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java \
|
||
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java \
|
||
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java
|
||
git commit -m "fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)"
|
||
```
|
||
|
||
### Task 9: `PostgresAlertInstanceRepository`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/storage/PostgresAlertInstanceRepository.java`
|
||
- Test: `.../alerting/storage/PostgresAlertInstanceRepositoryIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** covering: save/findById, findOpenForRule (filter `state IN ('PENDING','FIRING','ACKNOWLEDGED')`), listForInbox with user/group/role filters (seed 3 instances: one targeting user, one targeting group, one targeting role; assert listForInbox returns all three for a user in those groups/roles), countUnreadForUser (uses LEFT JOIN `alert_reads`), ack, resolve, deleteResolvedBefore.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — same RowMapper pattern as Task 7. Key queries:
|
||
|
||
```sql
|
||
-- findOpenForRule
|
||
SELECT * FROM alert_instances
|
||
WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED')
|
||
ORDER BY fired_at DESC LIMIT 1;
|
||
|
||
-- listForInbox (bind userId, groupIds array, roleNames array as ? placeholders)
|
||
SELECT * FROM alert_instances
|
||
WHERE environment_id = ?
|
||
AND state IN ('FIRING','ACKNOWLEDGED','RESOLVED')
|
||
AND (
|
||
? = ANY(target_user_ids)
|
||
OR target_group_ids && ?::uuid[]
|
||
OR target_role_names && ?::text[]
|
||
)
|
||
ORDER BY fired_at DESC LIMIT ?;
|
||
|
||
-- countUnreadForUser
|
||
SELECT count(*) FROM alert_instances ai
|
||
WHERE ai.environment_id = ?
|
||
AND ai.state IN ('FIRING','ACKNOWLEDGED')
|
||
AND (
|
||
? = ANY(ai.target_user_ids)
|
||
OR ai.target_group_ids && ?::uuid[]
|
||
OR ai.target_role_names && ?::text[]
|
||
)
|
||
AND NOT EXISTS (
|
||
SELECT 1 FROM alert_reads ar
|
||
WHERE ar.alert_instance_id = ai.id AND ar.user_id = ?
|
||
);
|
||
```
|
||
|
||
Array binding via `connection.createArrayOf("uuid", uuids)` / `createArrayOf("text", names)` inside a `ConnectionCallback`.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepositoryIT.java
|
||
git commit -m "feat(alerting): Postgres repository for alert_instances with inbox queries"
|
||
```
|
||
|
||
### Task 10: `PostgresAlertSilenceRepository`, `PostgresAlertNotificationRepository`, `PostgresAlertReadRepository`
|
||
|
||
**Files:**
|
||
- Create: three repositories under `.../alerting/storage/`
|
||
- Test: one IT per repository in `.../alerting/storage/`
|
||
|
||
- [ ] **Step 1: Write all three failing ITs** (one file each). Cover:
|
||
- `Silence`: save/findById, listActive filters by `now BETWEEN starts_at AND ends_at`, delete.
|
||
- `Notification`: save/findById, claimDueNotifications (SKIP LOCKED), scheduleRetry bumps attempts + `next_attempt_at`, markDelivered + markFailed transition status, deleteSettledBefore purges `DELIVERED` + `FAILED`.
|
||
- `Read`: markRead is idempotent (uses `ON CONFLICT DO NOTHING`), bulkMarkRead handles empty list.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** following the same JdbcTemplate pattern. Notification claim query mirrors Task 7's rule claim:
|
||
|
||
```sql
|
||
UPDATE alert_notifications
|
||
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
|
||
WHERE id IN (
|
||
SELECT id FROM alert_notifications
|
||
WHERE status = 'PENDING'
|
||
AND next_attempt_at <= now()
|
||
AND (claimed_until IS NULL OR claimed_until < now())
|
||
ORDER BY next_attempt_at
|
||
LIMIT ?
|
||
FOR UPDATE SKIP LOCKED
|
||
)
|
||
RETURNING *;
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/ \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/Postgres*IT.java
|
||
git commit -m "feat(alerting): Postgres repositories for silences, notifications, reads"
|
||
```
|
||
|
||
### Task 11: Wire all alerting repositories in `AlertingBeanConfig`
|
||
|
||
**Files:**
|
||
- Modify: `.../alerting/config/AlertingBeanConfig.java`
|
||
|
||
- [ ] **Step 1: Add beans for the remaining repositories**
|
||
|
||
```java
|
||
@Bean public AlertInstanceRepository alertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) {
|
||
return new PostgresAlertInstanceRepository(jdbc, om);
|
||
}
|
||
@Bean public AlertSilenceRepository alertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) {
|
||
return new PostgresAlertSilenceRepository(jdbc, om);
|
||
}
|
||
@Bean public AlertNotificationRepository alertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) {
|
||
return new PostgresAlertNotificationRepository(jdbc, om);
|
||
}
|
||
@Bean public AlertReadRepository alertReadRepository(JdbcTemplate jdbc) {
|
||
return new PostgresAlertReadRepository(jdbc);
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Verify compile + existing ITs still pass**
|
||
|
||
```bash
|
||
mvn -pl cameleer-server-app test -Dtest='PostgresAlert*IT'
|
||
```
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java
|
||
git commit -m "feat(alerting): wire all alerting repository beans"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 4 — ClickHouse reads: new count methods and projections
|
||
|
||
### Task 12: Add `ClickHouseLogStore.countLogs(LogSearchRequest)`
|
||
|
||
**Files:**
|
||
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java`
|
||
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java`
|
||
|
||
- [ ] **Step 1: GitNexus impact check**
|
||
|
||
Run `gitnexus_impact({target: "ClickHouseLogStore", direction: "upstream"})`. Expected callers: `LogQueryController`, `ContainerLogForwarder`, `ClickHouseConfig`. Adding a method is non-breaking — no downstream callers affected.
|
||
|
||
- [ ] **Step 2: Write the failing test**
|
||
|
||
```java
|
||
package com.cameleer.server.app.search;
|
||
|
||
import com.cameleer.server.app.AbstractPostgresIT;
|
||
import com.cameleer.server.core.search.LogSearchRequest;
|
||
import org.junit.jupiter.api.Test;
|
||
import org.springframework.beans.factory.annotation.Autowired;
|
||
|
||
import java.time.Instant;
|
||
import java.util.List;
|
||
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class ClickHouseLogStoreCountIT extends AbstractPostgresIT {
|
||
|
||
@Autowired ClickHouseLogStore store;
|
||
|
||
@Test
|
||
void countLogsRespectsLevelPatternAndWindow() {
|
||
// Seed 3 ERROR TimeoutException + 2 INFO rows in 'orders' app for env 'dev' within last 5 min
|
||
// (seed helper uses existing `indexBatch` path)
|
||
long count = store.countLogs(new LogSearchRequest(
|
||
/* environment */ "dev",
|
||
/* application */ "orders",
|
||
/* agentId */ null,
|
||
/* exchangeId */ null,
|
||
/* logger */ null,
|
||
/* sources */ List.of(),
|
||
/* levels */ List.of("ERROR"),
|
||
/* q */ "TimeoutException",
|
||
/* from */ Instant.now().minusSeconds(300),
|
||
/* to */ Instant.now(),
|
||
/* cursor */ null,
|
||
/* limit */ 100,
|
||
/* sort */ "desc"
|
||
));
|
||
assertThat(count).isEqualTo(3);
|
||
}
|
||
}
|
||
```
|
||
|
||
(Adjust `LogSearchRequest` constructor to the actual record signature — check `cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java` for exact order.)
|
||
|
||
- [ ] **Step 3: Run — FAIL**.
|
||
|
||
- [ ] **Step 4: Implement the method**
|
||
|
||
In `ClickHouseLogStore.java`, add a new public method. Reuse the WHERE-clause builder already used by `search(LogSearchRequest)`, but:
|
||
- No `FINAL`.
|
||
- Skip cursor, limit, sort.
|
||
- `SELECT count() FROM logs WHERE <tenant + env + app + level IN (...) + logger + q LIKE + timestamp BETWEEN>`.
|
||
- Include the `tenant_id = ?` predicate.
|
||
|
||
```java
|
||
public long countLogs(LogSearchRequest request) {
|
||
StringBuilder where = new StringBuilder("tenant_id = ? AND timestamp BETWEEN ? AND ?");
|
||
List<Object> args = new ArrayList<>();
|
||
args.add(tenantId);
|
||
args.add(Timestamp.from(request.from()));
|
||
args.add(Timestamp.from(request.to()));
|
||
if (request.environment() != null) { where.append(" AND environment = ?"); args.add(request.environment()); }
|
||
if (request.application() != null) { where.append(" AND application = ?"); args.add(request.application()); }
|
||
// … level multi, logger, q (positionCaseInsensitive(message, ?) > 0), exchangeId, agentId …
|
||
String sql = "SELECT count() FROM logs WHERE " + where; // NO FINAL
|
||
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
|
||
return n == null ? 0L : n;
|
||
}
|
||
```
|
||
|
||
(Imports: `java.sql.Timestamp`, `java.util.ArrayList`.)
|
||
|
||
- [ ] **Step 5: Run — PASS**.
|
||
|
||
- [ ] **Step 6: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java
|
||
git commit -m "feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator"
|
||
```
|
||
|
||
### Task 13: Add `ClickHouseSearchIndex.countExecutionsForAlerting(AlertMatchSpec)`
|
||
|
||
**Files:**
|
||
- Create: `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java`
|
||
- Modify: `cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java`
|
||
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java`
|
||
|
||
- [ ] **Step 1: GitNexus impact check**
|
||
|
||
Run `gitnexus_impact({target: "ClickHouseSearchIndex", direction: "upstream"})`. Additive method — no downstream breakage.
|
||
|
||
- [ ] **Step 2: Create `AlertMatchSpec` record**
|
||
|
||
```java
|
||
package com.cameleer.server.core.alerting;
|
||
|
||
import java.time.Instant;
|
||
import java.util.Map;
|
||
|
||
/** Specification for alerting-specific execution counting.
|
||
* Distinct from SearchRequest: no text-in-body subqueries, no cursor, no FINAL.
|
||
* All fields except tenant/env/from/to are nullable filters. */
|
||
public record AlertMatchSpec(
|
||
String tenantId,
|
||
String environment,
|
||
String applicationId, // nullable
|
||
String routeId, // nullable
|
||
String status, // "FAILED" / "SUCCESS" / null
|
||
Map<String, String> attributes, // exact match on execution attribute key=value
|
||
Instant from,
|
||
Instant to,
|
||
Instant after // nullable; used by PER_EXCHANGE to advance cursor
|
||
) {
|
||
public AlertMatchSpec {
|
||
attributes = attributes == null ? Map.of() : Map.copyOf(attributes);
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 3: Write the failing test** — seed a mix of FAILED/SUCCESS executions with various attribute maps, assert count matches.
|
||
|
||
- [ ] **Step 4: Run — FAIL**.
|
||
|
||
- [ ] **Step 5: Implement on `ClickHouseSearchIndex`**
|
||
|
||
```java
|
||
public long countExecutionsForAlerting(AlertMatchSpec spec) {
|
||
StringBuilder where = new StringBuilder(
|
||
"tenant_id = ? AND environment = ? AND start_time BETWEEN ? AND ?");
|
||
List<Object> args = new ArrayList<>();
|
||
args.add(spec.tenantId());
|
||
args.add(spec.environment());
|
||
args.add(Timestamp.from(spec.from()));
|
||
args.add(Timestamp.from(spec.to()));
|
||
if (spec.applicationId() != null) { where.append(" AND application_id = ?"); args.add(spec.applicationId()); }
|
||
if (spec.routeId() != null) { where.append(" AND route_id = ?"); args.add(spec.routeId()); }
|
||
if (spec.status() != null) { where.append(" AND status = ?"); args.add(spec.status()); }
|
||
if (spec.after() != null) {
|
||
where.append(" AND start_time > ?");
|
||
args.add(Timestamp.from(spec.after()));
|
||
}
|
||
// attribute filters: use Map column access — pattern matches existing search() impl
|
||
for (var e : spec.attributes().entrySet()) {
|
||
where.append(" AND attributes[?] = ?");
|
||
args.add(e.getKey());
|
||
args.add(e.getValue());
|
||
}
|
||
String sql = "SELECT count() FROM executions WHERE " + where; // NO FINAL
|
||
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
|
||
return n == null ? 0L : n;
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 6: Run — PASS**.
|
||
|
||
- [ ] **Step 7: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java \
|
||
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java
|
||
git commit -m "feat(alerting): countExecutionsForAlerting for exchange-match evaluator"
|
||
```
|
||
|
||
### Task 14: ClickHouse projections migration
|
||
|
||
**Files:**
|
||
- Create: `cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql`
|
||
- Modify: the schema initializer invocation site (likely `ClickHouseConfig` or `ClickHouseSchemaInitializer`) to also run this file on startup.
|
||
|
||
- [ ] **Step 1: Write the SQL file**
|
||
|
||
```sql
|
||
-- Additive, idempotent. Safe to drop + rebuild with no data loss.
|
||
ALTER TABLE executions
|
||
ADD PROJECTION IF NOT EXISTS alerting_app_status
|
||
(SELECT * ORDER BY (tenant_id, environment, application_id, status, start_time));
|
||
|
||
ALTER TABLE executions
|
||
ADD PROJECTION IF NOT EXISTS alerting_route_status
|
||
(SELECT * ORDER BY (tenant_id, environment, route_id, status, start_time));
|
||
|
||
ALTER TABLE logs
|
||
ADD PROJECTION IF NOT EXISTS alerting_app_level
|
||
(SELECT * ORDER BY (tenant_id, environment, application, level, timestamp));
|
||
|
||
ALTER TABLE agent_metrics
|
||
ADD PROJECTION IF NOT EXISTS alerting_instance_metric
|
||
(SELECT * ORDER BY (tenant_id, environment, instance_id, metric_name, collected_at));
|
||
|
||
ALTER TABLE executions MATERIALIZE PROJECTION alerting_app_status;
|
||
ALTER TABLE executions MATERIALIZE PROJECTION alerting_route_status;
|
||
ALTER TABLE logs MATERIALIZE PROJECTION alerting_app_level;
|
||
ALTER TABLE agent_metrics MATERIALIZE PROJECTION alerting_instance_metric;
|
||
```
|
||
|
||
(Adjust table column names to match real `init.sql` — confirm `application` vs `application_id` on the `logs` and `agent_metrics` tables.)
|
||
|
||
- [ ] **Step 2: Hook into `ClickHouseSchemaInitializer`**
|
||
|
||
Find the initializer and add a second invocation:
|
||
|
||
```java
|
||
runIdempotent("clickhouse/init.sql");
|
||
runIdempotent("clickhouse/alerting_projections.sql");
|
||
```
|
||
|
||
- [ ] **Step 3: Add a smoke IT**
|
||
|
||
```java
|
||
@Test
|
||
void projectionsExistAfterStartup() {
|
||
var names = jdbcTemplate.queryForList(
|
||
"SELECT name FROM system.projections WHERE table IN ('executions','logs','agent_metrics')",
|
||
String.class);
|
||
assertThat(names).contains(
|
||
"alerting_app_status","alerting_route_status","alerting_app_level","alerting_instance_metric");
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql \
|
||
cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/search/AlertingProjectionsIT.java
|
||
git commit -m "feat(alerting): ClickHouse projections for alerting read paths"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 5 — Mustache templating and silence matching
|
||
|
||
### Task 15: Add JMustache dependency
|
||
|
||
**Files:**
|
||
- Modify: `cameleer-server-core/pom.xml`
|
||
|
||
- [ ] **Step 1: Add dependency**
|
||
|
||
```xml
|
||
<dependency>
|
||
<groupId>com.samskivert</groupId>
|
||
<artifactId>jmustache</artifactId>
|
||
<version>1.16</version>
|
||
</dependency>
|
||
```
|
||
|
||
- [ ] **Step 2: Verify resolve**
|
||
|
||
Run: `mvn -pl cameleer-server-core dependency:resolve`
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-core/pom.xml
|
||
git commit -m "chore(alerting): add jmustache 1.16"
|
||
```
|
||
|
||
### Task 16: `MustacheRenderer`
|
||
|
||
**Files:**
|
||
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java`
|
||
- Test: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test**
|
||
|
||
```java
|
||
package com.cameleer.server.app.alerting.notify;
|
||
|
||
import org.junit.jupiter.api.Test;
|
||
import java.util.Map;
|
||
import static org.assertj.core.api.Assertions.assertThat;
|
||
|
||
class MustacheRendererTest {
|
||
|
||
private final MustacheRenderer r = new MustacheRenderer();
|
||
|
||
@Test
|
||
void rendersSimpleTemplate() {
|
||
String out = r.render("Hello {{name}}", Map.of("name", "world"));
|
||
assertThat(out).isEqualTo("Hello world");
|
||
}
|
||
|
||
@Test
|
||
void rendersNestedPath() {
|
||
String out = r.render("{{alert.severity}}", Map.of("alert", Map.of("severity","CRITICAL")));
|
||
assertThat(out).isEqualTo("CRITICAL");
|
||
}
|
||
|
||
@Test
|
||
void missingVariableRendersLiteral() {
|
||
String out = r.render("{{missing.path}}", Map.of());
|
||
assertThat(out).isEqualTo("{{missing.path}}");
|
||
}
|
||
|
||
@Test
|
||
void malformedTemplateReturnsRawWithWarn() {
|
||
String out = r.render("{{unclosed", Map.of("unclosed","x"));
|
||
assertThat(out).isEqualTo("{{unclosed");
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement**
|
||
|
||
```java
|
||
package com.cameleer.server.app.alerting.notify;
|
||
|
||
import com.samskivert.mustache.Mustache;
|
||
import com.samskivert.mustache.Template;
|
||
import org.slf4j.Logger;
|
||
import org.slf4j.LoggerFactory;
|
||
import org.springframework.stereotype.Component;
|
||
|
||
import java.util.Map;
|
||
|
||
@Component
|
||
public class MustacheRenderer {
|
||
|
||
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
|
||
|
||
private final Mustache.Compiler compiler = Mustache.compiler()
|
||
.nullValue("")
|
||
.emptyStringIsFalse(true)
|
||
.defaultValue(null) // null triggers MissingContext -> we intercept below
|
||
.escapeHTML(false);
|
||
|
||
public String render(String template, Map<String, Object> context) {
|
||
if (template == null) return "";
|
||
try {
|
||
Template t = compiler.compile(template);
|
||
return t.execute(new LiteralFallbackContext(context));
|
||
} catch (Exception e) {
|
||
log.warn("Mustache render failed for template='{}': {}", abbreviate(template), e.getMessage());
|
||
return template;
|
||
}
|
||
}
|
||
|
||
/** Returns `{{path}}` literal when a variable is missing. */
|
||
private static class LiteralFallbackContext {
|
||
private final Map<String, Object> map;
|
||
LiteralFallbackContext(Map<String, Object> map) { this.map = map; }
|
||
// JMustache uses reflection / Map lookup, so we rely on wrapping the missing-value callback:
|
||
// easiest approach: compile with a custom `Mustache.Compiler.Loader` and intercept resolution.
|
||
// Simpler: post-process the output to detect unresolved `{{}}` sections → not possible after render.
|
||
// Alternative: pre-flight — scan template tokens against context and replace unresolved tokens
|
||
// with the literal before compilation. Use this simple approach:
|
||
}
|
||
}
|
||
```
|
||
|
||
Simpler implementation (ships for v1):
|
||
|
||
```java
|
||
@Component
|
||
public class MustacheRenderer {
|
||
|
||
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
|
||
private static final java.util.regex.Pattern TOKEN =
|
||
java.util.regex.Pattern.compile("\\{\\{\\s*([a-zA-Z0-9_.]+)\\s*}}");
|
||
|
||
private final Mustache.Compiler compiler = Mustache.compiler()
|
||
.defaultValue("")
|
||
.escapeHTML(false);
|
||
|
||
public String render(String template, Map<String, Object> context) {
|
||
if (template == null) return "";
|
||
String resolved = preResolve(template, context);
|
||
try {
|
||
return compiler.compile(resolved).execute(context);
|
||
} catch (Exception e) {
|
||
log.warn("Mustache render failed: {}", e.getMessage());
|
||
return template;
|
||
}
|
||
}
|
||
|
||
/** Replaces `{{missing.path}}` with the literal so Mustache sees a non-tag string. */
|
||
private String preResolve(String template, Map<String, Object> context) {
|
||
var m = TOKEN.matcher(template);
|
||
var sb = new StringBuilder();
|
||
while (m.find()) {
|
||
String path = m.group(1);
|
||
if (resolvePath(context, path) == null) {
|
||
m.appendReplacement(sb, java.util.regex.Matcher.quoteReplacement("{{" + path + "}}"));
|
||
// Replace the {{}} with {{{ literal }}} once we escape it — but jmustache will not re-process.
|
||
// Simpler: just wrap in a triple-brace or surround with a marker. For v1 we skip the double-expand:
|
||
// we return the LITERAL inside a section {{#_literal_123}}... so preResolve returns a string
|
||
// that Mustache will not modify. Concrete approach:
|
||
}
|
||
}
|
||
m.appendTail(sb);
|
||
return sb.toString();
|
||
}
|
||
|
||
private Object resolvePath(Map<String, Object> ctx, String path) {
|
||
Object cur = ctx;
|
||
for (String seg : path.split("\\.")) {
|
||
if (!(cur instanceof Map<?,?> m)) return null;
|
||
cur = m.get(seg);
|
||
if (cur == null) return null;
|
||
}
|
||
return cur;
|
||
}
|
||
}
|
||
```
|
||
|
||
**Engineer note:** Prefer a pre-compile token substitution that replaces `{{missing.path}}` with a literal that Mustache renders as-is. One working approach: write a custom `Mustache.VariableFetcher` via `compiler.withFormatter(...)` — but JMustache's `Mustache.Compiler#withCollector()` is easier. Confirm during implementation and adjust this task; the tests in Step 1 lock the contract. If JMustache's API makes missing-variable fallback awkward, fall back to a regex-based substitutor that does `{{` → `⟦MUSTACHE_LITERAL:path⟧` for missing paths, then post-replace after render. The contract is: **unresolved `{{x}}` renders as literal `{{x}}`**.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java
|
||
git commit -m "feat(alerting): MustacheRenderer with literal fallback on missing vars"
|
||
```
|
||
|
||
### Task 17: `NotificationContextBuilder`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/notify/NotificationContextBuilder.java`
|
||
- Test: `.../alerting/notify/NotificationContextBuilderTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** covering:
|
||
- env / rule / alert subtrees always present
|
||
- conditional trees: `exchange.*` present only for EXCHANGE_MATCH, `log.*` only for LOG_PATTERN, etc.
|
||
- `alert.link` uses the configured `cameleer.server.ui-origin` prefix if present, else `/alerts/inbox/{id}`.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — pure static `Map<String,Object> build(AlertRule, AlertInstance, Environment, String uiOrigin)`.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java
|
||
git commit -m "feat(alerting): NotificationContextBuilder for template context maps"
|
||
```
|
||
|
||
### Task 18: `SilenceMatcher` evaluator
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/notify/SilenceMatcherService.java` (named to avoid clash with core record `SilenceMatcher`)
|
||
- Test: `.../alerting/notify/SilenceMatcherServiceTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** covering truth table:
|
||
- Wildcard matcher → matches any instance.
|
||
- Matcher with `ruleId` only → matches only instances with that rule.
|
||
- Multiple fields → AND logic.
|
||
- Active-window check at notification time (not at eval time).
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement**
|
||
|
||
```java
|
||
@Component
|
||
public class SilenceMatcherService {
|
||
|
||
public boolean matches(SilenceMatcher m, AlertInstance instance, AlertRule rule) {
|
||
if (m.ruleId() != null && !m.ruleId().equals(instance.ruleId())) return false;
|
||
if (m.severity()!= null && m.severity() != instance.severity()) return false;
|
||
if (m.appSlug() != null && !m.appSlug().equals(rule.condition().scope().appSlug())) return false;
|
||
if (m.routeId() != null && !m.routeId().equals(rule.condition().scope().routeId())) return false;
|
||
if (m.agentId() != null && !m.agentId().equals(rule.condition().scope().agentId())) return false;
|
||
return true;
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/SilenceMatcherServiceTest.java
|
||
git commit -m "feat(alerting): silence matcher for notification-time dispatch"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 6 — Condition evaluators
|
||
|
||
All six evaluators share this shape:
|
||
|
||
```java
|
||
public sealed interface ConditionEvaluator<C extends AlertCondition>
|
||
permits RouteMetricEvaluator, ExchangeMatchEvaluator, AgentStateEvaluator,
|
||
DeploymentStateEvaluator, LogPatternEvaluator, JvmMetricEvaluator {
|
||
|
||
ConditionKind kind();
|
||
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
|
||
}
|
||
```
|
||
|
||
Supporting types (create these in Task 19 before implementing individual evaluators).
|
||
|
||
### Task 19: `EvalContext`, `EvalResult`, `TickCache`, `PerKindCircuitBreaker`, `ConditionEvaluator` interface
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/EvalContext.java`, `EvalResult.java`, `TickCache.java`, `PerKindCircuitBreaker.java`, `ConditionEvaluator.java`
|
||
- Test: `.../alerting/eval/TickCacheTest.java`, `PerKindCircuitBreakerTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing tests**
|
||
|
||
```java
|
||
// TickCacheTest.java
|
||
@Test
|
||
void getOrComputeCachesWithinTick() {
|
||
var cache = new TickCache();
|
||
int n = cache.getOrCompute("k", () -> 42);
|
||
int m = cache.getOrCompute("k", () -> 43);
|
||
assertThat(n).isEqualTo(42);
|
||
assertThat(m).isEqualTo(42); // cached
|
||
}
|
||
|
||
// PerKindCircuitBreakerTest.java
|
||
@Test
|
||
void opensAfterFailThreshold() {
|
||
var cb = new PerKindCircuitBreaker(5, 30, 60, java.time.Clock.fixed(...));
|
||
for (int i = 0; i < 5; i++) cb.recordFailure(ConditionKind.AGENT_STATE);
|
||
assertThat(cb.isOpen(ConditionKind.AGENT_STATE)).isTrue();
|
||
}
|
||
|
||
@Test
|
||
void closesAfterCooldown() { /* advance clock beyond cooldown window */ }
|
||
```
|
||
|
||
- [ ] **Step 2: Implement**
|
||
|
||
```java
|
||
// EvalContext.java
|
||
package com.cameleer.server.app.alerting.eval;
|
||
import java.time.Instant;
|
||
public record EvalContext(String tenantId, Instant now, TickCache tickCache) {}
|
||
```
|
||
|
||
```java
|
||
// EvalResult.java
|
||
package com.cameleer.server.app.alerting.eval;
|
||
import java.util.Map;
|
||
|
||
public sealed interface EvalResult {
|
||
record Firing(Double currentValue, Double threshold, Map<String, Object> context) implements EvalResult {
|
||
public Firing { context = context == null ? Map.of() : Map.copyOf(context); }
|
||
}
|
||
record Clear() implements EvalResult {
|
||
public static final Clear INSTANCE = new Clear();
|
||
}
|
||
record Error(Throwable cause) implements EvalResult {}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// TickCache.java
|
||
package com.cameleer.server.app.alerting.eval;
|
||
import java.util.concurrent.ConcurrentHashMap;
|
||
import java.util.function.Supplier;
|
||
|
||
public class TickCache {
|
||
private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<>();
|
||
@SuppressWarnings("unchecked")
|
||
public <T> T getOrCompute(String key, Supplier<T> supplier) {
|
||
return (T) map.computeIfAbsent(key, k -> supplier.get());
|
||
}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// PerKindCircuitBreaker.java
|
||
package com.cameleer.server.app.alerting.eval;
|
||
|
||
import com.cameleer.server.core.alerting.ConditionKind;
|
||
|
||
import java.time.Clock;
|
||
import java.time.Duration;
|
||
import java.time.Instant;
|
||
import java.util.*;
|
||
import java.util.concurrent.ConcurrentHashMap;
|
||
|
||
public class PerKindCircuitBreaker {
|
||
|
||
private record State(Deque<Instant> failures, Instant openUntil) {}
|
||
|
||
private final int threshold;
|
||
private final Duration window;
|
||
private final Duration cooldown;
|
||
private final Clock clock;
|
||
private final ConcurrentHashMap<ConditionKind, State> byKind = new ConcurrentHashMap<>();
|
||
|
||
public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds, Clock clock) {
|
||
this.threshold = threshold;
|
||
this.window = Duration.ofSeconds(windowSeconds);
|
||
this.cooldown = Duration.ofSeconds(cooldownSeconds);
|
||
this.clock = clock;
|
||
}
|
||
|
||
public void recordFailure(ConditionKind kind) {
|
||
byKind.compute(kind, (k, s) -> {
|
||
var deque = (s == null) ? new ArrayDeque<Instant>() : new ArrayDeque<>(s.failures());
|
||
Instant now = Instant.now(clock);
|
||
Instant cutoff = now.minus(window);
|
||
while (!deque.isEmpty() && deque.peekFirst().isBefore(cutoff)) deque.pollFirst();
|
||
deque.addLast(now);
|
||
Instant openUntil = (deque.size() >= threshold) ? now.plus(cooldown) : null;
|
||
return new State(deque, openUntil);
|
||
});
|
||
}
|
||
|
||
public boolean isOpen(ConditionKind kind) {
|
||
State s = byKind.get(kind);
|
||
return s != null && s.openUntil() != null && Instant.now(clock).isBefore(s.openUntil());
|
||
}
|
||
|
||
public void recordSuccess(ConditionKind kind) {
|
||
byKind.compute(kind, (k, s) -> new State(new ArrayDeque<>(), null));
|
||
}
|
||
}
|
||
```
|
||
|
||
```java
|
||
// ConditionEvaluator.java
|
||
package com.cameleer.server.app.alerting.eval;
|
||
|
||
import com.cameleer.server.core.alerting.*;
|
||
|
||
public interface ConditionEvaluator<C extends AlertCondition> {
|
||
ConditionKind kind();
|
||
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
|
||
}
|
||
```
|
||
|
||
(`sealed permits …` is omitted on the interface to avoid a multi-file compile-order gotcha during the TDD sequence. The effective constraint is enforced by the dispatcher's `switch` over `ConditionKind`.)
|
||
|
||
- [ ] **Step 3: Run — PASS**.
|
||
|
||
- [ ] **Step 4: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/
|
||
git commit -m "feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)"
|
||
```
|
||
|
||
### Task 20: `AgentStateEvaluator`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/AgentStateEvaluator.java`
|
||
- Test: `.../alerting/eval/AgentStateEvaluatorTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test**
|
||
|
||
```java
|
||
@Test
|
||
void firesWhenAnyAgentInTargetStateForScope() {
|
||
var registry = mock(AgentRegistryService.class);
|
||
when(registry.findAll()).thenReturn(List.of(
|
||
new AgentInfo("a1","a1","orders", "env-uuid","1.0", List.of(), Map.of(),
|
||
AgentState.DEAD, Instant.now().minusSeconds(120), Instant.now().minusSeconds(120), null)
|
||
));
|
||
var eval = new AgentStateEvaluator(registry);
|
||
var rule = ruleWith(new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60));
|
||
EvalResult r = eval.evaluate((AgentStateCondition) rule.condition(), rule,
|
||
new EvalContext("default", Instant.now(), new TickCache()));
|
||
assertThat(r).isInstanceOf(EvalResult.Firing.class);
|
||
}
|
||
|
||
@Test
|
||
void clearWhenNoMatchingAgents() { /* ... */ }
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement**
|
||
|
||
```java
|
||
@Component
|
||
public class AgentStateEvaluator implements ConditionEvaluator<AgentStateCondition> {
|
||
|
||
private final AgentRegistryService registry;
|
||
|
||
public AgentStateEvaluator(AgentRegistryService registry) { this.registry = registry; }
|
||
|
||
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
|
||
|
||
@Override
|
||
public EvalResult evaluate(AgentStateCondition c, AlertRule rule, EvalContext ctx) {
|
||
AgentState target = AgentState.valueOf(c.state());
|
||
Instant cutoff = ctx.now().minusSeconds(c.forSeconds());
|
||
List<AgentInfo> hits = registry.findAll().stream()
|
||
.filter(a -> matchesScope(a, c.scope()))
|
||
.filter(a -> a.state() == target)
|
||
.filter(a -> a.lastHeartbeat() != null && a.lastHeartbeat().isBefore(cutoff))
|
||
.toList();
|
||
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
|
||
AgentInfo first = hits.get(0);
|
||
return new EvalResult.Firing(
|
||
(double) hits.size(), null,
|
||
Map.of("agent", Map.of(
|
||
"id", first.instanceId(),
|
||
"name", first.displayName(),
|
||
"state", first.state().name()
|
||
), "app", Map.of("slug", first.applicationId())));
|
||
}
|
||
|
||
private static boolean matchesScope(AgentInfo a, AlertScope s) {
|
||
if (s.appSlug() != null && !s.appSlug().equals(a.applicationId())) return false;
|
||
if (s.agentId() != null && !s.agentId().equals(a.instanceId())) return false;
|
||
return true;
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluatorTest.java
|
||
git commit -m "feat(alerting): AGENT_STATE evaluator"
|
||
```
|
||
|
||
### Task 21: `DeploymentStateEvaluator`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/DeploymentStateEvaluator.java`
|
||
- Test: `.../alerting/eval/DeploymentStateEvaluatorTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** — `FAILED` deployment for matching app → Firing; `RUNNING` → Clear.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — read via `DeploymentRepository.findByAppId` and `AppService.getByEnvironmentAndSlug`:
|
||
|
||
```java
|
||
@Override
|
||
public EvalResult evaluate(DeploymentStateCondition c, AlertRule rule, EvalContext ctx) {
|
||
App app = appService.getByEnvironmentAndSlug(rule.environmentId(), c.scope().appSlug()).orElse(null);
|
||
if (app == null) return EvalResult.Clear.INSTANCE;
|
||
List<Deployment> current = deploymentRepo.findByAppId(app.id());
|
||
Set<String> wanted = Set.copyOf(c.states());
|
||
var hits = current.stream()
|
||
.filter(d -> wanted.contains(d.status().name()))
|
||
.toList();
|
||
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
|
||
Deployment d = hits.get(0);
|
||
return new EvalResult.Firing((double) hits.size(), null,
|
||
Map.of("deployment", Map.of("id", d.id().toString(), "status", d.status().name()),
|
||
"app", Map.of("slug", app.slug())));
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): DEPLOYMENT_STATE evaluator"
|
||
```
|
||
|
||
### Task 22: `RouteMetricEvaluator`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/RouteMetricEvaluator.java`
|
||
- Test: `.../alerting/eval/RouteMetricEvaluatorTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** — mock `StatsStore`, seed `ExecutionStats{p99Ms = 2500, ...}` for a scoped call, assert Firing with `currentValue = 2500, threshold = 2000`.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — dispatch on `RouteMetric` enum:
|
||
|
||
```java
|
||
@Override
|
||
public EvalResult evaluate(RouteMetricCondition c, AlertRule rule, EvalContext ctx) {
|
||
Instant from = ctx.now().minusSeconds(c.windowSeconds());
|
||
Instant to = ctx.now();
|
||
|
||
String env = environmentService.findById(rule.environmentId()).map(Environment::slug).orElse(null);
|
||
ExecutionStats stats = (c.scope().routeId() != null)
|
||
? statsStore.statsForRoute(from, to, c.scope().routeId(), c.scope().appSlug(), env)
|
||
: (c.scope().appSlug() != null)
|
||
? statsStore.statsForApp(from, to, c.scope().appSlug(), env)
|
||
: statsStore.stats(from, to, env);
|
||
|
||
double actual = switch (c.metric()) {
|
||
case ERROR_RATE -> errorRate(stats);
|
||
case P95_LATENCY_MS -> stats.p95DurationMs();
|
||
case P99_LATENCY_MS -> stats.p99DurationMs();
|
||
case THROUGHPUT -> stats.totalCount();
|
||
case ERROR_COUNT -> stats.failedCount();
|
||
};
|
||
|
||
boolean fire = switch (c.comparator()) {
|
||
case GT -> actual > c.threshold();
|
||
case GTE -> actual >= c.threshold();
|
||
case LT -> actual < c.threshold();
|
||
case LTE -> actual <= c.threshold();
|
||
case EQ -> actual == c.threshold();
|
||
};
|
||
|
||
if (!fire) return EvalResult.Clear.INSTANCE;
|
||
return new EvalResult.Firing(actual, c.threshold(),
|
||
Map.of("route", Map.of("id", c.scope().routeId() == null ? "" : c.scope().routeId()),
|
||
"app", Map.of("slug", c.scope().appSlug() == null ? "" : c.scope().appSlug())));
|
||
}
|
||
|
||
private double errorRate(ExecutionStats s) {
|
||
long total = s.totalCount();
|
||
return total == 0 ? 0.0 : (double) s.failedCount() / total;
|
||
}
|
||
```
|
||
|
||
(Adjust method names on `ExecutionStats` to match the actual record — use `gitnexus_context({name: "ExecutionStats"})` if unsure.)
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): ROUTE_METRIC evaluator"
|
||
```
|
||
|
||
### Task 23: `LogPatternEvaluator`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/LogPatternEvaluator.java`
|
||
- Test: `.../alerting/eval/LogPatternEvaluatorTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** — mock `ClickHouseLogStore.countLogs` returning 7; threshold 5 → Firing; returning 3 → Clear.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — build a `LogSearchRequest` from the condition + window, delegate to `countLogs`. Use `TickCache` keyed on `(env, app, level, pattern, windowStart, windowEnd)` to coalesce.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): LOG_PATTERN evaluator"
|
||
```
|
||
|
||
### Task 24: `JvmMetricEvaluator`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/JvmMetricEvaluator.java`
|
||
- Test: `.../alerting/eval/JvmMetricEvaluatorTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** — mock `MetricsQueryStore.queryTimeSeries` for `("agent-1", ["heap_used_percent"], from, to, 1)` returning `{heap_used_percent: [Bucket{max=95.0}]}`; assert Firing with currentValue=95.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — aggregate across buckets per `AggregationOp` (MAX/MIN/AVG/LATEST), compare against threshold.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): JVM_METRIC evaluator"
|
||
```
|
||
|
||
### Task 25: `ExchangeMatchEvaluator` (PER_EXCHANGE + COUNT_IN_WINDOW)
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/ExchangeMatchEvaluator.java`
|
||
- Test: `.../alerting/eval/ExchangeMatchEvaluatorTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** — two variants:
|
||
- `COUNT_IN_WINDOW`: mock `ClickHouseSearchIndex.countExecutionsForAlerting` → threshold check.
|
||
- `PER_EXCHANGE`: `eval_state.lastExchangeTs` cursor advancement. Seed 3 matching exchanges; first eval returns all 3 as separate Firings (emit a list? or change signature?). For v1 simplicity, the evaluator returns `EvalResult.Firing` with an internal list of exchange descriptors in the context map; the job handles one-alert-per-exchange fan-out.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement.** The key design decision is how PER_EXCHANGE returns multiple alerts. Simplest approach: extend `EvalResult` with a `Batch` variant:
|
||
|
||
```java
|
||
record Batch(List<Firing> firings) implements EvalResult { ... }
|
||
```
|
||
|
||
Add this to `EvalResult.java` (Task 19). The job (Task 27) detects Batch and creates one `AlertInstance` per Firing. This keeps non-batched evaluators simple.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): EXCHANGE_MATCH evaluator with per-exchange + count modes"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 7 — Evaluator job and state transitions
|
||
|
||
### Task 26: `AlertingProperties` + `AlertStateTransitions`
|
||
|
||
**Files:**
|
||
- Create: `cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java`
|
||
- Create: `.../alerting/eval/AlertStateTransitions.java`
|
||
- Test: `.../alerting/eval/AlertStateTransitionsTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** for the pure state machine:
|
||
|
||
```java
|
||
@Test
|
||
void clearWithNoOpenInstanceIsNoOp() {
|
||
var next = AlertStateTransitions.apply(null, EvalResult.Clear.INSTANCE, rule, now);
|
||
assertThat(next).isEmpty();
|
||
}
|
||
|
||
@Test
|
||
void firingWithNoOpenInstanceCreatesPendingIfForDuration() {
|
||
var rule = ruleBuilder().forDurationSeconds(60).build();
|
||
var result = new EvalResult.Firing(2500.0, 2000.0, Map.of());
|
||
var next = AlertStateTransitions.apply(null, result, rule, now);
|
||
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.PENDING));
|
||
}
|
||
|
||
@Test
|
||
void firingWithNoForDurationGoesStraightToFiring() {
|
||
var rule = ruleBuilder().forDurationSeconds(0).build();
|
||
var next = AlertStateTransitions.apply(null, new EvalResult.Firing(1.0, null, Map.of()), rule, now);
|
||
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.FIRING));
|
||
}
|
||
|
||
@Test
|
||
void pendingPromotesToFiringAfterForDuration() { /* ... */ }
|
||
|
||
@Test
|
||
void firingClearTransitionsToResolved() { /* ... */ }
|
||
|
||
@Test
|
||
void ackedInstanceClearsToResolved() { /* preserves acked_by, sets resolved_at */ }
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement**
|
||
|
||
```java
|
||
// AlertStateTransitions.java
|
||
package com.cameleer.server.app.alerting.eval;
|
||
|
||
import com.cameleer.server.core.alerting.*;
|
||
import java.time.Instant;
|
||
import java.util.*;
|
||
|
||
public final class AlertStateTransitions {
|
||
|
||
private AlertStateTransitions() {}
|
||
|
||
/** Returns the new/updated AlertInstance, or empty when nothing changes. */
|
||
public static Optional<AlertInstance> apply(
|
||
AlertInstance current, EvalResult result, AlertRule rule, Instant now) {
|
||
|
||
return switch (result) {
|
||
case EvalResult.Clear c -> onClear(current, now);
|
||
case EvalResult.Firing f -> onFiring(current, f, rule, now);
|
||
case EvalResult.Error e -> Optional.empty();
|
||
case EvalResult.Batch b -> Optional.empty(); // batch handled by the job, not here
|
||
};
|
||
}
|
||
|
||
private static Optional<AlertInstance> onFiring(AlertInstance current, EvalResult.Firing f,
|
||
AlertRule rule, Instant now) {
|
||
if (current == null) {
|
||
AlertState initial = rule.forDurationSeconds() > 0 ? AlertState.PENDING : AlertState.FIRING;
|
||
return Optional.of(newInstance(rule, f, initial, now));
|
||
}
|
||
if (current.state() == AlertState.PENDING) {
|
||
Instant firedAt = current.firedAt();
|
||
if (firedAt.plusSeconds(rule.forDurationSeconds()).isBefore(now)) {
|
||
return Optional.of(current /* copy with state=FIRING, firedAt=now */);
|
||
}
|
||
return Optional.of(current); // stay PENDING, no mutation
|
||
}
|
||
return Optional.empty(); // already FIRING/ACK — re-notification handled by dispatcher
|
||
}
|
||
|
||
private static Optional<AlertInstance> onClear(AlertInstance current, Instant now) {
|
||
if (current == null) return Optional.empty();
|
||
if (current.state() == AlertState.RESOLVED) return Optional.empty();
|
||
return Optional.of(current /* copy with state=RESOLVED, resolvedAt=now */);
|
||
}
|
||
|
||
private static AlertInstance newInstance(AlertRule rule, EvalResult.Firing f, AlertState state, Instant now) {
|
||
// ... construct from rule snapshot + context; title/message rendered by the job
|
||
throw new UnsupportedOperationException("stub");
|
||
}
|
||
}
|
||
```
|
||
|
||
Flesh out the `.withState(...)` / `.withResolvedAt(...)` helpers on `AlertInstance` (add wither-style methods returning new records) as part of this task.
|
||
|
||
```java
|
||
// AlertingProperties.java
|
||
package com.cameleer.server.app.alerting.config;
|
||
|
||
import org.springframework.boot.context.properties.ConfigurationProperties;
|
||
|
||
@ConfigurationProperties("cameleer.server.alerting")
|
||
public record AlertingProperties(
|
||
Integer evaluatorTickIntervalMs,
|
||
Integer evaluatorBatchSize,
|
||
Integer claimTtlSeconds,
|
||
Integer notificationTickIntervalMs,
|
||
Integer notificationBatchSize,
|
||
Boolean inTickCacheEnabled,
|
||
Integer circuitBreakerFailThreshold,
|
||
Integer circuitBreakerWindowSeconds,
|
||
Integer circuitBreakerCooldownSeconds,
|
||
Integer eventRetentionDays,
|
||
Integer notificationRetentionDays,
|
||
Integer webhookTimeoutMs,
|
||
Integer webhookMaxAttempts) {
|
||
|
||
public int effectiveEvaluatorTickIntervalMs() {
|
||
int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs;
|
||
return Math.max(5000, raw); // floor
|
||
}
|
||
public int effectiveEvaluatorBatchSize() { return evaluatorBatchSize == null ? 20 : evaluatorBatchSize; }
|
||
public int effectiveClaimTtlSeconds() { return claimTtlSeconds == null ? 30 : claimTtlSeconds; }
|
||
public int effectiveNotificationTickIntervalMs(){ return notificationTickIntervalMs == null ? 5000 : notificationTickIntervalMs; }
|
||
public int effectiveNotificationBatchSize() { return notificationBatchSize == null ? 50 : notificationBatchSize; }
|
||
public int effectiveEventRetentionDays() { return eventRetentionDays == null ? 90 : eventRetentionDays; }
|
||
public int effectiveNotificationRetentionDays() { return notificationRetentionDays == null ? 30 : notificationRetentionDays; }
|
||
public int effectiveWebhookTimeoutMs() { return webhookTimeoutMs == null ? 5000 : webhookTimeoutMs; }
|
||
public int effectiveWebhookMaxAttempts() { return webhookMaxAttempts == null ? 3 : webhookMaxAttempts; }
|
||
public int cbFailThreshold() { return circuitBreakerFailThreshold == null ? 5 : circuitBreakerFailThreshold; }
|
||
public int cbWindowSeconds() { return circuitBreakerWindowSeconds == null ? 30 : circuitBreakerWindowSeconds; }
|
||
public int cbCooldownSeconds(){ return circuitBreakerCooldownSeconds== null ? 60 : circuitBreakerCooldownSeconds; }
|
||
}
|
||
```
|
||
|
||
Register via `@ConfigurationPropertiesScan` or explicit `@EnableConfigurationProperties(AlertingProperties.class)` in `AlertingBeanConfig`. Also clamp-with-WARN if `evaluatorTickIntervalMs < 5000` at startup.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java \
|
||
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java
|
||
git commit -m "feat(alerting): AlertingProperties + AlertStateTransitions state machine"
|
||
```
|
||
|
||
### Task 27: `AlertEvaluatorJob`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/eval/AlertEvaluatorJob.java`
|
||
- Test: `.../alerting/eval/AlertEvaluatorJobIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing integration test** (uses real PG + mocked evaluators):
|
||
|
||
```java
|
||
@Test
|
||
void claimDueRuleFireResolveCycle() throws Exception {
|
||
// seed one rule scoped to a non-existent agent state -> evaluator returns Clear -> no instance.
|
||
// flip the mock to return Firing -> one AlertInstance in FIRING state.
|
||
// flip back to Clear -> instance transitions to RESOLVED.
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement**
|
||
|
||
```java
|
||
@Component
|
||
public class AlertEvaluatorJob implements SchedulingConfigurer {
|
||
|
||
private static final Logger log = LoggerFactory.getLogger(AlertEvaluatorJob.class);
|
||
|
||
private final AlertingProperties props;
|
||
private final AlertRuleRepository ruleRepo;
|
||
private final AlertInstanceRepository instanceRepo;
|
||
private final AlertNotificationRepository notificationRepo;
|
||
private final Map<ConditionKind, ConditionEvaluator<?>> evaluators;
|
||
private final PerKindCircuitBreaker circuitBreaker;
|
||
private final MustacheRenderer renderer;
|
||
private final NotificationContextBuilder contextBuilder;
|
||
private final String instanceId;
|
||
private final String tenantId;
|
||
private final AlertingMetrics metrics;
|
||
private final Clock clock;
|
||
|
||
public AlertEvaluatorJob(/* ...all above... */) { /* assign */ }
|
||
|
||
@Override
|
||
public void configureTasks(ScheduledTaskRegistrar registrar) {
|
||
registrar.addFixedDelayTask(this::tick, props.effectiveEvaluatorTickIntervalMs());
|
||
}
|
||
|
||
void tick() {
|
||
List<AlertRule> claimed = ruleRepo.claimDueRules(
|
||
instanceId, props.effectiveEvaluatorBatchSize(), props.effectiveClaimTtlSeconds());
|
||
|
||
TickCache cache = new TickCache();
|
||
EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), cache);
|
||
|
||
for (AlertRule rule : claimed) {
|
||
if (circuitBreaker.isOpen(rule.conditionKind())) {
|
||
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
|
||
continue;
|
||
}
|
||
try {
|
||
EvalResult result = evaluateSafely(rule, ctx);
|
||
applyResult(rule, result);
|
||
circuitBreaker.recordSuccess(rule.conditionKind());
|
||
} catch (Exception e) {
|
||
circuitBreaker.recordFailure(rule.conditionKind());
|
||
metrics.evalError(rule.conditionKind(), rule.id());
|
||
log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString());
|
||
} finally {
|
||
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
|
||
}
|
||
}
|
||
}
|
||
|
||
@SuppressWarnings({"rawtypes","unchecked"})
|
||
private EvalResult evaluateSafely(AlertRule rule, EvalContext ctx) {
|
||
ConditionEvaluator evaluator = evaluators.get(rule.conditionKind());
|
||
if (evaluator == null) throw new IllegalStateException("No evaluator for " + rule.conditionKind());
|
||
return evaluator.evaluate(rule.condition(), rule, ctx);
|
||
}
|
||
|
||
private void applyResult(AlertRule rule, EvalResult result) {
|
||
if (result instanceof EvalResult.Batch b) {
|
||
for (EvalResult.Firing f : b.firings()) applyFiring(rule, f);
|
||
return;
|
||
}
|
||
AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null);
|
||
AlertStateTransitions.apply(current, result, rule, Instant.now(clock)).ifPresent(next -> {
|
||
AlertInstance persisted = instanceRepo.save(
|
||
enrichTitleMessage(rule, next, result));
|
||
if (next.state() == AlertState.FIRING && current == null) {
|
||
enqueueNotifications(rule, persisted);
|
||
}
|
||
});
|
||
}
|
||
|
||
private void applyFiring(AlertRule rule, EvalResult.Firing f) { /* always create new instance for PER_EXCHANGE mode */ }
|
||
|
||
private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance, EvalResult result) {
|
||
Map<String,Object> ctx = contextBuilder.build(rule, instance, /* env lookup */ null, /* uiOrigin */ null);
|
||
String title = renderer.render(rule.notificationTitleTmpl(), ctx);
|
||
String message = renderer.render(rule.notificationMessageTmpl(), ctx);
|
||
return instance /* .withTitle(title).withMessage(message) */;
|
||
}
|
||
|
||
private void enqueueNotifications(AlertRule rule, AlertInstance instance) {
|
||
for (WebhookBinding w : rule.webhooks()) {
|
||
Map<String,Object> payload = /* context-builder + body override */ Map.of();
|
||
notificationRepo.save(new AlertNotification(
|
||
UUID.randomUUID(), instance.id(), w.id(), w.outboundConnectionId(),
|
||
NotificationStatus.PENDING, 0, Instant.now(clock),
|
||
null, null, null, null, payload, null, Instant.now(clock)));
|
||
}
|
||
}
|
||
|
||
private void reschedule(AlertRule rule, Instant next) {
|
||
ruleRepo.releaseClaim(rule.id(), next, rule.evalState());
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java \
|
||
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java
|
||
git commit -m "feat(alerting): AlertEvaluatorJob with claim-polling + circuit breaker"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 8 — Notification dispatch
|
||
|
||
### Task 28: `HmacSigner`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/notify/HmacSigner.java`
|
||
- Test: `.../alerting/notify/HmacSignerTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test**
|
||
|
||
```java
|
||
@Test
|
||
void signsBodyWithSha256Hmac() {
|
||
String sig = new HmacSigner().sign("secret", "payload".getBytes(StandardCharsets.UTF_8));
|
||
// precomputed: HMAC-SHA256(secret, "payload") = 3c5c4f...
|
||
assertThat(sig).startsWith("sha256=").isEqualTo("sha256=3c5c4f..."); // replace with real hex
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — `javax.crypto.Mac.getInstance("HmacSHA256")`, `HexFormat.of().formatHex(...)`.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): HmacSigner for webhook signature"
|
||
```
|
||
|
||
### Task 29: `WebhookDispatcher`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/notify/WebhookDispatcher.java`
|
||
- Test: `.../alerting/notify/WebhookDispatcherIT.java` (WireMock)
|
||
|
||
- [ ] **Step 1: Write the failing IT** covering:
|
||
- 2xx → returns DELIVERED with status + snippet.
|
||
- 4xx → returns FAILED immediately.
|
||
- 5xx → returns RETRY with exponential backoff.
|
||
- Network timeout → RETRY.
|
||
- HMAC header present when `hmacSecret != null`.
|
||
- TLS trust-all config works against WireMock HTTPS.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement**
|
||
|
||
```java
|
||
@Component
|
||
public class WebhookDispatcher {
|
||
|
||
public record Outcome(NotificationStatus status, int httpStatus, String snippet, Duration retryAfter) {}
|
||
|
||
private final OutboundHttpClientFactory clientFactory;
|
||
private final SecretCipher cipher;
|
||
private final HmacSigner signer;
|
||
private final MustacheRenderer renderer;
|
||
private final AlertingProperties props;
|
||
private final ObjectMapper om;
|
||
|
||
public WebhookDispatcher(/* ... */) { /* assign */ }
|
||
|
||
public Outcome dispatch(AlertNotification notif, AlertRule rule, AlertInstance instance,
|
||
OutboundConnection conn, Map<String,Object> context) {
|
||
String bodyTmpl = pickBodyTemplate(rule, notif.webhookId(), conn);
|
||
String body = renderer.render(bodyTmpl, context);
|
||
|
||
var ctx = new OutboundHttpRequestContext(
|
||
conn.tlsTrustMode(), conn.tlsCaPemPaths(),
|
||
Duration.ofMillis(2000), Duration.ofMillis(props.effectiveWebhookTimeoutMs()));
|
||
var client = clientFactory.clientFor(ctx);
|
||
|
||
var request = new HttpPost(renderer.render(conn.url(), context));
|
||
request.setEntity(new StringEntity(body, StandardCharsets.UTF_8));
|
||
request.setHeader("Content-Type", "application/json");
|
||
|
||
for (var h : conn.defaultHeaders().entrySet()) {
|
||
request.setHeader(h.getKey(), renderer.render(h.getValue(), context));
|
||
}
|
||
if (conn.hmacSecretCiphertext() != null) {
|
||
String secret = cipher.decrypt(conn.hmacSecretCiphertext());
|
||
request.setHeader("X-Cameleer-Signature", signer.sign(secret, body.getBytes(StandardCharsets.UTF_8)));
|
||
}
|
||
|
||
try (var response = client.execute(request)) {
|
||
int code = response.getCode();
|
||
String snippet = snippet(response);
|
||
if (code >= 200 && code < 300) return new Outcome(NotificationStatus.DELIVERED, code, snippet, null);
|
||
if (code >= 400 && code < 500) return new Outcome(NotificationStatus.FAILED, code, snippet, null);
|
||
return retryOutcome(code, snippet);
|
||
} catch (IOException e) {
|
||
return retryOutcome(-1, e.getMessage());
|
||
}
|
||
}
|
||
|
||
private Outcome retryOutcome(int code, String snippet) {
|
||
// Backoff: 30s, 120s, 300s
|
||
Duration next = Duration.ofSeconds(30); // caller multiplies by attempt
|
||
return new Outcome(null /* caller decides PENDING vs FAILED */, code, snippet, next);
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): WebhookDispatcher with HMAC + TLS + retry classification"
|
||
```
|
||
|
||
### Task 30: `NotificationDispatchJob`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/notify/NotificationDispatchJob.java`
|
||
- Test: `.../alerting/notify/NotificationDispatchJobIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing IT** — seed a `PENDING` `AlertNotification`; run one tick; WireMock returns 200; assert row transitions to `DELIVERED`. Seed another against 503 → assert `attempts=1`, `next_attempt_at` bumped, still `PENDING`.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — claim-polling loop:
|
||
|
||
```java
|
||
void tick() {
|
||
var claimed = notificationRepo.claimDueNotifications(instanceId, batchSize, claimTtl);
|
||
for (var n : claimed) {
|
||
var conn = outboundRepo.findById(tenantId, n.outboundConnectionId()).orElse(null);
|
||
if (conn == null) { notificationRepo.markFailed(n.id(), 0, "outbound connection deleted"); continue; }
|
||
|
||
var instance = instanceRepo.findById(n.alertInstanceId()).orElseThrow();
|
||
var rule = ruleRepo.findById(instance.ruleId()).orElse(null);
|
||
var context = contextBuilder.build(rule, instance, env, uiOrigin);
|
||
|
||
// silence check
|
||
if (silenceRepo.listActive(instance.environmentId(), Instant.now()).stream()
|
||
.anyMatch(s -> silenceMatcher.matches(s.matcher(), instance, rule))) {
|
||
instanceRepo.markSilenced(instance.id(), true);
|
||
notificationRepo.markFailed(n.id(), 0, "silenced");
|
||
continue;
|
||
}
|
||
|
||
var outcome = dispatcher.dispatch(n, rule, instance, conn, context);
|
||
if (outcome.status() == NotificationStatus.DELIVERED) {
|
||
notificationRepo.markDelivered(n.id(), outcome.httpStatus(), outcome.snippet(), Instant.now());
|
||
} else if (outcome.status() == NotificationStatus.FAILED) {
|
||
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
|
||
} else {
|
||
int attempts = n.attempts() + 1;
|
||
if (attempts >= props.effectiveWebhookMaxAttempts()) {
|
||
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
|
||
} else {
|
||
Instant next = Instant.now().plus(outcome.retryAfter().multipliedBy(attempts));
|
||
notificationRepo.scheduleRetry(n.id(), next, outcome.httpStatus(), outcome.snippet());
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): NotificationDispatchJob outbox loop with silence + retry"
|
||
```
|
||
|
||
### Task 31: `InAppInboxQuery` + server-side 5s memoization
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/notify/InAppInboxQuery.java`
|
||
- Test: `.../alerting/notify/InAppInboxQueryTest.java`
|
||
|
||
- [ ] **Step 1: Write the failing test** covering the path (resolves groups/roles from `RbacService.getEffectiveRolesForUser` + `listGroupsForUser`, delegates to `AlertInstanceRepository.listForInbox`/`countUnreadForUser`, second call within 5s returns cached count).
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — Caffeine-style `ConcurrentHashMap<Key, Entry>` with `Entry(count, expiresAt)`, 5 s TTL per `(envId, userId)`.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): InAppInboxQuery with 5s unread-count memoization"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 9 — REST controllers
|
||
|
||
### Task 32: `AlertRuleController` + DTOs
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/controller/AlertRuleController.java`
|
||
- Create: DTOs in `.../alerting/dto/`
|
||
- Test: `.../alerting/controller/AlertRuleControllerIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing IT** — seed an env, authenticate as OPERATOR, POST a rule, GET list, PUT update, DELETE. Assert webhook references to unknown connections return 422. Assert VIEWER cannot POST but can GET. Assert audit log entry on each mutation.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement.** Endpoints (all under `/api/v1/environments/{envSlug}/alerts/rules`, env resolved via `@EnvPath Environment env`):
|
||
|
||
| Method | Path | RBAC |
|
||
|---|---|---|
|
||
| GET | `` | VIEWER+ |
|
||
| POST | `` | OPERATOR+ |
|
||
| GET | `{id}` | VIEWER+ |
|
||
| PUT | `{id}` | OPERATOR+ |
|
||
| DELETE | `{id}` | OPERATOR+ |
|
||
| POST | `{id}/enable` / `{id}/disable` | OPERATOR+ |
|
||
| POST | `{id}/render-preview` | OPERATOR+ |
|
||
| POST | `{id}/test-evaluate` | OPERATOR+ |
|
||
|
||
Key DTOs: `AlertRuleRequest` (with `@Valid AlertConditionDto`), `AlertRuleResponse`, `RenderPreviewRequest/Response`, `TestEvaluateRequest/Response`.
|
||
|
||
On save, validate:
|
||
- Each `WebhookBindingRequest.outboundConnectionId` exists in `outbound_connections` (via `OutboundConnectionService.get(id)` → 422 if 404).
|
||
- Connection is allowed in this env (via `conn.isAllowedInEnvironment(env.id())` → 422 otherwise).
|
||
- SSRF check on connection URL deferred to the outbound-connection save path (Plan 01 territory).
|
||
|
||
Audit via `auditService.log("ALERT_RULE_CREATE", ALERT_RULE_CHANGE, rule.id().toString(), Map.of("name", rule.name()), SUCCESS, request)`.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): AlertRuleController REST + audit + DTOs"
|
||
```
|
||
|
||
### Task 33: `AlertController`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/controller/AlertController.java`, `AlertDto.java`, `UnreadCountResponse.java`
|
||
- Test: `.../alerting/controller/AlertControllerIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing IT** for `GET /alerts`, `GET /alerts/unread-count`, `POST /alerts/{id}/ack`, `POST /alerts/{id}/read`, `POST /alerts/bulk-read`. Assert env isolation (env-A alert not visible from env-B).
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — delegate to `InAppInboxQuery` and `AlertInstanceRepository`. On ack, enforce targeted-or-OPERATOR rule.
|
||
|
||
- [ ] **Step 4: Run — PASS**.
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): AlertController for inbox + ack + read"
|
||
```
|
||
|
||
### Task 34: `AlertSilenceController`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/controller/AlertSilenceController.java`, `AlertSilenceDto.java`
|
||
- Test: `.../alerting/controller/AlertSilenceControllerIT.java`
|
||
|
||
- [ ] **Step 1–5:** Follow the same pattern. Mutations OPERATOR+, audit `ALERT_SILENCE_CHANGE`. Validate `endsAt > startsAt` at controller layer (DB constraint catches it anyway; user-facing 422 is friendlier).
|
||
|
||
### Task 35: `AlertNotificationController`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/controller/AlertNotificationController.java`
|
||
- Test: `.../alerting/controller/AlertNotificationControllerIT.java`
|
||
|
||
- [ ] **Step 1–5:**
|
||
- `GET /alerts/{id}/notifications` → VIEWER+; returns per-instance outbox rows.
|
||
- `POST /alerts/notifications/{id}/retry` → OPERATOR+; resets `next_attempt_at = now`, `attempts = 0`, `status = PENDING`. Flat path because notification IDs are globally unique (document this in the flat-allow-list rule file).
|
||
|
||
- [ ] **Step 6: Update `SecurityConfig` to permit the new paths**
|
||
|
||
In `cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java`:
|
||
|
||
```java
|
||
.requestMatchers(HttpMethod.GET, "/api/v1/environments/*/alerts/**").hasAnyRole("VIEWER","OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/ack").hasAnyRole("VIEWER","OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/bulk-read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
|
||
.requestMatchers(HttpMethod.POST, "/api/v1/alerts/notifications/*/retry").hasAnyRole("OPERATOR","ADMIN")
|
||
```
|
||
|
||
(Class-level `@PreAuthorize` on each controller is authoritative; the path matchers are defence-in-depth.)
|
||
|
||
- [ ] **Step 7: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): AlertNotificationController + SecurityConfig paths"
|
||
```
|
||
|
||
### Task 36: Regenerate OpenAPI schema
|
||
|
||
- [ ] **Step 1: Start backend on :8081** (from the alerting-02 worktree).
|
||
- [ ] **Step 2:** `cd ui && npm run generate-api:live`
|
||
- [ ] **Step 3:** Commit `ui/src/api/schema.d.ts` + `ui/src/api/openapi.json` regen.
|
||
|
||
```bash
|
||
git add ui/src/api/schema.d.ts ui/src/api/openapi.json
|
||
git commit -m "chore(alerting): regenerate openapi schema for alerting endpoints"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 10 — Retention, metrics, rules, verification
|
||
|
||
### Task 37: `AlertingRetentionJob`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/retention/AlertingRetentionJob.java`
|
||
- Test: `.../alerting/retention/AlertingRetentionJobIT.java`
|
||
|
||
- [ ] **Step 1: Write the failing IT** — seed 2 resolved instances (one older than retention, one fresher) + 2 settled notifications; run `cleanup()`; assert only old rows are deleted.
|
||
|
||
- [ ] **Step 2: Run — FAIL**.
|
||
|
||
- [ ] **Step 3: Implement** — `@Scheduled(cron = "0 0 3 * * *")`, cutoffs from `AlertingProperties`, advisory-lock-of-the-day pattern (see `JarRetentionJob.java`).
|
||
|
||
- [ ] **Step 4–5: Run, commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): AlertingRetentionJob daily cleanup"
|
||
```
|
||
|
||
### Task 38: `AlertingMetrics`
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/metrics/AlertingMetrics.java`
|
||
|
||
- [ ] **Step 1: Register metrics** via `MeterRegistry`:
|
||
|
||
```java
|
||
@Component
|
||
public class AlertingMetrics {
|
||
private final MeterRegistry registry;
|
||
public AlertingMetrics(MeterRegistry registry) { this.registry = registry; }
|
||
|
||
public void evalError(ConditionKind kind, UUID ruleId) {
|
||
registry.counter("alerting_eval_errors_total",
|
||
"kind", kind.name(), "rule_id", ruleId.toString()).increment();
|
||
}
|
||
public void circuitOpened(ConditionKind kind) {
|
||
registry.counter("alerting_circuit_open_total", "kind", kind.name()).increment();
|
||
}
|
||
public Timer evalDuration(ConditionKind kind) {
|
||
return registry.timer("alerting_eval_duration_seconds", "kind", kind.name());
|
||
}
|
||
// + gauges via MeterBinder that query repositories
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Wire into `AlertEvaluatorJob` and `PerKindCircuitBreaker`.**
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git commit -m "feat(alerting): observability metrics via micrometer"
|
||
```
|
||
|
||
### Task 39: Update `.claude/rules/app-classes.md` + `core-classes.md`
|
||
|
||
- [ ] **Step 1: Document the new `alerting/` packages** in both rule files. Add a new subsection under `controller/` for the alerting env-scoped controllers. Document the new flat endpoint `/api/v1/alerts/notifications/{id}/retry` in the flat-allow-list with justification "notification IDs are globally unique; matches the `/api/v1/executions/{id}` precedent".
|
||
|
||
- [ ] **Step 2: Commit**
|
||
|
||
```bash
|
||
git add .claude/rules/app-classes.md .claude/rules/core-classes.md
|
||
git commit -m "docs(rules): document alerting/ packages + notification retry flat endpoint"
|
||
```
|
||
|
||
### Task 40: `application.yml` defaults + admin guide
|
||
|
||
**Files:**
|
||
- Modify: `cameleer-server-app/src/main/resources/application.yml`
|
||
- Create: `docs/alerting.md`
|
||
|
||
- [ ] **Step 1: Add default stanza**
|
||
|
||
```yaml
|
||
cameleer:
|
||
server:
|
||
alerting:
|
||
evaluator-tick-interval-ms: 5000
|
||
evaluator-batch-size: 20
|
||
claim-ttl-seconds: 30
|
||
notification-tick-interval-ms: 5000
|
||
notification-batch-size: 50
|
||
in-tick-cache-enabled: true
|
||
circuit-breaker-fail-threshold: 5
|
||
circuit-breaker-window-seconds: 30
|
||
circuit-breaker-cooldown-seconds: 60
|
||
event-retention-days: 90
|
||
notification-retention-days: 30
|
||
webhook-timeout-ms: 5000
|
||
webhook-max-attempts: 3
|
||
```
|
||
|
||
- [ ] **Step 2: Write `docs/alerting.md`** — 1-2 page admin guide covering: rule shapes per condition kind (with example JSON), template variables per kind, webhook destinations (Slack/PagerDuty/Teams examples), silence patterns, troubleshooting (circuit breaker, retention).
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/main/resources/application.yml docs/alerting.md
|
||
git commit -m "docs(alerting): default config + admin guide"
|
||
```
|
||
|
||
### Task 41: Full-lifecycle integration test
|
||
|
||
**Files:**
|
||
- Create: `cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java`
|
||
|
||
- [ ] **Step 1: Write the full-lifecycle IT**
|
||
|
||
Steps in the single test method:
|
||
1. Seed env, user with OPERATOR role, outbound connection (WireMock backing) with HMAC secret.
|
||
2. POST a `LOG_PATTERN` rule pointing at `WireMock` via the outbound connection, `forDurationSeconds=0`, `threshold=1`.
|
||
3. Inject a log row into ClickHouse that matches the pattern.
|
||
4. Trigger `AlertEvaluatorJob.tick()` directly.
|
||
5. Assert one `alert_instances` row in FIRING.
|
||
6. Trigger `NotificationDispatchJob.tick()`.
|
||
7. Assert WireMock received one POST with `X-Cameleer-Signature` header + rendered body.
|
||
8. POST `/alerts/{id}/ack` → state ACKNOWLEDGED.
|
||
9. Create a silence matching this rule; fire another tick; assert `silenced=true` on new instance and WireMock received no second request.
|
||
10. Remove the matching log rows, run tick → instance RESOLVED.
|
||
11. DELETE the rule → assert `alert_instances.rule_id = NULL` but `rule_snapshot` still retains rule name.
|
||
|
||
- [ ] **Step 2: Run — PASS** (may need a few iterations of debugging).
|
||
|
||
- [ ] **Step 3: Commit**
|
||
|
||
```bash
|
||
git add cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java
|
||
git commit -m "test(alerting): full lifecycle — fire, notify, silence, ack, resolve, delete"
|
||
```
|
||
|
||
### Task 42: Env-isolation + outbound-guard regression tests
|
||
|
||
**Files:**
|
||
- Create: `.../alerting/AlertingEnvIsolationIT.java`, `OutboundConnectionAllowedEnvIT.java`
|
||
|
||
- [ ] **Step 1: Env isolation** — rule in env-A, fire, assert invisible from env-B inbox.
|
||
|
||
- [ ] **Step 2: Outbound guard** — rule references a connection restricted to env-A; POST rule creation in env-B → 422. Narrowing `allowed_environment_ids` on the connection while a rule still references it → 409 (this exercises the freshly-wired `rulesReferencing`).
|
||
|
||
- [ ] **Step 3: Run — PASS**.
|
||
|
||
- [ ] **Step 4: Commit**
|
||
|
||
```bash
|
||
git commit -m "test(alerting): env isolation + outbound allowed-env guard"
|
||
```
|
||
|
||
### Task 43: Final verification + GitNexus reindex
|
||
|
||
- [ ] **Step 1: Full build**
|
||
|
||
```bash
|
||
mvn clean verify
|
||
```
|
||
|
||
Expected: All tests pass. Known pre-existing test debt (wrong-JdbcTemplate + shared-context state leaks) may still fail — document any failures that existed before Plan 02 in a commit message "known-pre-existing" note.
|
||
|
||
- [ ] **Step 2: GitNexus reindex**
|
||
|
||
```bash
|
||
npx gitnexus analyze --embeddings
|
||
```
|
||
|
||
- [ ] **Step 3: Manual smoke**
|
||
|
||
Start backend + UI (Plan 01 UI is sufficient for outbound connections). Walk through:
|
||
- Create an outbound connection to `https://httpbin.org/post`.
|
||
- `curl` the alerting REST API to POST a `LOG_PATTERN` rule.
|
||
- Inject a matching log via `POST /api/v1/data/logs`.
|
||
- Wait 2 eval ticks + 1 notification tick.
|
||
- Confirm: `alert_instances` row in FIRING, `alert_notifications` row DELIVERED with HTTP 200, httpbin shows the body.
|
||
- `curl POST /alerts/{id}/ack` → state ACKNOWLEDGED.
|
||
|
||
- [ ] **Step 4: Nothing to commit if all passes — plan complete**
|
||
|
||
---
|
||
|
||
## Known-incomplete items carried into Plan 03
|
||
|
||
- **UI:** `NotificationBell`, `/alerts/**` pages, `<MustacheEditor />` with variable auto-complete, CMD-K alert/rule sources. Open design question: completion engine choice (CodeMirror 6 vs Monaco vs textarea overlay) still open — see spec §20 #7.
|
||
- **Rule promotion across envs.** Pure UI flow (no new server endpoint); lives with the rule editor in Plan 03.
|
||
- **OIDC retrofit** to use `OutboundHttpClientFactory`. Unchanged from Plan 01 — a separate small follow-up.
|
||
- **TLS summary enrichment** on `/test` endpoint (Plan 01 stubbed as `"TLS"`). Can extract actual protocol + cipher suite + peer cert from Apache HttpClient 5's routed context.
|
||
- **Performance tests.** 500-rule, 5-replica `PerformanceIT` deferred; claim-polling concurrency is covered by Task 7's unit-level test.
|
||
- **Bulk promotion** and **mustache completion `variables` metadata endpoint** (`GET /alerts/rules/template-variables`) — deferred until usage patterns justify.
|
||
- **Rule deletion test debt.** Existing pre-Plan-02 test debt (wrong-JdbcTemplate bug in ~9 controller ITs + shared-context state leaks in `FlywayMigrationIT` / `ConfigEnvIsolationIT` / `ClickHouseStatsStoreIT`) is orthogonal and should be addressed in a dedicated test-hygiene pass.
|
||
|
||
---
|
||
|
||
## Self-review
|
||
|
||
**Spec coverage** (against `docs/superpowers/specs/2026-04-19-alerting-design.md`):
|
||
|
||
| Spec § | Scope | Covered by |
|
||
|---|---|---|
|
||
| §2 Signal sources (6) | All 6 condition kinds | Tasks 4, 20–25 |
|
||
| §2 Delivery channels | In-app + webhook | Tasks 29, 30, 31 |
|
||
| §2 Lifecycle (FIRING/ACK/RESOLVED + SILENCED) | State machine + silence | Tasks 26, 18, 30, 33 |
|
||
| §2 Rule promotion | **Deferred to Plan 03 (UI)** | — |
|
||
| §2 CMD-K | **Deferred to Plan 03** | — |
|
||
| §2 Configurable cadence, 5 s floor | `AlertingProperties.effective*` | Task 26 |
|
||
| §3 Key decisions | All 14 decisions honoured | — |
|
||
| §4 Module layout | `core/alerting` + `app/alerting/**` | Tasks 3–11, 15–38 |
|
||
| §4 Touchpoints | `countLogs` + `countExecutionsForAlerting` + `AuditCategory` + `SecurityConfig` | Tasks 2, 12, 13, 35 |
|
||
| §5 Data model | V12 migration | Task 1 |
|
||
| §5 Claim-polling queries | `FOR UPDATE SKIP LOCKED` in rule + notification repos | Tasks 7, 10 |
|
||
| §6 Outbound connections wiring | `rulesReferencing` gate | Task 8 (**CRITICAL**) |
|
||
| §7 Evaluator cadence, state machine, 4 projections, query coalescing, circuit breaker | Tick cache + projections + CB + SchedulingConfigurer | Tasks 14, 19, 26, 27 |
|
||
| §8 Notification dispatch, HMAC, template render, in-app inbox, 5s memoization | Tasks 28, 29, 30, 31 |
|
||
| §9 Rule promotion | **Deferred** (UI) | — |
|
||
| §10 Cross-cutting HTTP | Reused from Plan 01 | — |
|
||
| §11 API surface | All routes implemented except rule promotion | Tasks 32–36 |
|
||
| §12 CMD-K | **Deferred to Plan 03** | — |
|
||
| §13 UI | **Deferred to Plan 03** | — |
|
||
| §14 Configuration | `AlertingProperties` + `application.yml` | Tasks 26, 40 |
|
||
| §15 Retention | Daily job | Task 37 |
|
||
| §16 Observability (metrics + audit) | Tasks 2, 38 |
|
||
| §17 Security (tenant/env, RBAC, SSRF, HMAC, TLS, audit) | Tasks 32–36, 28, Plan 01 |
|
||
| §18 Testing | Unit + IT + WireMock + full-lifecycle | Tasks 17, 19, 27–31, 41, 42 |
|
||
| §19 Rollout | Dormant-by-default; matching `application.yml` + docs | Task 40 |
|
||
| §20 #1 OIDC alignment | **Deferred** (follow-up) | — |
|
||
| §20 #2 secret encryption | Reused Plan 01 `SecretCipher` | Task 29 |
|
||
| §20 #3 CH migration naming | `alerting_projections.sql` | Task 14 |
|
||
| §20 #6 env-delete cascade audit | PG IT | Task 1 |
|
||
| §20 #7 Mustache completion engine | **Deferred** (UI) | — |
|
||
|
||
**Placeholders:** A handful of steps reference real record fields / method names with `/* … */` markers where the exact name depends on what the existing codebase exposes (`ExecutionStats` metric accessors, `AgentInfo.lastHeartbeat` method name, wither-method signatures on `AlertInstance`). Each is accompanied by a `gitnexus_context({name: ...})` hint for the implementer. These are not TBDs — they are direct instructions to resolve against the code at implementation time.
|
||
|
||
**Type consistency check:** `AlertRule`, `AlertInstance`, `AlertNotification`, `AlertSilence` field names in the Java records match the SQL column names (snake_case in SQL, camelCase in Java). `WebhookBinding.id` is used as `alert_notifications.webhook_id` — stable opaque reference. `OutboundConnection.createdBy/updatedBy` types match `users.user_id TEXT` (Plan 01 precedent). `rulesReferencing` signature matches Plan 01's stub `List<UUID> rulesReferencing(UUID)`.
|
||
|
||
**Risks flagged to executor:**
|
||
|
||
1. **Task 16 `MustacheRenderer` missing-variable fallback** is non-trivial in JMustache's default compiler config — implementer may need a second iteration. Tests lock the contract; the implementation approach is flexible.
|
||
2. **Task 12/13** — the SQL dialect for attribute map access on the `executions` table (`attributes[?]`) depends on the actual column type in `init.sql`. If attributes is `Map(String,String)`, the syntax works; if it's stored as JSON string, switch to `JSONExtractString(attributes, ?) = ?`.
|
||
3. **Task 27 `enrichTitleMessage`** depends on `AlertInstance` having wither methods — these are added opportunistically during Task 26 when `AlertStateTransitions` needs them. Don't forget to expose them.
|
||
4. **Claim-polling semantics under schema-per-tenant** — the `?currentSchema=tenant_{id}` JDBC URL routes writes correctly, but the `FOR UPDATE SKIP LOCKED` behaviour is per-schema so cross-tenant locks are irrelevant (correct behaviour). Make sure IT tests run with `cameleer.server.tenant.id=default`.
|
||
5. **Task 41 full-lifecycle test is the canary.** If it fails after each task, pair-program with the failing assertion — the bug is almost always in state transitions or renderer context shape.
|