138 KiB
Alerting — Plan 02 — Backend Implementation
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Deliver the server-side alerting feature described in docs/superpowers/specs/2026-04-19-alerting-design.md — domain model, storage, evaluators for all six condition kinds, notification dispatch (webhook + in-app inbox), REST API, retention, metrics, and integration tests. UI, CMD-K integration, and load tests are explicitly deferred to Plan 03.
Architecture: Confined to new alerting/ packages in both cameleer-server-core (pure records + interfaces) and cameleer-server-app (Spring-wired storage, scheduling, REST). Postgres stores rules/instances/silences/notifications; ClickHouse stores observability data read by evaluators (new countLogs / countExecutionsForAlerting methods, four additive projections). Claim-polling FOR UPDATE SKIP LOCKED makes the evaluator and dispatcher horizontally scalable. Rule→connection wiring (rulesReferencing) is populated in this plan — it is the gate that unlocks safe production use of Plan 01.
Tech Stack: Java 17, Spring Boot 3.4.3, PostgreSQL (Flyway V12), ClickHouse (idempotent init SQL), JMustache for templates, Apache HttpClient 5 via Plan 01's OutboundHttpClientFactory, Testcontainers + JUnit 5 + WireMock + AssertJ for tests.
Base branch
Branch Plan 02 off feat/alerting-01-outbound-infra. Plan 02 depends on Plan 01's OutboundConnection domain, OutboundHttpClientFactory bean, SecretCipher, OutboundConnectionServiceImpl.rulesReferencing() stub, the V11 migration, and the OUTBOUND_CONNECTION_CHANGE / OUTBOUND_HTTP_TRUST_CHANGE audit categories. Branching off main is not an option — those classes do not exist there yet. When Plan 01 merges, rebase Plan 02 onto main; until then Plan 02 is stacked PR #2.
# Execute in a fresh worktree
git fetch origin
git worktree add -b feat/alerting-02-backend .worktrees/alerting-02 feat/alerting-01-outbound-infra
cd .worktrees/alerting-02
mvn clean compile # confirm Plan 01 code compiles as baseline
File Structure
Created — cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/
| File | Responsibility |
|---|---|
AlertingProperties.java |
Not here — see app module. |
AlertRule.java |
Immutable record: id, environmentId, name, description, severity, enabled, conditionKind, condition, evaluationIntervalSeconds, forDurationSeconds, reNotifyMinutes, notificationTitleTmpl, notificationMessageTmpl, webhooks, targets, nextEvaluationAt, claimedBy, claimedUntil, evalState, audit fields. |
AlertCondition.java |
Sealed interface; Jackson DEDUCTION polymorphism root. |
RouteMetricCondition.java |
Record: scope, metric, comparator, threshold, windowSeconds. |
ExchangeMatchCondition.java |
Record: scope, filter, fireMode, threshold, windowSeconds, perExchangeLingerSeconds. |
AgentStateCondition.java |
Record: scope, state, forSeconds. |
DeploymentStateCondition.java |
Record: scope, states. |
LogPatternCondition.java |
Record: scope, level, pattern, threshold, windowSeconds. |
JvmMetricCondition.java |
Record: scope, metric, aggregation, comparator, threshold, windowSeconds. |
AlertScope.java |
Record: appSlug?, routeId?, agentId? — nullable fields, used by all conditions. |
ConditionKind.java |
Enum mirror of SQL condition_kind_enum. |
RouteMetric.java, Comparator.java, AggregationOp.java, FireMode.java |
Enums used in conditions. |
AlertSeverity.java |
Enum mirror of SQL severity_enum. |
AlertState.java |
Enum mirror of SQL alert_state_enum. |
AlertInstance.java |
Immutable record for alert_instances row. |
AlertRuleTarget.java |
Record for alert_rule_targets row. |
TargetKind.java |
Enum mirror of SQL target_kind_enum. |
AlertSilence.java |
Record: id, environmentId, matcher, reason, startsAt, endsAt, createdBy, createdAt. |
SilenceMatcher.java |
Record: ruleId?, appSlug?, routeId?, agentId?, severity?. |
AlertNotification.java |
Record for alert_notifications outbox row. |
NotificationStatus.java |
Enum mirror of SQL notification_status_enum. |
WebhookBinding.java |
Record embedded in alert_rules.webhooks JSONB: id, outboundConnectionId, bodyOverride?, headerOverrides?. |
AlertRuleRepository.java |
CRUD + claim-polling interface. |
AlertInstanceRepository.java |
CRUD + query-for-inbox interface. |
AlertSilenceRepository.java |
CRUD interface. |
AlertNotificationRepository.java |
CRUD + claim-polling interface. |
AlertReadRepository.java |
Mark-read + count-unread interface. |
Created — cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/
| File | Responsibility |
|---|---|
config/AlertingProperties.java |
@ConfigurationProperties("cameleer.server.alerting"). |
config/AlertingBeanConfig.java |
Bean wiring for repositories, evaluators, dispatch, mustache renderer, etc. |
storage/PostgresAlertRuleRepository.java |
JdbcTemplate impl of AlertRuleRepository. |
storage/PostgresAlertInstanceRepository.java |
JdbcTemplate impl. |
storage/PostgresAlertSilenceRepository.java |
JdbcTemplate impl. |
storage/PostgresAlertNotificationRepository.java |
JdbcTemplate impl. |
storage/PostgresAlertReadRepository.java |
JdbcTemplate impl. |
eval/EvalContext.java |
Per-tick context (tenantId, now, tickCache). |
eval/EvalResult.java |
Sealed: Firing(value, threshold, contextMap) / Clear / Error(Throwable). |
eval/TickCache.java |
ConcurrentHashMap<String,Object> discarded per tick. |
eval/PerKindCircuitBreaker.java |
Failure window + cooldown per ConditionKind. |
eval/ConditionEvaluator.java |
Generic interface: evaluate(C, AlertRule, EvalContext). |
eval/RouteMetricEvaluator.java |
Reads StatsStore. |
eval/ExchangeMatchEvaluator.java |
Reads ClickHouseSearchIndex.countExecutionsForAlerting + SearchService.search for PER_EXCHANGE cursor mode. |
eval/AgentStateEvaluator.java |
Reads AgentRegistryService.findAll. |
eval/DeploymentStateEvaluator.java |
Reads DeploymentRepository.findByAppId. |
eval/LogPatternEvaluator.java |
Reads new ClickHouseLogStore.countLogs. |
eval/JvmMetricEvaluator.java |
Reads MetricsQueryStore.queryTimeSeries. |
eval/AlertEvaluatorJob.java |
@Component implementing SchedulingConfigurer; claim-polling loop. |
eval/AlertStateTransitions.java |
Pure function: given current instance + EvalResult → new state + timestamps. |
notify/MustacheRenderer.java |
JMustache wrapper; resilient to bad templates. |
notify/NotificationContextBuilder.java |
Pure: builds context map from AlertInstance + rule + env. |
notify/SilenceMatcher.java |
Pure: evaluates a SilenceMatcher against an AlertInstance. |
notify/InAppInboxQuery.java |
Server-side query helper for /alerts and unread-count. |
notify/WebhookDispatcher.java |
Renders + POSTs + HMAC signs; classifies 2xx/4xx/5xx → status. |
notify/NotificationDispatchJob.java |
@Component SchedulingConfigurer; claim-polling on alert_notifications. |
notify/HmacSigner.java |
Pure: computes sha256=<hmac(secret, body)>. |
retention/AlertingRetentionJob.java |
@Scheduled(cron = "0 0 3 * * *") — delete old alert_instances + alert_notifications. |
controller/AlertRuleController.java |
/api/v1/environments/{envSlug}/alerts/rules. |
controller/AlertController.java |
/api/v1/environments/{envSlug}/alerts + instance actions. |
controller/AlertSilenceController.java |
/api/v1/environments/{envSlug}/alerts/silences. |
controller/AlertNotificationController.java |
/api/v1/environments/{envSlug}/alerts/{id}/notifications, /alerts/notifications/{id}/retry. |
dto/AlertRuleDto.java, dto/AlertDto.java, dto/AlertSilenceDto.java, dto/AlertNotificationDto.java, dto/ConditionDto.java, dto/WebhookBindingDto.java, dto/RenderPreviewRequest.java, dto/RenderPreviewResponse.java, dto/TestEvaluateRequest.java, dto/TestEvaluateResponse.java, dto/UnreadCountResponse.java |
Request/response DTOs. |
metrics/AlertingMetrics.java |
Micrometer registrations for counters/gauges/histograms. |
Created — resources
| File | Responsibility |
|---|---|
cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql |
Flyway migration: 5 enums, 6 tables, indexes, cascades. |
cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql |
4 projections on executions / logs / agent_metrics, all IF NOT EXISTS. |
Modified
| File | Change |
|---|---|
cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java |
Add ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE. |
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java |
Replace the rulesReferencing(UUID) stub with a call through AlertRuleRepository.findRuleIdsByOutboundConnectionId. |
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java |
Add long countLogs(LogSearchRequest) — no FINAL. |
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java |
Add long countExecutionsForAlerting(AlertMatchSpec) — no FINAL. |
cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java |
Run alerting_projections.sql via existing ClickHouseSchemaInitializer. |
cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java |
Permit new /api/v1/environments/{envSlug}/alerts/** path matchers with role-based access. |
cameleer-server-core/pom.xml |
Add com.samskivert:jmustache:1.16. |
.claude/rules/app-classes.md, .claude/rules/core-classes.md |
Document new packages. |
cameleer-server-app/src/main/resources/application.yml |
Default AlertingProperties stanza + comment linking to the admin guide. |
Conventions
- TDD. Every task starts with a failing test, implements the minimum to pass, then commits.
- One commit per task. Commit messages:
feat(alerting): …,test(alerting): …,fix(alerting): …,chore(alerting): …,docs(alerting): …. - Tenant invariant. Every ClickHouse query and Postgres table referencing observability data filters by
tenantId(injected viaAlertingBeanConfigfromcameleer.server.tenant.id). - No
FINALon the two new CH count methods — alerting tolerates brief duplicate counts. - Jackson polymorphism via
@JsonTypeInfo(use = DEDUCTION)with@JsonSubTypesonAlertCondition. - Pure
core/, Spring-only inapp/. No@Component,@Service, or@Scheduledannotations incameleer-server-core. - Claim polling.
FOR UPDATE SKIP LOCKED+claimed_by/claimed_untilwith 30 s TTL. - Instance id for claim ownership: use
InetAddress.getLocalHost().getHostName() + ":" + processPid(); exposed as a bean"alertingInstanceId"of typeString. - GitNexus hygiene. Before modifying any existing class (
OutboundConnectionServiceImpl,ClickHouseLogStore,ClickHouseSearchIndex,AuditCategory,SecurityConfig), rungitnexus_impact({target: "<className>", direction: "upstream"})and report blast radius. Rungitnexus_detect_changes()before each commit.
Phase 1 — Flyway V12 migration and audit categories
Task 1: V12__alerting_tables.sql
Files:
-
Create:
cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql -
Test:
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java -
Step 1: Write the failing integration test
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class V12MigrationIT extends AbstractPostgresIT {
@Test
void allAlertingTablesAndEnumsExist() {
var tables = jdbcTemplate.queryForList(
"SELECT table_name FROM information_schema.tables WHERE table_schema='public' " +
"AND table_name IN ('alert_rules','alert_rule_targets','alert_instances'," +
"'alert_silences','alert_notifications','alert_reads')",
String.class);
assertThat(tables).containsExactlyInAnyOrder(
"alert_rules","alert_rule_targets","alert_instances",
"alert_silences","alert_notifications","alert_reads");
var enums = jdbcTemplate.queryForList(
"SELECT typname FROM pg_type WHERE typname IN " +
"('severity_enum','condition_kind_enum','alert_state_enum'," +
"'target_kind_enum','notification_status_enum')",
String.class);
assertThat(enums).hasSize(5);
}
@Test
void deletingEnvironmentCascadesAlertingRows() {
var envId = java.util.UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-cascade-env");
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES (?, ?, 'x', 'a@b', true)", "u1", "u1");
var ruleId = java.util.UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " +
"notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " +
"VALUES (?, ?, 'r', 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'u1', 'u1')",
ruleId, envId);
var instanceId = java.util.UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO alert_instances (id, rule_id, rule_snapshot, environment_id, state, severity, " +
"fired_at, context, title, message) VALUES (?, ?, '{}'::jsonb, ?, 'FIRING', 'WARNING', " +
"now(), '{}'::jsonb, 't', 'm')",
instanceId, ruleId, envId);
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
assertThat(jdbcTemplate.queryForObject(
"SELECT count(*) FROM alert_rules WHERE environment_id = ?",
Integer.class, envId)).isZero();
assertThat(jdbcTemplate.queryForObject(
"SELECT count(*) FROM alert_instances WHERE environment_id = ?",
Integer.class, envId)).isZero();
}
}
- Step 2: Run the test to verify it fails
Run: mvn -pl cameleer-server-app test -Dtest=V12MigrationIT
Expected: FAIL — tables do not exist.
- Step 3: Write the migration
Create cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql:
-- Enums (outbound_method_enum / outbound_auth_kind_enum / trust_mode_enum already exist from V11)
CREATE TYPE severity_enum AS ENUM ('CRITICAL','WARNING','INFO');
CREATE TYPE condition_kind_enum AS ENUM ('ROUTE_METRIC','EXCHANGE_MATCH','AGENT_STATE','DEPLOYMENT_STATE','LOG_PATTERN','JVM_METRIC');
CREATE TYPE alert_state_enum AS ENUM ('PENDING','FIRING','ACKNOWLEDGED','RESOLVED');
CREATE TYPE target_kind_enum AS ENUM ('USER','GROUP','ROLE');
CREATE TYPE notification_status_enum AS ENUM ('PENDING','DELIVERED','FAILED');
CREATE TABLE alert_rules (
id uuid PRIMARY KEY,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
name varchar(200) NOT NULL,
description text,
severity severity_enum NOT NULL,
enabled boolean NOT NULL DEFAULT true,
condition_kind condition_kind_enum NOT NULL,
condition jsonb NOT NULL,
evaluation_interval_seconds int NOT NULL DEFAULT 60 CHECK (evaluation_interval_seconds >= 5),
for_duration_seconds int NOT NULL DEFAULT 0 CHECK (for_duration_seconds >= 0),
re_notify_minutes int NOT NULL DEFAULT 60 CHECK (re_notify_minutes >= 0),
notification_title_tmpl text NOT NULL,
notification_message_tmpl text NOT NULL,
webhooks jsonb NOT NULL DEFAULT '[]',
next_evaluation_at timestamptz NOT NULL DEFAULT now(),
claimed_by varchar(64),
claimed_until timestamptz,
eval_state jsonb NOT NULL DEFAULT '{}',
created_at timestamptz NOT NULL DEFAULT now(),
created_by text NOT NULL REFERENCES users(user_id),
updated_at timestamptz NOT NULL DEFAULT now(),
updated_by text NOT NULL REFERENCES users(user_id)
);
CREATE INDEX alert_rules_env_idx ON alert_rules (environment_id);
CREATE INDEX alert_rules_claim_due_idx ON alert_rules (next_evaluation_at) WHERE enabled = true;
CREATE TABLE alert_rule_targets (
id uuid PRIMARY KEY,
rule_id uuid NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
target_kind target_kind_enum NOT NULL,
target_id varchar(128) NOT NULL,
UNIQUE (rule_id, target_kind, target_id)
);
CREATE INDEX alert_rule_targets_lookup_idx ON alert_rule_targets (target_kind, target_id);
CREATE TABLE alert_instances (
id uuid PRIMARY KEY,
rule_id uuid REFERENCES alert_rules(id) ON DELETE SET NULL,
rule_snapshot jsonb NOT NULL,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
state alert_state_enum NOT NULL,
severity severity_enum NOT NULL,
fired_at timestamptz NOT NULL,
acked_at timestamptz,
acked_by text REFERENCES users(user_id),
resolved_at timestamptz,
last_notified_at timestamptz,
silenced boolean NOT NULL DEFAULT false,
current_value numeric,
threshold numeric,
context jsonb NOT NULL,
title text NOT NULL,
message text NOT NULL,
target_user_ids text[] NOT NULL DEFAULT '{}',
target_group_ids uuid[] NOT NULL DEFAULT '{}',
target_role_names text[] NOT NULL DEFAULT '{}'
);
CREATE INDEX alert_instances_inbox_idx ON alert_instances (environment_id, state, fired_at DESC);
CREATE INDEX alert_instances_open_rule_idx ON alert_instances (rule_id, state) WHERE rule_id IS NOT NULL;
CREATE INDEX alert_instances_resolved_idx ON alert_instances (resolved_at) WHERE state = 'RESOLVED';
CREATE INDEX alert_instances_target_u_idx ON alert_instances USING GIN (target_user_ids);
CREATE INDEX alert_instances_target_g_idx ON alert_instances USING GIN (target_group_ids);
CREATE INDEX alert_instances_target_r_idx ON alert_instances USING GIN (target_role_names);
CREATE TABLE alert_silences (
id uuid PRIMARY KEY,
environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
matcher jsonb NOT NULL,
reason text,
starts_at timestamptz NOT NULL,
ends_at timestamptz NOT NULL CHECK (ends_at > starts_at),
created_by text NOT NULL REFERENCES users(user_id),
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_silences_active_idx ON alert_silences (environment_id, ends_at);
CREATE TABLE alert_notifications (
id uuid PRIMARY KEY,
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
webhook_id uuid,
outbound_connection_id uuid REFERENCES outbound_connections(id) ON DELETE SET NULL,
status notification_status_enum NOT NULL DEFAULT 'PENDING',
attempts int NOT NULL DEFAULT 0,
next_attempt_at timestamptz NOT NULL DEFAULT now(),
claimed_by varchar(64),
claimed_until timestamptz,
last_response_status int,
last_response_snippet text,
payload jsonb NOT NULL,
delivered_at timestamptz,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_notifications_pending_idx ON alert_notifications (next_attempt_at) WHERE status = 'PENDING';
CREATE INDEX alert_notifications_instance_idx ON alert_notifications (alert_instance_id);
CREATE TABLE alert_reads (
user_id text NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
read_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (user_id, alert_instance_id)
);
Notes:
-
Plan 01 established
users.user_idas TEXT. All FK-to-users columns in this migration aretext, notuuid. -
target_user_idsistext[](matchesusers.user_id). -
outbound_connections(Plan 01) is referenced withON DELETE SET NULL— matches the spec's "409 if referenced" semantics at the app layer while preserving referential cleanup if the admin-facing guard is bypassed. -
Step 4: Run the test to verify it passes
Run: mvn -pl cameleer-server-app test -Dtest=V12MigrationIT
Expected: PASS.
- Step 5: Commit
git add cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java
git commit -m "feat(alerting): V12 flyway migration for alerting tables"
Task 2: Extend AuditCategory
Files:
-
Modify:
cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java -
Test:
cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java -
Step 1: GitNexus impact check
Run gitnexus_impact({target: "AuditCategory", direction: "upstream"}) — report the blast radius (additive enum values are non-breaking; affected files are the admin rule file + any switch statements).
- Step 2: Write the failing test
package com.cameleer.server.core.admin;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class AuditCategoryTest {
@Test
void alertingCategoriesPresent() {
assertThat(AuditCategory.valueOf("ALERT_RULE_CHANGE")).isNotNull();
assertThat(AuditCategory.valueOf("ALERT_SILENCE_CHANGE")).isNotNull();
}
}
- Step 3: Run the test — FAIL
Run: mvn -pl cameleer-server-core test -Dtest=AuditCategoryTest
Expected: FAIL — IllegalArgumentException: No enum constant.
- Step 4: Add the enum values
Replace the whole enum body with:
package com.cameleer.server.core.admin;
public enum AuditCategory {
INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE
}
-
Step 5: Run the test — PASS
-
Step 6: Commit
git add cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java \
cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java
git commit -m "feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories"
Phase 2 — Core domain model
Each task in this phase adds a small, focused set of pure-Java records and enums under cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/. All records use canonical constructors with explicit @NotNull-style defensive copying only for mutable collections (List.copyOf, Map.copyOf). Jackson polymorphism is handled by @JsonTypeInfo(use = DEDUCTION) on AlertCondition.
Task 3: Enums + AlertScope
Files:
-
Create:
.../alerting/AlertSeverity.java,AlertState.java,ConditionKind.java,TargetKind.java,NotificationStatus.java,RouteMetric.java,Comparator.java,AggregationOp.java,FireMode.java,AlertScope.java -
Test:
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java -
Step 1: Write the failing test
package com.cameleer.server.core.alerting;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;
class AlertScopeTest {
@Test
void allFieldsNullIsEnvWide() {
var s = new AlertScope(null, null, null);
assertThat(s.isEnvWide()).isTrue();
}
@Test
void appScoped() {
var s = new AlertScope("orders", null, null);
assertThat(s.isEnvWide()).isFalse();
assertThat(s.appSlug()).isEqualTo("orders");
}
@Test
void enumsHaveExpectedValues() {
assertThat(AlertSeverity.values()).containsExactly(
AlertSeverity.CRITICAL, AlertSeverity.WARNING, AlertSeverity.INFO);
assertThat(AlertState.values()).containsExactly(
AlertState.PENDING, AlertState.FIRING, AlertState.ACKNOWLEDGED, AlertState.RESOLVED);
assertThat(ConditionKind.values()).hasSize(6);
assertThat(TargetKind.values()).containsExactly(
TargetKind.USER, TargetKind.GROUP, TargetKind.ROLE);
assertThat(NotificationStatus.values()).containsExactly(
NotificationStatus.PENDING, NotificationStatus.DELIVERED, NotificationStatus.FAILED);
}
}
- Step 2: Run — FAIL (
cannot find symbol).
Run: mvn -pl cameleer-server-core test -Dtest=AlertScopeTest
- Step 3: Create the files
// AlertSeverity.java
package com.cameleer.server.core.alerting;
public enum AlertSeverity { CRITICAL, WARNING, INFO }
// AlertState.java
package com.cameleer.server.core.alerting;
public enum AlertState { PENDING, FIRING, ACKNOWLEDGED, RESOLVED }
// ConditionKind.java
package com.cameleer.server.core.alerting;
public enum ConditionKind { ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC }
// TargetKind.java
package com.cameleer.server.core.alerting;
public enum TargetKind { USER, GROUP, ROLE }
// NotificationStatus.java
package com.cameleer.server.core.alerting;
public enum NotificationStatus { PENDING, DELIVERED, FAILED }
// RouteMetric.java
package com.cameleer.server.core.alerting;
public enum RouteMetric { ERROR_RATE, P95_LATENCY_MS, P99_LATENCY_MS, THROUGHPUT, ERROR_COUNT }
// Comparator.java
package com.cameleer.server.core.alerting;
public enum Comparator { GT, GTE, LT, LTE, EQ }
// AggregationOp.java
package com.cameleer.server.core.alerting;
public enum AggregationOp { MAX, MIN, AVG, LATEST }
// FireMode.java
package com.cameleer.server.core.alerting;
public enum FireMode { PER_EXCHANGE, COUNT_IN_WINDOW }
// AlertScope.java
package com.cameleer.server.core.alerting;
public record AlertScope(String appSlug, String routeId, String agentId) {
public boolean isEnvWide() { return appSlug == null && routeId == null && agentId == null; }
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java
git commit -m "feat(alerting): core enums + AlertScope"
Task 4: AlertCondition sealed hierarchy + Jackson polymorphism
Files:
-
Create:
.../alerting/AlertCondition.java,RouteMetricCondition.java,ExchangeMatchCondition.java(with nestedExchangeFilter),AgentStateCondition.java,DeploymentStateCondition.java,LogPatternCondition.java,JvmMetricCondition.java -
Test:
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java -
Step 1: Write the failing test
package com.cameleer.server.core.alerting;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class AlertConditionJsonTest {
private final ObjectMapper om = new ObjectMapper();
@Test
void roundtripRouteMetric() throws Exception {
var c = new RouteMetricCondition(
new AlertScope("orders", "route-1", null),
RouteMetric.P99_LATENCY_MS, Comparator.GT, 2000.0, 300);
String json = om.writeValueAsString((AlertCondition) c);
AlertCondition parsed = om.readValue(json, AlertCondition.class);
assertThat(parsed).isInstanceOf(RouteMetricCondition.class);
assertThat(parsed.kind()).isEqualTo(ConditionKind.ROUTE_METRIC);
}
@Test
void roundtripExchangeMatchPerExchange() throws Exception {
var c = new ExchangeMatchCondition(
new AlertScope("orders", null, null),
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of("type","payment")),
FireMode.PER_EXCHANGE, null, null, 300);
String json = om.writeValueAsString((AlertCondition) c);
AlertCondition parsed = om.readValue(json, AlertCondition.class);
assertThat(parsed).isInstanceOf(ExchangeMatchCondition.class);
}
@Test
void roundtripExchangeMatchCountInWindow() throws Exception {
var c = new ExchangeMatchCondition(
new AlertScope("orders", null, null),
new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
FireMode.COUNT_IN_WINDOW, 5, 900, null);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(((ExchangeMatchCondition) parsed).threshold()).isEqualTo(5);
}
@Test
void roundtripAgentState() throws Exception {
var c = new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(AgentStateCondition.class);
}
@Test
void roundtripDeploymentState() throws Exception {
var c = new DeploymentStateCondition(new AlertScope("orders", null, null), List.of("FAILED","DEGRADED"));
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(DeploymentStateCondition.class);
}
@Test
void roundtripLogPattern() throws Exception {
var c = new LogPatternCondition(new AlertScope("orders", null, null),
"ERROR", "TimeoutException", 5, 900);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(LogPatternCondition.class);
}
@Test
void roundtripJvmMetric() throws Exception {
var c = new JvmMetricCondition(new AlertScope("orders", null, null),
"heap_used_percent", AggregationOp.MAX, Comparator.GT, 90.0, 300);
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
assertThat(parsed).isInstanceOf(JvmMetricCondition.class);
}
}
-
Step 2: Run — FAIL.
-
Step 3: Create the sealed hierarchy
// AlertCondition.java
package com.cameleer.server.core.alerting;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
@JsonTypeInfo(use = JsonTypeInfo.Id.DEDUCTION)
@JsonSubTypes({
@JsonSubTypes.Type(RouteMetricCondition.class),
@JsonSubTypes.Type(ExchangeMatchCondition.class),
@JsonSubTypes.Type(AgentStateCondition.class),
@JsonSubTypes.Type(DeploymentStateCondition.class),
@JsonSubTypes.Type(LogPatternCondition.class),
@JsonSubTypes.Type(JvmMetricCondition.class)
})
public sealed interface AlertCondition permits
RouteMetricCondition, ExchangeMatchCondition, AgentStateCondition,
DeploymentStateCondition, LogPatternCondition, JvmMetricCondition {
ConditionKind kind();
AlertScope scope();
}
// RouteMetricCondition.java
package com.cameleer.server.core.alerting;
public record RouteMetricCondition(
AlertScope scope,
RouteMetric metric,
Comparator comparator,
double threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.ROUTE_METRIC; }
}
// ExchangeMatchCondition.java
package com.cameleer.server.core.alerting;
import java.util.Map;
public record ExchangeMatchCondition(
AlertScope scope,
ExchangeFilter filter,
FireMode fireMode,
Integer threshold, // required when COUNT_IN_WINDOW; null for PER_EXCHANGE
Integer windowSeconds, // required when COUNT_IN_WINDOW
Integer perExchangeLingerSeconds // required when PER_EXCHANGE
) implements AlertCondition {
public ExchangeMatchCondition {
if (fireMode == FireMode.COUNT_IN_WINDOW && (threshold == null || windowSeconds == null))
throw new IllegalArgumentException("COUNT_IN_WINDOW requires threshold + windowSeconds");
if (fireMode == FireMode.PER_EXCHANGE && perExchangeLingerSeconds == null)
throw new IllegalArgumentException("PER_EXCHANGE requires perExchangeLingerSeconds");
}
@Override public ConditionKind kind() { return ConditionKind.EXCHANGE_MATCH; }
public record ExchangeFilter(String status, Map<String, String> attributes) {
public ExchangeFilter { attributes = attributes == null ? Map.of() : Map.copyOf(attributes); }
}
}
// AgentStateCondition.java
package com.cameleer.server.core.alerting;
public record AgentStateCondition(AlertScope scope, String state, int forSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
}
// DeploymentStateCondition.java
package com.cameleer.server.core.alerting;
import java.util.List;
public record DeploymentStateCondition(AlertScope scope, List<String> states) implements AlertCondition {
public DeploymentStateCondition { states = List.copyOf(states); }
@Override public ConditionKind kind() { return ConditionKind.DEPLOYMENT_STATE; }
}
// LogPatternCondition.java
package com.cameleer.server.core.alerting;
public record LogPatternCondition(
AlertScope scope,
String level,
String pattern,
int threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.LOG_PATTERN; }
}
// JvmMetricCondition.java
package com.cameleer.server.core.alerting;
public record JvmMetricCondition(
AlertScope scope,
String metric,
AggregationOp aggregation,
Comparator comparator,
double threshold,
int windowSeconds) implements AlertCondition {
@Override public ConditionKind kind() { return ConditionKind.JVM_METRIC; }
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java
git commit -m "feat(alerting): sealed AlertCondition hierarchy with Jackson deduction"
Task 5: Core data records (AlertRule, AlertInstance, AlertSilence, SilenceMatcher, AlertRuleTarget, AlertNotification, WebhookBinding)
Files:
-
Create: the seven records above under
.../alerting/ -
Test:
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java -
Step 1: Write the failing test
package com.cameleer.server.core.alerting;
import org.junit.jupiter.api.Test;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class AlertDomainRecordsTest {
@Test
void alertRuleDefensiveCopy() {
var webhooks = new java.util.ArrayList<WebhookBinding>();
webhooks.add(new WebhookBinding(UUID.randomUUID(), UUID.randomUUID(), null, null));
var r = newRule(webhooks);
webhooks.clear();
assertThat(r.webhooks()).hasSize(1);
}
@Test
void silenceMatcherAllFieldsNullMatchesEverything() {
var m = new SilenceMatcher(null, null, null, null, null);
assertThat(m.isWildcard()).isTrue();
}
private AlertRule newRule(List<WebhookBinding> wh) {
return new AlertRule(
UUID.randomUUID(), UUID.randomUUID(), "r", null,
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
60, 0, 60, "t", "m", wh, List.of(),
Instant.now(), null, null, Map.of(),
Instant.now(), "u1", Instant.now(), "u1");
}
}
-
Step 2: Run — FAIL.
-
Step 3: Create the records
// AlertRule.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
public record AlertRule(
UUID id,
UUID environmentId,
String name,
String description,
AlertSeverity severity,
boolean enabled,
ConditionKind conditionKind,
AlertCondition condition,
int evaluationIntervalSeconds,
int forDurationSeconds,
int reNotifyMinutes,
String notificationTitleTmpl,
String notificationMessageTmpl,
List<WebhookBinding> webhooks,
List<AlertRuleTarget> targets,
Instant nextEvaluationAt,
String claimedBy,
Instant claimedUntil,
Map<String, Object> evalState,
Instant createdAt,
String createdBy,
Instant updatedAt,
String updatedBy) {
public AlertRule {
webhooks = webhooks == null ? List.of() : List.copyOf(webhooks);
targets = targets == null ? List.of() : List.copyOf(targets);
evalState = evalState == null ? Map.of() : Map.copyOf(evalState);
}
}
// AlertInstance.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
public record AlertInstance(
UUID id,
UUID ruleId, // nullable after rule deletion
Map<String, Object> ruleSnapshot,
UUID environmentId,
AlertState state,
AlertSeverity severity,
Instant firedAt,
Instant ackedAt,
String ackedBy,
Instant resolvedAt,
Instant lastNotifiedAt,
boolean silenced,
Double currentValue,
Double threshold,
Map<String, Object> context,
String title,
String message,
List<String> targetUserIds,
List<UUID> targetGroupIds,
List<String> targetRoleNames) {
public AlertInstance {
ruleSnapshot = ruleSnapshot == null ? Map.of() : Map.copyOf(ruleSnapshot);
context = context == null ? Map.of() : Map.copyOf(context);
targetUserIds = targetUserIds == null ? List.of() : List.copyOf(targetUserIds);
targetGroupIds = targetGroupIds == null ? List.of() : List.copyOf(targetGroupIds);
targetRoleNames = targetRoleNames == null ? List.of() : List.copyOf(targetRoleNames);
}
}
// AlertRuleTarget.java
package com.cameleer.server.core.alerting;
import java.util.UUID;
public record AlertRuleTarget(UUID id, UUID ruleId, TargetKind kind, String targetId) {}
// WebhookBinding.java
package com.cameleer.server.core.alerting;
import java.util.Map;
import java.util.UUID;
public record WebhookBinding(
UUID id,
UUID outboundConnectionId,
String bodyOverride,
Map<String, String> headerOverrides) {
public WebhookBinding {
headerOverrides = headerOverrides == null ? Map.of() : Map.copyOf(headerOverrides);
}
}
// SilenceMatcher.java
package com.cameleer.server.core.alerting;
import java.util.UUID;
public record SilenceMatcher(
UUID ruleId, String appSlug, String routeId, String agentId, AlertSeverity severity) {
public boolean isWildcard() {
return ruleId == null && appSlug == null && routeId == null && agentId == null && severity == null;
}
}
// AlertSilence.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.UUID;
public record AlertSilence(
UUID id,
UUID environmentId,
SilenceMatcher matcher,
String reason,
Instant startsAt,
Instant endsAt,
String createdBy,
Instant createdAt) {}
// AlertNotification.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.Map;
import java.util.UUID;
public record AlertNotification(
UUID id,
UUID alertInstanceId,
UUID webhookId,
UUID outboundConnectionId,
NotificationStatus status,
int attempts,
Instant nextAttemptAt,
String claimedBy,
Instant claimedUntil,
Integer lastResponseStatus,
String lastResponseSnippet,
Map<String, Object> payload,
Instant deliveredAt,
Instant createdAt) {
public AlertNotification {
payload = payload == null ? Map.of() : Map.copyOf(payload);
}
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java
git commit -m "feat(alerting): core domain records (rule, instance, silence, notification)"
Task 6: Repository interfaces
Files:
-
Create:
.../alerting/AlertRuleRepository.java,AlertInstanceRepository.java,AlertSilenceRepository.java,AlertNotificationRepository.java,AlertReadRepository.java -
No test (pure interfaces — covered by the Phase 3 integration tests).
-
Step 1: Create the interfaces
// AlertRuleRepository.java
package com.cameleer.server.core.alerting;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertRuleRepository {
AlertRule save(AlertRule rule); // upsert by id
Optional<AlertRule> findById(UUID id);
List<AlertRule> listByEnvironment(UUID environmentId);
List<AlertRule> findAllByOutboundConnectionId(UUID connectionId);
List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId); // used by rulesReferencing()
void delete(UUID id);
/** Claim up to batchSize rules whose next_evaluation_at <= now AND (claimed_until IS NULL OR claimed_until < now).
* Atomically sets claimed_by + claimed_until = now + ttl. Returns claimed rules. */
List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds);
/** Release claim + bump next_evaluation_at. */
void releaseClaim(UUID ruleId, java.time.Instant nextEvaluationAt,
java.util.Map<String, Object> evalState);
}
// AlertInstanceRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertInstanceRepository {
AlertInstance save(AlertInstance instance); // upsert by id
Optional<AlertInstance> findById(UUID id);
Optional<AlertInstance> findOpenForRule(UUID ruleId); // state IN ('PENDING','FIRING','ACKNOWLEDGED')
List<AlertInstance> listForInbox(UUID environmentId,
List<String> userGroupIdFilter, // UUIDs as String? decide impl-side
String userId,
List<String> userRoleNames,
int limit);
long countUnreadForUser(UUID environmentId, String userId);
void ack(UUID id, String userId, Instant when);
void resolve(UUID id, Instant when);
void markSilenced(UUID id, boolean silenced);
void deleteResolvedBefore(Instant cutoff);
}
// AlertSilenceRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertSilenceRepository {
AlertSilence save(AlertSilence silence);
Optional<AlertSilence> findById(UUID id);
List<AlertSilence> listActive(UUID environmentId, Instant when);
List<AlertSilence> listByEnvironment(UUID environmentId);
void delete(UUID id);
}
// AlertNotificationRepository.java
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
public interface AlertNotificationRepository {
AlertNotification save(AlertNotification n);
Optional<AlertNotification> findById(UUID id);
List<AlertNotification> listForInstance(UUID alertInstanceId);
List<AlertNotification> claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds);
void markDelivered(UUID id, int status, String snippet, Instant when);
void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet);
void markFailed(UUID id, int status, String snippet);
void deleteSettledBefore(Instant cutoff);
}
// AlertReadRepository.java
package com.cameleer.server.core.alerting;
import java.util.List;
import java.util.UUID;
public interface AlertReadRepository {
void markRead(String userId, UUID alertInstanceId);
void bulkMarkRead(String userId, List<UUID> alertInstanceIds);
}
- Step 2: Compile
Run: mvn -pl cameleer-server-core compile
Expected: SUCCESS.
- Step 3: Commit
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/Alert*Repository.java
git commit -m "feat(alerting): core repository interfaces"
Phase 3 — Postgres repositories
All repositories use JdbcTemplate and ObjectMapper for JSONB columns (same pattern as PostgresOutboundConnectionRepository). Convert UUID[] with ConnectionCallback + Array.of("uuid", ...) and text[] with Array.of("text", ...).
Task 7: PostgresAlertRuleRepository
Files:
-
Create:
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java -
Test:
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java -
Step 1: Write the failing integration test
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Test;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
class PostgresAlertRuleRepositoryIT extends AbstractPostgresIT {
private PostgresAlertRuleRepository repo;
private UUID envId;
@AfterEach
void cleanup() {
jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId);
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-user'");
}
@org.junit.jupiter.api.BeforeEach
void setup() {
repo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
envId = UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-env-" + UUID.randomUUID());
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES ('test-user', 'test-user', 'x', 'a@b', true)");
}
@Test
void saveAndFindByIdRoundtrip() {
var rule = newRule(List.of());
repo.save(rule);
var found = repo.findById(rule.id()).orElseThrow();
assertThat(found.name()).isEqualTo(rule.name());
assertThat(found.condition()).isInstanceOf(AgentStateCondition.class);
}
@Test
void findRuleIdsByOutboundConnectionId() {
var connId = UUID.randomUUID();
var wb = new WebhookBinding(UUID.randomUUID(), connId, null, Map.of());
var rule = newRule(List.of(wb));
repo.save(rule);
List<UUID> ids = repo.findRuleIdsByOutboundConnectionId(connId);
assertThat(ids).containsExactly(rule.id());
assertThat(repo.findRuleIdsByOutboundConnectionId(UUID.randomUUID())).isEmpty();
}
@Test
void claimDueRulesAtomicSkipLocked() {
var rule = newRule(List.of());
repo.save(rule);
List<AlertRule> claimed = repo.claimDueRules("instance-A", 10, 30);
assertThat(claimed).hasSize(1);
// Second claimant sees nothing until first releases or TTL expires
List<AlertRule> second = repo.claimDueRules("instance-B", 10, 30);
assertThat(second).isEmpty();
}
private AlertRule newRule(List<WebhookBinding> webhooks) {
return new AlertRule(
UUID.randomUUID(), envId, "rule-" + UUID.randomUUID(), "desc",
AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null, null, null), "DEAD", 60),
60, 0, 60, "t", "m", webhooks, List.of(),
Instant.now().minusSeconds(10), null, null, Map.of(),
Instant.now(), "test-user", Instant.now(), "test-user");
}
}
-
Step 2: Run — FAIL.
-
Step 3: Implement the repository
package com.cameleer.server.app.alerting.storage;
import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.postgresql.util.PGobject;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.sql.Types;
import java.time.Instant;
import java.util.*;
public class PostgresAlertRuleRepository implements AlertRuleRepository {
private final JdbcTemplate jdbc;
private final ObjectMapper om;
public PostgresAlertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
this.jdbc = jdbc;
this.om = om;
}
@Override
public AlertRule save(AlertRule r) {
String sql = """
INSERT INTO alert_rules (id, environment_id, name, description, severity, enabled,
condition_kind, condition, evaluation_interval_seconds, for_duration_seconds,
re_notify_minutes, notification_title_tmpl, notification_message_tmpl,
webhooks, next_evaluation_at, claimed_by, claimed_until, eval_state,
created_at, created_by, updated_at, updated_by)
VALUES (?, ?, ?, ?, ?::severity_enum, ?, ?::condition_kind_enum, ?::jsonb, ?, ?, ?, ?, ?, ?::jsonb,
?, ?, ?, ?::jsonb, ?, ?, ?, ?)
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name, description = EXCLUDED.description,
severity = EXCLUDED.severity, enabled = EXCLUDED.enabled,
condition_kind = EXCLUDED.condition_kind, condition = EXCLUDED.condition,
evaluation_interval_seconds = EXCLUDED.evaluation_interval_seconds,
for_duration_seconds = EXCLUDED.for_duration_seconds,
re_notify_minutes = EXCLUDED.re_notify_minutes,
notification_title_tmpl = EXCLUDED.notification_title_tmpl,
notification_message_tmpl = EXCLUDED.notification_message_tmpl,
webhooks = EXCLUDED.webhooks, eval_state = EXCLUDED.eval_state,
updated_at = EXCLUDED.updated_at, updated_by = EXCLUDED.updated_by
""";
jdbc.update(sql,
r.id(), r.environmentId(), r.name(), r.description(),
r.severity().name(), r.enabled(), r.conditionKind().name(),
writeJson(r.condition()),
r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
r.notificationTitleTmpl(), r.notificationMessageTmpl(),
writeJson(r.webhooks()),
Timestamp.from(r.nextEvaluationAt()),
r.claimedBy(),
r.claimedUntil() == null ? null : Timestamp.from(r.claimedUntil()),
writeJson(r.evalState()),
Timestamp.from(r.createdAt()), r.createdBy(),
Timestamp.from(r.updatedAt()), r.updatedBy());
return r;
}
@Override
public Optional<AlertRule> findById(UUID id) {
var list = jdbc.query("SELECT * FROM alert_rules WHERE id = ?", rowMapper(), id);
return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
}
@Override
public List<AlertRule> listByEnvironment(UUID environmentId) {
return jdbc.query(
"SELECT * FROM alert_rules WHERE environment_id = ? ORDER BY created_at DESC",
rowMapper(), environmentId);
}
@Override
public List<AlertRule> findAllByOutboundConnectionId(UUID connectionId) {
String sql = """
SELECT * FROM alert_rules
WHERE webhooks @> ?::jsonb
ORDER BY created_at DESC
""";
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
return jdbc.query(sql, rowMapper(), predicate);
}
@Override
public List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId) {
String sql = """
SELECT id FROM alert_rules
WHERE webhooks @> ?::jsonb
""";
String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
return jdbc.queryForList(sql, UUID.class, predicate);
}
@Override
public void delete(UUID id) {
jdbc.update("DELETE FROM alert_rules WHERE id = ?", id);
}
@Override
public List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds) {
String sql = """
UPDATE alert_rules
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
WHERE id IN (
SELECT id FROM alert_rules
WHERE enabled = true
AND next_evaluation_at <= now()
AND (claimed_until IS NULL OR claimed_until < now())
ORDER BY next_evaluation_at
LIMIT ?
FOR UPDATE SKIP LOCKED
)
RETURNING *
""";
return jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize);
}
@Override
public void releaseClaim(UUID ruleId, Instant nextEvaluationAt, Map<String, Object> evalState) {
jdbc.update("""
UPDATE alert_rules
SET claimed_by = NULL, claimed_until = NULL,
next_evaluation_at = ?, eval_state = ?::jsonb
WHERE id = ?
""",
Timestamp.from(nextEvaluationAt), writeJson(evalState), ruleId);
}
private RowMapper<AlertRule> rowMapper() {
return (rs, i) -> {
ConditionKind kind = ConditionKind.valueOf(rs.getString("condition_kind"));
AlertCondition cond = om.readValue(rs.getString("condition"), AlertCondition.class);
List<WebhookBinding> webhooks = om.readValue(
rs.getString("webhooks"), new TypeReference<>() {});
Map<String, Object> evalState = om.readValue(
rs.getString("eval_state"), new TypeReference<>() {});
Timestamp cu = rs.getTimestamp("claimed_until");
return new AlertRule(
(UUID) rs.getObject("id"),
(UUID) rs.getObject("environment_id"),
rs.getString("name"),
rs.getString("description"),
AlertSeverity.valueOf(rs.getString("severity")),
rs.getBoolean("enabled"),
kind, cond,
rs.getInt("evaluation_interval_seconds"),
rs.getInt("for_duration_seconds"),
rs.getInt("re_notify_minutes"),
rs.getString("notification_title_tmpl"),
rs.getString("notification_message_tmpl"),
webhooks, List.of(),
rs.getTimestamp("next_evaluation_at").toInstant(),
rs.getString("claimed_by"),
cu == null ? null : cu.toInstant(),
evalState,
rs.getTimestamp("created_at").toInstant(),
rs.getString("created_by"),
rs.getTimestamp("updated_at").toInstant(),
rs.getString("updated_by"));
};
}
private String writeJson(Object o) {
try { return om.writeValueAsString(o); }
catch (Exception e) { throw new IllegalStateException(e); }
}
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_rules"
Task 8: Wire OutboundConnectionServiceImpl.rulesReferencing() (CRITICAL — Plan 01 gate)
This is the Plan 01 known-incomplete item. Plan 01 shipped
rulesReferencing()returning[]. Until this task lands, outbound connections can be deleted or narrowed while rules reference them, corrupting production. Do not skip or defer.
Files:
-
Modify:
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java -
Modify:
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java -
Test:
cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java -
Step 1: GitNexus impact check
Run gitnexus_impact({target: "OutboundConnectionServiceImpl", direction: "upstream"}). Report blast radius. Expected: controller + bean config + UI hooks (Plan 01). No production paths should be affected by replacing a stub with real behaviour.
- Step 2: Write the failing integration test
package com.cameleer.server.app.outbound;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.*;
import com.cameleer.server.core.outbound.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class OutboundConnectionServiceRulesReferencingIT extends AbstractPostgresIT {
@Autowired OutboundConnectionService service;
@Autowired OutboundConnectionRepository repo;
private UUID envId;
private UUID connId;
private PostgresAlertRuleRepository ruleRepo;
@BeforeEach
void seed() {
ruleRepo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
envId = UUID.randomUUID();
jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "env-" + UUID.randomUUID());
jdbcTemplate.update(
"INSERT INTO users (user_id, username, password_hash, email, enabled) " +
"VALUES ('u-ref', 'u-ref', 'x', 'a@b', true) ON CONFLICT DO NOTHING");
var c = repo.save(new OutboundConnection(
UUID.randomUUID(), "default", "conn", null, "https://example.test",
OutboundMethod.POST, Map.of(), null, TrustMode.SYSTEM_DEFAULT, List.of(), null,
OutboundAuth.None.INSTANCE, List.of(),
Instant.now(), "u-ref", Instant.now(), "u-ref"));
connId = c.id();
var rule = new AlertRule(
UUID.randomUUID(), envId, "r", null, AlertSeverity.WARNING, true,
ConditionKind.AGENT_STATE,
new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
60, 0, 60, "t", "m",
List.of(new WebhookBinding(UUID.randomUUID(), connId, null, Map.of())),
List.of(), Instant.now(), null, null, Map.of(),
Instant.now(), "u-ref", Instant.now(), "u-ref");
ruleRepo.save(rule);
}
@Test
void deleteConnectionReferencedByRuleReturns409() {
assertThat(service.rulesReferencing(connId)).hasSize(1);
assertThatThrownBy(() -> service.delete(connId, "u-ref"))
.hasMessageContaining("referenced by rules");
}
}
-
Step 3: Run — FAIL (stub returns empty list, so delete succeeds).
-
Step 4: Replace the stub
In OutboundConnectionServiceImpl.java:
// existing imports + add:
import com.cameleer.server.core.alerting.AlertRuleRepository;
public class OutboundConnectionServiceImpl implements OutboundConnectionService {
private final OutboundConnectionRepository repo;
private final AlertRuleRepository ruleRepo; // NEW
private final String tenantId;
public OutboundConnectionServiceImpl(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
String tenantId) {
this.repo = repo;
this.ruleRepo = ruleRepo;
this.tenantId = tenantId;
}
// … create/update/delete/get/list unchanged …
@Override
public List<UUID> rulesReferencing(UUID id) {
return ruleRepo.findRuleIdsByOutboundConnectionId(id);
}
}
Update OutboundBeanConfig.java to inject AlertRuleRepository:
@Bean
public OutboundConnectionService outboundConnectionService(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
@Value("${cameleer.server.tenant.id:default}") String tenantId) {
return new OutboundConnectionServiceImpl(repo, ruleRepo, tenantId);
}
Add the AlertRuleRepository bean in a new AlertingBeanConfig.java stub (completed in Phase 7):
package com.cameleer.server.app.alerting.config;
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.AlertRuleRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
@Configuration
public class AlertingBeanConfig {
@Bean
public AlertRuleRepository alertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertRuleRepository(jdbc, om);
}
}
-
Step 5: Run — PASS.
-
Step 6: GitNexus detect_changes + commit
# Verify scope
# gitnexus_detect_changes({scope: "staged"})
git add cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java
git commit -m "fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)"
Task 9: PostgresAlertInstanceRepository
Files:
-
Create:
.../alerting/storage/PostgresAlertInstanceRepository.java -
Test:
.../alerting/storage/PostgresAlertInstanceRepositoryIT.java -
Step 1: Write the failing test covering: save/findById, findOpenForRule (filter
state IN ('PENDING','FIRING','ACKNOWLEDGED')), listForInbox with user/group/role filters (seed 3 instances: one targeting user, one targeting group, one targeting role; assert listForInbox returns all three for a user in those groups/roles), countUnreadForUser (uses LEFT JOINalert_reads), ack, resolve, deleteResolvedBefore. -
Step 2: Run — FAIL.
-
Step 3: Implement — same RowMapper pattern as Task 7. Key queries:
-- findOpenForRule
SELECT * FROM alert_instances
WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED')
ORDER BY fired_at DESC LIMIT 1;
-- listForInbox (bind userId, groupIds array, roleNames array as ? placeholders)
SELECT * FROM alert_instances
WHERE environment_id = ?
AND state IN ('FIRING','ACKNOWLEDGED','RESOLVED')
AND (
? = ANY(target_user_ids)
OR target_group_ids && ?::uuid[]
OR target_role_names && ?::text[]
)
ORDER BY fired_at DESC LIMIT ?;
-- countUnreadForUser
SELECT count(*) FROM alert_instances ai
WHERE ai.environment_id = ?
AND ai.state IN ('FIRING','ACKNOWLEDGED')
AND (
? = ANY(ai.target_user_ids)
OR ai.target_group_ids && ?::uuid[]
OR ai.target_role_names && ?::text[]
)
AND NOT EXISTS (
SELECT 1 FROM alert_reads ar
WHERE ar.alert_instance_id = ai.id AND ar.user_id = ?
);
Array binding via connection.createArrayOf("uuid", uuids) / createArrayOf("text", names) inside a ConnectionCallback.
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_instances with inbox queries"
Task 10: PostgresAlertSilenceRepository, PostgresAlertNotificationRepository, PostgresAlertReadRepository
Files:
-
Create: three repositories under
.../alerting/storage/ -
Test: one IT per repository in
.../alerting/storage/ -
Step 1: Write all three failing ITs (one file each). Cover:
Silence: save/findById, listActive filters bynow BETWEEN starts_at AND ends_at, delete.Notification: save/findById, claimDueNotifications (SKIP LOCKED), scheduleRetry bumps attempts +next_attempt_at, markDelivered + markFailed transition status, deleteSettledBefore purgesDELIVERED+FAILED.Read: markRead is idempotent (usesON CONFLICT DO NOTHING), bulkMarkRead handles empty list.
-
Step 2: Run — FAIL.
-
Step 3: Implement following the same JdbcTemplate pattern. Notification claim query mirrors Task 7's rule claim:
UPDATE alert_notifications
SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
WHERE id IN (
SELECT id FROM alert_notifications
WHERE status = 'PENDING'
AND next_attempt_at <= now()
AND (claimed_until IS NULL OR claimed_until < now())
ORDER BY next_attempt_at
LIMIT ?
FOR UPDATE SKIP LOCKED
)
RETURNING *;
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/ \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/Postgres*IT.java
git commit -m "feat(alerting): Postgres repositories for silences, notifications, reads"
Task 11: Wire all alerting repositories in AlertingBeanConfig
Files:
-
Modify:
.../alerting/config/AlertingBeanConfig.java -
Step 1: Add beans for the remaining repositories
@Bean public AlertInstanceRepository alertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertInstanceRepository(jdbc, om);
}
@Bean public AlertSilenceRepository alertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertSilenceRepository(jdbc, om);
}
@Bean public AlertNotificationRepository alertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) {
return new PostgresAlertNotificationRepository(jdbc, om);
}
@Bean public AlertReadRepository alertReadRepository(JdbcTemplate jdbc) {
return new PostgresAlertReadRepository(jdbc);
}
- Step 2: Verify compile + existing ITs still pass
mvn -pl cameleer-server-app test -Dtest='PostgresAlert*IT'
- Step 3: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java
git commit -m "feat(alerting): wire all alerting repository beans"
Phase 4 — ClickHouse reads: new count methods and projections
Task 12: Add ClickHouseLogStore.countLogs(LogSearchRequest)
Files:
-
Modify:
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java -
Test:
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java -
Step 1: GitNexus impact check
Run gitnexus_impact({target: "ClickHouseLogStore", direction: "upstream"}). Expected callers: LogQueryController, ContainerLogForwarder, ClickHouseConfig. Adding a method is non-breaking — no downstream callers affected.
- Step 2: Write the failing test
package com.cameleer.server.app.search;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.search.LogSearchRequest;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import java.time.Instant;
import java.util.List;
import static org.assertj.core.api.Assertions.assertThat;
class ClickHouseLogStoreCountIT extends AbstractPostgresIT {
@Autowired ClickHouseLogStore store;
@Test
void countLogsRespectsLevelPatternAndWindow() {
// Seed 3 ERROR TimeoutException + 2 INFO rows in 'orders' app for env 'dev' within last 5 min
// (seed helper uses existing `indexBatch` path)
long count = store.countLogs(new LogSearchRequest(
/* environment */ "dev",
/* application */ "orders",
/* agentId */ null,
/* exchangeId */ null,
/* logger */ null,
/* sources */ List.of(),
/* levels */ List.of("ERROR"),
/* q */ "TimeoutException",
/* from */ Instant.now().minusSeconds(300),
/* to */ Instant.now(),
/* cursor */ null,
/* limit */ 100,
/* sort */ "desc"
));
assertThat(count).isEqualTo(3);
}
}
(Adjust LogSearchRequest constructor to the actual record signature — check cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java for exact order.)
-
Step 3: Run — FAIL.
-
Step 4: Implement the method
In ClickHouseLogStore.java, add a new public method. Reuse the WHERE-clause builder already used by search(LogSearchRequest), but:
- No
FINAL. - Skip cursor, limit, sort.
SELECT count() FROM logs WHERE <tenant + env + app + level IN (...) + logger + q LIKE + timestamp BETWEEN>.- Include the
tenant_id = ?predicate.
public long countLogs(LogSearchRequest request) {
StringBuilder where = new StringBuilder("tenant_id = ? AND timestamp BETWEEN ? AND ?");
List<Object> args = new ArrayList<>();
args.add(tenantId);
args.add(Timestamp.from(request.from()));
args.add(Timestamp.from(request.to()));
if (request.environment() != null) { where.append(" AND environment = ?"); args.add(request.environment()); }
if (request.application() != null) { where.append(" AND application = ?"); args.add(request.application()); }
// … level multi, logger, q (positionCaseInsensitive(message, ?) > 0), exchangeId, agentId …
String sql = "SELECT count() FROM logs WHERE " + where; // NO FINAL
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
return n == null ? 0L : n;
}
(Imports: java.sql.Timestamp, java.util.ArrayList.)
-
Step 5: Run — PASS.
-
Step 6: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java
git commit -m "feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator"
Task 13: Add ClickHouseSearchIndex.countExecutionsForAlerting(AlertMatchSpec)
Files:
-
Create:
cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java -
Modify:
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java -
Test:
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java -
Step 1: GitNexus impact check
Run gitnexus_impact({target: "ClickHouseSearchIndex", direction: "upstream"}). Additive method — no downstream breakage.
- Step 2: Create
AlertMatchSpecrecord
package com.cameleer.server.core.alerting;
import java.time.Instant;
import java.util.Map;
/** Specification for alerting-specific execution counting.
* Distinct from SearchRequest: no text-in-body subqueries, no cursor, no FINAL.
* All fields except tenant/env/from/to are nullable filters. */
public record AlertMatchSpec(
String tenantId,
String environment,
String applicationId, // nullable
String routeId, // nullable
String status, // "FAILED" / "SUCCESS" / null
Map<String, String> attributes, // exact match on execution attribute key=value
Instant from,
Instant to,
Instant after // nullable; used by PER_EXCHANGE to advance cursor
) {
public AlertMatchSpec {
attributes = attributes == null ? Map.of() : Map.copyOf(attributes);
}
}
-
Step 3: Write the failing test — seed a mix of FAILED/SUCCESS executions with various attribute maps, assert count matches.
-
Step 4: Run — FAIL.
-
Step 5: Implement on
ClickHouseSearchIndex
public long countExecutionsForAlerting(AlertMatchSpec spec) {
StringBuilder where = new StringBuilder(
"tenant_id = ? AND environment = ? AND start_time BETWEEN ? AND ?");
List<Object> args = new ArrayList<>();
args.add(spec.tenantId());
args.add(spec.environment());
args.add(Timestamp.from(spec.from()));
args.add(Timestamp.from(spec.to()));
if (spec.applicationId() != null) { where.append(" AND application_id = ?"); args.add(spec.applicationId()); }
if (spec.routeId() != null) { where.append(" AND route_id = ?"); args.add(spec.routeId()); }
if (spec.status() != null) { where.append(" AND status = ?"); args.add(spec.status()); }
if (spec.after() != null) {
where.append(" AND start_time > ?");
args.add(Timestamp.from(spec.after()));
}
// attribute filters: use Map column access — pattern matches existing search() impl
for (var e : spec.attributes().entrySet()) {
where.append(" AND attributes[?] = ?");
args.add(e.getKey());
args.add(e.getValue());
}
String sql = "SELECT count() FROM executions WHERE " + where; // NO FINAL
Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
return n == null ? 0L : n;
}
-
Step 6: Run — PASS.
-
Step 7: Commit
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java
git commit -m "feat(alerting): countExecutionsForAlerting for exchange-match evaluator"
Task 14: ClickHouse projections migration
Files:
-
Create:
cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql -
Modify: the schema initializer invocation site (likely
ClickHouseConfigorClickHouseSchemaInitializer) to also run this file on startup. -
Step 1: Write the SQL file
-- Additive, idempotent. Safe to drop + rebuild with no data loss.
ALTER TABLE executions
ADD PROJECTION IF NOT EXISTS alerting_app_status
(SELECT * ORDER BY (tenant_id, environment, application_id, status, start_time));
ALTER TABLE executions
ADD PROJECTION IF NOT EXISTS alerting_route_status
(SELECT * ORDER BY (tenant_id, environment, route_id, status, start_time));
ALTER TABLE logs
ADD PROJECTION IF NOT EXISTS alerting_app_level
(SELECT * ORDER BY (tenant_id, environment, application, level, timestamp));
ALTER TABLE agent_metrics
ADD PROJECTION IF NOT EXISTS alerting_instance_metric
(SELECT * ORDER BY (tenant_id, environment, instance_id, metric_name, collected_at));
ALTER TABLE executions MATERIALIZE PROJECTION alerting_app_status;
ALTER TABLE executions MATERIALIZE PROJECTION alerting_route_status;
ALTER TABLE logs MATERIALIZE PROJECTION alerting_app_level;
ALTER TABLE agent_metrics MATERIALIZE PROJECTION alerting_instance_metric;
(Adjust table column names to match real init.sql — confirm application vs application_id on the logs and agent_metrics tables.)
- Step 2: Hook into
ClickHouseSchemaInitializer
Find the initializer and add a second invocation:
runIdempotent("clickhouse/init.sql");
runIdempotent("clickhouse/alerting_projections.sql");
- Step 3: Add a smoke IT
@Test
void projectionsExistAfterStartup() {
var names = jdbcTemplate.queryForList(
"SELECT name FROM system.projections WHERE table IN ('executions','logs','agent_metrics')",
String.class);
assertThat(names).contains(
"alerting_app_status","alerting_route_status","alerting_app_level","alerting_instance_metric");
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql \
cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/search/AlertingProjectionsIT.java
git commit -m "feat(alerting): ClickHouse projections for alerting read paths"
Phase 5 — Mustache templating and silence matching
Task 15: Add JMustache dependency
Files:
-
Modify:
cameleer-server-core/pom.xml -
Step 1: Add dependency
<dependency>
<groupId>com.samskivert</groupId>
<artifactId>jmustache</artifactId>
<version>1.16</version>
</dependency>
- Step 2: Verify resolve
Run: mvn -pl cameleer-server-core dependency:resolve
- Step 3: Commit
git add cameleer-server-core/pom.xml
git commit -m "chore(alerting): add jmustache 1.16"
Task 16: MustacheRenderer
Files:
-
Create:
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java -
Test:
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java -
Step 1: Write the failing test
package com.cameleer.server.app.alerting.notify;
import org.junit.jupiter.api.Test;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
class MustacheRendererTest {
private final MustacheRenderer r = new MustacheRenderer();
@Test
void rendersSimpleTemplate() {
String out = r.render("Hello {{name}}", Map.of("name", "world"));
assertThat(out).isEqualTo("Hello world");
}
@Test
void rendersNestedPath() {
String out = r.render("{{alert.severity}}", Map.of("alert", Map.of("severity","CRITICAL")));
assertThat(out).isEqualTo("CRITICAL");
}
@Test
void missingVariableRendersLiteral() {
String out = r.render("{{missing.path}}", Map.of());
assertThat(out).isEqualTo("{{missing.path}}");
}
@Test
void malformedTemplateReturnsRawWithWarn() {
String out = r.render("{{unclosed", Map.of("unclosed","x"));
assertThat(out).isEqualTo("{{unclosed");
}
}
-
Step 2: Run — FAIL.
-
Step 3: Implement
package com.cameleer.server.app.alerting.notify;
import com.samskivert.mustache.Mustache;
import com.samskivert.mustache.Template;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.util.Map;
@Component
public class MustacheRenderer {
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
private final Mustache.Compiler compiler = Mustache.compiler()
.nullValue("")
.emptyStringIsFalse(true)
.defaultValue(null) // null triggers MissingContext -> we intercept below
.escapeHTML(false);
public String render(String template, Map<String, Object> context) {
if (template == null) return "";
try {
Template t = compiler.compile(template);
return t.execute(new LiteralFallbackContext(context));
} catch (Exception e) {
log.warn("Mustache render failed for template='{}': {}", abbreviate(template), e.getMessage());
return template;
}
}
/** Returns `{{path}}` literal when a variable is missing. */
private static class LiteralFallbackContext {
private final Map<String, Object> map;
LiteralFallbackContext(Map<String, Object> map) { this.map = map; }
// JMustache uses reflection / Map lookup, so we rely on wrapping the missing-value callback:
// easiest approach: compile with a custom `Mustache.Compiler.Loader` and intercept resolution.
// Simpler: post-process the output to detect unresolved `{{}}` sections → not possible after render.
// Alternative: pre-flight — scan template tokens against context and replace unresolved tokens
// with the literal before compilation. Use this simple approach:
}
}
Simpler implementation (ships for v1):
@Component
public class MustacheRenderer {
private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
private static final java.util.regex.Pattern TOKEN =
java.util.regex.Pattern.compile("\\{\\{\\s*([a-zA-Z0-9_.]+)\\s*}}");
private final Mustache.Compiler compiler = Mustache.compiler()
.defaultValue("")
.escapeHTML(false);
public String render(String template, Map<String, Object> context) {
if (template == null) return "";
String resolved = preResolve(template, context);
try {
return compiler.compile(resolved).execute(context);
} catch (Exception e) {
log.warn("Mustache render failed: {}", e.getMessage());
return template;
}
}
/** Replaces `{{missing.path}}` with the literal so Mustache sees a non-tag string. */
private String preResolve(String template, Map<String, Object> context) {
var m = TOKEN.matcher(template);
var sb = new StringBuilder();
while (m.find()) {
String path = m.group(1);
if (resolvePath(context, path) == null) {
m.appendReplacement(sb, java.util.regex.Matcher.quoteReplacement("{{" + path + "}}"));
// Replace the {{}} with {{{ literal }}} once we escape it — but jmustache will not re-process.
// Simpler: just wrap in a triple-brace or surround with a marker. For v1 we skip the double-expand:
// we return the LITERAL inside a section {{#_literal_123}}... so preResolve returns a string
// that Mustache will not modify. Concrete approach:
}
}
m.appendTail(sb);
return sb.toString();
}
private Object resolvePath(Map<String, Object> ctx, String path) {
Object cur = ctx;
for (String seg : path.split("\\.")) {
if (!(cur instanceof Map<?,?> m)) return null;
cur = m.get(seg);
if (cur == null) return null;
}
return cur;
}
}
Engineer note: Prefer a pre-compile token substitution that replaces {{missing.path}} with a literal that Mustache renders as-is. One working approach: write a custom Mustache.VariableFetcher via compiler.withFormatter(...) — but JMustache's Mustache.Compiler#withCollector() is easier. Confirm during implementation and adjust this task; the tests in Step 1 lock the contract. If JMustache's API makes missing-variable fallback awkward, fall back to a regex-based substitutor that does {{ → ⟦MUSTACHE_LITERAL:path⟧ for missing paths, then post-replace after render. The contract is: unresolved {{x}} renders as literal {{x}}.
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java
git commit -m "feat(alerting): MustacheRenderer with literal fallback on missing vars"
Task 17: NotificationContextBuilder
Files:
-
Create:
.../alerting/notify/NotificationContextBuilder.java -
Test:
.../alerting/notify/NotificationContextBuilderTest.java -
Step 1: Write the failing test covering:
- env / rule / alert subtrees always present
- conditional trees:
exchange.*present only for EXCHANGE_MATCH,log.*only for LOG_PATTERN, etc. alert.linkuses the configuredcameleer.server.ui-originprefix if present, else/alerts/inbox/{id}.
-
Step 2: Run — FAIL.
-
Step 3: Implement — pure static
Map<String,Object> build(AlertRule, AlertInstance, Environment, String uiOrigin). -
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java
git commit -m "feat(alerting): NotificationContextBuilder for template context maps"
Task 18: SilenceMatcher evaluator
Files:
-
Create:
.../alerting/notify/SilenceMatcherService.java(named to avoid clash with core recordSilenceMatcher) -
Test:
.../alerting/notify/SilenceMatcherServiceTest.java -
Step 1: Write the failing test covering truth table:
- Wildcard matcher → matches any instance.
- Matcher with
ruleIdonly → matches only instances with that rule. - Multiple fields → AND logic.
- Active-window check at notification time (not at eval time).
-
Step 2: Run — FAIL.
-
Step 3: Implement
@Component
public class SilenceMatcherService {
public boolean matches(SilenceMatcher m, AlertInstance instance, AlertRule rule) {
if (m.ruleId() != null && !m.ruleId().equals(instance.ruleId())) return false;
if (m.severity()!= null && m.severity() != instance.severity()) return false;
if (m.appSlug() != null && !m.appSlug().equals(rule.condition().scope().appSlug())) return false;
if (m.routeId() != null && !m.routeId().equals(rule.condition().scope().routeId())) return false;
if (m.agentId() != null && !m.agentId().equals(rule.condition().scope().agentId())) return false;
return true;
}
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/SilenceMatcherServiceTest.java
git commit -m "feat(alerting): silence matcher for notification-time dispatch"
Phase 6 — Condition evaluators
All six evaluators share this shape:
public sealed interface ConditionEvaluator<C extends AlertCondition>
permits RouteMetricEvaluator, ExchangeMatchEvaluator, AgentStateEvaluator,
DeploymentStateEvaluator, LogPatternEvaluator, JvmMetricEvaluator {
ConditionKind kind();
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}
Supporting types (create these in Task 19 before implementing individual evaluators).
Task 19: EvalContext, EvalResult, TickCache, PerKindCircuitBreaker, ConditionEvaluator interface
Files:
-
Create:
.../alerting/eval/EvalContext.java,EvalResult.java,TickCache.java,PerKindCircuitBreaker.java,ConditionEvaluator.java -
Test:
.../alerting/eval/TickCacheTest.java,PerKindCircuitBreakerTest.java -
Step 1: Write the failing tests
// TickCacheTest.java
@Test
void getOrComputeCachesWithinTick() {
var cache = new TickCache();
int n = cache.getOrCompute("k", () -> 42);
int m = cache.getOrCompute("k", () -> 43);
assertThat(n).isEqualTo(42);
assertThat(m).isEqualTo(42); // cached
}
// PerKindCircuitBreakerTest.java
@Test
void opensAfterFailThreshold() {
var cb = new PerKindCircuitBreaker(5, 30, 60, java.time.Clock.fixed(...));
for (int i = 0; i < 5; i++) cb.recordFailure(ConditionKind.AGENT_STATE);
assertThat(cb.isOpen(ConditionKind.AGENT_STATE)).isTrue();
}
@Test
void closesAfterCooldown() { /* advance clock beyond cooldown window */ }
- Step 2: Implement
// EvalContext.java
package com.cameleer.server.app.alerting.eval;
import java.time.Instant;
public record EvalContext(String tenantId, Instant now, TickCache tickCache) {}
// EvalResult.java
package com.cameleer.server.app.alerting.eval;
import java.util.Map;
public sealed interface EvalResult {
record Firing(Double currentValue, Double threshold, Map<String, Object> context) implements EvalResult {
public Firing { context = context == null ? Map.of() : Map.copyOf(context); }
}
record Clear() implements EvalResult {
public static final Clear INSTANCE = new Clear();
}
record Error(Throwable cause) implements EvalResult {}
}
// TickCache.java
package com.cameleer.server.app.alerting.eval;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Supplier;
public class TickCache {
private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<>();
@SuppressWarnings("unchecked")
public <T> T getOrCompute(String key, Supplier<T> supplier) {
return (T) map.computeIfAbsent(key, k -> supplier.get());
}
}
// PerKindCircuitBreaker.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.ConditionKind;
import java.time.Clock;
import java.time.Duration;
import java.time.Instant;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
public class PerKindCircuitBreaker {
private record State(Deque<Instant> failures, Instant openUntil) {}
private final int threshold;
private final Duration window;
private final Duration cooldown;
private final Clock clock;
private final ConcurrentHashMap<ConditionKind, State> byKind = new ConcurrentHashMap<>();
public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds, Clock clock) {
this.threshold = threshold;
this.window = Duration.ofSeconds(windowSeconds);
this.cooldown = Duration.ofSeconds(cooldownSeconds);
this.clock = clock;
}
public void recordFailure(ConditionKind kind) {
byKind.compute(kind, (k, s) -> {
var deque = (s == null) ? new ArrayDeque<Instant>() : new ArrayDeque<>(s.failures());
Instant now = Instant.now(clock);
Instant cutoff = now.minus(window);
while (!deque.isEmpty() && deque.peekFirst().isBefore(cutoff)) deque.pollFirst();
deque.addLast(now);
Instant openUntil = (deque.size() >= threshold) ? now.plus(cooldown) : null;
return new State(deque, openUntil);
});
}
public boolean isOpen(ConditionKind kind) {
State s = byKind.get(kind);
return s != null && s.openUntil() != null && Instant.now(clock).isBefore(s.openUntil());
}
public void recordSuccess(ConditionKind kind) {
byKind.compute(kind, (k, s) -> new State(new ArrayDeque<>(), null));
}
}
// ConditionEvaluator.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.*;
public interface ConditionEvaluator<C extends AlertCondition> {
ConditionKind kind();
EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}
(sealed permits … is omitted on the interface to avoid a multi-file compile-order gotcha during the TDD sequence. The effective constraint is enforced by the dispatcher's switch over ConditionKind.)
-
Step 3: Run — PASS.
-
Step 4: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/
git commit -m "feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)"
Task 20: AgentStateEvaluator
Files:
-
Create:
.../alerting/eval/AgentStateEvaluator.java -
Test:
.../alerting/eval/AgentStateEvaluatorTest.java -
Step 1: Write the failing test
@Test
void firesWhenAnyAgentInTargetStateForScope() {
var registry = mock(AgentRegistryService.class);
when(registry.findAll()).thenReturn(List.of(
new AgentInfo("a1","a1","orders", "env-uuid","1.0", List.of(), Map.of(),
AgentState.DEAD, Instant.now().minusSeconds(120), Instant.now().minusSeconds(120), null)
));
var eval = new AgentStateEvaluator(registry);
var rule = ruleWith(new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60));
EvalResult r = eval.evaluate((AgentStateCondition) rule.condition(), rule,
new EvalContext("default", Instant.now(), new TickCache()));
assertThat(r).isInstanceOf(EvalResult.Firing.class);
}
@Test
void clearWhenNoMatchingAgents() { /* ... */ }
-
Step 2: Run — FAIL.
-
Step 3: Implement
@Component
public class AgentStateEvaluator implements ConditionEvaluator<AgentStateCondition> {
private final AgentRegistryService registry;
public AgentStateEvaluator(AgentRegistryService registry) { this.registry = registry; }
@Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
@Override
public EvalResult evaluate(AgentStateCondition c, AlertRule rule, EvalContext ctx) {
AgentState target = AgentState.valueOf(c.state());
Instant cutoff = ctx.now().minusSeconds(c.forSeconds());
List<AgentInfo> hits = registry.findAll().stream()
.filter(a -> matchesScope(a, c.scope()))
.filter(a -> a.state() == target)
.filter(a -> a.lastHeartbeat() != null && a.lastHeartbeat().isBefore(cutoff))
.toList();
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
AgentInfo first = hits.get(0);
return new EvalResult.Firing(
(double) hits.size(), null,
Map.of("agent", Map.of(
"id", first.instanceId(),
"name", first.displayName(),
"state", first.state().name()
), "app", Map.of("slug", first.applicationId())));
}
private static boolean matchesScope(AgentInfo a, AlertScope s) {
if (s.appSlug() != null && !s.appSlug().equals(a.applicationId())) return false;
if (s.agentId() != null && !s.agentId().equals(a.instanceId())) return false;
return true;
}
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluatorTest.java
git commit -m "feat(alerting): AGENT_STATE evaluator"
Task 21: DeploymentStateEvaluator
Files:
-
Create:
.../alerting/eval/DeploymentStateEvaluator.java -
Test:
.../alerting/eval/DeploymentStateEvaluatorTest.java -
Step 1: Write the failing test —
FAILEDdeployment for matching app → Firing;RUNNING→ Clear. -
Step 2: Run — FAIL.
-
Step 3: Implement — read via
DeploymentRepository.findByAppIdandAppService.getByEnvironmentAndSlug:
@Override
public EvalResult evaluate(DeploymentStateCondition c, AlertRule rule, EvalContext ctx) {
App app = appService.getByEnvironmentAndSlug(rule.environmentId(), c.scope().appSlug()).orElse(null);
if (app == null) return EvalResult.Clear.INSTANCE;
List<Deployment> current = deploymentRepo.findByAppId(app.id());
Set<String> wanted = Set.copyOf(c.states());
var hits = current.stream()
.filter(d -> wanted.contains(d.status().name()))
.toList();
if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
Deployment d = hits.get(0);
return new EvalResult.Firing((double) hits.size(), null,
Map.of("deployment", Map.of("id", d.id().toString(), "status", d.status().name()),
"app", Map.of("slug", app.slug())));
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): DEPLOYMENT_STATE evaluator"
Task 22: RouteMetricEvaluator
Files:
-
Create:
.../alerting/eval/RouteMetricEvaluator.java -
Test:
.../alerting/eval/RouteMetricEvaluatorTest.java -
Step 1: Write the failing test — mock
StatsStore, seedExecutionStats{p99Ms = 2500, ...}for a scoped call, assert Firing withcurrentValue = 2500, threshold = 2000. -
Step 2: Run — FAIL.
-
Step 3: Implement — dispatch on
RouteMetricenum:
@Override
public EvalResult evaluate(RouteMetricCondition c, AlertRule rule, EvalContext ctx) {
Instant from = ctx.now().minusSeconds(c.windowSeconds());
Instant to = ctx.now();
String env = environmentService.findById(rule.environmentId()).map(Environment::slug).orElse(null);
ExecutionStats stats = (c.scope().routeId() != null)
? statsStore.statsForRoute(from, to, c.scope().routeId(), c.scope().appSlug(), env)
: (c.scope().appSlug() != null)
? statsStore.statsForApp(from, to, c.scope().appSlug(), env)
: statsStore.stats(from, to, env);
double actual = switch (c.metric()) {
case ERROR_RATE -> errorRate(stats);
case P95_LATENCY_MS -> stats.p95DurationMs();
case P99_LATENCY_MS -> stats.p99DurationMs();
case THROUGHPUT -> stats.totalCount();
case ERROR_COUNT -> stats.failedCount();
};
boolean fire = switch (c.comparator()) {
case GT -> actual > c.threshold();
case GTE -> actual >= c.threshold();
case LT -> actual < c.threshold();
case LTE -> actual <= c.threshold();
case EQ -> actual == c.threshold();
};
if (!fire) return EvalResult.Clear.INSTANCE;
return new EvalResult.Firing(actual, c.threshold(),
Map.of("route", Map.of("id", c.scope().routeId() == null ? "" : c.scope().routeId()),
"app", Map.of("slug", c.scope().appSlug() == null ? "" : c.scope().appSlug())));
}
private double errorRate(ExecutionStats s) {
long total = s.totalCount();
return total == 0 ? 0.0 : (double) s.failedCount() / total;
}
(Adjust method names on ExecutionStats to match the actual record — use gitnexus_context({name: "ExecutionStats"}) if unsure.)
-
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): ROUTE_METRIC evaluator"
Task 23: LogPatternEvaluator
Files:
-
Create:
.../alerting/eval/LogPatternEvaluator.java -
Test:
.../alerting/eval/LogPatternEvaluatorTest.java -
Step 1: Write the failing test — mock
ClickHouseLogStore.countLogsreturning 7; threshold 5 → Firing; returning 3 → Clear. -
Step 2: Run — FAIL.
-
Step 3: Implement — build a
LogSearchRequestfrom the condition + window, delegate tocountLogs. UseTickCachekeyed on(env, app, level, pattern, windowStart, windowEnd)to coalesce. -
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): LOG_PATTERN evaluator"
Task 24: JvmMetricEvaluator
Files:
-
Create:
.../alerting/eval/JvmMetricEvaluator.java -
Test:
.../alerting/eval/JvmMetricEvaluatorTest.java -
Step 1: Write the failing test — mock
MetricsQueryStore.queryTimeSeriesfor("agent-1", ["heap_used_percent"], from, to, 1)returning{heap_used_percent: [Bucket{max=95.0}]}; assert Firing with currentValue=95. -
Step 2: Run — FAIL.
-
Step 3: Implement — aggregate across buckets per
AggregationOp(MAX/MIN/AVG/LATEST), compare against threshold. -
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): JVM_METRIC evaluator"
Task 25: ExchangeMatchEvaluator (PER_EXCHANGE + COUNT_IN_WINDOW)
Files:
-
Create:
.../alerting/eval/ExchangeMatchEvaluator.java -
Test:
.../alerting/eval/ExchangeMatchEvaluatorTest.java -
Step 1: Write the failing test — two variants:
COUNT_IN_WINDOW: mockClickHouseSearchIndex.countExecutionsForAlerting→ threshold check.PER_EXCHANGE:eval_state.lastExchangeTscursor advancement. Seed 3 matching exchanges; first eval returns all 3 as separate Firings (emit a list? or change signature?). For v1 simplicity, the evaluator returnsEvalResult.Firingwith an internal list of exchange descriptors in the context map; the job handles one-alert-per-exchange fan-out.
-
Step 2: Run — FAIL.
-
Step 3: Implement. The key design decision is how PER_EXCHANGE returns multiple alerts. Simplest approach: extend
EvalResultwith aBatchvariant:
record Batch(List<Firing> firings) implements EvalResult { ... }
Add this to EvalResult.java (Task 19). The job (Task 27) detects Batch and creates one AlertInstance per Firing. This keeps non-batched evaluators simple.
-
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): EXCHANGE_MATCH evaluator with per-exchange + count modes"
Phase 7 — Evaluator job and state transitions
Task 26: AlertingProperties + AlertStateTransitions
Files:
-
Create:
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java -
Create:
.../alerting/eval/AlertStateTransitions.java -
Test:
.../alerting/eval/AlertStateTransitionsTest.java -
Step 1: Write the failing test for the pure state machine:
@Test
void clearWithNoOpenInstanceIsNoOp() {
var next = AlertStateTransitions.apply(null, EvalResult.Clear.INSTANCE, rule, now);
assertThat(next).isEmpty();
}
@Test
void firingWithNoOpenInstanceCreatesPendingIfForDuration() {
var rule = ruleBuilder().forDurationSeconds(60).build();
var result = new EvalResult.Firing(2500.0, 2000.0, Map.of());
var next = AlertStateTransitions.apply(null, result, rule, now);
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.PENDING));
}
@Test
void firingWithNoForDurationGoesStraightToFiring() {
var rule = ruleBuilder().forDurationSeconds(0).build();
var next = AlertStateTransitions.apply(null, new EvalResult.Firing(1.0, null, Map.of()), rule, now);
assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.FIRING));
}
@Test
void pendingPromotesToFiringAfterForDuration() { /* ... */ }
@Test
void firingClearTransitionsToResolved() { /* ... */ }
@Test
void ackedInstanceClearsToResolved() { /* preserves acked_by, sets resolved_at */ }
-
Step 2: Run — FAIL.
-
Step 3: Implement
// AlertStateTransitions.java
package com.cameleer.server.app.alerting.eval;
import com.cameleer.server.core.alerting.*;
import java.time.Instant;
import java.util.*;
public final class AlertStateTransitions {
private AlertStateTransitions() {}
/** Returns the new/updated AlertInstance, or empty when nothing changes. */
public static Optional<AlertInstance> apply(
AlertInstance current, EvalResult result, AlertRule rule, Instant now) {
return switch (result) {
case EvalResult.Clear c -> onClear(current, now);
case EvalResult.Firing f -> onFiring(current, f, rule, now);
case EvalResult.Error e -> Optional.empty();
case EvalResult.Batch b -> Optional.empty(); // batch handled by the job, not here
};
}
private static Optional<AlertInstance> onFiring(AlertInstance current, EvalResult.Firing f,
AlertRule rule, Instant now) {
if (current == null) {
AlertState initial = rule.forDurationSeconds() > 0 ? AlertState.PENDING : AlertState.FIRING;
return Optional.of(newInstance(rule, f, initial, now));
}
if (current.state() == AlertState.PENDING) {
Instant firedAt = current.firedAt();
if (firedAt.plusSeconds(rule.forDurationSeconds()).isBefore(now)) {
return Optional.of(current /* copy with state=FIRING, firedAt=now */);
}
return Optional.of(current); // stay PENDING, no mutation
}
return Optional.empty(); // already FIRING/ACK — re-notification handled by dispatcher
}
private static Optional<AlertInstance> onClear(AlertInstance current, Instant now) {
if (current == null) return Optional.empty();
if (current.state() == AlertState.RESOLVED) return Optional.empty();
return Optional.of(current /* copy with state=RESOLVED, resolvedAt=now */);
}
private static AlertInstance newInstance(AlertRule rule, EvalResult.Firing f, AlertState state, Instant now) {
// ... construct from rule snapshot + context; title/message rendered by the job
throw new UnsupportedOperationException("stub");
}
}
Flesh out the .withState(...) / .withResolvedAt(...) helpers on AlertInstance (add wither-style methods returning new records) as part of this task.
// AlertingProperties.java
package com.cameleer.server.app.alerting.config;
import org.springframework.boot.context.properties.ConfigurationProperties;
@ConfigurationProperties("cameleer.server.alerting")
public record AlertingProperties(
Integer evaluatorTickIntervalMs,
Integer evaluatorBatchSize,
Integer claimTtlSeconds,
Integer notificationTickIntervalMs,
Integer notificationBatchSize,
Boolean inTickCacheEnabled,
Integer circuitBreakerFailThreshold,
Integer circuitBreakerWindowSeconds,
Integer circuitBreakerCooldownSeconds,
Integer eventRetentionDays,
Integer notificationRetentionDays,
Integer webhookTimeoutMs,
Integer webhookMaxAttempts) {
public int effectiveEvaluatorTickIntervalMs() {
int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs;
return Math.max(5000, raw); // floor
}
public int effectiveEvaluatorBatchSize() { return evaluatorBatchSize == null ? 20 : evaluatorBatchSize; }
public int effectiveClaimTtlSeconds() { return claimTtlSeconds == null ? 30 : claimTtlSeconds; }
public int effectiveNotificationTickIntervalMs(){ return notificationTickIntervalMs == null ? 5000 : notificationTickIntervalMs; }
public int effectiveNotificationBatchSize() { return notificationBatchSize == null ? 50 : notificationBatchSize; }
public int effectiveEventRetentionDays() { return eventRetentionDays == null ? 90 : eventRetentionDays; }
public int effectiveNotificationRetentionDays() { return notificationRetentionDays == null ? 30 : notificationRetentionDays; }
public int effectiveWebhookTimeoutMs() { return webhookTimeoutMs == null ? 5000 : webhookTimeoutMs; }
public int effectiveWebhookMaxAttempts() { return webhookMaxAttempts == null ? 3 : webhookMaxAttempts; }
public int cbFailThreshold() { return circuitBreakerFailThreshold == null ? 5 : circuitBreakerFailThreshold; }
public int cbWindowSeconds() { return circuitBreakerWindowSeconds == null ? 30 : circuitBreakerWindowSeconds; }
public int cbCooldownSeconds(){ return circuitBreakerCooldownSeconds== null ? 60 : circuitBreakerCooldownSeconds; }
}
Register via @ConfigurationPropertiesScan or explicit @EnableConfigurationProperties(AlertingProperties.class) in AlertingBeanConfig. Also clamp-with-WARN if evaluatorTickIntervalMs < 5000 at startup.
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java \
cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java
git commit -m "feat(alerting): AlertingProperties + AlertStateTransitions state machine"
Task 27: AlertEvaluatorJob
Files:
-
Create:
.../alerting/eval/AlertEvaluatorJob.java -
Test:
.../alerting/eval/AlertEvaluatorJobIT.java -
Step 1: Write the failing integration test (uses real PG + mocked evaluators):
@Test
void claimDueRuleFireResolveCycle() throws Exception {
// seed one rule scoped to a non-existent agent state -> evaluator returns Clear -> no instance.
// flip the mock to return Firing -> one AlertInstance in FIRING state.
// flip back to Clear -> instance transitions to RESOLVED.
}
-
Step 2: Run — FAIL.
-
Step 3: Implement
@Component
public class AlertEvaluatorJob implements SchedulingConfigurer {
private static final Logger log = LoggerFactory.getLogger(AlertEvaluatorJob.class);
private final AlertingProperties props;
private final AlertRuleRepository ruleRepo;
private final AlertInstanceRepository instanceRepo;
private final AlertNotificationRepository notificationRepo;
private final Map<ConditionKind, ConditionEvaluator<?>> evaluators;
private final PerKindCircuitBreaker circuitBreaker;
private final MustacheRenderer renderer;
private final NotificationContextBuilder contextBuilder;
private final String instanceId;
private final String tenantId;
private final AlertingMetrics metrics;
private final Clock clock;
public AlertEvaluatorJob(/* ...all above... */) { /* assign */ }
@Override
public void configureTasks(ScheduledTaskRegistrar registrar) {
registrar.addFixedDelayTask(this::tick, props.effectiveEvaluatorTickIntervalMs());
}
void tick() {
List<AlertRule> claimed = ruleRepo.claimDueRules(
instanceId, props.effectiveEvaluatorBatchSize(), props.effectiveClaimTtlSeconds());
TickCache cache = new TickCache();
EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), cache);
for (AlertRule rule : claimed) {
if (circuitBreaker.isOpen(rule.conditionKind())) {
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
continue;
}
try {
EvalResult result = evaluateSafely(rule, ctx);
applyResult(rule, result);
circuitBreaker.recordSuccess(rule.conditionKind());
} catch (Exception e) {
circuitBreaker.recordFailure(rule.conditionKind());
metrics.evalError(rule.conditionKind(), rule.id());
log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString());
} finally {
reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
}
}
}
@SuppressWarnings({"rawtypes","unchecked"})
private EvalResult evaluateSafely(AlertRule rule, EvalContext ctx) {
ConditionEvaluator evaluator = evaluators.get(rule.conditionKind());
if (evaluator == null) throw new IllegalStateException("No evaluator for " + rule.conditionKind());
return evaluator.evaluate(rule.condition(), rule, ctx);
}
private void applyResult(AlertRule rule, EvalResult result) {
if (result instanceof EvalResult.Batch b) {
for (EvalResult.Firing f : b.firings()) applyFiring(rule, f);
return;
}
AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null);
AlertStateTransitions.apply(current, result, rule, Instant.now(clock)).ifPresent(next -> {
AlertInstance persisted = instanceRepo.save(
enrichTitleMessage(rule, next, result));
if (next.state() == AlertState.FIRING && current == null) {
enqueueNotifications(rule, persisted);
}
});
}
private void applyFiring(AlertRule rule, EvalResult.Firing f) { /* always create new instance for PER_EXCHANGE mode */ }
private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance, EvalResult result) {
Map<String,Object> ctx = contextBuilder.build(rule, instance, /* env lookup */ null, /* uiOrigin */ null);
String title = renderer.render(rule.notificationTitleTmpl(), ctx);
String message = renderer.render(rule.notificationMessageTmpl(), ctx);
return instance /* .withTitle(title).withMessage(message) */;
}
private void enqueueNotifications(AlertRule rule, AlertInstance instance) {
for (WebhookBinding w : rule.webhooks()) {
Map<String,Object> payload = /* context-builder + body override */ Map.of();
notificationRepo.save(new AlertNotification(
UUID.randomUUID(), instance.id(), w.id(), w.outboundConnectionId(),
NotificationStatus.PENDING, 0, Instant.now(clock),
null, null, null, null, payload, null, Instant.now(clock)));
}
}
private void reschedule(AlertRule rule, Instant next) {
ruleRepo.releaseClaim(rule.id(), next, rule.evalState());
}
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java \
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java
git commit -m "feat(alerting): AlertEvaluatorJob with claim-polling + circuit breaker"
Phase 8 — Notification dispatch
Task 28: HmacSigner
Files:
-
Create:
.../alerting/notify/HmacSigner.java -
Test:
.../alerting/notify/HmacSignerTest.java -
Step 1: Write the failing test
@Test
void signsBodyWithSha256Hmac() {
String sig = new HmacSigner().sign("secret", "payload".getBytes(StandardCharsets.UTF_8));
// precomputed: HMAC-SHA256(secret, "payload") = 3c5c4f...
assertThat(sig).startsWith("sha256=").isEqualTo("sha256=3c5c4f..."); // replace with real hex
}
-
Step 2: Run — FAIL.
-
Step 3: Implement —
javax.crypto.Mac.getInstance("HmacSHA256"),HexFormat.of().formatHex(...). -
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): HmacSigner for webhook signature"
Task 29: WebhookDispatcher
Files:
-
Create:
.../alerting/notify/WebhookDispatcher.java -
Test:
.../alerting/notify/WebhookDispatcherIT.java(WireMock) -
Step 1: Write the failing IT covering:
- 2xx → returns DELIVERED with status + snippet.
- 4xx → returns FAILED immediately.
- 5xx → returns RETRY with exponential backoff.
- Network timeout → RETRY.
- HMAC header present when
hmacSecret != null. - TLS trust-all config works against WireMock HTTPS.
-
Step 2: Run — FAIL.
-
Step 3: Implement
@Component
public class WebhookDispatcher {
public record Outcome(NotificationStatus status, int httpStatus, String snippet, Duration retryAfter) {}
private final OutboundHttpClientFactory clientFactory;
private final SecretCipher cipher;
private final HmacSigner signer;
private final MustacheRenderer renderer;
private final AlertingProperties props;
private final ObjectMapper om;
public WebhookDispatcher(/* ... */) { /* assign */ }
public Outcome dispatch(AlertNotification notif, AlertRule rule, AlertInstance instance,
OutboundConnection conn, Map<String,Object> context) {
String bodyTmpl = pickBodyTemplate(rule, notif.webhookId(), conn);
String body = renderer.render(bodyTmpl, context);
var ctx = new OutboundHttpRequestContext(
conn.tlsTrustMode(), conn.tlsCaPemPaths(),
Duration.ofMillis(2000), Duration.ofMillis(props.effectiveWebhookTimeoutMs()));
var client = clientFactory.clientFor(ctx);
var request = new HttpPost(renderer.render(conn.url(), context));
request.setEntity(new StringEntity(body, StandardCharsets.UTF_8));
request.setHeader("Content-Type", "application/json");
for (var h : conn.defaultHeaders().entrySet()) {
request.setHeader(h.getKey(), renderer.render(h.getValue(), context));
}
if (conn.hmacSecretCiphertext() != null) {
String secret = cipher.decrypt(conn.hmacSecretCiphertext());
request.setHeader("X-Cameleer-Signature", signer.sign(secret, body.getBytes(StandardCharsets.UTF_8)));
}
try (var response = client.execute(request)) {
int code = response.getCode();
String snippet = snippet(response);
if (code >= 200 && code < 300) return new Outcome(NotificationStatus.DELIVERED, code, snippet, null);
if (code >= 400 && code < 500) return new Outcome(NotificationStatus.FAILED, code, snippet, null);
return retryOutcome(code, snippet);
} catch (IOException e) {
return retryOutcome(-1, e.getMessage());
}
}
private Outcome retryOutcome(int code, String snippet) {
// Backoff: 30s, 120s, 300s
Duration next = Duration.ofSeconds(30); // caller multiplies by attempt
return new Outcome(null /* caller decides PENDING vs FAILED */, code, snippet, next);
}
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): WebhookDispatcher with HMAC + TLS + retry classification"
Task 30: NotificationDispatchJob
Files:
-
Create:
.../alerting/notify/NotificationDispatchJob.java -
Test:
.../alerting/notify/NotificationDispatchJobIT.java -
Step 1: Write the failing IT — seed a
PENDINGAlertNotification; run one tick; WireMock returns 200; assert row transitions toDELIVERED. Seed another against 503 → assertattempts=1,next_attempt_atbumped, stillPENDING. -
Step 2: Run — FAIL.
-
Step 3: Implement — claim-polling loop:
void tick() {
var claimed = notificationRepo.claimDueNotifications(instanceId, batchSize, claimTtl);
for (var n : claimed) {
var conn = outboundRepo.findById(tenantId, n.outboundConnectionId()).orElse(null);
if (conn == null) { notificationRepo.markFailed(n.id(), 0, "outbound connection deleted"); continue; }
var instance = instanceRepo.findById(n.alertInstanceId()).orElseThrow();
var rule = ruleRepo.findById(instance.ruleId()).orElse(null);
var context = contextBuilder.build(rule, instance, env, uiOrigin);
// silence check
if (silenceRepo.listActive(instance.environmentId(), Instant.now()).stream()
.anyMatch(s -> silenceMatcher.matches(s.matcher(), instance, rule))) {
instanceRepo.markSilenced(instance.id(), true);
notificationRepo.markFailed(n.id(), 0, "silenced");
continue;
}
var outcome = dispatcher.dispatch(n, rule, instance, conn, context);
if (outcome.status() == NotificationStatus.DELIVERED) {
notificationRepo.markDelivered(n.id(), outcome.httpStatus(), outcome.snippet(), Instant.now());
} else if (outcome.status() == NotificationStatus.FAILED) {
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
} else {
int attempts = n.attempts() + 1;
if (attempts >= props.effectiveWebhookMaxAttempts()) {
notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
} else {
Instant next = Instant.now().plus(outcome.retryAfter().multipliedBy(attempts));
notificationRepo.scheduleRetry(n.id(), next, outcome.httpStatus(), outcome.snippet());
}
}
}
}
-
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): NotificationDispatchJob outbox loop with silence + retry"
Task 31: InAppInboxQuery + server-side 5s memoization
Files:
-
Create:
.../alerting/notify/InAppInboxQuery.java -
Test:
.../alerting/notify/InAppInboxQueryTest.java -
Step 1: Write the failing test covering the path (resolves groups/roles from
RbacService.getEffectiveRolesForUser+listGroupsForUser, delegates toAlertInstanceRepository.listForInbox/countUnreadForUser, second call within 5s returns cached count). -
Step 2: Run — FAIL.
-
Step 3: Implement — Caffeine-style
ConcurrentHashMap<Key, Entry>withEntry(count, expiresAt), 5 s TTL per(envId, userId). -
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): InAppInboxQuery with 5s unread-count memoization"
Phase 9 — REST controllers
Task 32: AlertRuleController + DTOs
Files:
-
Create:
.../alerting/controller/AlertRuleController.java -
Create: DTOs in
.../alerting/dto/ -
Test:
.../alerting/controller/AlertRuleControllerIT.java -
Step 1: Write the failing IT — seed an env, authenticate as OPERATOR, POST a rule, GET list, PUT update, DELETE. Assert webhook references to unknown connections return 422. Assert VIEWER cannot POST but can GET. Assert audit log entry on each mutation.
-
Step 2: Run — FAIL.
-
Step 3: Implement. Endpoints (all under
/api/v1/environments/{envSlug}/alerts/rules, env resolved via@EnvPath Environment env):
| Method | Path | RBAC |
|---|---|---|
| GET | `` | VIEWER+ |
| POST | `` | OPERATOR+ |
| GET | {id} |
VIEWER+ |
| PUT | {id} |
OPERATOR+ |
| DELETE | {id} |
OPERATOR+ |
| POST | {id}/enable / {id}/disable |
OPERATOR+ |
| POST | {id}/render-preview |
OPERATOR+ |
| POST | {id}/test-evaluate |
OPERATOR+ |
Key DTOs: AlertRuleRequest (with @Valid AlertConditionDto), AlertRuleResponse, RenderPreviewRequest/Response, TestEvaluateRequest/Response.
On save, validate:
- Each
WebhookBindingRequest.outboundConnectionIdexists inoutbound_connections(viaOutboundConnectionService.get(id)→ 422 if 404). - Connection is allowed in this env (via
conn.isAllowedInEnvironment(env.id())→ 422 otherwise). - SSRF check on connection URL deferred to the outbound-connection save path (Plan 01 territory).
Audit via auditService.log("ALERT_RULE_CREATE", ALERT_RULE_CHANGE, rule.id().toString(), Map.of("name", rule.name()), SUCCESS, request).
-
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): AlertRuleController REST + audit + DTOs"
Task 33: AlertController
Files:
-
Create:
.../alerting/controller/AlertController.java,AlertDto.java,UnreadCountResponse.java -
Test:
.../alerting/controller/AlertControllerIT.java -
Step 1: Write the failing IT for
GET /alerts,GET /alerts/unread-count,POST /alerts/{id}/ack,POST /alerts/{id}/read,POST /alerts/bulk-read. Assert env isolation (env-A alert not visible from env-B). -
Step 2: Run — FAIL.
-
Step 3: Implement — delegate to
InAppInboxQueryandAlertInstanceRepository. On ack, enforce targeted-or-OPERATOR rule. -
Step 4: Run — PASS.
-
Step 5: Commit
git commit -m "feat(alerting): AlertController for inbox + ack + read"
Task 34: AlertSilenceController
Files:
-
Create:
.../alerting/controller/AlertSilenceController.java,AlertSilenceDto.java -
Test:
.../alerting/controller/AlertSilenceControllerIT.java -
Step 1–5: Follow the same pattern. Mutations OPERATOR+, audit
ALERT_SILENCE_CHANGE. ValidateendsAt > startsAtat controller layer (DB constraint catches it anyway; user-facing 422 is friendlier).
Task 35: AlertNotificationController
Files:
-
Create:
.../alerting/controller/AlertNotificationController.java -
Test:
.../alerting/controller/AlertNotificationControllerIT.java -
Step 1–5:
GET /alerts/{id}/notifications→ VIEWER+; returns per-instance outbox rows.POST /alerts/notifications/{id}/retry→ OPERATOR+; resetsnext_attempt_at = now,attempts = 0,status = PENDING. Flat path because notification IDs are globally unique (document this in the flat-allow-list rule file).
-
Step 6: Update
SecurityConfigto permit the new paths
In cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java:
.requestMatchers(HttpMethod.GET, "/api/v1/environments/*/alerts/**").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/ack").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/*/read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/environments/*/alerts/bulk-read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST, "/api/v1/alerts/notifications/*/retry").hasAnyRole("OPERATOR","ADMIN")
(Class-level @PreAuthorize on each controller is authoritative; the path matchers are defence-in-depth.)
- Step 7: Commit
git commit -m "feat(alerting): AlertNotificationController + SecurityConfig paths"
Task 36: Regenerate OpenAPI schema
- Step 1: Start backend on :8081 (from the alerting-02 worktree).
- Step 2:
cd ui && npm run generate-api:live - Step 3: Commit
ui/src/api/schema.d.ts+ui/src/api/openapi.jsonregen.
git add ui/src/api/schema.d.ts ui/src/api/openapi.json
git commit -m "chore(alerting): regenerate openapi schema for alerting endpoints"
Phase 10 — Retention, metrics, rules, verification
Task 37: AlertingRetentionJob
Files:
-
Create:
.../alerting/retention/AlertingRetentionJob.java -
Test:
.../alerting/retention/AlertingRetentionJobIT.java -
Step 1: Write the failing IT — seed 2 resolved instances (one older than retention, one fresher) + 2 settled notifications; run
cleanup(); assert only old rows are deleted. -
Step 2: Run — FAIL.
-
Step 3: Implement —
@Scheduled(cron = "0 0 3 * * *"), cutoffs fromAlertingProperties, advisory-lock-of-the-day pattern (seeJarRetentionJob.java). -
Step 4–5: Run, commit
git commit -m "feat(alerting): AlertingRetentionJob daily cleanup"
Task 38: AlertingMetrics
Files:
-
Create:
.../alerting/metrics/AlertingMetrics.java -
Step 1: Register metrics via
MeterRegistry:
@Component
public class AlertingMetrics {
private final MeterRegistry registry;
public AlertingMetrics(MeterRegistry registry) { this.registry = registry; }
public void evalError(ConditionKind kind, UUID ruleId) {
registry.counter("alerting_eval_errors_total",
"kind", kind.name(), "rule_id", ruleId.toString()).increment();
}
public void circuitOpened(ConditionKind kind) {
registry.counter("alerting_circuit_open_total", "kind", kind.name()).increment();
}
public Timer evalDuration(ConditionKind kind) {
return registry.timer("alerting_eval_duration_seconds", "kind", kind.name());
}
// + gauges via MeterBinder that query repositories
}
-
Step 2: Wire into
AlertEvaluatorJobandPerKindCircuitBreaker. -
Step 3: Commit
git commit -m "feat(alerting): observability metrics via micrometer"
Task 39: Update .claude/rules/app-classes.md + core-classes.md
-
Step 1: Document the new
alerting/packages in both rule files. Add a new subsection undercontroller/for the alerting env-scoped controllers. Document the new flat endpoint/api/v1/alerts/notifications/{id}/retryin the flat-allow-list with justification "notification IDs are globally unique; matches the/api/v1/executions/{id}precedent". -
Step 2: Commit
git add .claude/rules/app-classes.md .claude/rules/core-classes.md
git commit -m "docs(rules): document alerting/ packages + notification retry flat endpoint"
Task 40: application.yml defaults + admin guide
Files:
-
Modify:
cameleer-server-app/src/main/resources/application.yml -
Create:
docs/alerting.md -
Step 1: Add default stanza
cameleer:
server:
alerting:
evaluator-tick-interval-ms: 5000
evaluator-batch-size: 20
claim-ttl-seconds: 30
notification-tick-interval-ms: 5000
notification-batch-size: 50
in-tick-cache-enabled: true
circuit-breaker-fail-threshold: 5
circuit-breaker-window-seconds: 30
circuit-breaker-cooldown-seconds: 60
event-retention-days: 90
notification-retention-days: 30
webhook-timeout-ms: 5000
webhook-max-attempts: 3
-
Step 2: Write
docs/alerting.md— 1-2 page admin guide covering: rule shapes per condition kind (with example JSON), template variables per kind, webhook destinations (Slack/PagerDuty/Teams examples), silence patterns, troubleshooting (circuit breaker, retention). -
Step 3: Commit
git add cameleer-server-app/src/main/resources/application.yml docs/alerting.md
git commit -m "docs(alerting): default config + admin guide"
Task 41: Full-lifecycle integration test
Files:
-
Create:
cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java -
Step 1: Write the full-lifecycle IT
Steps in the single test method:
- Seed env, user with OPERATOR role, outbound connection (WireMock backing) with HMAC secret.
- POST a
LOG_PATTERNrule pointing atWireMockvia the outbound connection,forDurationSeconds=0,threshold=1. - Inject a log row into ClickHouse that matches the pattern.
- Trigger
AlertEvaluatorJob.tick()directly. - Assert one
alert_instancesrow in FIRING. - Trigger
NotificationDispatchJob.tick(). - Assert WireMock received one POST with
X-Cameleer-Signatureheader + rendered body. - POST
/alerts/{id}/ack→ state ACKNOWLEDGED. - Create a silence matching this rule; fire another tick; assert
silenced=trueon new instance and WireMock received no second request. - Remove the matching log rows, run tick → instance RESOLVED.
- DELETE the rule → assert
alert_instances.rule_id = NULLbutrule_snapshotstill retains rule name.
-
Step 2: Run — PASS (may need a few iterations of debugging).
-
Step 3: Commit
git add cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java
git commit -m "test(alerting): full lifecycle — fire, notify, silence, ack, resolve, delete"
Task 42: Env-isolation + outbound-guard regression tests
Files:
-
Create:
.../alerting/AlertingEnvIsolationIT.java,OutboundConnectionAllowedEnvIT.java -
Step 1: Env isolation — rule in env-A, fire, assert invisible from env-B inbox.
-
Step 2: Outbound guard — rule references a connection restricted to env-A; POST rule creation in env-B → 422. Narrowing
allowed_environment_idson the connection while a rule still references it → 409 (this exercises the freshly-wiredrulesReferencing). -
Step 3: Run — PASS.
-
Step 4: Commit
git commit -m "test(alerting): env isolation + outbound allowed-env guard"
Task 43: Final verification + GitNexus reindex
- Step 1: Full build
mvn clean verify
Expected: All tests pass. Known pre-existing test debt (wrong-JdbcTemplate + shared-context state leaks) may still fail — document any failures that existed before Plan 02 in a commit message "known-pre-existing" note.
- Step 2: GitNexus reindex
npx gitnexus analyze --embeddings
- Step 3: Manual smoke
Start backend + UI (Plan 01 UI is sufficient for outbound connections). Walk through:
-
Create an outbound connection to
https://httpbin.org/post. -
curlthe alerting REST API to POST aLOG_PATTERNrule. -
Inject a matching log via
POST /api/v1/data/logs. -
Wait 2 eval ticks + 1 notification tick.
-
Confirm:
alert_instancesrow in FIRING,alert_notificationsrow DELIVERED with HTTP 200, httpbin shows the body. -
curl POST /alerts/{id}/ack→ state ACKNOWLEDGED. -
Step 4: Nothing to commit if all passes — plan complete
Known-incomplete items carried into Plan 03
- UI:
NotificationBell,/alerts/**pages,<MustacheEditor />with variable auto-complete, CMD-K alert/rule sources. Open design question: completion engine choice (CodeMirror 6 vs Monaco vs textarea overlay) still open — see spec §20 #7. - Rule promotion across envs. Pure UI flow (no new server endpoint); lives with the rule editor in Plan 03.
- OIDC retrofit to use
OutboundHttpClientFactory. Unchanged from Plan 01 — a separate small follow-up. - TLS summary enrichment on
/testendpoint (Plan 01 stubbed as"TLS"). Can extract actual protocol + cipher suite + peer cert from Apache HttpClient 5's routed context. - Performance tests. 500-rule, 5-replica
PerformanceITdeferred; claim-polling concurrency is covered by Task 7's unit-level test. - Bulk promotion and mustache completion
variablesmetadata endpoint (GET /alerts/rules/template-variables) — deferred until usage patterns justify. - Rule deletion test debt. Existing pre-Plan-02 test debt (wrong-JdbcTemplate bug in ~9 controller ITs + shared-context state leaks in
FlywayMigrationIT/ConfigEnvIsolationIT/ClickHouseStatsStoreIT) is orthogonal and should be addressed in a dedicated test-hygiene pass.
Self-review
Spec coverage (against docs/superpowers/specs/2026-04-19-alerting-design.md):
| Spec § | Scope | Covered by |
|---|---|---|
| §2 Signal sources (6) | All 6 condition kinds | Tasks 4, 20–25 |
| §2 Delivery channels | In-app + webhook | Tasks 29, 30, 31 |
| §2 Lifecycle (FIRING/ACK/RESOLVED + SILENCED) | State machine + silence | Tasks 26, 18, 30, 33 |
| §2 Rule promotion | Deferred to Plan 03 (UI) | — |
| §2 CMD-K | Deferred to Plan 03 | — |
| §2 Configurable cadence, 5 s floor | AlertingProperties.effective* |
Task 26 |
| §3 Key decisions | All 14 decisions honoured | — |
| §4 Module layout | core/alerting + app/alerting/** |
Tasks 3–11, 15–38 |
| §4 Touchpoints | countLogs + countExecutionsForAlerting + AuditCategory + SecurityConfig |
Tasks 2, 12, 13, 35 |
| §5 Data model | V12 migration | Task 1 |
| §5 Claim-polling queries | FOR UPDATE SKIP LOCKED in rule + notification repos |
Tasks 7, 10 |
| §6 Outbound connections wiring | rulesReferencing gate |
Task 8 (CRITICAL) |
| §7 Evaluator cadence, state machine, 4 projections, query coalescing, circuit breaker | Tick cache + projections + CB + SchedulingConfigurer | Tasks 14, 19, 26, 27 |
| §8 Notification dispatch, HMAC, template render, in-app inbox, 5s memoization | Tasks 28, 29, 30, 31 | |
| §9 Rule promotion | Deferred (UI) | — |
| §10 Cross-cutting HTTP | Reused from Plan 01 | — |
| §11 API surface | All routes implemented except rule promotion | Tasks 32–36 |
| §12 CMD-K | Deferred to Plan 03 | — |
| §13 UI | Deferred to Plan 03 | — |
| §14 Configuration | AlertingProperties + application.yml |
Tasks 26, 40 |
| §15 Retention | Daily job | Task 37 |
| §16 Observability (metrics + audit) | Tasks 2, 38 | |
| §17 Security (tenant/env, RBAC, SSRF, HMAC, TLS, audit) | Tasks 32–36, 28, Plan 01 | |
| §18 Testing | Unit + IT + WireMock + full-lifecycle | Tasks 17, 19, 27–31, 41, 42 |
| §19 Rollout | Dormant-by-default; matching application.yml + docs |
Task 40 |
| §20 #1 OIDC alignment | Deferred (follow-up) | — |
| §20 #2 secret encryption | Reused Plan 01 SecretCipher |
Task 29 |
| §20 #3 CH migration naming | alerting_projections.sql |
Task 14 |
| §20 #6 env-delete cascade audit | PG IT | Task 1 |
| §20 #7 Mustache completion engine | Deferred (UI) | — |
Placeholders: A handful of steps reference real record fields / method names with /* … */ markers where the exact name depends on what the existing codebase exposes (ExecutionStats metric accessors, AgentInfo.lastHeartbeat method name, wither-method signatures on AlertInstance). Each is accompanied by a gitnexus_context({name: ...}) hint for the implementer. These are not TBDs — they are direct instructions to resolve against the code at implementation time.
Type consistency check: AlertRule, AlertInstance, AlertNotification, AlertSilence field names in the Java records match the SQL column names (snake_case in SQL, camelCase in Java). WebhookBinding.id is used as alert_notifications.webhook_id — stable opaque reference. OutboundConnection.createdBy/updatedBy types match users.user_id TEXT (Plan 01 precedent). rulesReferencing signature matches Plan 01's stub List<UUID> rulesReferencing(UUID).
Risks flagged to executor:
- Task 16
MustacheRenderermissing-variable fallback is non-trivial in JMustache's default compiler config — implementer may need a second iteration. Tests lock the contract; the implementation approach is flexible. - Task 12/13 — the SQL dialect for attribute map access on the
executionstable (attributes[?]) depends on the actual column type ininit.sql. If attributes isMap(String,String), the syntax works; if it's stored as JSON string, switch toJSONExtractString(attributes, ?) = ?. - Task 27
enrichTitleMessagedepends onAlertInstancehaving wither methods — these are added opportunistically during Task 26 whenAlertStateTransitionsneeds them. Don't forget to expose them. - Claim-polling semantics under schema-per-tenant — the
?currentSchema=tenant_{id}JDBC URL routes writes correctly, but theFOR UPDATE SKIP LOCKEDbehaviour is per-schema so cross-tenant locks are irrelevant (correct behaviour). Make sure IT tests run withcameleer.server.tenant.id=default. - Task 41 full-lifecycle test is the canary. If it fails after each task, pair-program with the failing assertion — the bug is almost always in state transitions or renderer context shape.