Files
cameleer-server/docs/superpowers/plans/2026-04-19-alerting-02-backend.md

138 KiB
Raw Blame History

Alerting — Plan 02 — Backend Implementation

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Deliver the server-side alerting feature described in docs/superpowers/specs/2026-04-19-alerting-design.md — domain model, storage, evaluators for all six condition kinds, notification dispatch (webhook + in-app inbox), REST API, retention, metrics, and integration tests. UI, CMD-K integration, and load tests are explicitly deferred to Plan 03.

Architecture: Confined to new alerting/ packages in both cameleer-server-core (pure records + interfaces) and cameleer-server-app (Spring-wired storage, scheduling, REST). Postgres stores rules/instances/silences/notifications; ClickHouse stores observability data read by evaluators (new countLogs / countExecutionsForAlerting methods, four additive projections). Claim-polling FOR UPDATE SKIP LOCKED makes the evaluator and dispatcher horizontally scalable. Rule→connection wiring (rulesReferencing) is populated in this plan — it is the gate that unlocks safe production use of Plan 01.

Tech Stack: Java 17, Spring Boot 3.4.3, PostgreSQL (Flyway V12), ClickHouse (idempotent init SQL), JMustache for templates, Apache HttpClient 5 via Plan 01's OutboundHttpClientFactory, Testcontainers + JUnit 5 + WireMock + AssertJ for tests.


Base branch

Branch Plan 02 off feat/alerting-01-outbound-infra. Plan 02 depends on Plan 01's OutboundConnection domain, OutboundHttpClientFactory bean, SecretCipher, OutboundConnectionServiceImpl.rulesReferencing() stub, the V11 migration, and the OUTBOUND_CONNECTION_CHANGE / OUTBOUND_HTTP_TRUST_CHANGE audit categories. Branching off main is not an option — those classes do not exist there yet. When Plan 01 merges, rebase Plan 02 onto main; until then Plan 02 is stacked PR #2.

# Execute in a fresh worktree
git fetch origin
git worktree add -b feat/alerting-02-backend .worktrees/alerting-02 feat/alerting-01-outbound-infra
cd .worktrees/alerting-02
mvn clean compile   # confirm Plan 01 code compiles as baseline

File Structure

Created — cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/

File Responsibility
AlertingProperties.java Not here — see app module.
AlertRule.java Immutable record: id, environmentId, name, description, severity, enabled, conditionKind, condition, evaluationIntervalSeconds, forDurationSeconds, reNotifyMinutes, notificationTitleTmpl, notificationMessageTmpl, webhooks, targets, nextEvaluationAt, claimedBy, claimedUntil, evalState, audit fields.
AlertCondition.java Sealed interface; Jackson DEDUCTION polymorphism root.
RouteMetricCondition.java Record: scope, metric, comparator, threshold, windowSeconds.
ExchangeMatchCondition.java Record: scope, filter, fireMode, threshold, windowSeconds, perExchangeLingerSeconds.
AgentStateCondition.java Record: scope, state, forSeconds.
DeploymentStateCondition.java Record: scope, states.
LogPatternCondition.java Record: scope, level, pattern, threshold, windowSeconds.
JvmMetricCondition.java Record: scope, metric, aggregation, comparator, threshold, windowSeconds.
AlertScope.java Record: appSlug?, routeId?, agentId? — nullable fields, used by all conditions.
ConditionKind.java Enum mirror of SQL condition_kind_enum.
RouteMetric.java, Comparator.java, AggregationOp.java, FireMode.java Enums used in conditions.
AlertSeverity.java Enum mirror of SQL severity_enum.
AlertState.java Enum mirror of SQL alert_state_enum.
AlertInstance.java Immutable record for alert_instances row.
AlertRuleTarget.java Record for alert_rule_targets row.
TargetKind.java Enum mirror of SQL target_kind_enum.
AlertSilence.java Record: id, environmentId, matcher, reason, startsAt, endsAt, createdBy, createdAt.
SilenceMatcher.java Record: ruleId?, appSlug?, routeId?, agentId?, severity?.
AlertNotification.java Record for alert_notifications outbox row.
NotificationStatus.java Enum mirror of SQL notification_status_enum.
WebhookBinding.java Record embedded in alert_rules.webhooks JSONB: id, outboundConnectionId, bodyOverride?, headerOverrides?.
AlertRuleRepository.java CRUD + claim-polling interface.
AlertInstanceRepository.java CRUD + query-for-inbox interface.
AlertSilenceRepository.java CRUD interface.
AlertNotificationRepository.java CRUD + claim-polling interface.
AlertReadRepository.java Mark-read + count-unread interface.

Created — cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/

File Responsibility
config/AlertingProperties.java @ConfigurationProperties("cameleer.server.alerting").
config/AlertingBeanConfig.java Bean wiring for repositories, evaluators, dispatch, mustache renderer, etc.
storage/PostgresAlertRuleRepository.java JdbcTemplate impl of AlertRuleRepository.
storage/PostgresAlertInstanceRepository.java JdbcTemplate impl.
storage/PostgresAlertSilenceRepository.java JdbcTemplate impl.
storage/PostgresAlertNotificationRepository.java JdbcTemplate impl.
storage/PostgresAlertReadRepository.java JdbcTemplate impl.
eval/EvalContext.java Per-tick context (tenantId, now, tickCache).
eval/EvalResult.java Sealed: Firing(value, threshold, contextMap) / Clear / Error(Throwable).
eval/TickCache.java ConcurrentHashMap<String,Object> discarded per tick.
eval/PerKindCircuitBreaker.java Failure window + cooldown per ConditionKind.
eval/ConditionEvaluator.java Generic interface: evaluate(C, AlertRule, EvalContext).
eval/RouteMetricEvaluator.java Reads StatsStore.
eval/ExchangeMatchEvaluator.java Reads ClickHouseSearchIndex.countExecutionsForAlerting + SearchService.search for PER_EXCHANGE cursor mode.
eval/AgentStateEvaluator.java Reads AgentRegistryService.findAll.
eval/DeploymentStateEvaluator.java Reads DeploymentRepository.findByAppId.
eval/LogPatternEvaluator.java Reads new ClickHouseLogStore.countLogs.
eval/JvmMetricEvaluator.java Reads MetricsQueryStore.queryTimeSeries.
eval/AlertEvaluatorJob.java @Component implementing SchedulingConfigurer; claim-polling loop.
eval/AlertStateTransitions.java Pure function: given current instance + EvalResult → new state + timestamps.
notify/MustacheRenderer.java JMustache wrapper; resilient to bad templates.
notify/NotificationContextBuilder.java Pure: builds context map from AlertInstance + rule + env.
notify/SilenceMatcher.java Pure: evaluates a SilenceMatcher against an AlertInstance.
notify/InAppInboxQuery.java Server-side query helper for /alerts and unread-count.
notify/WebhookDispatcher.java Renders + POSTs + HMAC signs; classifies 2xx/4xx/5xx → status.
notify/NotificationDispatchJob.java @Component SchedulingConfigurer; claim-polling on alert_notifications.
notify/HmacSigner.java Pure: computes sha256=<hmac(secret, body)>.
retention/AlertingRetentionJob.java @Scheduled(cron = "0 0 3 * * *") — delete old alert_instances + alert_notifications.
controller/AlertRuleController.java /api/v1/environments/{envSlug}/alerts/rules.
controller/AlertController.java /api/v1/environments/{envSlug}/alerts + instance actions.
controller/AlertSilenceController.java /api/v1/environments/{envSlug}/alerts/silences.
controller/AlertNotificationController.java /api/v1/environments/{envSlug}/alerts/{id}/notifications, /alerts/notifications/{id}/retry.
dto/AlertRuleDto.java, dto/AlertDto.java, dto/AlertSilenceDto.java, dto/AlertNotificationDto.java, dto/ConditionDto.java, dto/WebhookBindingDto.java, dto/RenderPreviewRequest.java, dto/RenderPreviewResponse.java, dto/TestEvaluateRequest.java, dto/TestEvaluateResponse.java, dto/UnreadCountResponse.java Request/response DTOs.
metrics/AlertingMetrics.java Micrometer registrations for counters/gauges/histograms.

Created — resources

File Responsibility
cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql Flyway migration: 5 enums, 6 tables, indexes, cascades.
cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql 4 projections on executions / logs / agent_metrics, all IF NOT EXISTS.

Modified

File Change
cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java Add ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE.
cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java Replace the rulesReferencing(UUID) stub with a call through AlertRuleRepository.findRuleIdsByOutboundConnectionId.
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java Add long countLogs(LogSearchRequest) — no FINAL.
cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java Add long countExecutionsForAlerting(AlertMatchSpec) — no FINAL.
cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java Run alerting_projections.sql via existing ClickHouseSchemaInitializer.
cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java Permit new /api/v1/environments/{envSlug}/alerts/** path matchers with role-based access.
cameleer-server-core/pom.xml Add com.samskivert:jmustache:1.16.
.claude/rules/app-classes.md, .claude/rules/core-classes.md Document new packages.
cameleer-server-app/src/main/resources/application.yml Default AlertingProperties stanza + comment linking to the admin guide.

Conventions

  • TDD. Every task starts with a failing test, implements the minimum to pass, then commits.
  • One commit per task. Commit messages: feat(alerting): …, test(alerting): …, fix(alerting): …, chore(alerting): …, docs(alerting): ….
  • Tenant invariant. Every ClickHouse query and Postgres table referencing observability data filters by tenantId (injected via AlertingBeanConfig from cameleer.server.tenant.id).
  • No FINAL on the two new CH count methods — alerting tolerates brief duplicate counts.
  • Jackson polymorphism via @JsonTypeInfo(use = DEDUCTION) with @JsonSubTypes on AlertCondition.
  • Pure core/, Spring-only in app/. No @Component, @Service, or @Scheduled annotations in cameleer-server-core.
  • Claim polling. FOR UPDATE SKIP LOCKED + claimed_by / claimed_until with 30 s TTL.
  • Instance id for claim ownership: use InetAddress.getLocalHost().getHostName() + ":" + processPid(); exposed as a bean "alertingInstanceId" of type String.
  • GitNexus hygiene. Before modifying any existing class (OutboundConnectionServiceImpl, ClickHouseLogStore, ClickHouseSearchIndex, AuditCategory, SecurityConfig), run gitnexus_impact({target: "<className>", direction: "upstream"}) and report blast radius. Run gitnexus_detect_changes() before each commit.

Phase 1 — Flyway V12 migration and audit categories

Task 1: V12__alerting_tables.sql

Files:

  • Create: cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql

  • Test: cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java

  • Step 1: Write the failing integration test

package com.cameleer.server.app.alerting.storage;

import com.cameleer.server.app.AbstractPostgresIT;
import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;

class V12MigrationIT extends AbstractPostgresIT {

    @Test
    void allAlertingTablesAndEnumsExist() {
        var tables = jdbcTemplate.queryForList(
            "SELECT table_name FROM information_schema.tables WHERE table_schema='public' " +
            "AND table_name IN ('alert_rules','alert_rule_targets','alert_instances'," +
            "'alert_silences','alert_notifications','alert_reads')",
            String.class);
        assertThat(tables).containsExactlyInAnyOrder(
            "alert_rules","alert_rule_targets","alert_instances",
            "alert_silences","alert_notifications","alert_reads");

        var enums = jdbcTemplate.queryForList(
            "SELECT typname FROM pg_type WHERE typname IN " +
            "('severity_enum','condition_kind_enum','alert_state_enum'," +
            "'target_kind_enum','notification_status_enum')",
            String.class);
        assertThat(enums).hasSize(5);
    }

    @Test
    void deletingEnvironmentCascadesAlertingRows() {
        var envId = java.util.UUID.randomUUID();
        jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-cascade-env");
        jdbcTemplate.update(
            "INSERT INTO users (user_id, username, password_hash, email, enabled) " +
            "VALUES (?, ?, 'x', 'a@b', true)", "u1", "u1");
        var ruleId = java.util.UUID.randomUUID();
        jdbcTemplate.update(
            "INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " +
            "notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " +
            "VALUES (?, ?, 'r', 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'u1', 'u1')",
            ruleId, envId);
        var instanceId = java.util.UUID.randomUUID();
        jdbcTemplate.update(
            "INSERT INTO alert_instances (id, rule_id, rule_snapshot, environment_id, state, severity, " +
            "fired_at, context, title, message) VALUES (?, ?, '{}'::jsonb, ?, 'FIRING', 'WARNING', " +
            "now(), '{}'::jsonb, 't', 'm')",
            instanceId, ruleId, envId);

        jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);

        assertThat(jdbcTemplate.queryForObject(
            "SELECT count(*) FROM alert_rules WHERE environment_id = ?",
            Integer.class, envId)).isZero();
        assertThat(jdbcTemplate.queryForObject(
            "SELECT count(*) FROM alert_instances WHERE environment_id = ?",
            Integer.class, envId)).isZero();
    }
}
  • Step 2: Run the test to verify it fails

Run: mvn -pl cameleer-server-app test -Dtest=V12MigrationIT Expected: FAIL — tables do not exist.

  • Step 3: Write the migration

Create cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql:

-- Enums (outbound_method_enum / outbound_auth_kind_enum / trust_mode_enum already exist from V11)
CREATE TYPE severity_enum            AS ENUM ('CRITICAL','WARNING','INFO');
CREATE TYPE condition_kind_enum      AS ENUM ('ROUTE_METRIC','EXCHANGE_MATCH','AGENT_STATE','DEPLOYMENT_STATE','LOG_PATTERN','JVM_METRIC');
CREATE TYPE alert_state_enum         AS ENUM ('PENDING','FIRING','ACKNOWLEDGED','RESOLVED');
CREATE TYPE target_kind_enum         AS ENUM ('USER','GROUP','ROLE');
CREATE TYPE notification_status_enum AS ENUM ('PENDING','DELIVERED','FAILED');

CREATE TABLE alert_rules (
  id                          uuid PRIMARY KEY,
  environment_id              uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
  name                        varchar(200) NOT NULL,
  description                 text,
  severity                    severity_enum NOT NULL,
  enabled                     boolean NOT NULL DEFAULT true,
  condition_kind              condition_kind_enum NOT NULL,
  condition                   jsonb NOT NULL,
  evaluation_interval_seconds int NOT NULL DEFAULT 60 CHECK (evaluation_interval_seconds >= 5),
  for_duration_seconds        int NOT NULL DEFAULT 0 CHECK (for_duration_seconds >= 0),
  re_notify_minutes           int NOT NULL DEFAULT 60 CHECK (re_notify_minutes >= 0),
  notification_title_tmpl     text NOT NULL,
  notification_message_tmpl   text NOT NULL,
  webhooks                    jsonb NOT NULL DEFAULT '[]',
  next_evaluation_at          timestamptz NOT NULL DEFAULT now(),
  claimed_by                  varchar(64),
  claimed_until               timestamptz,
  eval_state                  jsonb NOT NULL DEFAULT '{}',
  created_at                  timestamptz NOT NULL DEFAULT now(),
  created_by                  text NOT NULL REFERENCES users(user_id),
  updated_at                  timestamptz NOT NULL DEFAULT now(),
  updated_by                  text NOT NULL REFERENCES users(user_id)
);
CREATE INDEX alert_rules_env_idx       ON alert_rules (environment_id);
CREATE INDEX alert_rules_claim_due_idx ON alert_rules (next_evaluation_at) WHERE enabled = true;

CREATE TABLE alert_rule_targets (
  id          uuid PRIMARY KEY,
  rule_id     uuid NOT NULL REFERENCES alert_rules(id) ON DELETE CASCADE,
  target_kind target_kind_enum NOT NULL,
  target_id   varchar(128) NOT NULL,
  UNIQUE (rule_id, target_kind, target_id)
);
CREATE INDEX alert_rule_targets_lookup_idx ON alert_rule_targets (target_kind, target_id);

CREATE TABLE alert_instances (
  id                  uuid PRIMARY KEY,
  rule_id             uuid REFERENCES alert_rules(id) ON DELETE SET NULL,
  rule_snapshot       jsonb NOT NULL,
  environment_id      uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
  state               alert_state_enum NOT NULL,
  severity            severity_enum NOT NULL,
  fired_at            timestamptz NOT NULL,
  acked_at            timestamptz,
  acked_by            text REFERENCES users(user_id),
  resolved_at         timestamptz,
  last_notified_at    timestamptz,
  silenced            boolean NOT NULL DEFAULT false,
  current_value       numeric,
  threshold           numeric,
  context             jsonb NOT NULL,
  title               text NOT NULL,
  message             text NOT NULL,
  target_user_ids     text[] NOT NULL DEFAULT '{}',
  target_group_ids    uuid[] NOT NULL DEFAULT '{}',
  target_role_names   text[] NOT NULL DEFAULT '{}'
);
CREATE INDEX alert_instances_inbox_idx     ON alert_instances (environment_id, state, fired_at DESC);
CREATE INDEX alert_instances_open_rule_idx ON alert_instances (rule_id, state) WHERE rule_id IS NOT NULL;
CREATE INDEX alert_instances_resolved_idx  ON alert_instances (resolved_at) WHERE state = 'RESOLVED';
CREATE INDEX alert_instances_target_u_idx  ON alert_instances USING GIN (target_user_ids);
CREATE INDEX alert_instances_target_g_idx  ON alert_instances USING GIN (target_group_ids);
CREATE INDEX alert_instances_target_r_idx  ON alert_instances USING GIN (target_role_names);

CREATE TABLE alert_silences (
  id             uuid PRIMARY KEY,
  environment_id uuid NOT NULL REFERENCES environments(id) ON DELETE CASCADE,
  matcher        jsonb NOT NULL,
  reason         text,
  starts_at      timestamptz NOT NULL,
  ends_at        timestamptz NOT NULL CHECK (ends_at > starts_at),
  created_by     text NOT NULL REFERENCES users(user_id),
  created_at     timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_silences_active_idx ON alert_silences (environment_id, ends_at);

CREATE TABLE alert_notifications (
  id                     uuid PRIMARY KEY,
  alert_instance_id      uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
  webhook_id             uuid,
  outbound_connection_id uuid REFERENCES outbound_connections(id) ON DELETE SET NULL,
  status                 notification_status_enum NOT NULL DEFAULT 'PENDING',
  attempts               int NOT NULL DEFAULT 0,
  next_attempt_at        timestamptz NOT NULL DEFAULT now(),
  claimed_by             varchar(64),
  claimed_until          timestamptz,
  last_response_status   int,
  last_response_snippet  text,
  payload                jsonb NOT NULL,
  delivered_at           timestamptz,
  created_at             timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX alert_notifications_pending_idx  ON alert_notifications (next_attempt_at) WHERE status = 'PENDING';
CREATE INDEX alert_notifications_instance_idx ON alert_notifications (alert_instance_id);

CREATE TABLE alert_reads (
  user_id           text NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
  alert_instance_id uuid NOT NULL REFERENCES alert_instances(id) ON DELETE CASCADE,
  read_at           timestamptz NOT NULL DEFAULT now(),
  PRIMARY KEY (user_id, alert_instance_id)
);

Notes:

  • Plan 01 established users.user_id as TEXT. All FK-to-users columns in this migration are text, not uuid.

  • target_user_ids is text[] (matches users.user_id).

  • outbound_connections (Plan 01) is referenced with ON DELETE SET NULL — matches the spec's "409 if referenced" semantics at the app layer while preserving referential cleanup if the admin-facing guard is bypassed.

  • Step 4: Run the test to verify it passes

Run: mvn -pl cameleer-server-app test -Dtest=V12MigrationIT Expected: PASS.

  • Step 5: Commit
git add cameleer-server-app/src/main/resources/db/migration/V12__alerting_tables.sql \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/V12MigrationIT.java
git commit -m "feat(alerting): V12 flyway migration for alerting tables"

Task 2: Extend AuditCategory

Files:

  • Modify: cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java

  • Test: cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java

  • Step 1: GitNexus impact check

Run gitnexus_impact({target: "AuditCategory", direction: "upstream"}) — report the blast radius (additive enum values are non-breaking; affected files are the admin rule file + any switch statements).

  • Step 2: Write the failing test
package com.cameleer.server.core.admin;

import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;

class AuditCategoryTest {
    @Test
    void alertingCategoriesPresent() {
        assertThat(AuditCategory.valueOf("ALERT_RULE_CHANGE")).isNotNull();
        assertThat(AuditCategory.valueOf("ALERT_SILENCE_CHANGE")).isNotNull();
    }
}
  • Step 3: Run the test — FAIL

Run: mvn -pl cameleer-server-core test -Dtest=AuditCategoryTest Expected: FAIL — IllegalArgumentException: No enum constant.

  • Step 4: Add the enum values

Replace the whole enum body with:

package com.cameleer.server.core.admin;

public enum AuditCategory {
    INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT,
    OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE,
    ALERT_RULE_CHANGE, ALERT_SILENCE_CHANGE
}
  • Step 5: Run the test — PASS

  • Step 6: Commit

git add cameleer-server-core/src/main/java/com/cameleer/server/core/admin/AuditCategory.java \
        cameleer-server-core/src/test/java/com/cameleer/server/core/admin/AuditCategoryTest.java
git commit -m "feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories"

Phase 2 — Core domain model

Each task in this phase adds a small, focused set of pure-Java records and enums under cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/. All records use canonical constructors with explicit @NotNull-style defensive copying only for mutable collections (List.copyOf, Map.copyOf). Jackson polymorphism is handled by @JsonTypeInfo(use = DEDUCTION) on AlertCondition.

Task 3: Enums + AlertScope

Files:

  • Create: .../alerting/AlertSeverity.java, AlertState.java, ConditionKind.java, TargetKind.java, NotificationStatus.java, RouteMetric.java, Comparator.java, AggregationOp.java, FireMode.java, AlertScope.java

  • Test: cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java

  • Step 1: Write the failing test

package com.cameleer.server.core.alerting;

import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;

class AlertScopeTest {

    @Test
    void allFieldsNullIsEnvWide() {
        var s = new AlertScope(null, null, null);
        assertThat(s.isEnvWide()).isTrue();
    }

    @Test
    void appScoped() {
        var s = new AlertScope("orders", null, null);
        assertThat(s.isEnvWide()).isFalse();
        assertThat(s.appSlug()).isEqualTo("orders");
    }

    @Test
    void enumsHaveExpectedValues() {
        assertThat(AlertSeverity.values()).containsExactly(
            AlertSeverity.CRITICAL, AlertSeverity.WARNING, AlertSeverity.INFO);
        assertThat(AlertState.values()).containsExactly(
            AlertState.PENDING, AlertState.FIRING, AlertState.ACKNOWLEDGED, AlertState.RESOLVED);
        assertThat(ConditionKind.values()).hasSize(6);
        assertThat(TargetKind.values()).containsExactly(
            TargetKind.USER, TargetKind.GROUP, TargetKind.ROLE);
        assertThat(NotificationStatus.values()).containsExactly(
            NotificationStatus.PENDING, NotificationStatus.DELIVERED, NotificationStatus.FAILED);
    }
}
  • Step 2: Run — FAIL (cannot find symbol).

Run: mvn -pl cameleer-server-core test -Dtest=AlertScopeTest

  • Step 3: Create the files
// AlertSeverity.java
package com.cameleer.server.core.alerting;
public enum AlertSeverity { CRITICAL, WARNING, INFO }

// AlertState.java
package com.cameleer.server.core.alerting;
public enum AlertState { PENDING, FIRING, ACKNOWLEDGED, RESOLVED }

// ConditionKind.java
package com.cameleer.server.core.alerting;
public enum ConditionKind { ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC }

// TargetKind.java
package com.cameleer.server.core.alerting;
public enum TargetKind { USER, GROUP, ROLE }

// NotificationStatus.java
package com.cameleer.server.core.alerting;
public enum NotificationStatus { PENDING, DELIVERED, FAILED }

// RouteMetric.java
package com.cameleer.server.core.alerting;
public enum RouteMetric { ERROR_RATE, P95_LATENCY_MS, P99_LATENCY_MS, THROUGHPUT, ERROR_COUNT }

// Comparator.java
package com.cameleer.server.core.alerting;
public enum Comparator { GT, GTE, LT, LTE, EQ }

// AggregationOp.java
package com.cameleer.server.core.alerting;
public enum AggregationOp { MAX, MIN, AVG, LATEST }

// FireMode.java
package com.cameleer.server.core.alerting;
public enum FireMode { PER_EXCHANGE, COUNT_IN_WINDOW }

// AlertScope.java
package com.cameleer.server.core.alerting;
public record AlertScope(String appSlug, String routeId, String agentId) {
    public boolean isEnvWide() { return appSlug == null && routeId == null && agentId == null; }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
        cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertScopeTest.java
git commit -m "feat(alerting): core enums + AlertScope"

Task 4: AlertCondition sealed hierarchy + Jackson polymorphism

Files:

  • Create: .../alerting/AlertCondition.java, RouteMetricCondition.java, ExchangeMatchCondition.java (with nested ExchangeFilter), AgentStateCondition.java, DeploymentStateCondition.java, LogPatternCondition.java, JvmMetricCondition.java

  • Test: cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java

  • Step 1: Write the failing test

package com.cameleer.server.core.alerting;

import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;

import static org.assertj.core.api.Assertions.assertThat;

class AlertConditionJsonTest {

    private final ObjectMapper om = new ObjectMapper();

    @Test
    void roundtripRouteMetric() throws Exception {
        var c = new RouteMetricCondition(
            new AlertScope("orders", "route-1", null),
            RouteMetric.P99_LATENCY_MS, Comparator.GT, 2000.0, 300);
        String json = om.writeValueAsString((AlertCondition) c);
        AlertCondition parsed = om.readValue(json, AlertCondition.class);
        assertThat(parsed).isInstanceOf(RouteMetricCondition.class);
        assertThat(parsed.kind()).isEqualTo(ConditionKind.ROUTE_METRIC);
    }

    @Test
    void roundtripExchangeMatchPerExchange() throws Exception {
        var c = new ExchangeMatchCondition(
            new AlertScope("orders", null, null),
            new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of("type","payment")),
            FireMode.PER_EXCHANGE, null, null, 300);
        String json = om.writeValueAsString((AlertCondition) c);
        AlertCondition parsed = om.readValue(json, AlertCondition.class);
        assertThat(parsed).isInstanceOf(ExchangeMatchCondition.class);
    }

    @Test
    void roundtripExchangeMatchCountInWindow() throws Exception {
        var c = new ExchangeMatchCondition(
            new AlertScope("orders", null, null),
            new ExchangeMatchCondition.ExchangeFilter("FAILED", Map.of()),
            FireMode.COUNT_IN_WINDOW, 5, 900, null);
        AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
        assertThat(((ExchangeMatchCondition) parsed).threshold()).isEqualTo(5);
    }

    @Test
    void roundtripAgentState() throws Exception {
        var c = new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60);
        AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
        assertThat(parsed).isInstanceOf(AgentStateCondition.class);
    }

    @Test
    void roundtripDeploymentState() throws Exception {
        var c = new DeploymentStateCondition(new AlertScope("orders", null, null), List.of("FAILED","DEGRADED"));
        AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
        assertThat(parsed).isInstanceOf(DeploymentStateCondition.class);
    }

    @Test
    void roundtripLogPattern() throws Exception {
        var c = new LogPatternCondition(new AlertScope("orders", null, null),
            "ERROR", "TimeoutException", 5, 900);
        AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
        assertThat(parsed).isInstanceOf(LogPatternCondition.class);
    }

    @Test
    void roundtripJvmMetric() throws Exception {
        var c = new JvmMetricCondition(new AlertScope("orders", null, null),
            "heap_used_percent", AggregationOp.MAX, Comparator.GT, 90.0, 300);
        AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
        assertThat(parsed).isInstanceOf(JvmMetricCondition.class);
    }
}
  • Step 2: Run — FAIL.

  • Step 3: Create the sealed hierarchy

// AlertCondition.java
package com.cameleer.server.core.alerting;

import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;

@JsonTypeInfo(use = JsonTypeInfo.Id.DEDUCTION)
@JsonSubTypes({
    @JsonSubTypes.Type(RouteMetricCondition.class),
    @JsonSubTypes.Type(ExchangeMatchCondition.class),
    @JsonSubTypes.Type(AgentStateCondition.class),
    @JsonSubTypes.Type(DeploymentStateCondition.class),
    @JsonSubTypes.Type(LogPatternCondition.class),
    @JsonSubTypes.Type(JvmMetricCondition.class)
})
public sealed interface AlertCondition permits
    RouteMetricCondition, ExchangeMatchCondition, AgentStateCondition,
    DeploymentStateCondition, LogPatternCondition, JvmMetricCondition {

    ConditionKind kind();
    AlertScope scope();
}
// RouteMetricCondition.java
package com.cameleer.server.core.alerting;

public record RouteMetricCondition(
        AlertScope scope,
        RouteMetric metric,
        Comparator comparator,
        double threshold,
        int windowSeconds) implements AlertCondition {
    @Override public ConditionKind kind() { return ConditionKind.ROUTE_METRIC; }
}
// ExchangeMatchCondition.java
package com.cameleer.server.core.alerting;

import java.util.Map;

public record ExchangeMatchCondition(
        AlertScope scope,
        ExchangeFilter filter,
        FireMode fireMode,
        Integer threshold,                // required when COUNT_IN_WINDOW; null for PER_EXCHANGE
        Integer windowSeconds,            // required when COUNT_IN_WINDOW
        Integer perExchangeLingerSeconds  // required when PER_EXCHANGE
) implements AlertCondition {

    public ExchangeMatchCondition {
        if (fireMode == FireMode.COUNT_IN_WINDOW && (threshold == null || windowSeconds == null))
            throw new IllegalArgumentException("COUNT_IN_WINDOW requires threshold + windowSeconds");
        if (fireMode == FireMode.PER_EXCHANGE && perExchangeLingerSeconds == null)
            throw new IllegalArgumentException("PER_EXCHANGE requires perExchangeLingerSeconds");
    }

    @Override public ConditionKind kind() { return ConditionKind.EXCHANGE_MATCH; }

    public record ExchangeFilter(String status, Map<String, String> attributes) {
        public ExchangeFilter { attributes = attributes == null ? Map.of() : Map.copyOf(attributes); }
    }
}
// AgentStateCondition.java
package com.cameleer.server.core.alerting;

public record AgentStateCondition(AlertScope scope, String state, int forSeconds) implements AlertCondition {
    @Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }
}
// DeploymentStateCondition.java
package com.cameleer.server.core.alerting;

import java.util.List;

public record DeploymentStateCondition(AlertScope scope, List<String> states) implements AlertCondition {
    public DeploymentStateCondition { states = List.copyOf(states); }
    @Override public ConditionKind kind() { return ConditionKind.DEPLOYMENT_STATE; }
}
// LogPatternCondition.java
package com.cameleer.server.core.alerting;

public record LogPatternCondition(
        AlertScope scope,
        String level,
        String pattern,
        int threshold,
        int windowSeconds) implements AlertCondition {
    @Override public ConditionKind kind() { return ConditionKind.LOG_PATTERN; }
}
// JvmMetricCondition.java
package com.cameleer.server.core.alerting;

public record JvmMetricCondition(
        AlertScope scope,
        String metric,
        AggregationOp aggregation,
        Comparator comparator,
        double threshold,
        int windowSeconds) implements AlertCondition {
    @Override public ConditionKind kind() { return ConditionKind.JVM_METRIC; }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
        cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertConditionJsonTest.java
git commit -m "feat(alerting): sealed AlertCondition hierarchy with Jackson deduction"

Task 5: Core data records (AlertRule, AlertInstance, AlertSilence, SilenceMatcher, AlertRuleTarget, AlertNotification, WebhookBinding)

Files:

  • Create: the seven records above under .../alerting/

  • Test: cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java

  • Step 1: Write the failing test

package com.cameleer.server.core.alerting;

import org.junit.jupiter.api.Test;
import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;

import static org.assertj.core.api.Assertions.assertThat;

class AlertDomainRecordsTest {

    @Test
    void alertRuleDefensiveCopy() {
        var webhooks = new java.util.ArrayList<WebhookBinding>();
        webhooks.add(new WebhookBinding(UUID.randomUUID(), UUID.randomUUID(), null, null));
        var r = newRule(webhooks);
        webhooks.clear();
        assertThat(r.webhooks()).hasSize(1);
    }

    @Test
    void silenceMatcherAllFieldsNullMatchesEverything() {
        var m = new SilenceMatcher(null, null, null, null, null);
        assertThat(m.isWildcard()).isTrue();
    }

    private AlertRule newRule(List<WebhookBinding> wh) {
        return new AlertRule(
            UUID.randomUUID(), UUID.randomUUID(), "r", null,
            AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
            new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
            60, 0, 60, "t", "m", wh, List.of(),
            Instant.now(), null, null, Map.of(),
            Instant.now(), "u1", Instant.now(), "u1");
    }
}
  • Step 2: Run — FAIL.

  • Step 3: Create the records

// AlertRule.java
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;

public record AlertRule(
        UUID id,
        UUID environmentId,
        String name,
        String description,
        AlertSeverity severity,
        boolean enabled,
        ConditionKind conditionKind,
        AlertCondition condition,
        int evaluationIntervalSeconds,
        int forDurationSeconds,
        int reNotifyMinutes,
        String notificationTitleTmpl,
        String notificationMessageTmpl,
        List<WebhookBinding> webhooks,
        List<AlertRuleTarget> targets,
        Instant nextEvaluationAt,
        String claimedBy,
        Instant claimedUntil,
        Map<String, Object> evalState,
        Instant createdAt,
        String createdBy,
        Instant updatedAt,
        String updatedBy) {

    public AlertRule {
        webhooks  = webhooks  == null ? List.of() : List.copyOf(webhooks);
        targets   = targets   == null ? List.of() : List.copyOf(targets);
        evalState = evalState == null ? Map.of()  : Map.copyOf(evalState);
    }
}
// AlertInstance.java
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;

public record AlertInstance(
        UUID id,
        UUID ruleId,                // nullable after rule deletion
        Map<String, Object> ruleSnapshot,
        UUID environmentId,
        AlertState state,
        AlertSeverity severity,
        Instant firedAt,
        Instant ackedAt,
        String ackedBy,
        Instant resolvedAt,
        Instant lastNotifiedAt,
        boolean silenced,
        Double currentValue,
        Double threshold,
        Map<String, Object> context,
        String title,
        String message,
        List<String> targetUserIds,
        List<UUID> targetGroupIds,
        List<String> targetRoleNames) {

    public AlertInstance {
        ruleSnapshot    = ruleSnapshot    == null ? Map.of() : Map.copyOf(ruleSnapshot);
        context         = context         == null ? Map.of() : Map.copyOf(context);
        targetUserIds   = targetUserIds   == null ? List.of() : List.copyOf(targetUserIds);
        targetGroupIds  = targetGroupIds  == null ? List.of() : List.copyOf(targetGroupIds);
        targetRoleNames = targetRoleNames == null ? List.of() : List.copyOf(targetRoleNames);
    }
}
// AlertRuleTarget.java
package com.cameleer.server.core.alerting;

import java.util.UUID;

public record AlertRuleTarget(UUID id, UUID ruleId, TargetKind kind, String targetId) {}
// WebhookBinding.java
package com.cameleer.server.core.alerting;

import java.util.Map;
import java.util.UUID;

public record WebhookBinding(
        UUID id,
        UUID outboundConnectionId,
        String bodyOverride,
        Map<String, String> headerOverrides) {

    public WebhookBinding {
        headerOverrides = headerOverrides == null ? Map.of() : Map.copyOf(headerOverrides);
    }
}
// SilenceMatcher.java
package com.cameleer.server.core.alerting;

import java.util.UUID;

public record SilenceMatcher(
        UUID ruleId, String appSlug, String routeId, String agentId, AlertSeverity severity) {

    public boolean isWildcard() {
        return ruleId == null && appSlug == null && routeId == null && agentId == null && severity == null;
    }
}
// AlertSilence.java
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.UUID;

public record AlertSilence(
        UUID id,
        UUID environmentId,
        SilenceMatcher matcher,
        String reason,
        Instant startsAt,
        Instant endsAt,
        String createdBy,
        Instant createdAt) {}
// AlertNotification.java
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.Map;
import java.util.UUID;

public record AlertNotification(
        UUID id,
        UUID alertInstanceId,
        UUID webhookId,
        UUID outboundConnectionId,
        NotificationStatus status,
        int attempts,
        Instant nextAttemptAt,
        String claimedBy,
        Instant claimedUntil,
        Integer lastResponseStatus,
        String lastResponseSnippet,
        Map<String, Object> payload,
        Instant deliveredAt,
        Instant createdAt) {

    public AlertNotification {
        payload = payload == null ? Map.of() : Map.copyOf(payload);
    }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/ \
        cameleer-server-core/src/test/java/com/cameleer/server/core/alerting/AlertDomainRecordsTest.java
git commit -m "feat(alerting): core domain records (rule, instance, silence, notification)"

Task 6: Repository interfaces

Files:

  • Create: .../alerting/AlertRuleRepository.java, AlertInstanceRepository.java, AlertSilenceRepository.java, AlertNotificationRepository.java, AlertReadRepository.java

  • No test (pure interfaces — covered by the Phase 3 integration tests).

  • Step 1: Create the interfaces

// AlertRuleRepository.java
package com.cameleer.server.core.alerting;

import java.util.List;
import java.util.Optional;
import java.util.UUID;

public interface AlertRuleRepository {
    AlertRule save(AlertRule rule);                       // upsert by id
    Optional<AlertRule> findById(UUID id);
    List<AlertRule> listByEnvironment(UUID environmentId);
    List<AlertRule> findAllByOutboundConnectionId(UUID connectionId);
    List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId);  // used by rulesReferencing()
    void delete(UUID id);

    /** Claim up to batchSize rules whose next_evaluation_at <= now AND (claimed_until IS NULL OR claimed_until < now).
     *  Atomically sets claimed_by + claimed_until = now + ttl. Returns claimed rules. */
    List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds);

    /** Release claim + bump next_evaluation_at. */
    void releaseClaim(UUID ruleId, java.time.Instant nextEvaluationAt,
                      java.util.Map<String, Object> evalState);
}
// AlertInstanceRepository.java
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;

public interface AlertInstanceRepository {
    AlertInstance save(AlertInstance instance);   // upsert by id
    Optional<AlertInstance> findById(UUID id);
    Optional<AlertInstance> findOpenForRule(UUID ruleId);  // state IN ('PENDING','FIRING','ACKNOWLEDGED')
    List<AlertInstance> listForInbox(UUID environmentId,
                                     List<String> userGroupIdFilter,   // UUIDs as String? decide impl-side
                                     String userId,
                                     List<String> userRoleNames,
                                     int limit);
    long countUnreadForUser(UUID environmentId, String userId);
    void ack(UUID id, String userId, Instant when);
    void resolve(UUID id, Instant when);
    void markSilenced(UUID id, boolean silenced);
    void deleteResolvedBefore(Instant cutoff);
}
// AlertSilenceRepository.java
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;

public interface AlertSilenceRepository {
    AlertSilence save(AlertSilence silence);
    Optional<AlertSilence> findById(UUID id);
    List<AlertSilence> listActive(UUID environmentId, Instant when);
    List<AlertSilence> listByEnvironment(UUID environmentId);
    void delete(UUID id);
}
// AlertNotificationRepository.java
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;

public interface AlertNotificationRepository {
    AlertNotification save(AlertNotification n);
    Optional<AlertNotification> findById(UUID id);
    List<AlertNotification> listForInstance(UUID alertInstanceId);
    List<AlertNotification> claimDueNotifications(String instanceId, int batchSize, int claimTtlSeconds);
    void markDelivered(UUID id, int status, String snippet, Instant when);
    void scheduleRetry(UUID id, Instant nextAttemptAt, int status, String snippet);
    void markFailed(UUID id, int status, String snippet);
    void deleteSettledBefore(Instant cutoff);
}
// AlertReadRepository.java
package com.cameleer.server.core.alerting;

import java.util.List;
import java.util.UUID;

public interface AlertReadRepository {
    void markRead(String userId, UUID alertInstanceId);
    void bulkMarkRead(String userId, List<UUID> alertInstanceIds);
}
  • Step 2: Compile

Run: mvn -pl cameleer-server-core compile Expected: SUCCESS.

  • Step 3: Commit
git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/Alert*Repository.java
git commit -m "feat(alerting): core repository interfaces"

Phase 3 — Postgres repositories

All repositories use JdbcTemplate and ObjectMapper for JSONB columns (same pattern as PostgresOutboundConnectionRepository). Convert UUID[] with ConnectionCallback + Array.of("uuid", ...) and text[] with Array.of("text", ...).

Task 7: PostgresAlertRuleRepository

Files:

  • Create: cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java

  • Test: cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java

  • Step 1: Write the failing integration test

package com.cameleer.server.app.alerting.storage;

import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Test;

import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;

import static org.assertj.core.api.Assertions.assertThat;

class PostgresAlertRuleRepositoryIT extends AbstractPostgresIT {

    private PostgresAlertRuleRepository repo;
    private UUID envId;

    @AfterEach
    void cleanup() {
        jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId);
        jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
        jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-user'");
    }

    @org.junit.jupiter.api.BeforeEach
    void setup() {
        repo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
        envId = UUID.randomUUID();
        jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "test-env-" + UUID.randomUUID());
        jdbcTemplate.update(
            "INSERT INTO users (user_id, username, password_hash, email, enabled) " +
            "VALUES ('test-user', 'test-user', 'x', 'a@b', true)");
    }

    @Test
    void saveAndFindByIdRoundtrip() {
        var rule = newRule(List.of());
        repo.save(rule);
        var found = repo.findById(rule.id()).orElseThrow();
        assertThat(found.name()).isEqualTo(rule.name());
        assertThat(found.condition()).isInstanceOf(AgentStateCondition.class);
    }

    @Test
    void findRuleIdsByOutboundConnectionId() {
        var connId = UUID.randomUUID();
        var wb = new WebhookBinding(UUID.randomUUID(), connId, null, Map.of());
        var rule = newRule(List.of(wb));
        repo.save(rule);

        List<UUID> ids = repo.findRuleIdsByOutboundConnectionId(connId);
        assertThat(ids).containsExactly(rule.id());

        assertThat(repo.findRuleIdsByOutboundConnectionId(UUID.randomUUID())).isEmpty();
    }

    @Test
    void claimDueRulesAtomicSkipLocked() {
        var rule = newRule(List.of());
        repo.save(rule);

        List<AlertRule> claimed = repo.claimDueRules("instance-A", 10, 30);
        assertThat(claimed).hasSize(1);

        // Second claimant sees nothing until first releases or TTL expires
        List<AlertRule> second = repo.claimDueRules("instance-B", 10, 30);
        assertThat(second).isEmpty();
    }

    private AlertRule newRule(List<WebhookBinding> webhooks) {
        return new AlertRule(
            UUID.randomUUID(), envId, "rule-" + UUID.randomUUID(), "desc",
            AlertSeverity.WARNING, true, ConditionKind.AGENT_STATE,
            new AgentStateCondition(new AlertScope(null, null, null), "DEAD", 60),
            60, 0, 60, "t", "m", webhooks, List.of(),
            Instant.now().minusSeconds(10), null, null, Map.of(),
            Instant.now(), "test-user", Instant.now(), "test-user");
    }
}
  • Step 2: Run — FAIL.

  • Step 3: Implement the repository

package com.cameleer.server.app.alerting.storage;

import com.cameleer.server.core.alerting.*;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.postgresql.util.PGobject;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;

import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.sql.Types;
import java.time.Instant;
import java.util.*;

public class PostgresAlertRuleRepository implements AlertRuleRepository {

    private final JdbcTemplate jdbc;
    private final ObjectMapper om;

    public PostgresAlertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
        this.jdbc = jdbc;
        this.om = om;
    }

    @Override
    public AlertRule save(AlertRule r) {
        String sql = """
            INSERT INTO alert_rules (id, environment_id, name, description, severity, enabled,
                condition_kind, condition, evaluation_interval_seconds, for_duration_seconds,
                re_notify_minutes, notification_title_tmpl, notification_message_tmpl,
                webhooks, next_evaluation_at, claimed_by, claimed_until, eval_state,
                created_at, created_by, updated_at, updated_by)
            VALUES (?, ?, ?, ?, ?::severity_enum, ?, ?::condition_kind_enum, ?::jsonb, ?, ?, ?, ?, ?, ?::jsonb,
                ?, ?, ?, ?::jsonb, ?, ?, ?, ?)
            ON CONFLICT (id) DO UPDATE SET
                name = EXCLUDED.name, description = EXCLUDED.description,
                severity = EXCLUDED.severity, enabled = EXCLUDED.enabled,
                condition_kind = EXCLUDED.condition_kind, condition = EXCLUDED.condition,
                evaluation_interval_seconds = EXCLUDED.evaluation_interval_seconds,
                for_duration_seconds = EXCLUDED.for_duration_seconds,
                re_notify_minutes = EXCLUDED.re_notify_minutes,
                notification_title_tmpl = EXCLUDED.notification_title_tmpl,
                notification_message_tmpl = EXCLUDED.notification_message_tmpl,
                webhooks = EXCLUDED.webhooks, eval_state = EXCLUDED.eval_state,
                updated_at = EXCLUDED.updated_at, updated_by = EXCLUDED.updated_by
            """;
        jdbc.update(sql,
            r.id(), r.environmentId(), r.name(), r.description(),
            r.severity().name(), r.enabled(), r.conditionKind().name(),
            writeJson(r.condition()),
            r.evaluationIntervalSeconds(), r.forDurationSeconds(), r.reNotifyMinutes(),
            r.notificationTitleTmpl(), r.notificationMessageTmpl(),
            writeJson(r.webhooks()),
            Timestamp.from(r.nextEvaluationAt()),
            r.claimedBy(),
            r.claimedUntil() == null ? null : Timestamp.from(r.claimedUntil()),
            writeJson(r.evalState()),
            Timestamp.from(r.createdAt()), r.createdBy(),
            Timestamp.from(r.updatedAt()), r.updatedBy());
        return r;
    }

    @Override
    public Optional<AlertRule> findById(UUID id) {
        var list = jdbc.query("SELECT * FROM alert_rules WHERE id = ?", rowMapper(), id);
        return list.isEmpty() ? Optional.empty() : Optional.of(list.get(0));
    }

    @Override
    public List<AlertRule> listByEnvironment(UUID environmentId) {
        return jdbc.query(
            "SELECT * FROM alert_rules WHERE environment_id = ? ORDER BY created_at DESC",
            rowMapper(), environmentId);
    }

    @Override
    public List<AlertRule> findAllByOutboundConnectionId(UUID connectionId) {
        String sql = """
            SELECT * FROM alert_rules
             WHERE webhooks @> ?::jsonb
             ORDER BY created_at DESC
            """;
        String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
        return jdbc.query(sql, rowMapper(), predicate);
    }

    @Override
    public List<UUID> findRuleIdsByOutboundConnectionId(UUID connectionId) {
        String sql = """
            SELECT id FROM alert_rules
             WHERE webhooks @> ?::jsonb
            """;
        String predicate = "[{\"outboundConnectionId\":\"" + connectionId + "\"}]";
        return jdbc.queryForList(sql, UUID.class, predicate);
    }

    @Override
    public void delete(UUID id) {
        jdbc.update("DELETE FROM alert_rules WHERE id = ?", id);
    }

    @Override
    public List<AlertRule> claimDueRules(String instanceId, int batchSize, int claimTtlSeconds) {
        String sql = """
            UPDATE alert_rules
               SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
             WHERE id IN (
                 SELECT id FROM alert_rules
                  WHERE enabled = true
                    AND next_evaluation_at <= now()
                    AND (claimed_until IS NULL OR claimed_until < now())
                  ORDER BY next_evaluation_at
                  LIMIT ?
                  FOR UPDATE SKIP LOCKED
             )
             RETURNING *
            """;
        return jdbc.query(sql, rowMapper(), instanceId, claimTtlSeconds, batchSize);
    }

    @Override
    public void releaseClaim(UUID ruleId, Instant nextEvaluationAt, Map<String, Object> evalState) {
        jdbc.update("""
            UPDATE alert_rules
               SET claimed_by = NULL, claimed_until = NULL,
                   next_evaluation_at = ?, eval_state = ?::jsonb
             WHERE id = ?
            """,
            Timestamp.from(nextEvaluationAt), writeJson(evalState), ruleId);
    }

    private RowMapper<AlertRule> rowMapper() {
        return (rs, i) -> {
            ConditionKind kind = ConditionKind.valueOf(rs.getString("condition_kind"));
            AlertCondition cond = om.readValue(rs.getString("condition"), AlertCondition.class);
            List<WebhookBinding> webhooks = om.readValue(
                rs.getString("webhooks"), new TypeReference<>() {});
            Map<String, Object> evalState = om.readValue(
                rs.getString("eval_state"), new TypeReference<>() {});

            Timestamp cu = rs.getTimestamp("claimed_until");
            return new AlertRule(
                (UUID) rs.getObject("id"),
                (UUID) rs.getObject("environment_id"),
                rs.getString("name"),
                rs.getString("description"),
                AlertSeverity.valueOf(rs.getString("severity")),
                rs.getBoolean("enabled"),
                kind, cond,
                rs.getInt("evaluation_interval_seconds"),
                rs.getInt("for_duration_seconds"),
                rs.getInt("re_notify_minutes"),
                rs.getString("notification_title_tmpl"),
                rs.getString("notification_message_tmpl"),
                webhooks, List.of(),
                rs.getTimestamp("next_evaluation_at").toInstant(),
                rs.getString("claimed_by"),
                cu == null ? null : cu.toInstant(),
                evalState,
                rs.getTimestamp("created_at").toInstant(),
                rs.getString("created_by"),
                rs.getTimestamp("updated_at").toInstant(),
                rs.getString("updated_by"));
        };
    }

    private String writeJson(Object o) {
        try { return om.writeValueAsString(o); }
        catch (Exception e) { throw new IllegalStateException(e); }
    }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepository.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertRuleRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_rules"

Task 8: Wire OutboundConnectionServiceImpl.rulesReferencing() (CRITICAL — Plan 01 gate)

This is the Plan 01 known-incomplete item. Plan 01 shipped rulesReferencing() returning []. Until this task lands, outbound connections can be deleted or narrowed while rules reference them, corrupting production. Do not skip or defer.

Files:

  • Modify: cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java

  • Modify: cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java

  • Test: cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java

  • Step 1: GitNexus impact check

Run gitnexus_impact({target: "OutboundConnectionServiceImpl", direction: "upstream"}). Report blast radius. Expected: controller + bean config + UI hooks (Plan 01). No production paths should be affected by replacing a stub with real behaviour.

  • Step 2: Write the failing integration test
package com.cameleer.server.app.outbound;

import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.*;
import com.cameleer.server.core.outbound.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;

import java.time.Instant;
import java.util.List;
import java.util.Map;
import java.util.UUID;

import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;

class OutboundConnectionServiceRulesReferencingIT extends AbstractPostgresIT {

    @Autowired OutboundConnectionService service;
    @Autowired OutboundConnectionRepository repo;

    private UUID envId;
    private UUID connId;
    private PostgresAlertRuleRepository ruleRepo;

    @BeforeEach
    void seed() {
        ruleRepo = new PostgresAlertRuleRepository(jdbcTemplate, new ObjectMapper());
        envId = UUID.randomUUID();
        jdbcTemplate.update("INSERT INTO environments (id, slug) VALUES (?, ?)", envId, "env-" + UUID.randomUUID());
        jdbcTemplate.update(
            "INSERT INTO users (user_id, username, password_hash, email, enabled) " +
            "VALUES ('u-ref', 'u-ref', 'x', 'a@b', true) ON CONFLICT DO NOTHING");
        var c = repo.save(new OutboundConnection(
            UUID.randomUUID(), "default", "conn", null, "https://example.test",
            OutboundMethod.POST, Map.of(), null, TrustMode.SYSTEM_DEFAULT, List.of(), null,
            OutboundAuth.None.INSTANCE, List.of(),
            Instant.now(), "u-ref", Instant.now(), "u-ref"));
        connId = c.id();

        var rule = new AlertRule(
            UUID.randomUUID(), envId, "r", null, AlertSeverity.WARNING, true,
            ConditionKind.AGENT_STATE,
            new AgentStateCondition(new AlertScope(null,null,null), "DEAD", 60),
            60, 0, 60, "t", "m",
            List.of(new WebhookBinding(UUID.randomUUID(), connId, null, Map.of())),
            List.of(), Instant.now(), null, null, Map.of(),
            Instant.now(), "u-ref", Instant.now(), "u-ref");
        ruleRepo.save(rule);
    }

    @Test
    void deleteConnectionReferencedByRuleReturns409() {
        assertThat(service.rulesReferencing(connId)).hasSize(1);
        assertThatThrownBy(() -> service.delete(connId, "u-ref"))
            .hasMessageContaining("referenced by rules");
    }
}
  • Step 3: Run — FAIL (stub returns empty list, so delete succeeds).

  • Step 4: Replace the stub

In OutboundConnectionServiceImpl.java:

// existing imports + add:
import com.cameleer.server.core.alerting.AlertRuleRepository;

public class OutboundConnectionServiceImpl implements OutboundConnectionService {

    private final OutboundConnectionRepository repo;
    private final AlertRuleRepository ruleRepo;   // NEW
    private final String tenantId;

    public OutboundConnectionServiceImpl(
            OutboundConnectionRepository repo,
            AlertRuleRepository ruleRepo,
            String tenantId) {
        this.repo = repo;
        this.ruleRepo = ruleRepo;
        this.tenantId = tenantId;
    }

    // … create/update/delete/get/list unchanged …

    @Override
    public List<UUID> rulesReferencing(UUID id) {
        return ruleRepo.findRuleIdsByOutboundConnectionId(id);
    }
}

Update OutboundBeanConfig.java to inject AlertRuleRepository:

@Bean
public OutboundConnectionService outboundConnectionService(
        OutboundConnectionRepository repo,
        AlertRuleRepository ruleRepo,
        @Value("${cameleer.server.tenant.id:default}") String tenantId) {
    return new OutboundConnectionServiceImpl(repo, ruleRepo, tenantId);
}

Add the AlertRuleRepository bean in a new AlertingBeanConfig.java stub (completed in Phase 7):

package com.cameleer.server.app.alerting.config;

import com.cameleer.server.app.alerting.storage.PostgresAlertRuleRepository;
import com.cameleer.server.core.alerting.AlertRuleRepository;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;

@Configuration
public class AlertingBeanConfig {
    @Bean
    public AlertRuleRepository alertRuleRepository(JdbcTemplate jdbc, ObjectMapper om) {
        return new PostgresAlertRuleRepository(jdbc, om);
    }
}
  • Step 5: Run — PASS.

  • Step 6: GitNexus detect_changes + commit

# Verify scope
# gitnexus_detect_changes({scope: "staged"})
git add cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/OutboundConnectionServiceImpl.java \
        cameleer-server-app/src/main/java/com/cameleer/server/app/outbound/config/OutboundBeanConfig.java \
        cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/outbound/OutboundConnectionServiceRulesReferencingIT.java
git commit -m "fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)"

Task 9: PostgresAlertInstanceRepository

Files:

  • Create: .../alerting/storage/PostgresAlertInstanceRepository.java

  • Test: .../alerting/storage/PostgresAlertInstanceRepositoryIT.java

  • Step 1: Write the failing test covering: save/findById, findOpenForRule (filter state IN ('PENDING','FIRING','ACKNOWLEDGED')), listForInbox with user/group/role filters (seed 3 instances: one targeting user, one targeting group, one targeting role; assert listForInbox returns all three for a user in those groups/roles), countUnreadForUser (uses LEFT JOIN alert_reads), ack, resolve, deleteResolvedBefore.

  • Step 2: Run — FAIL.

  • Step 3: Implement — same RowMapper pattern as Task 7. Key queries:

-- findOpenForRule
SELECT * FROM alert_instances
 WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED')
 ORDER BY fired_at DESC LIMIT 1;

-- listForInbox (bind userId, groupIds array, roleNames array as ? placeholders)
SELECT * FROM alert_instances
 WHERE environment_id = ?
   AND state IN ('FIRING','ACKNOWLEDGED','RESOLVED')
   AND (
       ? = ANY(target_user_ids)
    OR target_group_ids && ?::uuid[]
    OR target_role_names && ?::text[]
   )
 ORDER BY fired_at DESC LIMIT ?;

-- countUnreadForUser
SELECT count(*) FROM alert_instances ai
 WHERE ai.environment_id = ?
   AND ai.state IN ('FIRING','ACKNOWLEDGED')
   AND (
       ? = ANY(ai.target_user_ids)
    OR ai.target_group_ids && ?::uuid[]
    OR ai.target_role_names && ?::text[]
   )
   AND NOT EXISTS (
       SELECT 1 FROM alert_reads ar
        WHERE ar.alert_instance_id = ai.id AND ar.user_id = ?
   );

Array binding via connection.createArrayOf("uuid", uuids) / createArrayOf("text", names) inside a ConnectionCallback.

  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepository.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/PostgresAlertInstanceRepositoryIT.java
git commit -m "feat(alerting): Postgres repository for alert_instances with inbox queries"

Task 10: PostgresAlertSilenceRepository, PostgresAlertNotificationRepository, PostgresAlertReadRepository

Files:

  • Create: three repositories under .../alerting/storage/

  • Test: one IT per repository in .../alerting/storage/

  • Step 1: Write all three failing ITs (one file each). Cover:

    • Silence: save/findById, listActive filters by now BETWEEN starts_at AND ends_at, delete.
    • Notification: save/findById, claimDueNotifications (SKIP LOCKED), scheduleRetry bumps attempts + next_attempt_at, markDelivered + markFailed transition status, deleteSettledBefore purges DELIVERED + FAILED.
    • Read: markRead is idempotent (uses ON CONFLICT DO NOTHING), bulkMarkRead handles empty list.
  • Step 2: Run — FAIL.

  • Step 3: Implement following the same JdbcTemplate pattern. Notification claim query mirrors Task 7's rule claim:

UPDATE alert_notifications
   SET claimed_by = ?, claimed_until = now() + (? || ' seconds')::interval
 WHERE id IN (
   SELECT id FROM alert_notifications
    WHERE status = 'PENDING'
      AND next_attempt_at <= now()
      AND (claimed_until IS NULL OR claimed_until < now())
    ORDER BY next_attempt_at
    LIMIT ?
    FOR UPDATE SKIP LOCKED
 )
 RETURNING *;
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/storage/ \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/storage/Postgres*IT.java
git commit -m "feat(alerting): Postgres repositories for silences, notifications, reads"

Task 11: Wire all alerting repositories in AlertingBeanConfig

Files:

  • Modify: .../alerting/config/AlertingBeanConfig.java

  • Step 1: Add beans for the remaining repositories

@Bean public AlertInstanceRepository alertInstanceRepository(JdbcTemplate jdbc, ObjectMapper om) {
    return new PostgresAlertInstanceRepository(jdbc, om);
}
@Bean public AlertSilenceRepository alertSilenceRepository(JdbcTemplate jdbc, ObjectMapper om) {
    return new PostgresAlertSilenceRepository(jdbc, om);
}
@Bean public AlertNotificationRepository alertNotificationRepository(JdbcTemplate jdbc, ObjectMapper om) {
    return new PostgresAlertNotificationRepository(jdbc, om);
}
@Bean public AlertReadRepository alertReadRepository(JdbcTemplate jdbc) {
    return new PostgresAlertReadRepository(jdbc);
}
  • Step 2: Verify compile + existing ITs still pass
mvn -pl cameleer-server-app test -Dtest='PostgresAlert*IT'
  • Step 3: Commit
git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingBeanConfig.java
git commit -m "feat(alerting): wire all alerting repository beans"

Phase 4 — ClickHouse reads: new count methods and projections

Task 12: Add ClickHouseLogStore.countLogs(LogSearchRequest)

Files:

  • Modify: cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java

  • Test: cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java

  • Step 1: GitNexus impact check

Run gitnexus_impact({target: "ClickHouseLogStore", direction: "upstream"}). Expected callers: LogQueryController, ContainerLogForwarder, ClickHouseConfig. Adding a method is non-breaking — no downstream callers affected.

  • Step 2: Write the failing test
package com.cameleer.server.app.search;

import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.core.search.LogSearchRequest;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;

import java.time.Instant;
import java.util.List;

import static org.assertj.core.api.Assertions.assertThat;

class ClickHouseLogStoreCountIT extends AbstractPostgresIT {

    @Autowired ClickHouseLogStore store;

    @Test
    void countLogsRespectsLevelPatternAndWindow() {
        // Seed 3 ERROR TimeoutException + 2 INFO rows in 'orders' app for env 'dev' within last 5 min
        // (seed helper uses existing `indexBatch` path)
        long count = store.countLogs(new LogSearchRequest(
            /* environment */ "dev",
            /* application */ "orders",
            /* agentId */ null,
            /* exchangeId */ null,
            /* logger */ null,
            /* sources */ List.of(),
            /* levels */ List.of("ERROR"),
            /* q */ "TimeoutException",
            /* from */ Instant.now().minusSeconds(300),
            /* to */ Instant.now(),
            /* cursor */ null,
            /* limit */ 100,
            /* sort */ "desc"
        ));
        assertThat(count).isEqualTo(3);
    }
}

(Adjust LogSearchRequest constructor to the actual record signature — check cameleer-server-core/src/main/java/com/cameleer/server/core/search/LogSearchRequest.java for exact order.)

  • Step 3: Run — FAIL.

  • Step 4: Implement the method

In ClickHouseLogStore.java, add a new public method. Reuse the WHERE-clause builder already used by search(LogSearchRequest), but:

  • No FINAL.
  • Skip cursor, limit, sort.
  • SELECT count() FROM logs WHERE <tenant + env + app + level IN (...) + logger + q LIKE + timestamp BETWEEN>.
  • Include the tenant_id = ? predicate.
public long countLogs(LogSearchRequest request) {
    StringBuilder where = new StringBuilder("tenant_id = ? AND timestamp BETWEEN ? AND ?");
    List<Object> args = new ArrayList<>();
    args.add(tenantId);
    args.add(Timestamp.from(request.from()));
    args.add(Timestamp.from(request.to()));
    if (request.environment() != null) { where.append(" AND environment = ?"); args.add(request.environment()); }
    if (request.application() != null) { where.append(" AND application = ?"); args.add(request.application()); }
    // … level multi, logger, q (positionCaseInsensitive(message, ?) > 0), exchangeId, agentId …
    String sql = "SELECT count() FROM logs WHERE " + where;  // NO FINAL
    Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
    return n == null ? 0L : n;
}

(Imports: java.sql.Timestamp, java.util.ArrayList.)

  • Step 5: Run — PASS.

  • Step 6: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseLogStore.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseLogStoreCountIT.java
git commit -m "feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator"

Task 13: Add ClickHouseSearchIndex.countExecutionsForAlerting(AlertMatchSpec)

Files:

  • Create: cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java

  • Modify: cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java

  • Test: cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java

  • Step 1: GitNexus impact check

Run gitnexus_impact({target: "ClickHouseSearchIndex", direction: "upstream"}). Additive method — no downstream breakage.

  • Step 2: Create AlertMatchSpec record
package com.cameleer.server.core.alerting;

import java.time.Instant;
import java.util.Map;

/** Specification for alerting-specific execution counting.
 *  Distinct from SearchRequest: no text-in-body subqueries, no cursor, no FINAL.
 *  All fields except tenant/env/from/to are nullable filters. */
public record AlertMatchSpec(
        String tenantId,
        String environment,
        String applicationId,       // nullable
        String routeId,             // nullable
        String status,              // "FAILED" / "SUCCESS" / null
        Map<String, String> attributes,  // exact match on execution attribute key=value
        Instant from,
        Instant to,
        Instant after             // nullable; used by PER_EXCHANGE to advance cursor
) {
    public AlertMatchSpec {
        attributes = attributes == null ? Map.of() : Map.copyOf(attributes);
    }
}
  • Step 3: Write the failing test — seed a mix of FAILED/SUCCESS executions with various attribute maps, assert count matches.

  • Step 4: Run — FAIL.

  • Step 5: Implement on ClickHouseSearchIndex

public long countExecutionsForAlerting(AlertMatchSpec spec) {
    StringBuilder where = new StringBuilder(
        "tenant_id = ? AND environment = ? AND start_time BETWEEN ? AND ?");
    List<Object> args = new ArrayList<>();
    args.add(spec.tenantId());
    args.add(spec.environment());
    args.add(Timestamp.from(spec.from()));
    args.add(Timestamp.from(spec.to()));
    if (spec.applicationId() != null) { where.append(" AND application_id = ?"); args.add(spec.applicationId()); }
    if (spec.routeId() != null)       { where.append(" AND route_id = ?");        args.add(spec.routeId()); }
    if (spec.status() != null)        { where.append(" AND status = ?");          args.add(spec.status()); }
    if (spec.after() != null) {
        where.append(" AND start_time > ?");
        args.add(Timestamp.from(spec.after()));
    }
    // attribute filters: use Map column access — pattern matches existing search() impl
    for (var e : spec.attributes().entrySet()) {
        where.append(" AND attributes[?] = ?");
        args.add(e.getKey());
        args.add(e.getValue());
    }
    String sql = "SELECT count() FROM executions WHERE " + where;  // NO FINAL
    Long n = jdbc.queryForObject(sql, Long.class, args.toArray());
    return n == null ? 0L : n;
}
  • Step 6: Run — PASS.

  • Step 7: Commit

git add cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertMatchSpec.java \
        cameleer-server-app/src/main/java/com/cameleer/server/app/search/ClickHouseSearchIndex.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/search/ClickHouseSearchIndexAlertingCountIT.java
git commit -m "feat(alerting): countExecutionsForAlerting for exchange-match evaluator"

Task 14: ClickHouse projections migration

Files:

  • Create: cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql

  • Modify: the schema initializer invocation site (likely ClickHouseConfig or ClickHouseSchemaInitializer) to also run this file on startup.

  • Step 1: Write the SQL file

-- Additive, idempotent. Safe to drop + rebuild with no data loss.
ALTER TABLE executions
  ADD PROJECTION IF NOT EXISTS alerting_app_status
  (SELECT * ORDER BY (tenant_id, environment, application_id, status, start_time));

ALTER TABLE executions
  ADD PROJECTION IF NOT EXISTS alerting_route_status
  (SELECT * ORDER BY (tenant_id, environment, route_id, status, start_time));

ALTER TABLE logs
  ADD PROJECTION IF NOT EXISTS alerting_app_level
  (SELECT * ORDER BY (tenant_id, environment, application, level, timestamp));

ALTER TABLE agent_metrics
  ADD PROJECTION IF NOT EXISTS alerting_instance_metric
  (SELECT * ORDER BY (tenant_id, environment, instance_id, metric_name, collected_at));

ALTER TABLE executions    MATERIALIZE PROJECTION alerting_app_status;
ALTER TABLE executions    MATERIALIZE PROJECTION alerting_route_status;
ALTER TABLE logs          MATERIALIZE PROJECTION alerting_app_level;
ALTER TABLE agent_metrics MATERIALIZE PROJECTION alerting_instance_metric;

(Adjust table column names to match real init.sql — confirm application vs application_id on the logs and agent_metrics tables.)

  • Step 2: Hook into ClickHouseSchemaInitializer

Find the initializer and add a second invocation:

runIdempotent("clickhouse/init.sql");
runIdempotent("clickhouse/alerting_projections.sql");
  • Step 3: Add a smoke IT
@Test
void projectionsExistAfterStartup() {
    var names = jdbcTemplate.queryForList(
        "SELECT name FROM system.projections WHERE table IN ('executions','logs','agent_metrics')",
        String.class);
    assertThat(names).contains(
        "alerting_app_status","alerting_route_status","alerting_app_level","alerting_instance_metric");
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/resources/clickhouse/alerting_projections.sql \
        cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseConfig.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/search/AlertingProjectionsIT.java
git commit -m "feat(alerting): ClickHouse projections for alerting read paths"

Phase 5 — Mustache templating and silence matching

Task 15: Add JMustache dependency

Files:

  • Modify: cameleer-server-core/pom.xml

  • Step 1: Add dependency

<dependency>
    <groupId>com.samskivert</groupId>
    <artifactId>jmustache</artifactId>
    <version>1.16</version>
</dependency>
  • Step 2: Verify resolve

Run: mvn -pl cameleer-server-core dependency:resolve

  • Step 3: Commit
git add cameleer-server-core/pom.xml
git commit -m "chore(alerting): add jmustache 1.16"

Task 16: MustacheRenderer

Files:

  • Create: cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java

  • Test: cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java

  • Step 1: Write the failing test

package com.cameleer.server.app.alerting.notify;

import org.junit.jupiter.api.Test;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;

class MustacheRendererTest {

    private final MustacheRenderer r = new MustacheRenderer();

    @Test
    void rendersSimpleTemplate() {
        String out = r.render("Hello {{name}}", Map.of("name", "world"));
        assertThat(out).isEqualTo("Hello world");
    }

    @Test
    void rendersNestedPath() {
        String out = r.render("{{alert.severity}}", Map.of("alert", Map.of("severity","CRITICAL")));
        assertThat(out).isEqualTo("CRITICAL");
    }

    @Test
    void missingVariableRendersLiteral() {
        String out = r.render("{{missing.path}}", Map.of());
        assertThat(out).isEqualTo("{{missing.path}}");
    }

    @Test
    void malformedTemplateReturnsRawWithWarn() {
        String out = r.render("{{unclosed", Map.of("unclosed","x"));
        assertThat(out).isEqualTo("{{unclosed");
    }
}
  • Step 2: Run — FAIL.

  • Step 3: Implement

package com.cameleer.server.app.alerting.notify;

import com.samskivert.mustache.Mustache;
import com.samskivert.mustache.Template;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;

import java.util.Map;

@Component
public class MustacheRenderer {

    private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);

    private final Mustache.Compiler compiler = Mustache.compiler()
        .nullValue("")
        .emptyStringIsFalse(true)
        .defaultValue(null)  // null triggers MissingContext -> we intercept below
        .escapeHTML(false);

    public String render(String template, Map<String, Object> context) {
        if (template == null) return "";
        try {
            Template t = compiler.compile(template);
            return t.execute(new LiteralFallbackContext(context));
        } catch (Exception e) {
            log.warn("Mustache render failed for template='{}': {}", abbreviate(template), e.getMessage());
            return template;
        }
    }

    /** Returns `{{path}}` literal when a variable is missing. */
    private static class LiteralFallbackContext {
        private final Map<String, Object> map;
        LiteralFallbackContext(Map<String, Object> map) { this.map = map; }
        // JMustache uses reflection / Map lookup, so we rely on wrapping the missing-value callback:
        // easiest approach: compile with a custom `Mustache.Compiler.Loader` and intercept resolution.
        // Simpler: post-process the output to detect unresolved `{{}}` sections → not possible after render.
        // Alternative: pre-flight — scan template tokens against context and replace unresolved tokens
        // with the literal before compilation. Use this simple approach:
    }
}

Simpler implementation (ships for v1):

@Component
public class MustacheRenderer {

    private static final Logger log = LoggerFactory.getLogger(MustacheRenderer.class);
    private static final java.util.regex.Pattern TOKEN =
        java.util.regex.Pattern.compile("\\{\\{\\s*([a-zA-Z0-9_.]+)\\s*}}");

    private final Mustache.Compiler compiler = Mustache.compiler()
        .defaultValue("")
        .escapeHTML(false);

    public String render(String template, Map<String, Object> context) {
        if (template == null) return "";
        String resolved = preResolve(template, context);
        try {
            return compiler.compile(resolved).execute(context);
        } catch (Exception e) {
            log.warn("Mustache render failed: {}", e.getMessage());
            return template;
        }
    }

    /** Replaces `{{missing.path}}` with the literal so Mustache sees a non-tag string. */
    private String preResolve(String template, Map<String, Object> context) {
        var m = TOKEN.matcher(template);
        var sb = new StringBuilder();
        while (m.find()) {
            String path = m.group(1);
            if (resolvePath(context, path) == null) {
                m.appendReplacement(sb, java.util.regex.Matcher.quoteReplacement("{{" + path + "}}"));
                // Replace the {{}} with {{{ literal }}} once we escape it — but jmustache will not re-process.
                // Simpler: just wrap in a triple-brace or surround with a marker. For v1 we skip the double-expand:
                // we return the LITERAL inside a section {{#_literal_123}}... so preResolve returns a string
                // that Mustache will not modify. Concrete approach:
            }
        }
        m.appendTail(sb);
        return sb.toString();
    }

    private Object resolvePath(Map<String, Object> ctx, String path) {
        Object cur = ctx;
        for (String seg : path.split("\\.")) {
            if (!(cur instanceof Map<?,?> m)) return null;
            cur = m.get(seg);
            if (cur == null) return null;
        }
        return cur;
    }
}

Engineer note: Prefer a pre-compile token substitution that replaces {{missing.path}} with a literal that Mustache renders as-is. One working approach: write a custom Mustache.VariableFetcher via compiler.withFormatter(...) — but JMustache's Mustache.Compiler#withCollector() is easier. Confirm during implementation and adjust this task; the tests in Step 1 lock the contract. If JMustache's API makes missing-variable fallback awkward, fall back to a regex-based substitutor that does {{⟦MUSTACHE_LITERAL:path⟧ for missing paths, then post-replace after render. The contract is: unresolved {{x}} renders as literal {{x}}.

  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/MustacheRenderer.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/MustacheRendererTest.java
git commit -m "feat(alerting): MustacheRenderer with literal fallback on missing vars"

Task 17: NotificationContextBuilder

Files:

  • Create: .../alerting/notify/NotificationContextBuilder.java

  • Test: .../alerting/notify/NotificationContextBuilderTest.java

  • Step 1: Write the failing test covering:

    • env / rule / alert subtrees always present
    • conditional trees: exchange.* present only for EXCHANGE_MATCH, log.* only for LOG_PATTERN, etc.
    • alert.link uses the configured cameleer.server.ui-origin prefix if present, else /alerts/inbox/{id}.
  • Step 2: Run — FAIL.

  • Step 3: Implement — pure static Map<String,Object> build(AlertRule, AlertInstance, Environment, String uiOrigin).

  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilder.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/NotificationContextBuilderTest.java
git commit -m "feat(alerting): NotificationContextBuilder for template context maps"

Task 18: SilenceMatcher evaluator

Files:

  • Create: .../alerting/notify/SilenceMatcherService.java (named to avoid clash with core record SilenceMatcher)

  • Test: .../alerting/notify/SilenceMatcherServiceTest.java

  • Step 1: Write the failing test covering truth table:

    • Wildcard matcher → matches any instance.
    • Matcher with ruleId only → matches only instances with that rule.
    • Multiple fields → AND logic.
    • Active-window check at notification time (not at eval time).
  • Step 2: Run — FAIL.

  • Step 3: Implement

@Component
public class SilenceMatcherService {

    public boolean matches(SilenceMatcher m, AlertInstance instance, AlertRule rule) {
        if (m.ruleId()  != null && !m.ruleId().equals(instance.ruleId())) return false;
        if (m.severity()!= null && m.severity() != instance.severity()) return false;
        if (m.appSlug() != null && !m.appSlug().equals(rule.condition().scope().appSlug())) return false;
        if (m.routeId() != null && !m.routeId().equals(rule.condition().scope().routeId())) return false;
        if (m.agentId() != null && !m.agentId().equals(rule.condition().scope().agentId())) return false;
        return true;
    }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/notify/SilenceMatcherService.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/notify/SilenceMatcherServiceTest.java
git commit -m "feat(alerting): silence matcher for notification-time dispatch"

Phase 6 — Condition evaluators

All six evaluators share this shape:

public sealed interface ConditionEvaluator<C extends AlertCondition>
        permits RouteMetricEvaluator, ExchangeMatchEvaluator, AgentStateEvaluator,
                DeploymentStateEvaluator, LogPatternEvaluator, JvmMetricEvaluator {

    ConditionKind kind();
    EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}

Supporting types (create these in Task 19 before implementing individual evaluators).

Task 19: EvalContext, EvalResult, TickCache, PerKindCircuitBreaker, ConditionEvaluator interface

Files:

  • Create: .../alerting/eval/EvalContext.java, EvalResult.java, TickCache.java, PerKindCircuitBreaker.java, ConditionEvaluator.java

  • Test: .../alerting/eval/TickCacheTest.java, PerKindCircuitBreakerTest.java

  • Step 1: Write the failing tests

// TickCacheTest.java
@Test
void getOrComputeCachesWithinTick() {
    var cache = new TickCache();
    int n = cache.getOrCompute("k", () -> 42);
    int m = cache.getOrCompute("k", () -> 43);
    assertThat(n).isEqualTo(42);
    assertThat(m).isEqualTo(42);  // cached
}

// PerKindCircuitBreakerTest.java
@Test
void opensAfterFailThreshold() {
    var cb = new PerKindCircuitBreaker(5, 30, 60, java.time.Clock.fixed(...));
    for (int i = 0; i < 5; i++) cb.recordFailure(ConditionKind.AGENT_STATE);
    assertThat(cb.isOpen(ConditionKind.AGENT_STATE)).isTrue();
}

@Test
void closesAfterCooldown() { /* advance clock beyond cooldown window */ }
  • Step 2: Implement
// EvalContext.java
package com.cameleer.server.app.alerting.eval;
import java.time.Instant;
public record EvalContext(String tenantId, Instant now, TickCache tickCache) {}
// EvalResult.java
package com.cameleer.server.app.alerting.eval;
import java.util.Map;

public sealed interface EvalResult {
    record Firing(Double currentValue, Double threshold, Map<String, Object> context) implements EvalResult {
        public Firing { context = context == null ? Map.of() : Map.copyOf(context); }
    }
    record Clear() implements EvalResult {
        public static final Clear INSTANCE = new Clear();
    }
    record Error(Throwable cause) implements EvalResult {}
}
// TickCache.java
package com.cameleer.server.app.alerting.eval;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Supplier;

public class TickCache {
    private final ConcurrentHashMap<String, Object> map = new ConcurrentHashMap<>();
    @SuppressWarnings("unchecked")
    public <T> T getOrCompute(String key, Supplier<T> supplier) {
        return (T) map.computeIfAbsent(key, k -> supplier.get());
    }
}
// PerKindCircuitBreaker.java
package com.cameleer.server.app.alerting.eval;

import com.cameleer.server.core.alerting.ConditionKind;

import java.time.Clock;
import java.time.Duration;
import java.time.Instant;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

public class PerKindCircuitBreaker {

    private record State(Deque<Instant> failures, Instant openUntil) {}

    private final int threshold;
    private final Duration window;
    private final Duration cooldown;
    private final Clock clock;
    private final ConcurrentHashMap<ConditionKind, State> byKind = new ConcurrentHashMap<>();

    public PerKindCircuitBreaker(int threshold, int windowSeconds, int cooldownSeconds, Clock clock) {
        this.threshold = threshold;
        this.window = Duration.ofSeconds(windowSeconds);
        this.cooldown = Duration.ofSeconds(cooldownSeconds);
        this.clock = clock;
    }

    public void recordFailure(ConditionKind kind) {
        byKind.compute(kind, (k, s) -> {
            var deque = (s == null) ? new ArrayDeque<Instant>() : new ArrayDeque<>(s.failures());
            Instant now = Instant.now(clock);
            Instant cutoff = now.minus(window);
            while (!deque.isEmpty() && deque.peekFirst().isBefore(cutoff)) deque.pollFirst();
            deque.addLast(now);
            Instant openUntil = (deque.size() >= threshold) ? now.plus(cooldown) : null;
            return new State(deque, openUntil);
        });
    }

    public boolean isOpen(ConditionKind kind) {
        State s = byKind.get(kind);
        return s != null && s.openUntil() != null && Instant.now(clock).isBefore(s.openUntil());
    }

    public void recordSuccess(ConditionKind kind) {
        byKind.compute(kind, (k, s) -> new State(new ArrayDeque<>(), null));
    }
}
// ConditionEvaluator.java
package com.cameleer.server.app.alerting.eval;

import com.cameleer.server.core.alerting.*;

public interface ConditionEvaluator<C extends AlertCondition> {
    ConditionKind kind();
    EvalResult evaluate(C condition, AlertRule rule, EvalContext ctx);
}

(sealed permits … is omitted on the interface to avoid a multi-file compile-order gotcha during the TDD sequence. The effective constraint is enforced by the dispatcher's switch over ConditionKind.)

  • Step 3: Run — PASS.

  • Step 4: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/ \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/
git commit -m "feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)"

Task 20: AgentStateEvaluator

Files:

  • Create: .../alerting/eval/AgentStateEvaluator.java

  • Test: .../alerting/eval/AgentStateEvaluatorTest.java

  • Step 1: Write the failing test

@Test
void firesWhenAnyAgentInTargetStateForScope() {
    var registry = mock(AgentRegistryService.class);
    when(registry.findAll()).thenReturn(List.of(
        new AgentInfo("a1","a1","orders", "env-uuid","1.0", List.of(), Map.of(),
            AgentState.DEAD, Instant.now().minusSeconds(120), Instant.now().minusSeconds(120), null)
    ));
    var eval = new AgentStateEvaluator(registry);
    var rule = ruleWith(new AgentStateCondition(new AlertScope("orders", null, null), "DEAD", 60));
    EvalResult r = eval.evaluate((AgentStateCondition) rule.condition(), rule,
        new EvalContext("default", Instant.now(), new TickCache()));
    assertThat(r).isInstanceOf(EvalResult.Firing.class);
}

@Test
void clearWhenNoMatchingAgents() { /* ... */ }
  • Step 2: Run — FAIL.

  • Step 3: Implement

@Component
public class AgentStateEvaluator implements ConditionEvaluator<AgentStateCondition> {

    private final AgentRegistryService registry;

    public AgentStateEvaluator(AgentRegistryService registry) { this.registry = registry; }

    @Override public ConditionKind kind() { return ConditionKind.AGENT_STATE; }

    @Override
    public EvalResult evaluate(AgentStateCondition c, AlertRule rule, EvalContext ctx) {
        AgentState target = AgentState.valueOf(c.state());
        Instant cutoff = ctx.now().minusSeconds(c.forSeconds());
        List<AgentInfo> hits = registry.findAll().stream()
            .filter(a -> matchesScope(a, c.scope()))
            .filter(a -> a.state() == target)
            .filter(a -> a.lastHeartbeat() != null && a.lastHeartbeat().isBefore(cutoff))
            .toList();
        if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
        AgentInfo first = hits.get(0);
        return new EvalResult.Firing(
            (double) hits.size(), null,
            Map.of("agent", Map.of(
                "id", first.instanceId(),
                "name", first.displayName(),
                "state", first.state().name()
            ), "app", Map.of("slug", first.applicationId())));
    }

    private static boolean matchesScope(AgentInfo a, AlertScope s) {
        if (s.appSlug() != null && !s.appSlug().equals(a.applicationId())) return false;
        if (s.agentId() != null && !s.agentId().equals(a.instanceId())) return false;
        return true;
    }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluator.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AgentStateEvaluatorTest.java
git commit -m "feat(alerting): AGENT_STATE evaluator"

Task 21: DeploymentStateEvaluator

Files:

  • Create: .../alerting/eval/DeploymentStateEvaluator.java

  • Test: .../alerting/eval/DeploymentStateEvaluatorTest.java

  • Step 1: Write the failing testFAILED deployment for matching app → Firing; RUNNING → Clear.

  • Step 2: Run — FAIL.

  • Step 3: Implement — read via DeploymentRepository.findByAppId and AppService.getByEnvironmentAndSlug:

@Override
public EvalResult evaluate(DeploymentStateCondition c, AlertRule rule, EvalContext ctx) {
    App app = appService.getByEnvironmentAndSlug(rule.environmentId(), c.scope().appSlug()).orElse(null);
    if (app == null) return EvalResult.Clear.INSTANCE;
    List<Deployment> current = deploymentRepo.findByAppId(app.id());
    Set<String> wanted = Set.copyOf(c.states());
    var hits = current.stream()
        .filter(d -> wanted.contains(d.status().name()))
        .toList();
    if (hits.isEmpty()) return EvalResult.Clear.INSTANCE;
    Deployment d = hits.get(0);
    return new EvalResult.Firing((double) hits.size(), null,
        Map.of("deployment", Map.of("id", d.id().toString(), "status", d.status().name()),
               "app", Map.of("slug", app.slug())));
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): DEPLOYMENT_STATE evaluator"

Task 22: RouteMetricEvaluator

Files:

  • Create: .../alerting/eval/RouteMetricEvaluator.java

  • Test: .../alerting/eval/RouteMetricEvaluatorTest.java

  • Step 1: Write the failing test — mock StatsStore, seed ExecutionStats{p99Ms = 2500, ...} for a scoped call, assert Firing with currentValue = 2500, threshold = 2000.

  • Step 2: Run — FAIL.

  • Step 3: Implement — dispatch on RouteMetric enum:

@Override
public EvalResult evaluate(RouteMetricCondition c, AlertRule rule, EvalContext ctx) {
    Instant from = ctx.now().minusSeconds(c.windowSeconds());
    Instant to = ctx.now();

    String env = environmentService.findById(rule.environmentId()).map(Environment::slug).orElse(null);
    ExecutionStats stats = (c.scope().routeId() != null)
        ? statsStore.statsForRoute(from, to, c.scope().routeId(), c.scope().appSlug(), env)
        : (c.scope().appSlug() != null)
            ? statsStore.statsForApp(from, to, c.scope().appSlug(), env)
            : statsStore.stats(from, to, env);

    double actual = switch (c.metric()) {
        case ERROR_RATE    -> errorRate(stats);
        case P95_LATENCY_MS -> stats.p95DurationMs();
        case P99_LATENCY_MS -> stats.p99DurationMs();
        case THROUGHPUT    -> stats.totalCount();
        case ERROR_COUNT   -> stats.failedCount();
    };

    boolean fire = switch (c.comparator()) {
        case GT  -> actual >  c.threshold();
        case GTE -> actual >= c.threshold();
        case LT  -> actual <  c.threshold();
        case LTE -> actual <= c.threshold();
        case EQ  -> actual == c.threshold();
    };

    if (!fire) return EvalResult.Clear.INSTANCE;
    return new EvalResult.Firing(actual, c.threshold(),
        Map.of("route", Map.of("id", c.scope().routeId() == null ? "" : c.scope().routeId()),
               "app",   Map.of("slug", c.scope().appSlug() == null ? "" : c.scope().appSlug())));
}

private double errorRate(ExecutionStats s) {
    long total = s.totalCount();
    return total == 0 ? 0.0 : (double) s.failedCount() / total;
}

(Adjust method names on ExecutionStats to match the actual record — use gitnexus_context({name: "ExecutionStats"}) if unsure.)

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): ROUTE_METRIC evaluator"

Task 23: LogPatternEvaluator

Files:

  • Create: .../alerting/eval/LogPatternEvaluator.java

  • Test: .../alerting/eval/LogPatternEvaluatorTest.java

  • Step 1: Write the failing test — mock ClickHouseLogStore.countLogs returning 7; threshold 5 → Firing; returning 3 → Clear.

  • Step 2: Run — FAIL.

  • Step 3: Implement — build a LogSearchRequest from the condition + window, delegate to countLogs. Use TickCache keyed on (env, app, level, pattern, windowStart, windowEnd) to coalesce.

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): LOG_PATTERN evaluator"

Task 24: JvmMetricEvaluator

Files:

  • Create: .../alerting/eval/JvmMetricEvaluator.java

  • Test: .../alerting/eval/JvmMetricEvaluatorTest.java

  • Step 1: Write the failing test — mock MetricsQueryStore.queryTimeSeries for ("agent-1", ["heap_used_percent"], from, to, 1) returning {heap_used_percent: [Bucket{max=95.0}]}; assert Firing with currentValue=95.

  • Step 2: Run — FAIL.

  • Step 3: Implement — aggregate across buckets per AggregationOp (MAX/MIN/AVG/LATEST), compare against threshold.

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): JVM_METRIC evaluator"

Task 25: ExchangeMatchEvaluator (PER_EXCHANGE + COUNT_IN_WINDOW)

Files:

  • Create: .../alerting/eval/ExchangeMatchEvaluator.java

  • Test: .../alerting/eval/ExchangeMatchEvaluatorTest.java

  • Step 1: Write the failing test — two variants:

    • COUNT_IN_WINDOW: mock ClickHouseSearchIndex.countExecutionsForAlerting → threshold check.
    • PER_EXCHANGE: eval_state.lastExchangeTs cursor advancement. Seed 3 matching exchanges; first eval returns all 3 as separate Firings (emit a list? or change signature?). For v1 simplicity, the evaluator returns EvalResult.Firing with an internal list of exchange descriptors in the context map; the job handles one-alert-per-exchange fan-out.
  • Step 2: Run — FAIL.

  • Step 3: Implement. The key design decision is how PER_EXCHANGE returns multiple alerts. Simplest approach: extend EvalResult with a Batch variant:

record Batch(List<Firing> firings) implements EvalResult { ... }

Add this to EvalResult.java (Task 19). The job (Task 27) detects Batch and creates one AlertInstance per Firing. This keeps non-batched evaluators simple.

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): EXCHANGE_MATCH evaluator with per-exchange + count modes"

Phase 7 — Evaluator job and state transitions

Task 26: AlertingProperties + AlertStateTransitions

Files:

  • Create: cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java

  • Create: .../alerting/eval/AlertStateTransitions.java

  • Test: .../alerting/eval/AlertStateTransitionsTest.java

  • Step 1: Write the failing test for the pure state machine:

@Test
void clearWithNoOpenInstanceIsNoOp() {
    var next = AlertStateTransitions.apply(null, EvalResult.Clear.INSTANCE, rule, now);
    assertThat(next).isEmpty();
}

@Test
void firingWithNoOpenInstanceCreatesPendingIfForDuration() {
    var rule = ruleBuilder().forDurationSeconds(60).build();
    var result = new EvalResult.Firing(2500.0, 2000.0, Map.of());
    var next = AlertStateTransitions.apply(null, result, rule, now);
    assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.PENDING));
}

@Test
void firingWithNoForDurationGoesStraightToFiring() {
    var rule = ruleBuilder().forDurationSeconds(0).build();
    var next = AlertStateTransitions.apply(null, new EvalResult.Firing(1.0, null, Map.of()), rule, now);
    assertThat(next).hasValueSatisfying(i -> assertThat(i.state()).isEqualTo(AlertState.FIRING));
}

@Test
void pendingPromotesToFiringAfterForDuration() { /* ... */ }

@Test
void firingClearTransitionsToResolved() { /* ... */ }

@Test
void ackedInstanceClearsToResolved() { /* preserves acked_by, sets resolved_at */ }
  • Step 2: Run — FAIL.

  • Step 3: Implement

// AlertStateTransitions.java
package com.cameleer.server.app.alerting.eval;

import com.cameleer.server.core.alerting.*;
import java.time.Instant;
import java.util.*;

public final class AlertStateTransitions {

    private AlertStateTransitions() {}

    /** Returns the new/updated AlertInstance, or empty when nothing changes. */
    public static Optional<AlertInstance> apply(
            AlertInstance current, EvalResult result, AlertRule rule, Instant now) {

        return switch (result) {
            case EvalResult.Clear c -> onClear(current, now);
            case EvalResult.Firing f -> onFiring(current, f, rule, now);
            case EvalResult.Error e -> Optional.empty();
            case EvalResult.Batch b -> Optional.empty(); // batch handled by the job, not here
        };
    }

    private static Optional<AlertInstance> onFiring(AlertInstance current, EvalResult.Firing f,
                                                     AlertRule rule, Instant now) {
        if (current == null) {
            AlertState initial = rule.forDurationSeconds() > 0 ? AlertState.PENDING : AlertState.FIRING;
            return Optional.of(newInstance(rule, f, initial, now));
        }
        if (current.state() == AlertState.PENDING) {
            Instant firedAt = current.firedAt();
            if (firedAt.plusSeconds(rule.forDurationSeconds()).isBefore(now)) {
                return Optional.of(current /* copy with state=FIRING, firedAt=now */);
            }
            return Optional.of(current); // stay PENDING, no mutation
        }
        return Optional.empty();  // already FIRING/ACK — re-notification handled by dispatcher
    }

    private static Optional<AlertInstance> onClear(AlertInstance current, Instant now) {
        if (current == null) return Optional.empty();
        if (current.state() == AlertState.RESOLVED) return Optional.empty();
        return Optional.of(current /* copy with state=RESOLVED, resolvedAt=now */);
    }

    private static AlertInstance newInstance(AlertRule rule, EvalResult.Firing f, AlertState state, Instant now) {
        // ... construct from rule snapshot + context; title/message rendered by the job
        throw new UnsupportedOperationException("stub");
    }
}

Flesh out the .withState(...) / .withResolvedAt(...) helpers on AlertInstance (add wither-style methods returning new records) as part of this task.

// AlertingProperties.java
package com.cameleer.server.app.alerting.config;

import org.springframework.boot.context.properties.ConfigurationProperties;

@ConfigurationProperties("cameleer.server.alerting")
public record AlertingProperties(
        Integer evaluatorTickIntervalMs,
        Integer evaluatorBatchSize,
        Integer claimTtlSeconds,
        Integer notificationTickIntervalMs,
        Integer notificationBatchSize,
        Boolean inTickCacheEnabled,
        Integer circuitBreakerFailThreshold,
        Integer circuitBreakerWindowSeconds,
        Integer circuitBreakerCooldownSeconds,
        Integer eventRetentionDays,
        Integer notificationRetentionDays,
        Integer webhookTimeoutMs,
        Integer webhookMaxAttempts) {

    public int effectiveEvaluatorTickIntervalMs() {
        int raw = evaluatorTickIntervalMs == null ? 5000 : evaluatorTickIntervalMs;
        return Math.max(5000, raw);  // floor
    }
    public int effectiveEvaluatorBatchSize()        { return evaluatorBatchSize         == null ? 20   : evaluatorBatchSize; }
    public int effectiveClaimTtlSeconds()           { return claimTtlSeconds            == null ? 30   : claimTtlSeconds; }
    public int effectiveNotificationTickIntervalMs(){ return notificationTickIntervalMs == null ? 5000 : notificationTickIntervalMs; }
    public int effectiveNotificationBatchSize()     { return notificationBatchSize      == null ? 50   : notificationBatchSize; }
    public int effectiveEventRetentionDays()        { return eventRetentionDays         == null ? 90   : eventRetentionDays; }
    public int effectiveNotificationRetentionDays() { return notificationRetentionDays  == null ? 30   : notificationRetentionDays; }
    public int effectiveWebhookTimeoutMs()          { return webhookTimeoutMs           == null ? 5000 : webhookTimeoutMs; }
    public int effectiveWebhookMaxAttempts()        { return webhookMaxAttempts         == null ? 3    : webhookMaxAttempts; }
    public int cbFailThreshold()  { return circuitBreakerFailThreshold  == null ? 5  : circuitBreakerFailThreshold; }
    public int cbWindowSeconds()  { return circuitBreakerWindowSeconds  == null ? 30 : circuitBreakerWindowSeconds; }
    public int cbCooldownSeconds(){ return circuitBreakerCooldownSeconds== null ? 60 : circuitBreakerCooldownSeconds; }
}

Register via @ConfigurationPropertiesScan or explicit @EnableConfigurationProperties(AlertingProperties.class) in AlertingBeanConfig. Also clamp-with-WARN if evaluatorTickIntervalMs < 5000 at startup.

  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/config/AlertingProperties.java \
        cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertStateTransitions.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertStateTransitionsTest.java
git commit -m "feat(alerting): AlertingProperties + AlertStateTransitions state machine"

Task 27: AlertEvaluatorJob

Files:

  • Create: .../alerting/eval/AlertEvaluatorJob.java

  • Test: .../alerting/eval/AlertEvaluatorJobIT.java

  • Step 1: Write the failing integration test (uses real PG + mocked evaluators):

@Test
void claimDueRuleFireResolveCycle() throws Exception {
    // seed one rule scoped to a non-existent agent state -> evaluator returns Clear -> no instance.
    // flip the mock to return Firing -> one AlertInstance in FIRING state.
    // flip back to Clear -> instance transitions to RESOLVED.
}
  • Step 2: Run — FAIL.

  • Step 3: Implement

@Component
public class AlertEvaluatorJob implements SchedulingConfigurer {

    private static final Logger log = LoggerFactory.getLogger(AlertEvaluatorJob.class);

    private final AlertingProperties props;
    private final AlertRuleRepository ruleRepo;
    private final AlertInstanceRepository instanceRepo;
    private final AlertNotificationRepository notificationRepo;
    private final Map<ConditionKind, ConditionEvaluator<?>> evaluators;
    private final PerKindCircuitBreaker circuitBreaker;
    private final MustacheRenderer renderer;
    private final NotificationContextBuilder contextBuilder;
    private final String instanceId;
    private final String tenantId;
    private final AlertingMetrics metrics;
    private final Clock clock;

    public AlertEvaluatorJob(/* ...all above... */) { /* assign */ }

    @Override
    public void configureTasks(ScheduledTaskRegistrar registrar) {
        registrar.addFixedDelayTask(this::tick, props.effectiveEvaluatorTickIntervalMs());
    }

    void tick() {
        List<AlertRule> claimed = ruleRepo.claimDueRules(
            instanceId, props.effectiveEvaluatorBatchSize(), props.effectiveClaimTtlSeconds());

        TickCache cache = new TickCache();
        EvalContext ctx = new EvalContext(tenantId, Instant.now(clock), cache);

        for (AlertRule rule : claimed) {
            if (circuitBreaker.isOpen(rule.conditionKind())) {
                reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
                continue;
            }
            try {
                EvalResult result = evaluateSafely(rule, ctx);
                applyResult(rule, result);
                circuitBreaker.recordSuccess(rule.conditionKind());
            } catch (Exception e) {
                circuitBreaker.recordFailure(rule.conditionKind());
                metrics.evalError(rule.conditionKind(), rule.id());
                log.warn("Evaluator error for rule {} ({}): {}", rule.id(), rule.conditionKind(), e.toString());
            } finally {
                reschedule(rule, Instant.now(clock).plusSeconds(rule.evaluationIntervalSeconds()));
            }
        }
    }

    @SuppressWarnings({"rawtypes","unchecked"})
    private EvalResult evaluateSafely(AlertRule rule, EvalContext ctx) {
        ConditionEvaluator evaluator = evaluators.get(rule.conditionKind());
        if (evaluator == null) throw new IllegalStateException("No evaluator for " + rule.conditionKind());
        return evaluator.evaluate(rule.condition(), rule, ctx);
    }

    private void applyResult(AlertRule rule, EvalResult result) {
        if (result instanceof EvalResult.Batch b) {
            for (EvalResult.Firing f : b.firings()) applyFiring(rule, f);
            return;
        }
        AlertInstance current = instanceRepo.findOpenForRule(rule.id()).orElse(null);
        AlertStateTransitions.apply(current, result, rule, Instant.now(clock)).ifPresent(next -> {
            AlertInstance persisted = instanceRepo.save(
                enrichTitleMessage(rule, next, result));
            if (next.state() == AlertState.FIRING && current == null) {
                enqueueNotifications(rule, persisted);
            }
        });
    }

    private void applyFiring(AlertRule rule, EvalResult.Firing f) { /* always create new instance for PER_EXCHANGE mode */ }

    private AlertInstance enrichTitleMessage(AlertRule rule, AlertInstance instance, EvalResult result) {
        Map<String,Object> ctx = contextBuilder.build(rule, instance, /* env lookup */ null, /* uiOrigin */ null);
        String title = renderer.render(rule.notificationTitleTmpl(), ctx);
        String message = renderer.render(rule.notificationMessageTmpl(), ctx);
        return instance /* .withTitle(title).withMessage(message) */;
    }

    private void enqueueNotifications(AlertRule rule, AlertInstance instance) {
        for (WebhookBinding w : rule.webhooks()) {
            Map<String,Object> payload = /* context-builder + body override */ Map.of();
            notificationRepo.save(new AlertNotification(
                UUID.randomUUID(), instance.id(), w.id(), w.outboundConnectionId(),
                NotificationStatus.PENDING, 0, Instant.now(clock),
                null, null, null, null, payload, null, Instant.now(clock)));
        }
    }

    private void reschedule(AlertRule rule, Instant next) {
        ruleRepo.releaseClaim(rule.id(), next, rule.evalState());
    }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git add cameleer-server-app/src/main/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJob.java \
        cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/eval/AlertEvaluatorJobIT.java
git commit -m "feat(alerting): AlertEvaluatorJob with claim-polling + circuit breaker"

Phase 8 — Notification dispatch

Task 28: HmacSigner

Files:

  • Create: .../alerting/notify/HmacSigner.java

  • Test: .../alerting/notify/HmacSignerTest.java

  • Step 1: Write the failing test

@Test
void signsBodyWithSha256Hmac() {
    String sig = new HmacSigner().sign("secret", "payload".getBytes(StandardCharsets.UTF_8));
    // precomputed: HMAC-SHA256(secret, "payload") = 3c5c4f...
    assertThat(sig).startsWith("sha256=").isEqualTo("sha256=3c5c4f...");  // replace with real hex
}
  • Step 2: Run — FAIL.

  • Step 3: Implementjavax.crypto.Mac.getInstance("HmacSHA256"), HexFormat.of().formatHex(...).

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): HmacSigner for webhook signature"

Task 29: WebhookDispatcher

Files:

  • Create: .../alerting/notify/WebhookDispatcher.java

  • Test: .../alerting/notify/WebhookDispatcherIT.java (WireMock)

  • Step 1: Write the failing IT covering:

    • 2xx → returns DELIVERED with status + snippet.
    • 4xx → returns FAILED immediately.
    • 5xx → returns RETRY with exponential backoff.
    • Network timeout → RETRY.
    • HMAC header present when hmacSecret != null.
    • TLS trust-all config works against WireMock HTTPS.
  • Step 2: Run — FAIL.

  • Step 3: Implement

@Component
public class WebhookDispatcher {

    public record Outcome(NotificationStatus status, int httpStatus, String snippet, Duration retryAfter) {}

    private final OutboundHttpClientFactory clientFactory;
    private final SecretCipher cipher;
    private final HmacSigner signer;
    private final MustacheRenderer renderer;
    private final AlertingProperties props;
    private final ObjectMapper om;

    public WebhookDispatcher(/* ... */) { /* assign */ }

    public Outcome dispatch(AlertNotification notif, AlertRule rule, AlertInstance instance,
                             OutboundConnection conn, Map<String,Object> context) {
        String bodyTmpl = pickBodyTemplate(rule, notif.webhookId(), conn);
        String body = renderer.render(bodyTmpl, context);

        var ctx = new OutboundHttpRequestContext(
            conn.tlsTrustMode(), conn.tlsCaPemPaths(),
            Duration.ofMillis(2000), Duration.ofMillis(props.effectiveWebhookTimeoutMs()));
        var client = clientFactory.clientFor(ctx);

        var request = new HttpPost(renderer.render(conn.url(), context));
        request.setEntity(new StringEntity(body, StandardCharsets.UTF_8));
        request.setHeader("Content-Type", "application/json");

        for (var h : conn.defaultHeaders().entrySet()) {
            request.setHeader(h.getKey(), renderer.render(h.getValue(), context));
        }
        if (conn.hmacSecretCiphertext() != null) {
            String secret = cipher.decrypt(conn.hmacSecretCiphertext());
            request.setHeader("X-Cameleer-Signature", signer.sign(secret, body.getBytes(StandardCharsets.UTF_8)));
        }

        try (var response = client.execute(request)) {
            int code = response.getCode();
            String snippet = snippet(response);
            if (code >= 200 && code < 300) return new Outcome(NotificationStatus.DELIVERED, code, snippet, null);
            if (code >= 400 && code < 500) return new Outcome(NotificationStatus.FAILED, code, snippet, null);
            return retryOutcome(code, snippet);
        } catch (IOException e) {
            return retryOutcome(-1, e.getMessage());
        }
    }

    private Outcome retryOutcome(int code, String snippet) {
        // Backoff: 30s, 120s, 300s
        Duration next = Duration.ofSeconds(30);   // caller multiplies by attempt
        return new Outcome(null /* caller decides PENDING vs FAILED */, code, snippet, next);
    }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): WebhookDispatcher with HMAC + TLS + retry classification"

Task 30: NotificationDispatchJob

Files:

  • Create: .../alerting/notify/NotificationDispatchJob.java

  • Test: .../alerting/notify/NotificationDispatchJobIT.java

  • Step 1: Write the failing IT — seed a PENDING AlertNotification; run one tick; WireMock returns 200; assert row transitions to DELIVERED. Seed another against 503 → assert attempts=1, next_attempt_at bumped, still PENDING.

  • Step 2: Run — FAIL.

  • Step 3: Implement — claim-polling loop:

void tick() {
    var claimed = notificationRepo.claimDueNotifications(instanceId, batchSize, claimTtl);
    for (var n : claimed) {
        var conn = outboundRepo.findById(tenantId, n.outboundConnectionId()).orElse(null);
        if (conn == null) { notificationRepo.markFailed(n.id(), 0, "outbound connection deleted"); continue; }

        var instance = instanceRepo.findById(n.alertInstanceId()).orElseThrow();
        var rule = ruleRepo.findById(instance.ruleId()).orElse(null);
        var context = contextBuilder.build(rule, instance, env, uiOrigin);

        // silence check
        if (silenceRepo.listActive(instance.environmentId(), Instant.now()).stream()
                .anyMatch(s -> silenceMatcher.matches(s.matcher(), instance, rule))) {
            instanceRepo.markSilenced(instance.id(), true);
            notificationRepo.markFailed(n.id(), 0, "silenced");
            continue;
        }

        var outcome = dispatcher.dispatch(n, rule, instance, conn, context);
        if (outcome.status() == NotificationStatus.DELIVERED) {
            notificationRepo.markDelivered(n.id(), outcome.httpStatus(), outcome.snippet(), Instant.now());
        } else if (outcome.status() == NotificationStatus.FAILED) {
            notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
        } else {
            int attempts = n.attempts() + 1;
            if (attempts >= props.effectiveWebhookMaxAttempts()) {
                notificationRepo.markFailed(n.id(), outcome.httpStatus(), outcome.snippet());
            } else {
                Instant next = Instant.now().plus(outcome.retryAfter().multipliedBy(attempts));
                notificationRepo.scheduleRetry(n.id(), next, outcome.httpStatus(), outcome.snippet());
            }
        }
    }
}
  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): NotificationDispatchJob outbox loop with silence + retry"

Task 31: InAppInboxQuery + server-side 5s memoization

Files:

  • Create: .../alerting/notify/InAppInboxQuery.java

  • Test: .../alerting/notify/InAppInboxQueryTest.java

  • Step 1: Write the failing test covering the path (resolves groups/roles from RbacService.getEffectiveRolesForUser + listGroupsForUser, delegates to AlertInstanceRepository.listForInbox/countUnreadForUser, second call within 5s returns cached count).

  • Step 2: Run — FAIL.

  • Step 3: Implement — Caffeine-style ConcurrentHashMap<Key, Entry> with Entry(count, expiresAt), 5 s TTL per (envId, userId).

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): InAppInboxQuery with 5s unread-count memoization"

Phase 9 — REST controllers

Task 32: AlertRuleController + DTOs

Files:

  • Create: .../alerting/controller/AlertRuleController.java

  • Create: DTOs in .../alerting/dto/

  • Test: .../alerting/controller/AlertRuleControllerIT.java

  • Step 1: Write the failing IT — seed an env, authenticate as OPERATOR, POST a rule, GET list, PUT update, DELETE. Assert webhook references to unknown connections return 422. Assert VIEWER cannot POST but can GET. Assert audit log entry on each mutation.

  • Step 2: Run — FAIL.

  • Step 3: Implement. Endpoints (all under /api/v1/environments/{envSlug}/alerts/rules, env resolved via @EnvPath Environment env):

Method Path RBAC
GET `` VIEWER+
POST `` OPERATOR+
GET {id} VIEWER+
PUT {id} OPERATOR+
DELETE {id} OPERATOR+
POST {id}/enable / {id}/disable OPERATOR+
POST {id}/render-preview OPERATOR+
POST {id}/test-evaluate OPERATOR+

Key DTOs: AlertRuleRequest (with @Valid AlertConditionDto), AlertRuleResponse, RenderPreviewRequest/Response, TestEvaluateRequest/Response.

On save, validate:

  • Each WebhookBindingRequest.outboundConnectionId exists in outbound_connections (via OutboundConnectionService.get(id) → 422 if 404).
  • Connection is allowed in this env (via conn.isAllowedInEnvironment(env.id()) → 422 otherwise).
  • SSRF check on connection URL deferred to the outbound-connection save path (Plan 01 territory).

Audit via auditService.log("ALERT_RULE_CREATE", ALERT_RULE_CHANGE, rule.id().toString(), Map.of("name", rule.name()), SUCCESS, request).

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): AlertRuleController REST + audit + DTOs"

Task 33: AlertController

Files:

  • Create: .../alerting/controller/AlertController.java, AlertDto.java, UnreadCountResponse.java

  • Test: .../alerting/controller/AlertControllerIT.java

  • Step 1: Write the failing IT for GET /alerts, GET /alerts/unread-count, POST /alerts/{id}/ack, POST /alerts/{id}/read, POST /alerts/bulk-read. Assert env isolation (env-A alert not visible from env-B).

  • Step 2: Run — FAIL.

  • Step 3: Implement — delegate to InAppInboxQuery and AlertInstanceRepository. On ack, enforce targeted-or-OPERATOR rule.

  • Step 4: Run — PASS.

  • Step 5: Commit

git commit -m "feat(alerting): AlertController for inbox + ack + read"

Task 34: AlertSilenceController

Files:

  • Create: .../alerting/controller/AlertSilenceController.java, AlertSilenceDto.java

  • Test: .../alerting/controller/AlertSilenceControllerIT.java

  • Step 15: Follow the same pattern. Mutations OPERATOR+, audit ALERT_SILENCE_CHANGE. Validate endsAt > startsAt at controller layer (DB constraint catches it anyway; user-facing 422 is friendlier).

Task 35: AlertNotificationController

Files:

  • Create: .../alerting/controller/AlertNotificationController.java

  • Test: .../alerting/controller/AlertNotificationControllerIT.java

  • Step 15:

    • GET /alerts/{id}/notifications → VIEWER+; returns per-instance outbox rows.
    • POST /alerts/notifications/{id}/retry → OPERATOR+; resets next_attempt_at = now, attempts = 0, status = PENDING. Flat path because notification IDs are globally unique (document this in the flat-allow-list rule file).
  • Step 6: Update SecurityConfig to permit the new paths

In cameleer-server-app/src/main/java/com/cameleer/server/app/security/SecurityConfig.java:

.requestMatchers(HttpMethod.GET,    "/api/v1/environments/*/alerts/**").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST,   "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT,    "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/rules/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST,   "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.PUT,    "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.DELETE, "/api/v1/environments/*/alerts/silences/**").hasAnyRole("OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST,   "/api/v1/environments/*/alerts/*/ack").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST,   "/api/v1/environments/*/alerts/*/read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST,   "/api/v1/environments/*/alerts/bulk-read").hasAnyRole("VIEWER","OPERATOR","ADMIN")
.requestMatchers(HttpMethod.POST,   "/api/v1/alerts/notifications/*/retry").hasAnyRole("OPERATOR","ADMIN")

(Class-level @PreAuthorize on each controller is authoritative; the path matchers are defence-in-depth.)

  • Step 7: Commit
git commit -m "feat(alerting): AlertNotificationController + SecurityConfig paths"

Task 36: Regenerate OpenAPI schema

  • Step 1: Start backend on :8081 (from the alerting-02 worktree).
  • Step 2: cd ui && npm run generate-api:live
  • Step 3: Commit ui/src/api/schema.d.ts + ui/src/api/openapi.json regen.
git add ui/src/api/schema.d.ts ui/src/api/openapi.json
git commit -m "chore(alerting): regenerate openapi schema for alerting endpoints"

Phase 10 — Retention, metrics, rules, verification

Task 37: AlertingRetentionJob

Files:

  • Create: .../alerting/retention/AlertingRetentionJob.java

  • Test: .../alerting/retention/AlertingRetentionJobIT.java

  • Step 1: Write the failing IT — seed 2 resolved instances (one older than retention, one fresher) + 2 settled notifications; run cleanup(); assert only old rows are deleted.

  • Step 2: Run — FAIL.

  • Step 3: Implement@Scheduled(cron = "0 0 3 * * *"), cutoffs from AlertingProperties, advisory-lock-of-the-day pattern (see JarRetentionJob.java).

  • Step 45: Run, commit

git commit -m "feat(alerting): AlertingRetentionJob daily cleanup"

Task 38: AlertingMetrics

Files:

  • Create: .../alerting/metrics/AlertingMetrics.java

  • Step 1: Register metrics via MeterRegistry:

@Component
public class AlertingMetrics {
    private final MeterRegistry registry;
    public AlertingMetrics(MeterRegistry registry) { this.registry = registry; }

    public void evalError(ConditionKind kind, UUID ruleId) {
        registry.counter("alerting_eval_errors_total",
            "kind", kind.name(), "rule_id", ruleId.toString()).increment();
    }
    public void circuitOpened(ConditionKind kind) {
        registry.counter("alerting_circuit_open_total", "kind", kind.name()).increment();
    }
    public Timer evalDuration(ConditionKind kind) {
        return registry.timer("alerting_eval_duration_seconds", "kind", kind.name());
    }
    // + gauges via MeterBinder that query repositories
}
  • Step 2: Wire into AlertEvaluatorJob and PerKindCircuitBreaker.

  • Step 3: Commit

git commit -m "feat(alerting): observability metrics via micrometer"

Task 39: Update .claude/rules/app-classes.md + core-classes.md

  • Step 1: Document the new alerting/ packages in both rule files. Add a new subsection under controller/ for the alerting env-scoped controllers. Document the new flat endpoint /api/v1/alerts/notifications/{id}/retry in the flat-allow-list with justification "notification IDs are globally unique; matches the /api/v1/executions/{id} precedent".

  • Step 2: Commit

git add .claude/rules/app-classes.md .claude/rules/core-classes.md
git commit -m "docs(rules): document alerting/ packages + notification retry flat endpoint"

Task 40: application.yml defaults + admin guide

Files:

  • Modify: cameleer-server-app/src/main/resources/application.yml

  • Create: docs/alerting.md

  • Step 1: Add default stanza

cameleer:
  server:
    alerting:
      evaluator-tick-interval-ms:       5000
      evaluator-batch-size:             20
      claim-ttl-seconds:                30
      notification-tick-interval-ms:    5000
      notification-batch-size:          50
      in-tick-cache-enabled:            true
      circuit-breaker-fail-threshold:   5
      circuit-breaker-window-seconds:   30
      circuit-breaker-cooldown-seconds: 60
      event-retention-days:             90
      notification-retention-days:      30
      webhook-timeout-ms:               5000
      webhook-max-attempts:             3
  • Step 2: Write docs/alerting.md — 1-2 page admin guide covering: rule shapes per condition kind (with example JSON), template variables per kind, webhook destinations (Slack/PagerDuty/Teams examples), silence patterns, troubleshooting (circuit breaker, retention).

  • Step 3: Commit

git add cameleer-server-app/src/main/resources/application.yml docs/alerting.md
git commit -m "docs(alerting): default config + admin guide"

Task 41: Full-lifecycle integration test

Files:

  • Create: cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java

  • Step 1: Write the full-lifecycle IT

Steps in the single test method:

  1. Seed env, user with OPERATOR role, outbound connection (WireMock backing) with HMAC secret.
  2. POST a LOG_PATTERN rule pointing at WireMock via the outbound connection, forDurationSeconds=0, threshold=1.
  3. Inject a log row into ClickHouse that matches the pattern.
  4. Trigger AlertEvaluatorJob.tick() directly.
  5. Assert one alert_instances row in FIRING.
  6. Trigger NotificationDispatchJob.tick().
  7. Assert WireMock received one POST with X-Cameleer-Signature header + rendered body.
  8. POST /alerts/{id}/ack → state ACKNOWLEDGED.
  9. Create a silence matching this rule; fire another tick; assert silenced=true on new instance and WireMock received no second request.
  10. Remove the matching log rows, run tick → instance RESOLVED.
  11. DELETE the rule → assert alert_instances.rule_id = NULL but rule_snapshot still retains rule name.
  • Step 2: Run — PASS (may need a few iterations of debugging).

  • Step 3: Commit

git add cameleer-server-app/src/test/java/com/cameleer/server/app/alerting/AlertingFullLifecycleIT.java
git commit -m "test(alerting): full lifecycle — fire, notify, silence, ack, resolve, delete"

Task 42: Env-isolation + outbound-guard regression tests

Files:

  • Create: .../alerting/AlertingEnvIsolationIT.java, OutboundConnectionAllowedEnvIT.java

  • Step 1: Env isolation — rule in env-A, fire, assert invisible from env-B inbox.

  • Step 2: Outbound guard — rule references a connection restricted to env-A; POST rule creation in env-B → 422. Narrowing allowed_environment_ids on the connection while a rule still references it → 409 (this exercises the freshly-wired rulesReferencing).

  • Step 3: Run — PASS.

  • Step 4: Commit

git commit -m "test(alerting): env isolation + outbound allowed-env guard"

Task 43: Final verification + GitNexus reindex

  • Step 1: Full build
mvn clean verify

Expected: All tests pass. Known pre-existing test debt (wrong-JdbcTemplate + shared-context state leaks) may still fail — document any failures that existed before Plan 02 in a commit message "known-pre-existing" note.

  • Step 2: GitNexus reindex
npx gitnexus analyze --embeddings
  • Step 3: Manual smoke

Start backend + UI (Plan 01 UI is sufficient for outbound connections). Walk through:

  • Create an outbound connection to https://httpbin.org/post.

  • curl the alerting REST API to POST a LOG_PATTERN rule.

  • Inject a matching log via POST /api/v1/data/logs.

  • Wait 2 eval ticks + 1 notification tick.

  • Confirm: alert_instances row in FIRING, alert_notifications row DELIVERED with HTTP 200, httpbin shows the body.

  • curl POST /alerts/{id}/ack → state ACKNOWLEDGED.

  • Step 4: Nothing to commit if all passes — plan complete


Known-incomplete items carried into Plan 03

  • UI: NotificationBell, /alerts/** pages, <MustacheEditor /> with variable auto-complete, CMD-K alert/rule sources. Open design question: completion engine choice (CodeMirror 6 vs Monaco vs textarea overlay) still open — see spec §20 #7.
  • Rule promotion across envs. Pure UI flow (no new server endpoint); lives with the rule editor in Plan 03.
  • OIDC retrofit to use OutboundHttpClientFactory. Unchanged from Plan 01 — a separate small follow-up.
  • TLS summary enrichment on /test endpoint (Plan 01 stubbed as "TLS"). Can extract actual protocol + cipher suite + peer cert from Apache HttpClient 5's routed context.
  • Performance tests. 500-rule, 5-replica PerformanceIT deferred; claim-polling concurrency is covered by Task 7's unit-level test.
  • Bulk promotion and mustache completion variables metadata endpoint (GET /alerts/rules/template-variables) — deferred until usage patterns justify.
  • Rule deletion test debt. Existing pre-Plan-02 test debt (wrong-JdbcTemplate bug in ~9 controller ITs + shared-context state leaks in FlywayMigrationIT / ConfigEnvIsolationIT / ClickHouseStatsStoreIT) is orthogonal and should be addressed in a dedicated test-hygiene pass.

Self-review

Spec coverage (against docs/superpowers/specs/2026-04-19-alerting-design.md):

Spec § Scope Covered by
§2 Signal sources (6) All 6 condition kinds Tasks 4, 2025
§2 Delivery channels In-app + webhook Tasks 29, 30, 31
§2 Lifecycle (FIRING/ACK/RESOLVED + SILENCED) State machine + silence Tasks 26, 18, 30, 33
§2 Rule promotion Deferred to Plan 03 (UI)
§2 CMD-K Deferred to Plan 03
§2 Configurable cadence, 5 s floor AlertingProperties.effective* Task 26
§3 Key decisions All 14 decisions honoured
§4 Module layout core/alerting + app/alerting/** Tasks 311, 1538
§4 Touchpoints countLogs + countExecutionsForAlerting + AuditCategory + SecurityConfig Tasks 2, 12, 13, 35
§5 Data model V12 migration Task 1
§5 Claim-polling queries FOR UPDATE SKIP LOCKED in rule + notification repos Tasks 7, 10
§6 Outbound connections wiring rulesReferencing gate Task 8 (CRITICAL)
§7 Evaluator cadence, state machine, 4 projections, query coalescing, circuit breaker Tick cache + projections + CB + SchedulingConfigurer Tasks 14, 19, 26, 27
§8 Notification dispatch, HMAC, template render, in-app inbox, 5s memoization Tasks 28, 29, 30, 31
§9 Rule promotion Deferred (UI)
§10 Cross-cutting HTTP Reused from Plan 01
§11 API surface All routes implemented except rule promotion Tasks 3236
§12 CMD-K Deferred to Plan 03
§13 UI Deferred to Plan 03
§14 Configuration AlertingProperties + application.yml Tasks 26, 40
§15 Retention Daily job Task 37
§16 Observability (metrics + audit) Tasks 2, 38
§17 Security (tenant/env, RBAC, SSRF, HMAC, TLS, audit) Tasks 3236, 28, Plan 01
§18 Testing Unit + IT + WireMock + full-lifecycle Tasks 17, 19, 2731, 41, 42
§19 Rollout Dormant-by-default; matching application.yml + docs Task 40
§20 #1 OIDC alignment Deferred (follow-up)
§20 #2 secret encryption Reused Plan 01 SecretCipher Task 29
§20 #3 CH migration naming alerting_projections.sql Task 14
§20 #6 env-delete cascade audit PG IT Task 1
§20 #7 Mustache completion engine Deferred (UI)

Placeholders: A handful of steps reference real record fields / method names with /* … */ markers where the exact name depends on what the existing codebase exposes (ExecutionStats metric accessors, AgentInfo.lastHeartbeat method name, wither-method signatures on AlertInstance). Each is accompanied by a gitnexus_context({name: ...}) hint for the implementer. These are not TBDs — they are direct instructions to resolve against the code at implementation time.

Type consistency check: AlertRule, AlertInstance, AlertNotification, AlertSilence field names in the Java records match the SQL column names (snake_case in SQL, camelCase in Java). WebhookBinding.id is used as alert_notifications.webhook_id — stable opaque reference. OutboundConnection.createdBy/updatedBy types match users.user_id TEXT (Plan 01 precedent). rulesReferencing signature matches Plan 01's stub List<UUID> rulesReferencing(UUID).

Risks flagged to executor:

  1. Task 16 MustacheRenderer missing-variable fallback is non-trivial in JMustache's default compiler config — implementer may need a second iteration. Tests lock the contract; the implementation approach is flexible.
  2. Task 12/13 — the SQL dialect for attribute map access on the executions table (attributes[?]) depends on the actual column type in init.sql. If attributes is Map(String,String), the syntax works; if it's stored as JSON string, switch to JSONExtractString(attributes, ?) = ?.
  3. Task 27 enrichTitleMessage depends on AlertInstance having wither methods — these are added opportunistically during Task 26 when AlertStateTransitions needs them. Don't forget to expose them.
  4. Claim-polling semantics under schema-per-tenant — the ?currentSchema=tenant_{id} JDBC URL routes writes correctly, but the FOR UPDATE SKIP LOCKED behaviour is per-schema so cross-tenant locks are irrelevant (correct behaviour). Make sure IT tests run with cameleer.server.tenant.id=default.
  5. Task 41 full-lifecycle test is the canary. If it fails after each task, pair-program with the failing assertion — the bug is almost always in state transitions or renderer context shape.