feat(alerting): AGENT_LIFECYCLE condition kind with per-subject fire mode
Allows alert rules to fire on agent-lifecycle events — REGISTERED, RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather than only on current state. Each matching `(agent, eventType, timestamp)` becomes its own ackable AlertInstance, so outages on distinct agents are independently routable. Core: - New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record (scope, eventTypes, withinSeconds). Compact ctor rejects empty eventTypes and withinSeconds<1. - Strict allowlist enum `AgentLifecycleEventType` (six entries matching the server-emitted types in `AgentRegistrationController` and `AgentLifecycleMonitor`). Custom agent-emitted event types tracked in backlog issue #145. - `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes, from, to, limit)` — new read path ordered `(timestamp ASC, insert_id ASC)` used by the evaluator. Implemented on `ClickHouseAgentEventRepository` with tenant + env filter mandatory. App: - `AgentLifecycleEvaluator` queries events in the last `withinSeconds` window and returns `EvalResult.Batch` with one `Firing` per row. Every Firing carries a canonical `_subjectFingerprint` of `"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event` subtrees for Mustache templating. - `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}`. - Validation is delegated to the record compact ctor + enum at Jackson deserialization time — matches the existing policy of keeping controller validators focused on env-scoped / SQL-injection concerns. Schema: - V16 migration generalises the V15 per-exchange discriminator on `alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a fallback to the legacy `exchange.id` expression. Scalar kinds still resolve to `''` and keep one-open-per-rule. Duplicate-key path in `PostgresAlertInstanceRepository.save` is unchanged — the index is the deduper. UI: - New `AgentLifecycleForm.tsx` wizard form with multi-select chips for the six allowed event types + `withinSeconds` input. Wired into `ConditionStep`, `form-state` (validation + defaults: WENT_DEAD, 300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the new option array. - `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind, and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`. Tests (all passing): - 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive + empty/zero/unknown-type rejection). - 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty window, multi-agent fingerprint shape, scope forwarding, missing env). - `NotificationContextBuilderTest` switch now covers the new kind. - 119 alerting unit tests + 71 UI tests green. Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
This commit is contained in:
@@ -65,8 +65,8 @@ Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `Ale
|
||||
- `AgentEventsController` — GET `/api/v1/environments/{envSlug}/agents/events` (lifecycle events; cursor-paginated, returns `{ data, nextCursor, hasMore }`; order `(timestamp DESC, insert_id DESC)`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — `insert_id` is a stable UUID column used as a same-millisecond tiebreak).
|
||||
- `AgentMetricsController` — GET `/api/v1/environments/{envSlug}/agents/{agentId}/metrics` (JVM/Camel metrics). Rejects cross-env agents (404) as defence-in-depth.
|
||||
- `DiagramRenderController` — GET `/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram` (env-scoped lookup). Also GET `/api/v1/diagrams/{contentHash}/render` (flat — content hashes are globally unique).
|
||||
- `AlertRuleController` — `/api/v1/environments/{envSlug}/alerts/rules`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/enable` / POST `{id}/disable` / POST `{id}/render-preview` / POST `{id}/test-evaluate`. OPERATOR+ for mutations, VIEWER+ for reads. CRITICAL: attribute keys in `ExchangeMatchCondition.filter.attributes` are validated at rule-save time against `^[a-zA-Z0-9._-]+$` — they are later inlined into ClickHouse SQL. Webhook validation: verifies `outboundConnectionId` exists and `isAllowedInEnvironment`. Null notification templates default to `""` (NOT NULL constraint). Audit: `ALERT_RULE_CHANGE`.
|
||||
- `AlertController` — `/api/v1/environments/{envSlug}/alerts`. GET list (inbox filtered by userId/groupIds/roleNames via `InAppInboxQuery`) / GET `/unread-count` / GET `{id}` / POST `{id}/ack` / POST `{id}/read` / POST `/bulk-read`. VIEWER+ for all. Inbox SQL: `? = ANY(target_user_ids) OR target_group_ids && ? OR target_role_names && ?` — requires at least one matching target (no broadcast concept).
|
||||
- `AlertRuleController` — `/api/v1/environments/{envSlug}/alerts/rules`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/enable` / POST `{id}/disable` / POST `{id}/render-preview` / POST `{id}/test-evaluate`. OPERATOR+ for mutations, VIEWER+ for reads. CRITICAL: attribute keys in `ExchangeMatchCondition.filter.attributes` are validated at rule-save time against `^[a-zA-Z0-9._-]+$` — they are later inlined into ClickHouse SQL. `AgentLifecycleCondition` is allowlist-only — the `AgentLifecycleEventType` enum (REGISTERED / RE_REGISTERED / DEREGISTERED / WENT_STALE / WENT_DEAD / RECOVERED) plus the record compact ctor (non-empty `eventTypes`, `withinSeconds ≥ 1`) do the validation; custom agent-emitted event types are tracked in backlog issue #145. Webhook validation: verifies `outboundConnectionId` exists and `isAllowedInEnvironment`. Null notification templates default to `""` (NOT NULL constraint). Audit: `ALERT_RULE_CHANGE`.
|
||||
- `AlertController` — `/api/v1/environments/{envSlug}/alerts`. GET list (inbox filtered by userId/groupIds/roleNames via `InAppInboxQuery`; optional multi-value `state` + `severity` query params push filtering into PostgreSQL via `listForInbox` with `state::text = ANY(?)` / `severity::text = ANY(?)`) / GET `/unread-count` / GET `{id}` / POST `{id}/ack` / POST `{id}/read` / POST `/bulk-read`. VIEWER+ for all. Inbox SQL: `? = ANY(target_user_ids) OR target_group_ids && ? OR target_role_names && ?` — requires at least one matching target (no broadcast concept).
|
||||
- `AlertSilenceController` — `/api/v1/environments/{envSlug}/alerts/silences`. GET list / POST create / DELETE `{id}`. 422 if `endsAt <= startsAt`. OPERATOR+ for mutations, VIEWER+ for list. Audit: `ALERT_SILENCE_CHANGE`.
|
||||
- `AlertNotificationController` — Dual-path (no class-level prefix). GET `/api/v1/environments/{envSlug}/alerts/{alertId}/notifications` (VIEWER+); POST `/api/v1/alerts/notifications/{id}/retry` (OPERATOR+, flat — notification IDs globally unique). Retry resets attempts to 0 and sets `nextAttemptAt = now`.
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ paths:
|
||||
- `CommandType` — enum for command types (config-update, deep-trace, replay, route-control, etc.)
|
||||
- `CommandStatus` — enum for command acknowledgement states
|
||||
- `CommandReply` — record: command execution result from agent
|
||||
- `AgentEventRecord`, `AgentEventRepository` — event persistence. `AgentEventRepository.queryPage(...)` is cursor-paginated (`AgentEventPage{data, nextCursor, hasMore}`); the legacy non-paginated `query(...)` path is gone.
|
||||
- `AgentEventRecord`, `AgentEventRepository` — event persistence. `AgentEventRepository.queryPage(...)` is cursor-paginated (`AgentEventPage{data, nextCursor, hasMore}`); the legacy non-paginated `query(...)` path is gone. `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes, from, to, limit)` returns matching events ordered by `(timestamp ASC, insert_id ASC)` — consumed by `AgentLifecycleEvaluator`.
|
||||
- `AgentEventPage` — record: `(List<AgentEventRecord> data, String nextCursor, boolean hasMore)` returned by `AgentEventRepository.queryPage`
|
||||
- `AgentEventListener` — callback interface for agent events
|
||||
- `RouteStateRegistry` — tracks per-agent route states
|
||||
|
||||
@@ -43,7 +43,7 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
|
||||
- `AllAlertsPage.tsx` — env-wide list with state-chip filter.
|
||||
- `HistoryPage.tsx` — RESOLVED alerts.
|
||||
- `RulesListPage.tsx` — CRUD + enable/disable toggle + env-promotion dropdown (pure UI prefill, no new endpoint).
|
||||
- `RuleEditor/RuleEditorWizard.tsx` — 5-step wizard (Scope / Condition / Trigger / Notify / Review). `form-state.ts` is the single source of truth (`initialForm` / `toRequest` / `validateStep`). Six condition-form subcomponents under `RuleEditor/condition-forms/`.
|
||||
- `RuleEditor/RuleEditorWizard.tsx` — 5-step wizard (Scope / Condition / Trigger / Notify / Review). `form-state.ts` is the single source of truth (`initialForm` / `toRequest` / `validateStep`). Seven condition-form subcomponents under `RuleEditor/condition-forms/` — including `AgentLifecycleForm.tsx` (multi-select event-type chips for the six-entry `AgentLifecycleEventType` allowlist + lookback-window input).
|
||||
- `SilencesPage.tsx` — matcher-based create + end-early.
|
||||
- `AlertRow.tsx` shared list row; `alerts-page.module.css` shared styling.
|
||||
- **Components**:
|
||||
|
||||
@@ -71,6 +71,8 @@ PostgreSQL (Flyway): `cameleer-server-app/src/main/resources/db/migration/`
|
||||
- V12 — Alerting tables (alert_rules, alert_rule_targets, alert_instances, alert_notifications, alert_reads, alert_silences)
|
||||
- V13 — alert_instances open-rule unique index (alert_instances_open_rule_uq partial index on rule_id WHERE state IN PENDING/FIRING/ACKNOWLEDGED)
|
||||
- V14 — Repair EXCHANGE_MATCH alert_rules persisted with fireMode=null (sets fireMode=PER_EXCHANGE + perExchangeLingerSeconds=300); paired with stricter `ExchangeMatchCondition` ctor that now rejects null fireMode.
|
||||
- V15 — Discriminate open-instance uniqueness by `context->'exchange'->>'id'` so EXCHANGE_MATCH/PER_EXCHANGE emits one alert_instance per matching exchange; scalar kinds resolve to `''` and keep one-open-per-rule.
|
||||
- V16 — Generalise the V15 discriminator to prefer `context->>'_subjectFingerprint'` (falls back to the V15 `exchange.id` expression for legacy rows). Enables AGENT_LIFECYCLE to emit one alert_instance per `(agent, eventType, timestamp)` via a canonical fingerprint in the evaluator firing's context.
|
||||
|
||||
ClickHouse: `cameleer-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup)
|
||||
|
||||
@@ -98,7 +100,7 @@ When adding, removing, or renaming classes, controllers, endpoints, UI component
|
||||
<!-- gitnexus:start -->
|
||||
# GitNexus — Code Intelligence
|
||||
|
||||
This project is indexed by GitNexus as **cameleer-server** (8527 symbols, 22174 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
|
||||
This project is indexed by GitNexus as **cameleer-server** (8603 symbols, 22281 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
|
||||
|
||||
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
|
||||
|
||||
|
||||
@@ -0,0 +1,95 @@
|
||||
package com.cameleer.server.app.alerting.eval;
|
||||
|
||||
import com.cameleer.server.core.agent.AgentEventRecord;
|
||||
import com.cameleer.server.core.agent.AgentEventRepository;
|
||||
import com.cameleer.server.core.alerting.AgentLifecycleCondition;
|
||||
import com.cameleer.server.core.alerting.AgentLifecycleEventType;
|
||||
import com.cameleer.server.core.alerting.AlertRule;
|
||||
import com.cameleer.server.core.alerting.AlertScope;
|
||||
import com.cameleer.server.core.alerting.ConditionKind;
|
||||
import com.cameleer.server.core.runtime.EnvironmentRepository;
|
||||
import org.springframework.stereotype.Component;
|
||||
|
||||
import java.time.Instant;
|
||||
import java.util.ArrayList;
|
||||
import java.util.LinkedHashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
|
||||
/**
|
||||
* Evaluator for {@link AgentLifecycleCondition}.
|
||||
* <p>
|
||||
* Each matching row in {@code agent_events} produces its own {@link EvalResult.Firing}
|
||||
* in an {@link EvalResult.Batch}, so every {@code (agent, eventType, timestamp)}
|
||||
* tuple gets its own {@code AlertInstance} — operationally distinct outages /
|
||||
* restarts / shutdowns are independently ackable. Deduplication across ticks
|
||||
* is enforced by {@code alert_instances_open_rule_uq} via the canonical
|
||||
* {@code _subjectFingerprint} key in the instance context (see V16 migration).
|
||||
*/
|
||||
@Component
|
||||
public class AgentLifecycleEvaluator implements ConditionEvaluator<AgentLifecycleCondition> {
|
||||
|
||||
/** Hard cap on rows returned per tick — prevents a flood of stale events from overwhelming the job. */
|
||||
private static final int MAX_EVENTS_PER_TICK = 500;
|
||||
|
||||
private final AgentEventRepository eventRepo;
|
||||
private final EnvironmentRepository envRepo;
|
||||
|
||||
public AgentLifecycleEvaluator(AgentEventRepository eventRepo, EnvironmentRepository envRepo) {
|
||||
this.eventRepo = eventRepo;
|
||||
this.envRepo = envRepo;
|
||||
}
|
||||
|
||||
@Override
|
||||
public ConditionKind kind() { return ConditionKind.AGENT_LIFECYCLE; }
|
||||
|
||||
@Override
|
||||
public EvalResult evaluate(AgentLifecycleCondition c, AlertRule rule, EvalContext ctx) {
|
||||
String envSlug = envRepo.findById(rule.environmentId())
|
||||
.map(e -> e.slug())
|
||||
.orElse(null);
|
||||
if (envSlug == null) return EvalResult.Clear.INSTANCE;
|
||||
|
||||
AlertScope scope = c.scope();
|
||||
String appSlug = scope != null ? scope.appSlug() : null;
|
||||
String agentId = scope != null ? scope.agentId() : null;
|
||||
|
||||
List<String> typeNames = c.eventTypes().stream()
|
||||
.map(AgentLifecycleEventType::name)
|
||||
.toList();
|
||||
|
||||
Instant from = ctx.now().minusSeconds(c.withinSeconds());
|
||||
Instant to = ctx.now();
|
||||
|
||||
List<AgentEventRecord> matches = eventRepo.findInWindow(
|
||||
envSlug, appSlug, agentId, typeNames, from, to, MAX_EVENTS_PER_TICK);
|
||||
|
||||
if (matches.isEmpty()) return new EvalResult.Batch(List.of());
|
||||
|
||||
List<EvalResult.Firing> firings = new ArrayList<>(matches.size());
|
||||
for (AgentEventRecord ev : matches) {
|
||||
firings.add(toFiring(ev));
|
||||
}
|
||||
return new EvalResult.Batch(firings);
|
||||
}
|
||||
|
||||
private static EvalResult.Firing toFiring(AgentEventRecord ev) {
|
||||
String fingerprint = (ev.instanceId() == null ? "" : ev.instanceId())
|
||||
+ ":" + (ev.eventType() == null ? "" : ev.eventType())
|
||||
+ ":" + (ev.timestamp() == null ? "0" : Long.toString(ev.timestamp().toEpochMilli()));
|
||||
|
||||
Map<String, Object> context = new LinkedHashMap<>();
|
||||
context.put("agent", Map.of(
|
||||
"id", ev.instanceId() == null ? "" : ev.instanceId(),
|
||||
"app", ev.applicationId() == null ? "" : ev.applicationId()
|
||||
));
|
||||
context.put("event", Map.of(
|
||||
"type", ev.eventType() == null ? "" : ev.eventType(),
|
||||
"timestamp", ev.timestamp() == null ? "" : ev.timestamp().toString(),
|
||||
"detail", ev.detail() == null ? "" : ev.detail()
|
||||
));
|
||||
context.put("_subjectFingerprint", fingerprint);
|
||||
|
||||
return new EvalResult.Firing(1.0, null, context);
|
||||
}
|
||||
}
|
||||
@@ -64,6 +64,10 @@ public class NotificationContextBuilder {
|
||||
ctx.put("agent", subtree(instance, "agent.id", "agent.name", "agent.state"));
|
||||
ctx.put("app", subtree(instance, "app.slug", "app.id"));
|
||||
}
|
||||
case AGENT_LIFECYCLE -> {
|
||||
ctx.put("agent", subtree(instance, "agent.id", "agent.app"));
|
||||
ctx.put("event", subtree(instance, "event.type", "event.timestamp", "event.detail"));
|
||||
}
|
||||
case DEPLOYMENT_STATE -> {
|
||||
ctx.put("deployment", subtree(instance, "deployment.id", "deployment.status"));
|
||||
ctx.put("app", subtree(instance, "app.slug", "app.id"));
|
||||
|
||||
@@ -106,4 +106,57 @@ public class ClickHouseAgentEventRepository implements AgentEventRepository {
|
||||
|
||||
return new AgentEventPage(results, nextCursor, hasMore);
|
||||
}
|
||||
|
||||
@Override
|
||||
public List<AgentEventRecord> findInWindow(String environment,
|
||||
String applicationId,
|
||||
String instanceId,
|
||||
List<String> eventTypes,
|
||||
Instant fromInclusive,
|
||||
Instant toExclusive,
|
||||
int limit) {
|
||||
if (eventTypes == null || eventTypes.isEmpty()) {
|
||||
throw new IllegalArgumentException("eventTypes must not be empty");
|
||||
}
|
||||
if (fromInclusive == null || toExclusive == null) {
|
||||
throw new IllegalArgumentException("from/to must not be null");
|
||||
}
|
||||
|
||||
// `event_type IN (?, ?, …)` — one placeholder per type.
|
||||
String placeholders = String.join(",", java.util.Collections.nCopies(eventTypes.size(), "?"));
|
||||
var sql = new StringBuilder(SELECT_BASE);
|
||||
var params = new ArrayList<Object>();
|
||||
params.add(tenantId);
|
||||
|
||||
if (environment != null) {
|
||||
sql.append(" AND environment = ?");
|
||||
params.add(environment);
|
||||
}
|
||||
if (applicationId != null) {
|
||||
sql.append(" AND application_id = ?");
|
||||
params.add(applicationId);
|
||||
}
|
||||
if (instanceId != null) {
|
||||
sql.append(" AND instance_id = ?");
|
||||
params.add(instanceId);
|
||||
}
|
||||
sql.append(" AND event_type IN (").append(placeholders).append(")");
|
||||
params.addAll(eventTypes);
|
||||
sql.append(" AND timestamp >= ? AND timestamp < ?");
|
||||
params.add(Timestamp.from(fromInclusive));
|
||||
params.add(Timestamp.from(toExclusive));
|
||||
sql.append(" ORDER BY timestamp ASC, insert_id ASC LIMIT ?");
|
||||
params.add(limit);
|
||||
|
||||
return jdbc.query(sql.toString(),
|
||||
(rs, rowNum) -> new AgentEventRecord(
|
||||
rs.getLong("id"),
|
||||
rs.getString("instance_id"),
|
||||
rs.getString("application_id"),
|
||||
rs.getString("event_type"),
|
||||
rs.getString("detail"),
|
||||
rs.getTimestamp("timestamp").toInstant()
|
||||
),
|
||||
params.toArray());
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,27 @@
|
||||
-- V16 — Generalise open-alert_instance uniqueness via `_subjectFingerprint`.
|
||||
--
|
||||
-- V15 discriminated open instances by `context->'exchange'->>'id'` so that
|
||||
-- EXCHANGE_MATCH / PER_EXCHANGE could emit one instance per exchange. The new
|
||||
-- AGENT_LIFECYCLE / PER_AGENT condition has the same shape but a different
|
||||
-- subject key (agentId + eventType + eventTs). Rather than bolt condition-kind
|
||||
-- knowledge into the index, we introduce a canonical `_subjectFingerprint`
|
||||
-- field in `context` that every "per-subject" evaluator writes. The index
|
||||
-- prefers it over the legacy exchange.id discriminator.
|
||||
--
|
||||
-- Precedence in the COALESCE:
|
||||
-- 1. context->>'_subjectFingerprint' — explicit per-subject key (new)
|
||||
-- 2. context->'exchange'->>'id' — legacy EXCHANGE_MATCH instances (pre-V16)
|
||||
-- 3. '' — scalar condition kinds (one open per rule)
|
||||
--
|
||||
-- Existing open PER_EXCHANGE instances keep working because they never set
|
||||
-- `_subjectFingerprint` but do carry `context.exchange.id`, so the index
|
||||
-- still discriminates them correctly.
|
||||
DROP INDEX IF EXISTS alert_instances_open_rule_uq;
|
||||
|
||||
CREATE UNIQUE INDEX alert_instances_open_rule_uq
|
||||
ON alert_instances (rule_id, (COALESCE(
|
||||
context->>'_subjectFingerprint',
|
||||
context->'exchange'->>'id',
|
||||
'')))
|
||||
WHERE rule_id IS NOT NULL
|
||||
AND state IN ('PENDING','FIRING','ACKNOWLEDGED');
|
||||
@@ -0,0 +1,130 @@
|
||||
package com.cameleer.server.app.alerting.eval;
|
||||
|
||||
import com.cameleer.server.core.agent.AgentEventRecord;
|
||||
import com.cameleer.server.core.agent.AgentEventRepository;
|
||||
import com.cameleer.server.core.alerting.*;
|
||||
import com.cameleer.server.core.runtime.Environment;
|
||||
import com.cameleer.server.core.runtime.EnvironmentRepository;
|
||||
import org.junit.jupiter.api.BeforeEach;
|
||||
import org.junit.jupiter.api.Test;
|
||||
|
||||
import java.time.Instant;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.Optional;
|
||||
import java.util.UUID;
|
||||
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
import static org.mockito.ArgumentMatchers.any;
|
||||
import static org.mockito.ArgumentMatchers.anyInt;
|
||||
import static org.mockito.ArgumentMatchers.eq;
|
||||
import static org.mockito.Mockito.mock;
|
||||
import static org.mockito.Mockito.when;
|
||||
|
||||
class AgentLifecycleEvaluatorTest {
|
||||
|
||||
private AgentEventRepository events;
|
||||
private EnvironmentRepository envRepo;
|
||||
private AgentLifecycleEvaluator eval;
|
||||
|
||||
private static final UUID ENV_ID = UUID.fromString("bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb");
|
||||
private static final UUID RULE_ID = UUID.fromString("aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa");
|
||||
private static final String ENV_SLUG = "prod";
|
||||
private static final Instant NOW = Instant.parse("2026-04-19T10:00:00Z");
|
||||
|
||||
@BeforeEach
|
||||
void setUp() {
|
||||
events = mock(AgentEventRepository.class);
|
||||
envRepo = mock(EnvironmentRepository.class);
|
||||
when(envRepo.findById(ENV_ID)).thenReturn(Optional.of(
|
||||
new Environment(ENV_ID, ENV_SLUG, "Prod", true, true, Map.of(), 5, Instant.EPOCH)));
|
||||
eval = new AgentLifecycleEvaluator(events, envRepo);
|
||||
}
|
||||
|
||||
private AlertRule ruleWith(AlertCondition condition) {
|
||||
return new AlertRule(RULE_ID, ENV_ID, "lifecycle test", null,
|
||||
AlertSeverity.CRITICAL, true, condition.kind(), condition,
|
||||
60, 0, 0, null, null, List.of(), List.of(),
|
||||
null, null, null, Map.of(), null, null, null, null);
|
||||
}
|
||||
|
||||
private EvalContext ctx() { return new EvalContext("default", NOW, new TickCache()); }
|
||||
|
||||
@Test
|
||||
void kindIsAgentLifecycle() {
|
||||
assertThat(eval.kind()).isEqualTo(ConditionKind.AGENT_LIFECYCLE);
|
||||
}
|
||||
|
||||
@Test
|
||||
void emptyWindowYieldsEmptyBatch() {
|
||||
var condition = new AgentLifecycleCondition(
|
||||
new AlertScope(null, null, null),
|
||||
List.of(AgentLifecycleEventType.WENT_DEAD),
|
||||
300);
|
||||
when(events.findInWindow(eq(ENV_SLUG), any(), any(), any(), any(), any(), anyInt()))
|
||||
.thenReturn(List.of());
|
||||
|
||||
EvalResult r = eval.evaluate(condition, ruleWith(condition), ctx());
|
||||
assertThat(r).isInstanceOf(EvalResult.Batch.class);
|
||||
assertThat(((EvalResult.Batch) r).firings()).isEmpty();
|
||||
}
|
||||
|
||||
@Test
|
||||
void emitsOneFiringPerEventWithFingerprint() {
|
||||
Instant ts1 = NOW.minusSeconds(30);
|
||||
Instant ts2 = NOW.minusSeconds(10);
|
||||
when(events.findInWindow(eq(ENV_SLUG), any(), any(), any(), any(), any(), anyInt()))
|
||||
.thenReturn(List.of(
|
||||
new AgentEventRecord(0, "agent-A", "orders", "WENT_DEAD", "A went dead", ts1),
|
||||
new AgentEventRecord(0, "agent-B", "orders", "WENT_DEAD", "B went dead", ts2)
|
||||
));
|
||||
|
||||
var condition = new AgentLifecycleCondition(
|
||||
new AlertScope(null, null, null),
|
||||
List.of(AgentLifecycleEventType.WENT_DEAD), 60);
|
||||
|
||||
EvalResult r = eval.evaluate(condition, ruleWith(condition), ctx());
|
||||
var batch = (EvalResult.Batch) r;
|
||||
assertThat(batch.firings()).hasSize(2);
|
||||
|
||||
var f0 = batch.firings().get(0);
|
||||
assertThat(f0.context()).containsKey("_subjectFingerprint");
|
||||
assertThat((String) f0.context().get("_subjectFingerprint"))
|
||||
.isEqualTo("agent-A:WENT_DEAD:" + ts1.toEpochMilli());
|
||||
@SuppressWarnings("unchecked")
|
||||
Map<String, Object> agent0 = (Map<String, Object>) f0.context().get("agent");
|
||||
assertThat(agent0).containsEntry("id", "agent-A").containsEntry("app", "orders");
|
||||
@SuppressWarnings("unchecked")
|
||||
Map<String, Object> event0 = (Map<String, Object>) f0.context().get("event");
|
||||
assertThat(event0).containsEntry("type", "WENT_DEAD");
|
||||
|
||||
var f1 = batch.firings().get(1);
|
||||
assertThat((String) f1.context().get("_subjectFingerprint"))
|
||||
.isEqualTo("agent-B:WENT_DEAD:" + ts2.toEpochMilli());
|
||||
}
|
||||
|
||||
@Test
|
||||
void forwardsScopeFiltersToRepo() {
|
||||
when(events.findInWindow(eq(ENV_SLUG), eq("orders"), eq("agent-A"), any(), any(), any(), anyInt()))
|
||||
.thenReturn(List.of());
|
||||
var condition = new AgentLifecycleCondition(
|
||||
new AlertScope("orders", null, "agent-A"),
|
||||
List.of(AgentLifecycleEventType.REGISTERED), 120);
|
||||
eval.evaluate(condition, ruleWith(condition), ctx());
|
||||
// Mockito `when` matches — verifying no mismatch is enough; stub returns []
|
||||
}
|
||||
|
||||
@Test
|
||||
void clearsWhenEnvIsMissing() {
|
||||
// envRepo returns empty → should Clear, not throw.
|
||||
EnvironmentRepository emptyEnvRepo = mock(EnvironmentRepository.class);
|
||||
when(emptyEnvRepo.findById(ENV_ID)).thenReturn(Optional.empty());
|
||||
AgentLifecycleEvaluator localEval = new AgentLifecycleEvaluator(events, emptyEnvRepo);
|
||||
|
||||
var condition = new AgentLifecycleCondition(
|
||||
new AlertScope(null, null, null),
|
||||
List.of(AgentLifecycleEventType.WENT_DEAD), 60);
|
||||
EvalResult r = localEval.evaluate(condition, ruleWith(condition), ctx());
|
||||
assertThat(r).isEqualTo(EvalResult.Clear.INSTANCE);
|
||||
}
|
||||
}
|
||||
@@ -43,6 +43,10 @@ class NotificationContextBuilderTest {
|
||||
case AGENT_STATE -> new AgentStateCondition(
|
||||
new AlertScope(null, null, null),
|
||||
"DEAD", 0);
|
||||
case AGENT_LIFECYCLE -> new AgentLifecycleCondition(
|
||||
new AlertScope(null, null, null),
|
||||
List.of(AgentLifecycleEventType.WENT_DEAD),
|
||||
60);
|
||||
case DEPLOYMENT_STATE -> new DeploymentStateCondition(
|
||||
new AlertScope("my-app", null, null),
|
||||
List.of("FAILED"));
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
package com.cameleer.server.core.agent;
|
||||
|
||||
import java.time.Instant;
|
||||
import java.util.List;
|
||||
|
||||
public interface AgentEventRepository {
|
||||
|
||||
@@ -13,4 +14,19 @@ public interface AgentEventRepository {
|
||||
*/
|
||||
AgentEventPage queryPage(String applicationId, String instanceId, String environment,
|
||||
Instant from, Instant to, String cursor, int limit);
|
||||
|
||||
/**
|
||||
* Inclusive-exclusive window query ordered by (timestamp ASC, instance_id ASC)
|
||||
* used by the AGENT_LIFECYCLE alert evaluator. {@code eventTypes} is required
|
||||
* and must be non-empty; the implementation filters via {@code event_type IN (...)}.
|
||||
* Scope filters ({@code applicationId}, {@code instanceId}) are optional. The
|
||||
* returned list is capped at {@code limit} rows.
|
||||
*/
|
||||
List<AgentEventRecord> findInWindow(String environment,
|
||||
String applicationId,
|
||||
String instanceId,
|
||||
List<String> eventTypes,
|
||||
Instant fromInclusive,
|
||||
Instant toExclusive,
|
||||
int limit);
|
||||
}
|
||||
|
||||
@@ -0,0 +1,34 @@
|
||||
package com.cameleer.server.core.alerting;
|
||||
|
||||
import com.fasterxml.jackson.annotation.JsonProperty;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Fires one {@code AlertInstance} per matching {@code agent_events} row in the
|
||||
* lookback window. Per-subject fire mode (see
|
||||
* {@link AgentLifecycleEventType}) — each {@code (agent, eventType, timestamp)}
|
||||
* tuple is independently ackable, driven by a canonical
|
||||
* {@code _subjectFingerprint} in the instance context and the partial unique
|
||||
* index on {@code alert_instances}.
|
||||
*/
|
||||
public record AgentLifecycleCondition(
|
||||
AlertScope scope,
|
||||
List<AgentLifecycleEventType> eventTypes,
|
||||
int withinSeconds
|
||||
) implements AlertCondition {
|
||||
|
||||
public AgentLifecycleCondition {
|
||||
if (eventTypes == null || eventTypes.isEmpty()) {
|
||||
throw new IllegalArgumentException("eventTypes must not be empty");
|
||||
}
|
||||
if (withinSeconds < 1) {
|
||||
throw new IllegalArgumentException("withinSeconds must be >= 1");
|
||||
}
|
||||
eventTypes = List.copyOf(eventTypes);
|
||||
}
|
||||
|
||||
@Override
|
||||
@JsonProperty(value = "kind", access = JsonProperty.Access.READ_ONLY)
|
||||
public ConditionKind kind() { return ConditionKind.AGENT_LIFECYCLE; }
|
||||
}
|
||||
@@ -0,0 +1,20 @@
|
||||
package com.cameleer.server.core.alerting;
|
||||
|
||||
/**
|
||||
* Allowlist of agent-lifecycle event types that may appear in an
|
||||
* {@link AgentLifecycleCondition}. The set matches exactly the events the
|
||||
* server writes to {@code agent_events} — registration-controller emits
|
||||
* REGISTERED / RE_REGISTERED / DEREGISTERED, the lifecycle monitor emits
|
||||
* WENT_STALE / WENT_DEAD / RECOVERED.
|
||||
* <p>
|
||||
* Custom agent-emitted event types (via {@code POST /api/v1/data/events})
|
||||
* are intentionally excluded — see backlog issue #145.
|
||||
*/
|
||||
public enum AgentLifecycleEventType {
|
||||
REGISTERED,
|
||||
RE_REGISTERED,
|
||||
DEREGISTERED,
|
||||
WENT_STALE,
|
||||
WENT_DEAD,
|
||||
RECOVERED
|
||||
}
|
||||
@@ -9,13 +9,15 @@ import com.fasterxml.jackson.annotation.JsonTypeInfo;
|
||||
@JsonSubTypes.Type(value = RouteMetricCondition.class, name = "ROUTE_METRIC"),
|
||||
@JsonSubTypes.Type(value = ExchangeMatchCondition.class, name = "EXCHANGE_MATCH"),
|
||||
@JsonSubTypes.Type(value = AgentStateCondition.class, name = "AGENT_STATE"),
|
||||
@JsonSubTypes.Type(value = AgentLifecycleCondition.class, name = "AGENT_LIFECYCLE"),
|
||||
@JsonSubTypes.Type(value = DeploymentStateCondition.class, name = "DEPLOYMENT_STATE"),
|
||||
@JsonSubTypes.Type(value = LogPatternCondition.class, name = "LOG_PATTERN"),
|
||||
@JsonSubTypes.Type(value = JvmMetricCondition.class, name = "JVM_METRIC")
|
||||
})
|
||||
public sealed interface AlertCondition permits
|
||||
RouteMetricCondition, ExchangeMatchCondition, AgentStateCondition,
|
||||
DeploymentStateCondition, LogPatternCondition, JvmMetricCondition {
|
||||
AgentLifecycleCondition, DeploymentStateCondition, LogPatternCondition,
|
||||
JvmMetricCondition {
|
||||
|
||||
@JsonProperty("kind")
|
||||
ConditionKind kind();
|
||||
|
||||
@@ -1,3 +1,11 @@
|
||||
package com.cameleer.server.core.alerting;
|
||||
|
||||
public enum ConditionKind { ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC }
|
||||
public enum ConditionKind {
|
||||
ROUTE_METRIC,
|
||||
EXCHANGE_MATCH,
|
||||
AGENT_STATE,
|
||||
AGENT_LIFECYCLE,
|
||||
DEPLOYMENT_STATE,
|
||||
LOG_PATTERN,
|
||||
JVM_METRIC
|
||||
}
|
||||
|
||||
@@ -101,4 +101,50 @@ class AlertConditionJsonTest {
|
||||
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
|
||||
assertThat(parsed).isInstanceOf(JvmMetricCondition.class);
|
||||
}
|
||||
|
||||
@Test
|
||||
void roundtripAgentLifecycle() throws Exception {
|
||||
var c = new AgentLifecycleCondition(
|
||||
new AlertScope("orders", null, null),
|
||||
List.of(AgentLifecycleEventType.WENT_DEAD, AgentLifecycleEventType.DEREGISTERED),
|
||||
300);
|
||||
AlertCondition parsed = om.readValue(om.writeValueAsString((AlertCondition) c), AlertCondition.class);
|
||||
assertThat(parsed).isInstanceOf(AgentLifecycleCondition.class);
|
||||
var alc = (AgentLifecycleCondition) parsed;
|
||||
assertThat(alc.eventTypes()).containsExactly(
|
||||
AgentLifecycleEventType.WENT_DEAD, AgentLifecycleEventType.DEREGISTERED);
|
||||
assertThat(alc.withinSeconds()).isEqualTo(300);
|
||||
assertThat(alc.kind()).isEqualTo(ConditionKind.AGENT_LIFECYCLE);
|
||||
}
|
||||
|
||||
@Test
|
||||
void agentLifecycleRejectsEmptyEventTypes() {
|
||||
assertThatThrownBy(() -> new AgentLifecycleCondition(
|
||||
new AlertScope(null, null, null), List.of(), 60))
|
||||
.isInstanceOf(IllegalArgumentException.class)
|
||||
.hasMessageContaining("eventTypes");
|
||||
}
|
||||
|
||||
@Test
|
||||
void agentLifecycleRejectsZeroWindow() {
|
||||
assertThatThrownBy(() -> new AgentLifecycleCondition(
|
||||
new AlertScope(null, null, null),
|
||||
List.of(AgentLifecycleEventType.WENT_DEAD), 0))
|
||||
.isInstanceOf(IllegalArgumentException.class)
|
||||
.hasMessageContaining("withinSeconds");
|
||||
}
|
||||
|
||||
@Test
|
||||
void agentLifecycleRejectsUnknownEventTypeOnDeserialization() {
|
||||
String json = """
|
||||
{
|
||||
"kind": "AGENT_LIFECYCLE",
|
||||
"scope": {},
|
||||
"eventTypes": ["REGISTERED", "BOGUS_EVENT"],
|
||||
"withinSeconds": 60
|
||||
}
|
||||
""";
|
||||
assertThatThrownBy(() -> om.readValue(json, AlertCondition.class))
|
||||
.hasMessageContaining("BOGUS_EVENT");
|
||||
}
|
||||
}
|
||||
|
||||
File diff suppressed because one or more lines are too long
32
ui/src/api/schema.d.ts
vendored
32
ui/src/api/schema.d.ts
vendored
@@ -2221,6 +2221,16 @@ export interface components {
|
||||
/** Format: date-time */
|
||||
createdAt?: string;
|
||||
};
|
||||
AgentLifecycleCondition: {
|
||||
kind: "AgentLifecycleCondition";
|
||||
} & (Omit<components["schemas"]["AlertCondition"], "kind"> & {
|
||||
scope?: components["schemas"]["AlertScope"];
|
||||
eventTypes?: ("REGISTERED" | "RE_REGISTERED" | "DEREGISTERED" | "WENT_STALE" | "WENT_DEAD" | "RECOVERED")[];
|
||||
/** Format: int32 */
|
||||
withinSeconds?: number;
|
||||
/** @enum {string} */
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
});
|
||||
AgentStateCondition: {
|
||||
kind: "AgentStateCondition";
|
||||
} & (Omit<components["schemas"]["AlertCondition"], "kind"> & {
|
||||
@@ -2229,11 +2239,11 @@ export interface components {
|
||||
/** Format: int32 */
|
||||
forSeconds?: number;
|
||||
/** @enum {string} */
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
});
|
||||
AlertCondition: {
|
||||
/** @enum {string} */
|
||||
kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
};
|
||||
AlertRuleRequest: {
|
||||
name?: string;
|
||||
@@ -2241,8 +2251,8 @@ export interface components {
|
||||
/** @enum {string} */
|
||||
severity: "CRITICAL" | "WARNING" | "INFO";
|
||||
/** @enum {string} */
|
||||
conditionKind: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
condition: components["schemas"]["AgentStateCondition"] | components["schemas"]["DeploymentStateCondition"] | components["schemas"]["ExchangeMatchCondition"] | components["schemas"]["JvmMetricCondition"] | components["schemas"]["LogPatternCondition"] | components["schemas"]["RouteMetricCondition"];
|
||||
conditionKind: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
condition: components["schemas"]["AgentLifecycleCondition"] | components["schemas"]["AgentStateCondition"] | components["schemas"]["DeploymentStateCondition"] | components["schemas"]["ExchangeMatchCondition"] | components["schemas"]["JvmMetricCondition"] | components["schemas"]["LogPatternCondition"] | components["schemas"]["RouteMetricCondition"];
|
||||
/** Format: int32 */
|
||||
evaluationIntervalSeconds?: number;
|
||||
/** Format: int32 */
|
||||
@@ -2274,7 +2284,7 @@ export interface components {
|
||||
scope?: components["schemas"]["AlertScope"];
|
||||
states?: string[];
|
||||
/** @enum {string} */
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
});
|
||||
ExchangeFilter: {
|
||||
status?: string;
|
||||
@@ -2296,7 +2306,7 @@ export interface components {
|
||||
/** Format: int32 */
|
||||
perExchangeLingerSeconds?: number;
|
||||
/** @enum {string} */
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
});
|
||||
JvmMetricCondition: {
|
||||
kind: "JvmMetricCondition";
|
||||
@@ -2312,7 +2322,7 @@ export interface components {
|
||||
/** Format: int32 */
|
||||
windowSeconds?: number;
|
||||
/** @enum {string} */
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
});
|
||||
LogPatternCondition: {
|
||||
kind: "LogPatternCondition";
|
||||
@@ -2325,7 +2335,7 @@ export interface components {
|
||||
/** Format: int32 */
|
||||
windowSeconds?: number;
|
||||
/** @enum {string} */
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
});
|
||||
RouteMetricCondition: {
|
||||
kind: "RouteMetricCondition";
|
||||
@@ -2340,7 +2350,7 @@ export interface components {
|
||||
/** Format: int32 */
|
||||
windowSeconds?: number;
|
||||
/** @enum {string} */
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
readonly kind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
});
|
||||
WebhookBindingRequest: {
|
||||
/** Format: uuid */
|
||||
@@ -2361,8 +2371,8 @@ export interface components {
|
||||
severity?: "CRITICAL" | "WARNING" | "INFO";
|
||||
enabled?: boolean;
|
||||
/** @enum {string} */
|
||||
conditionKind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
condition?: components["schemas"]["AgentStateCondition"] | components["schemas"]["DeploymentStateCondition"] | components["schemas"]["ExchangeMatchCondition"] | components["schemas"]["JvmMetricCondition"] | components["schemas"]["LogPatternCondition"] | components["schemas"]["RouteMetricCondition"];
|
||||
conditionKind?: "ROUTE_METRIC" | "EXCHANGE_MATCH" | "AGENT_STATE" | "AGENT_LIFECYCLE" | "DEPLOYMENT_STATE" | "LOG_PATTERN" | "JVM_METRIC";
|
||||
condition?: components["schemas"]["AgentLifecycleCondition"] | components["schemas"]["AgentStateCondition"] | components["schemas"]["DeploymentStateCondition"] | components["schemas"]["ExchangeMatchCondition"] | components["schemas"]["JvmMetricCondition"] | components["schemas"]["LogPatternCondition"] | components["schemas"]["RouteMetricCondition"];
|
||||
/** Format: int32 */
|
||||
evaluationIntervalSeconds?: number;
|
||||
/** Format: int32 */
|
||||
|
||||
@@ -42,6 +42,16 @@ export const ALERT_VARIABLES: AlertVariable[] = [
|
||||
{ path: 'app.id', type: 'uuid', description: 'App UUID', sampleValue: '33333333-...',
|
||||
availableForKinds: ['ROUTE_METRIC', 'EXCHANGE_MATCH', 'AGENT_STATE', 'DEPLOYMENT_STATE', 'LOG_PATTERN', 'JVM_METRIC'], mayBeNull: true },
|
||||
|
||||
// AGENT_LIFECYCLE — agent + event subtree (distinct from AGENT_STATE's agent.* leaves)
|
||||
{ path: 'agent.app', type: 'string', description: 'Agent app slug', sampleValue: 'orders',
|
||||
availableForKinds: ['AGENT_LIFECYCLE'] },
|
||||
{ path: 'event.type', type: 'string', description: 'Lifecycle event type', sampleValue: 'WENT_DEAD',
|
||||
availableForKinds: ['AGENT_LIFECYCLE'] },
|
||||
{ path: 'event.timestamp', type: 'Instant', description: 'When the event happened', sampleValue: '2026-04-20T14:33:10Z',
|
||||
availableForKinds: ['AGENT_LIFECYCLE'] },
|
||||
{ path: 'event.detail', type: 'string', description: 'Free-text event detail', sampleValue: 'orders-0 STALE -> DEAD',
|
||||
availableForKinds: ['AGENT_LIFECYCLE'], mayBeNull: true },
|
||||
|
||||
// ROUTE_METRIC + EXCHANGE_MATCH share route.*
|
||||
{ path: 'route.id', type: 'string', description: 'Route ID', sampleValue: 'route-1',
|
||||
availableForKinds: ['ROUTE_METRIC', 'EXCHANGE_MATCH'] },
|
||||
@@ -56,7 +66,7 @@ export const ALERT_VARIABLES: AlertVariable[] = [
|
||||
|
||||
// AGENT_STATE + JVM_METRIC share agent.id/name; AGENT_STATE adds agent.state
|
||||
{ path: 'agent.id', type: 'string', description: 'Agent instance ID', sampleValue: 'prod-orders-0',
|
||||
availableForKinds: ['AGENT_STATE', 'JVM_METRIC'] },
|
||||
availableForKinds: ['AGENT_STATE', 'AGENT_LIFECYCLE', 'JVM_METRIC'] },
|
||||
{ path: 'agent.name', type: 'string', description: 'Agent display name', sampleValue: 'orders-0',
|
||||
availableForKinds: ['AGENT_STATE', 'JVM_METRIC'] },
|
||||
{ path: 'agent.state', type: 'string', description: 'Agent state', sampleValue: 'DEAD',
|
||||
|
||||
@@ -3,6 +3,7 @@ import type { FormState } from './form-state';
|
||||
import { RouteMetricForm } from './condition-forms/RouteMetricForm';
|
||||
import { ExchangeMatchForm } from './condition-forms/ExchangeMatchForm';
|
||||
import { AgentStateForm } from './condition-forms/AgentStateForm';
|
||||
import { AgentLifecycleForm } from './condition-forms/AgentLifecycleForm';
|
||||
import { DeploymentStateForm } from './condition-forms/DeploymentStateForm';
|
||||
import { LogPatternForm } from './condition-forms/LogPatternForm';
|
||||
import { JvmMetricForm } from './condition-forms/JvmMetricForm';
|
||||
@@ -23,6 +24,13 @@ export function ConditionStep({ form, setForm }: { form: FormState; setForm: (f:
|
||||
base.perExchangeLingerSeconds = 300;
|
||||
base.filter = {};
|
||||
}
|
||||
if (kind === 'AGENT_LIFECYCLE') {
|
||||
// Sensible defaults so a rule can be saved without touching every sub-field.
|
||||
// WENT_DEAD is the most "alert-worthy" event out of the six; a 5-minute
|
||||
// window matches the registry's STALE→DEAD cadence + slack for tick jitter.
|
||||
base.eventTypes = ['WENT_DEAD'];
|
||||
base.withinSeconds = 300;
|
||||
}
|
||||
setForm({
|
||||
...form,
|
||||
conditionKind: kind,
|
||||
@@ -42,6 +50,7 @@ export function ConditionStep({ form, setForm }: { form: FormState; setForm: (f:
|
||||
{form.conditionKind === 'ROUTE_METRIC' && <RouteMetricForm form={form} setForm={setForm} />}
|
||||
{form.conditionKind === 'EXCHANGE_MATCH' && <ExchangeMatchForm form={form} setForm={setForm} />}
|
||||
{form.conditionKind === 'AGENT_STATE' && <AgentStateForm form={form} setForm={setForm} />}
|
||||
{form.conditionKind === 'AGENT_LIFECYCLE' && <AgentLifecycleForm form={form} setForm={setForm} />}
|
||||
{form.conditionKind === 'DEPLOYMENT_STATE' && <DeploymentStateForm form={form} setForm={setForm} />}
|
||||
{form.conditionKind === 'LOG_PATTERN' && <LogPatternForm form={form} setForm={setForm} />}
|
||||
{form.conditionKind === 'JVM_METRIC' && <JvmMetricForm form={form} setForm={setForm} />}
|
||||
|
||||
@@ -0,0 +1,72 @@
|
||||
import { FormField, Input } from '@cameleer/design-system';
|
||||
import type { FormState } from '../form-state';
|
||||
import {
|
||||
AGENT_LIFECYCLE_EVENT_TYPE_OPTIONS,
|
||||
type AgentLifecycleEventType,
|
||||
} from '../../enums';
|
||||
|
||||
/**
|
||||
* Form for `AGENT_LIFECYCLE` conditions. Users pick one or more event types
|
||||
* (allowlist only) and a lookback window in seconds. The evaluator queries
|
||||
* `agent_events` with those filters; each matching row produces its own
|
||||
* {@code AlertInstance}.
|
||||
*/
|
||||
export function AgentLifecycleForm({ form, setForm }: { form: FormState; setForm: (f: FormState) => void }) {
|
||||
const c = form.condition as Record<string, unknown>;
|
||||
const selected = new Set<AgentLifecycleEventType>(
|
||||
Array.isArray(c.eventTypes) ? (c.eventTypes as AgentLifecycleEventType[]) : [],
|
||||
);
|
||||
|
||||
const patch = (p: Record<string, unknown>) =>
|
||||
setForm({
|
||||
...form,
|
||||
condition: { ...(form.condition as Record<string, unknown>), ...p } as FormState['condition'],
|
||||
});
|
||||
|
||||
const toggle = (t: AgentLifecycleEventType) => {
|
||||
const next = new Set(selected);
|
||||
if (next.has(t)) next.delete(t); else next.add(t);
|
||||
patch({ eventTypes: [...next] });
|
||||
};
|
||||
|
||||
return (
|
||||
<>
|
||||
<FormField
|
||||
label="Event types"
|
||||
hint="Fires one alert per matching event. Pick at least one."
|
||||
>
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 6 }}>
|
||||
{AGENT_LIFECYCLE_EVENT_TYPE_OPTIONS.map((opt) => {
|
||||
const active = selected.has(opt.value);
|
||||
return (
|
||||
<button
|
||||
key={opt.value}
|
||||
type="button"
|
||||
onClick={() => toggle(opt.value)}
|
||||
style={{
|
||||
border: `1px solid ${active ? 'var(--amber)' : 'var(--border-subtle)'}`,
|
||||
background: active ? 'var(--amber-bg)' : 'transparent',
|
||||
color: active ? 'var(--text-primary)' : 'var(--text-secondary)',
|
||||
borderRadius: 999,
|
||||
padding: '4px 10px',
|
||||
fontSize: 12,
|
||||
cursor: 'pointer',
|
||||
}}
|
||||
>
|
||||
{opt.label}
|
||||
</button>
|
||||
);
|
||||
})}
|
||||
</div>
|
||||
</FormField>
|
||||
<FormField label="Lookback window (seconds)" hint="How far back to search for matching events each tick.">
|
||||
<Input
|
||||
type="number"
|
||||
min={1}
|
||||
value={(c.withinSeconds as number | undefined) ?? 300}
|
||||
onChange={(e) => patch({ withinSeconds: Number(e.target.value) })}
|
||||
/>
|
||||
</FormField>
|
||||
</>
|
||||
);
|
||||
}
|
||||
@@ -160,6 +160,13 @@ export function validateStep(step: WizardStep, f: FormState): string[] {
|
||||
if (c.windowSeconds == null) errs.push('Window (seconds) is required for COUNT_IN_WINDOW.');
|
||||
}
|
||||
}
|
||||
if (f.conditionKind === 'AGENT_LIFECYCLE') {
|
||||
const c = f.condition as Record<string, unknown>;
|
||||
const types = Array.isArray(c.eventTypes) ? (c.eventTypes as string[]) : [];
|
||||
if (types.length === 0) errs.push('Pick at least one event type.');
|
||||
const within = c.withinSeconds as number | undefined;
|
||||
if (within == null || within < 1) errs.push('Lookback window must be \u2265 1 second.');
|
||||
}
|
||||
}
|
||||
if (step === 'trigger') {
|
||||
if (f.evaluationIntervalSeconds < 5) errs.push('Evaluation interval must be \u2265 5 s.');
|
||||
|
||||
@@ -7,6 +7,7 @@ import {
|
||||
JVM_AGGREGATION_OPTIONS,
|
||||
EXCHANGE_FIRE_MODE_OPTIONS,
|
||||
TARGET_KIND_OPTIONS,
|
||||
AGENT_LIFECYCLE_EVENT_TYPE_OPTIONS,
|
||||
} from './enums';
|
||||
|
||||
/**
|
||||
@@ -25,12 +26,24 @@ describe('alerts/enums option arrays', () => {
|
||||
{ value: 'ROUTE_METRIC', label: 'Route metric (error rate, latency, throughput)' },
|
||||
{ value: 'EXCHANGE_MATCH', label: 'Exchange match (specific failures)' },
|
||||
{ value: 'AGENT_STATE', label: 'Agent state (DEAD / STALE)' },
|
||||
{ value: 'AGENT_LIFECYCLE', label: 'Agent lifecycle (register / restart / stale / dead)' },
|
||||
{ value: 'DEPLOYMENT_STATE', label: 'Deployment state (FAILED / DEGRADED)' },
|
||||
{ value: 'LOG_PATTERN', label: 'Log pattern (count of matching logs)' },
|
||||
{ value: 'JVM_METRIC', label: 'JVM metric (heap, GC, inflight)' },
|
||||
]);
|
||||
});
|
||||
|
||||
it('AGENT_LIFECYCLE_EVENT_TYPE_OPTIONS', () => {
|
||||
expect(AGENT_LIFECYCLE_EVENT_TYPE_OPTIONS).toEqual([
|
||||
{ value: 'WENT_STALE', label: 'Went stale (heartbeat missed)' },
|
||||
{ value: 'WENT_DEAD', label: 'Went dead (extended silence)' },
|
||||
{ value: 'RECOVERED', label: 'Recovered (stale → live)' },
|
||||
{ value: 'REGISTERED', label: 'Registered (first check-in)' },
|
||||
{ value: 'RE_REGISTERED', label: 'Re-registered (app restart)' },
|
||||
{ value: 'DEREGISTERED', label: 'Deregistered (graceful shutdown)' },
|
||||
]);
|
||||
});
|
||||
|
||||
it('SEVERITY_OPTIONS', () => {
|
||||
expect(SEVERITY_OPTIONS).toEqual([
|
||||
{ value: 'CRITICAL', label: 'Critical' },
|
||||
|
||||
@@ -44,6 +44,13 @@ export type RouteMetric = 'ERROR_RATE' | 'AVG_DURATION_MS' | 'P99_LATENCY_M
|
||||
export type Comparator = 'GT' | 'GTE' | 'LT' | 'LTE' | 'EQ';
|
||||
export type JvmAggregation = 'MAX' | 'MIN' | 'AVG' | 'LATEST';
|
||||
export type ExchangeFireMode = 'PER_EXCHANGE' | 'COUNT_IN_WINDOW';
|
||||
export type AgentLifecycleEventType =
|
||||
| 'REGISTERED'
|
||||
| 'RE_REGISTERED'
|
||||
| 'DEREGISTERED'
|
||||
| 'WENT_STALE'
|
||||
| 'WENT_DEAD'
|
||||
| 'RECOVERED';
|
||||
|
||||
export interface Option<T extends string> { value: T; label: string }
|
||||
|
||||
@@ -73,6 +80,7 @@ const CONDITION_KIND_LABELS: Record<ConditionKind, string> = {
|
||||
ROUTE_METRIC: 'Route metric (error rate, latency, throughput)',
|
||||
EXCHANGE_MATCH: 'Exchange match (specific failures)',
|
||||
AGENT_STATE: 'Agent state (DEAD / STALE)',
|
||||
AGENT_LIFECYCLE: 'Agent lifecycle (register / restart / stale / dead)',
|
||||
DEPLOYMENT_STATE: 'Deployment state (FAILED / DEGRADED)',
|
||||
LOG_PATTERN: 'Log pattern (count of matching logs)',
|
||||
JVM_METRIC: 'JVM metric (heap, GC, inflight)',
|
||||
@@ -114,6 +122,15 @@ const EXCHANGE_FIRE_MODE_LABELS: Record<ExchangeFireMode, string> = {
|
||||
COUNT_IN_WINDOW: 'Threshold: N matches in window',
|
||||
};
|
||||
|
||||
const AGENT_LIFECYCLE_EVENT_TYPE_LABELS: Record<AgentLifecycleEventType, string> = {
|
||||
WENT_STALE: 'Went stale (heartbeat missed)',
|
||||
WENT_DEAD: 'Went dead (extended silence)',
|
||||
RECOVERED: 'Recovered (stale → live)',
|
||||
REGISTERED: 'Registered (first check-in)',
|
||||
RE_REGISTERED: 'Re-registered (app restart)',
|
||||
DEREGISTERED: 'Deregistered (graceful shutdown)',
|
||||
};
|
||||
|
||||
const TARGET_KIND_LABELS: Record<TargetKind, string> = {
|
||||
USER: 'User',
|
||||
GROUP: 'Group',
|
||||
@@ -147,3 +164,5 @@ export const COMPARATOR_OPTIONS: Option<Comparator>[] = toOptions
|
||||
export const JVM_AGGREGATION_OPTIONS: Option<JvmAggregation>[] = toOptions(JVM_AGGREGATION_LABELS, JVM_AGGREGATION_HIDDEN);
|
||||
export const EXCHANGE_FIRE_MODE_OPTIONS: Option<ExchangeFireMode>[] = toOptions(EXCHANGE_FIRE_MODE_LABELS);
|
||||
export const TARGET_KIND_OPTIONS: Option<TargetKind>[] = toOptions(TARGET_KIND_LABELS);
|
||||
export const AGENT_LIFECYCLE_EVENT_TYPE_OPTIONS: Option<AgentLifecycleEventType>[] =
|
||||
toOptions(AGENT_LIFECYCLE_EVENT_TYPE_LABELS);
|
||||
|
||||
Reference in New Issue
Block a user