Alerting: support custom agent event types in AGENT_LIFECYCLE condition #145

Open
opened 2026-04-21 14:13:36 +02:00 by claude · 0 comments
Owner

Context

The AGENT_LIFECYCLE alert condition (introduced in the agent-lifecycle-conditions feature) ships with a strict allowlist of event types:

  • REGISTERED
  • RE_REGISTERED
  • DEREGISTERED
  • WENT_STALE
  • WENT_DEAD
  • RECOVERED

Agents can also post arbitrary event types via POST /api/v1/data/events (handled by EventIngestionController), which end up in the same agent_events ClickHouse table. Today those custom event types cannot be selected in an alert rule — the allowlist rejects anything outside the six entries above.

Why we shipped strict first

  • Typos silently never fire. A freeform text field like eventType = "REGISTER" (missing ED) would look valid in the UI but match zero rows forever.
  • The six registry-lifecycle events are stable, well-known, and emitted by the server itself, so we can guarantee the allowlist is correct.

What's needed

  • A way to register/declare additional event types (per-tenant? per-env? global?) so the rule editor can offer them as first-class options.
  • UI in the wizard: either a "custom event type" chip input gated on an admin-approved list, or a dropdown populated from a discovery query (e.g. SELECT DISTINCT event_type FROM agent_events WHERE tenant_id = ? AND timestamp > now() - INTERVAL 30 DAY).
  • Validation: still reject unknown types at save time, but "unknown" now means "not in allowlist ∪ registered custom set".

Acceptance criteria

  • Users can opt into matching a custom event type in AgentLifecycleCondition.eventTypes without losing the typo safety net.
  • Existing rules (allowlist-only) keep working unchanged.
  • .claude/rules/ updated to document the mechanism.

Priority

Backlog — the six lifecycle events cover the core "agent outage / restart" use cases. Custom events are a "nice to have" for agents emitting domain-specific signals.

## Context The `AGENT_LIFECYCLE` alert condition (introduced in the agent-lifecycle-conditions feature) ships with a **strict allowlist** of event types: - `REGISTERED` - `RE_REGISTERED` - `DEREGISTERED` - `WENT_STALE` - `WENT_DEAD` - `RECOVERED` Agents can also post arbitrary event types via `POST /api/v1/data/events` (handled by `EventIngestionController`), which end up in the same `agent_events` ClickHouse table. Today those custom event types **cannot** be selected in an alert rule — the allowlist rejects anything outside the six entries above. ## Why we shipped strict first - Typos silently never fire. A freeform text field like `eventType = "REGISTER"` (missing `ED`) would look valid in the UI but match zero rows forever. - The six registry-lifecycle events are stable, well-known, and emitted by the server itself, so we can guarantee the allowlist is correct. ## What's needed - A way to register/declare additional event types (per-tenant? per-env? global?) so the rule editor can offer them as first-class options. - UI in the wizard: either a "custom event type" chip input gated on an admin-approved list, or a dropdown populated from a discovery query (e.g. `SELECT DISTINCT event_type FROM agent_events WHERE tenant_id = ? AND timestamp > now() - INTERVAL 30 DAY`). - Validation: still reject unknown types at save time, but "unknown" now means "not in allowlist ∪ registered custom set". ## Acceptance criteria - Users can opt into matching a custom event type in `AgentLifecycleCondition.eventTypes` without losing the typo safety net. - Existing rules (allowlist-only) keep working unchanged. - `.claude/rules/` updated to document the mechanism. ## Priority Backlog — the six lifecycle events cover the core "agent outage / restart" use cases. Custom events are a "nice to have" for agents emitting domain-specific signals.
Sign in to join this conversation.