Plan prose had spec §8 idealized leaves, but the backend NotificationContext
only emits a subset:
ROUTE_METRIC / EXCHANGE_MATCH → route.id + route.uri (uri added)
LOG_PATTERN → log.pattern + log.matchCount (renamed from log.logger/level/message)
app.slug / app.id → scoped to non-env kinds (removed from 'always')
exchange.link / alert.comparator / alert.window / app.displayName → removed (backend doesn't emit)
Without this alignment the Task 11 linter would (1) flag valid route.uri as
unknown, (2) suggest log.{logger,level,message} as valid paths that render
empty, and (3) flag app.slug on env-wide rules.
ALERT_VARIABLES mirrors the spec §8 context map. availableVariables(kind)
returns the kind-specific filter (always vars + kind vars). extractReferences
+ unknownReferences drive the inline amber linter. Backend NotificationContext
adds must land here too.
NotificationBell used a usePageVisible() subscription that re-rendered on
every visibilitychange without consuming the value. TanStack Query's
refetchIntervalInBackground:false already pauses polling; the extra
subscription was speculative generality. Dropped the import + call + JSDoc
reference; usePageVisible hook + test retained as a reusable primitive.
Also: alerts.test.tsx 'returns the server payload unmodified' asserted a
pre-plan {total, bySeverity} shape, but UnreadCountResponse is actually
{count}. Fixed mock + assertion to {count: 3}.
Adds a header bell component linking to /alerts/inbox with an unread-count
badge for the selected environment. Polling pauses when the tab is hidden
via TanStack Query's refetchIntervalInBackground:false (already set on
useUnreadCount); the new usePageVisible hook gives components a
re-renders-on-visibility-change signal for future defense-in-depth.
Plan-prose deviation: the plan assumed UnreadCountResponse carries a
bySeverity map for per-severity badge coloring, but the backend DTO only
exposes a scalar `count`. The bell reads `data?.count` and renders a single
var(--error) tint; a TODO references spec §13 for future per-severity work
that would require expanding the DTO.
Tests: usePageVisible toggles on visibilitychange events; NotificationBell
renders the bell with no badge at count=0 and shows "3" at count=3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
State colors follow the convention from @cameleer/design-system (CRITICAL->error,
WARNING->warning, INFO->auto). Silenced pill stacks next to state for the spec
section 8 audit-trail surface.
Adds env-scoped hooks for the alerts inbox:
- useAlerts (30s poll, background-paused, filter-aware)
- useAlert, useUnreadCount (30s poll)
- useAckAlert, useMarkAlertRead, useBulkReadAlerts (mutations that
invalidate the alerts query key tree + unread-count)
Test file uses .tsx because the QueryClientProvider wrapper relies on
JSX; vitest picks up both .ts and .tsx via the configured include glob.
Client mock targets the actual export name (`api` in ../client) rather
than the `apiClient` alias that alertMeta re-exports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fetched from http://192.168.50.86:30090/api/v1/api-docs via
`npm run generate-api:live`. Adds TypeScript types for the new alerting
REST surface merged in #140:
- 15 alerting paths under /environments/{envSlug}/alerts/** (rules CRUD,
enable/disable, render-preview, test-evaluate, inbox, unread-count,
ack/read/bulk-read, silences CRUD, per-alert notifications)
- 1 flat notification retry path /alerts/notifications/{id}/retry
- 4 outbound-connection admin paths (from Plan 01 #139)
Verified tsc -p tsconfig.app.json --noEmit exits 0 — no existing SPA
call sites break against the fresh types. Plan 03 UI work can consume
these directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Install @codemirror/{view,state,autocomplete,commands,language,lint}
and @lezer/common — needed by Phase 3's MustacheEditor (Task 13).
CM6 picked over a raw textarea for its small incremental-rendering
bundle, full ARIA/keyboard support, and pluggable autocomplete +
linter APIs that map cleanly to Mustache token parsing.
Add ui/playwright.config.ts wiring Task 30's E2E smoke:
- testDir ./src/test/e2e, single worker, trace+screenshot on failure
- webServer launches `npm run dev:local` (backend on :8081 required)
- PLAYWRIGHT_BASE_URL env var skips the dev server for CI against a
pre-deployed UI
Add test:e2e / test:e2e:ui npm scripts and exclude Playwright's
test-results/ and playwright-report/ from git. @playwright/test
itself was already in devDependencies from an earlier task.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prepares for Plan 03 unit tests (MustacheEditor, NotificationBell, wizard step
validation). jsdom environment + jest-dom matchers + canary test verifies the
wiring.
Fetched from http://192.168.50.86:30090/api/v1/api-docs via
`npm run generate-api:live`. Adds TypeScript types for the new alerting
REST surface merged in #140:
- 15 alerting paths under /environments/{envSlug}/alerts/** (rules CRUD,
enable/disable, render-preview, test-evaluate, inbox, unread-count,
ack/read/bulk-read, silences CRUD, per-alert notifications)
- 1 flat notification retry path /alerts/notifications/{id}/retry
- 4 outbound-connection admin paths (from Plan 01 #139)
Verified tsc -p tsconfig.app.json --noEmit exits 0 — no existing SPA
call sites break against the fresh types. Plan 03 UI work can consume
these directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #141. AbstractPostgresIT centrally declared three @MockBean
fields (clickHouseSearchIndex, clickHouseLogStore, agentRegistryService),
which meant EVERY IT ran against mocks instead of the real Spring context.
That masked the production crashloop — the real bean graph was never
exercised by CI.
- Remove the three @MockBean fields from AbstractPostgresIT.
- Move @MockBean declarations onto only the specific ITs that stub
method behavior (verified by grepping for when/verify calls).
- ITs that don't stub CH behavior now inject the real beans.
- Add SpringContextSmokeIT — @SpringBootTest with no mocks, void
contextLoads(). Fails fast on declared-type / autowire-type
mismatches like the one #141 fixed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production crashlooped on startup: ExchangeMatchEvaluator autowires the
concrete ClickHouseSearchIndex (for countExecutionsForAlerting, which
lives only on the concrete class, not the SearchIndex interface), but
StorageBeanConfig declared the bean with interface return type SearchIndex.
Spring matches autowire candidates by declared bean type, not by runtime
instance class, so the concrete-typed autowire failed with:
Parameter 0 of constructor in ExchangeMatchEvaluator required a bean
of type 'ClickHouseSearchIndex' that could not be found.
ClickHouseLogStore's bean is already declared with the concrete return
type (line 171), which is why LogPatternEvaluator autowires fine.
All alerting ITs passed pre-merge because AbstractPostgresIT replaces the
clickHouseSearchIndex bean with @MockBean(name=...) whose declared type
IS the concrete ClickHouseSearchIndex. The mock masked the prod bug.
Follow-up: remove @MockBean(name="clickHouseSearchIndex") from
AbstractPostgresIT so the real bean graph is exercised by alerting ITs
(and add a SpringContextSmokeIT that loads the context with no mocks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the three Flyway migrations added by the alerting feature branch
so future sessions have an accurate migration map.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rule creation now goes through POST /alerts/rules (exercises saveTargets on the
write path). Clock is replaced with @MockBean(name="alertingClock") and re-stubbed
in @BeforeEach to survive Mockito's inter-test reset. Six ordered steps:
1. seed log → tick evaluator → assert FIRING instance with non-empty targets (B-1)
2. tick dispatcher → assert DELIVERED notification + lastNotifiedAt stamped (B-2)
3. ack via REST → assert ACKNOWLEDGED state
4. create silence → inject PENDING notification → tick dispatcher → assert silenced (FAILED)
5. delete rule → assert rule_id nullified, rule_snapshot preserved (ON DELETE SET NULL)
6. new rule with reNotifyMinutes=1 → first dispatch → advance clock 61s →
evaluator sweep → second dispatch → verify 2 WireMock POSTs (B-2 cadence)
Background scheduler races addressed by resetting claimed_by/claimed_until before
each manual tick. Simulated clock set AFTER log insert to guarantee log timestamp
falls within the evaluator window. Re-notify notifications backdated in Postgres
to work around the simulated vs real clock gap in claimDueNotifications.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newInstance() now maps rule.targets() into targetUserIds/targetGroupIds/targetRoleNames
so newly created AlertInstance rows carry the correct target arrays.
Previously these were always empty List.of(), making the inbox query return nothing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
V13 migration creates alert_instances_open_rule_uq — a partial unique index on
(rule_id) WHERE state IN ('PENDING','FIRING','ACKNOWLEDGED'), preventing
duplicate open instances per rule. PostgresAlertInstanceRepository.save() catches
DuplicateKeyException and returns the existing open instance instead of failing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlertNotificationRepository gains resetForRetry(UUID, Instant) which sets
attempts=0, status=PENDING, next_attempt_at=now, and clears claim/response
fields. AlertNotificationController calls resetForRetry instead of
scheduleRetry so a manual retry always starts from a clean slate.
AlertNotificationControllerIT adds retryResetsAttemptsToZero to verify
attempts==0 and status==PENDING after three prior markFailed calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlertInstanceRepository gains listFiringDueForReNotify(Instant) — only returns
instances where last_notified_at IS NOT NULL and cadence has elapsed (IS NULL
branch excluded: sweep only re-notifies, initial notify is the dispatcher's job).
AlertEvaluatorJob.sweepReNotify() runs at the end of each tick, enqueues fresh
notifications for eligible instances and stamps last_notified_at.
NotificationDispatchJob stamps last_notified_at on the alert_instance when a
notification is DELIVERED, providing the anchor timestamp for cadence checks.
PostgresAlertInstanceRepositoryIT adds listFiringDueForReNotify test covering
the three-rule eligibility matrix (never-notified, long-ago, recent).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
saveTargets() is called unconditionally at the end of save() — it deletes
existing targets and re-inserts from the current targets list. findById()
and listByEnvironment() already call withTargets() so reads are consistent.
PostgresAlertRuleRepositoryIT adds saveTargets_roundtrip and
saveTargets_updateReplacesExistingTargets to cover the new write path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Investigated three approaches for CH 24.12:
- Inline SETTINGS on ADD PROJECTION: rejected (UNKNOWN_SETTING — not a query-level setting).
- ALTER TABLE MODIFY SETTING deduplicate_merge_projection_mode='rebuild': works; persists in
table metadata across connection restarts; runs before ADD PROJECTION in the SQL script.
- Session-level JDBC URL param: not pursued (MODIFY SETTING is strictly better).
alerting_projections.sql now runs MODIFY SETTING before the two executions ADD PROJECTIONs.
AlertingProjectionsIT strengthened to assert all four projections (including alerting_app_status
and alerting_route_status on executions) exist after schema init.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The evaluator mapped P95_LATENCY_MS to ExecutionStats.avgDurationMs because
stats_1m_route has no p95 column. Exposing the old name implied p95 semantics
operators did not get. Rename to AVG_DURATION_MS makes the contract honest.
Updated RouteMetric enum (with javadoc), evaluator switch, and admin guide.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AbstractPostgresIT gained clickHouseSearchIndex and agentRegistryService mocks in Phase 9.
All 14 alerting IT subclasses that re-declared the same @MockBean fields now fail with
"Duplicate mock definition". Removed the redundant declarations; per-class clickHouseLogStore
mock kept where needed. 120 alerting tests now pass (0 failures).
Also adds docs/alerting-02-verification.md (Task 43).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds alerting stanza to application.yml with all AlertingProperties
fields backed by env-var overrides. Creates docs/alerting.md covering
six condition kinds (with example JSON), template variables, webhook
setup (Slack/PagerDuty examples), silence patterns, circuit-breaker
and retention troubleshooting, and Prometheus metrics reference.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Nightly @Scheduled(03:00) job deletes RESOLVED alert_instances older
than eventRetentionDays and DELIVERED/FAILED alert_notifications older
than notificationRetentionDays. Uses injected Clock for testability.
IT covers: old-resolved deleted, fresh-resolved kept, FIRING kept
regardless of age, PENDING notification never deleted.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- GET /environments/{envSlug}/alerts/{alertId}/notifications — list notifications for instance (VIEWER+)
- POST /alerts/notifications/{id}/retry — manual retry of failed notification (OPERATOR+)
Flat path because notification IDs are globally unique (no env routing needed)
- scheduleRetry resets attempts to 0 and sets nextAttemptAt = now
- Added 11 alerting path matchers to SecurityConfig before outbound-connections block
- Fixed context loading failure in 6 pre-existing alerting storage/migration ITs by adding
@MockBean(clickHouseSearchIndex/clickHouseLogStore): ExchangeMatchEvaluator and
LogPatternEvaluator inject the concrete classes directly (not interface beans), so the
full Spring context fails without these mocks in tests that don't use the real CH container
- 5 IT tests: list, viewer-can-list, retry, viewer-cannot-retry, unknown-404
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- POST/GET/DELETE /environments/{envSlug}/alerts/silences
- 422 when endsAt <= startsAt ("endsAt must be after startsAt")
- OPERATOR+ for create/delete, VIEWER+ for list
- Audit: ALERT_SILENCE_CREATE/DELETE with AuditCategory.ALERT_SILENCE_CHANGE
- 6 IT tests: create, viewer-list, viewer-cannot-create, bad time-range, delete, audit event
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GET /environments/{envSlug}/alerts — inbox filtered by userId/groupIds/roleNames via InAppInboxQuery
- GET /unread-count — memoized unread count (5s TTL)
- GET /{id}, POST /{id}/ack, POST /{id}/read, POST /bulk-read
- bulkRead filters instanceIds to env before delegating to AlertReadRepository
- VIEWER+ for all endpoints; env isolation enforced by requireInstance
- 7 IT tests: list, env isolation, unread-count, ack flow, read, bulk-read, viewer access
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- POST/GET/PUT/DELETE /environments/{envSlug}/alerts/rules CRUD
- POST /{id}/enable, /{id}/disable, /{id}/render-preview, /{id}/test-evaluate
- Attribute-key validation: rejects keys not matching ^[a-zA-Z0-9._-]+$ at rule-save time
(CRITICAL: ExchangeMatchCondition attribute keys are inlined into ClickHouse SQL)
- Webhook validation: verifies outboundConnectionId exists and is allowed in env
- Null-safe notification template defaults to "" for NOT NULL DB constraint
- Fixed misleading comment in ClickHouseSearchIndex to document validation contract
- OPERATOR+ for mutations, VIEWER+ for reads
- Audit: ALERT_RULE_CREATE/UPDATE/DELETE/ENABLE/DISABLE with AuditCategory.ALERT_RULE_CHANGE
- 11 IT tests covering RBAC, SQL-injection prevention, enable/disable, audit, render-preview
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
listInbox resolves user groups+roles via RbacService.getEffectiveGroupsForUser
/ getEffectiveRolesForUser then delegates to AlertInstanceRepository.
countUnread memoized per (envId, userId) with 5s TTL via ConcurrentHashMap
using a controllable Clock. 6 unit tests covering delegation, cache hit,
TTL expiry, and isolation between users/envs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renders URL/headers/body with Mustache, optionally HMAC-signs the body
(X-Cameleer-Signature), supports POST/PUT/PATCH, classifies 2xx/4xx/5xx
into DELIVERED/FAILED/retry. 8 WireMock-backed IT tests including HTTPS
TRUST_ALL against WireMock self-signed cert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
HmacSHA256 signer returning sha256=<lowercase-hex>. 5 unit tests covering
known vector, prefix, hex casing, and different secrets/bodies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add withRuleSnapshot(Map) wither to AlertInstance (same pattern as other withers)
- Call snapshotRule(rule) + withRuleSnapshot in both applyResult (single-firing) and
applyBatchFiring paths so every persisted instance carries a non-empty JSONB snapshot
- Strip null values from the Jackson-serialized map before wrapping in the immutable
snapshot so Map.copyOf in the compact ctor does not throw NPE on nullable rule fields
- Add ruleSnapshotIsPersistedOnInstanceCreation IT: asserts name/severity/conditionKind
appear in the rule_snapshot column after a tick fires an instance
- Add historySurvivesRuleDelete IT: fires an instance, deletes the rule, asserts
rule_id IS NULL and rule_snapshot still contains the rule name (spec §5 guarantee)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AlertEvaluatorJob implements SchedulingConfigurer; fixed-delay tick from
AlertingProperties.effectiveEvaluatorTickIntervalMs (5 s floor)
- Claim-polling via AlertRuleRepository.claimDueRules (FOR UPDATE SKIP LOCKED)
- Per-kind circuit breaker guards each evaluator; failures recorded, open kinds
skipped and rescheduled without evaluation
- Single-Firing path delegates to AlertStateTransitions; new FIRING instances
enqueue AlertNotification rows per rule.webhooks()
- Batch (PER_EXCHANGE) path creates one FIRING AlertInstance per Firing entry
- PENDING→FIRING promotion handled in applyResult via state machine
- Title/message rendered via MustacheRenderer + NotificationContextBuilder;
environment resolved from EnvironmentRepository.findById per tick
- AlertEvaluatorJobIT (4 tests): uses named @MockBean replacements for
ClickHouseSearchIndex + ClickHouseLogStore; @MockBean AgentRegistryService
drives Clear/Firing/resolve cycle without timing sensitivity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PER_EXCHANGE returns EvalResult.Batch(List<Firing>); last Firing carries
_nextCursor (Instant) in its context map for the job to persist as
evalState.lastExchangeTs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>