UI: AlertStateChip.LABELS and .COLORS no longer include ACKNOWLEDGED
(dropped in V17). AlertStateChip.test.tsx test-cases trimmed to the
three remaining states. LayoutShell CMD-K now searches FIRING alerts
with acked=false (was state=[FIRING,ACKNOWLEDGED]).
Test: V17MigrationIT.open_rule_index_predicate_is_reworked replaced
with a structural-only assertion (index exists, indisunique). The
pg_get_indexdef pretty-printer varies across Postgres versions, so
predicate semantics are verified behaviorally in
PostgresAlertInstanceRepositoryIT (findOpenForRule_* +
save_rejectsSecondOpenInstanceForSameRuleAndExchange).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- new AlertInstanceRepository.filterInEnvLive(ids, env): single-query bulk ID validation
- AlertController.inEnvLiveIds now one SQL round-trip instead of N
- bulkMarkRead SQL: defense-in-depth AND deleted_at IS NULL
- bulkAck SQL already had deleted_at IS NULL guard — no change needed
- PostgresAlertInstanceRepositoryIT: add filterInEnvLive_excludes_other_env_and_soft_deleted
- V12MigrationIT: remove alert_reads assertion (table dropped by V17)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GET /alerts gains tri-state acked + read query params
- new endpoints: DELETE /{id} (soft-delete), POST /bulk-delete, POST /bulk-ack, POST /{id}/restore
- requireLiveInstance 404s on soft-deleted rows; restore() reads the row regardless
- BulkReadRequest → BulkIdsRequest (shared body for bulk read/ack/delete)
- AlertDto gains readAt; deletedAt stays off the wire
- InAppInboxQuery.listInbox threads acked/read through to the repo (7-arg, no more null placeholders)
- SecurityConfig: new matchers for bulk-ack (VIEWER+), DELETE/bulk-delete/restore (OPERATOR+)
- AlertControllerIT: persistence assertions on /read + /bulk-read; full coverage for new endpoints
- InAppInboxQueryTest: updated to 7-arg listInbox signature
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The class under test was removed in da281933; the IT became a @Disabled
placeholder. Deleting per no-backwards-compat policy. Read mutation
coverage lives in PostgresAlertInstanceRepositoryIT going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove SET state='ACKNOWLEDGED' from ack() and the ACKNOWLEDGED predicate
from findOpenForRule — both would error after V17. The final ack() + open-rule
semantics (idempotent guards, deleted_at) are owned by Task 5; this is just
the minimum to stop runtime SQL errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- AlertStateTransitionsTest: add null,null for readAt/deletedAt in openInstance helper;
replace firingWhenAcknowledgedIsNoOp with firing_with_ack_stays_firing_on_next_firing_tick;
convert ackedInstanceClearsToResolved to use FIRING+withAck; update section comment.
- PostgresAlertInstanceRepository: stub null,null for readAt/deletedAt in rowMapper
to unblock compilation (Task 4 will read the actual DB columns).
- All other alerting test files: add null,null for readAt/deletedAt to AlertInstance
ctor calls so the test source tree compiles; stub ACKNOWLEDGED JSON/state assertions
with FIRING + TODO Task 4 comments.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allows alert rules to fire on agent-lifecycle events — REGISTERED,
RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather
than only on current state. Each matching `(agent, eventType, timestamp)`
becomes its own ackable AlertInstance, so outages on distinct agents are
independently routable.
Core:
- New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record
(scope, eventTypes, withinSeconds). Compact ctor rejects empty
eventTypes and withinSeconds<1.
- Strict allowlist enum `AgentLifecycleEventType` (six entries matching
the server-emitted types in `AgentRegistrationController` and
`AgentLifecycleMonitor`). Custom agent-emitted event types tracked in
backlog issue #145.
- `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes,
from, to, limit)` — new read path ordered `(timestamp ASC, insert_id
ASC)` used by the evaluator. Implemented on
`ClickHouseAgentEventRepository` with tenant + env filter mandatory.
App:
- `AgentLifecycleEvaluator` queries events in the last `withinSeconds`
window and returns `EvalResult.Batch` with one `Firing` per row.
Every Firing carries a canonical `_subjectFingerprint` of
`"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event`
subtrees for Mustache templating.
- `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that
exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}`.
- Validation is delegated to the record compact ctor + enum at Jackson
deserialization time — matches the existing policy of keeping
controller validators focused on env-scoped / SQL-injection concerns.
Schema:
- V16 migration generalises the V15 per-exchange discriminator on
`alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a
fallback to the legacy `exchange.id` expression. Scalar kinds still
resolve to `''` and keep one-open-per-rule. Duplicate-key path in
`PostgresAlertInstanceRepository.save` is unchanged — the index is
the deduper.
UI:
- New `AgentLifecycleForm.tsx` wizard form with multi-select chips for
the six allowed event types + `withinSeconds` input. Wired into
`ConditionStep`, `form-state` (validation + defaults: WENT_DEAD,
300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the
new option array.
- `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind,
and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`.
Tests (all passing):
- 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive +
empty/zero/unknown-type rejection).
- 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty
window, multi-agent fingerprint shape, scope forwarding, missing env).
- `NotificationContextBuilderTest` switch now covers the new kind.
- 119 alerting unit tests + 71 UI tests green.
Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
Backend: `GET /environments/{envSlug}/alerts` now accepts optional multi-value
`state=…` and `severity=…` query params. Filters are pushed down to
PostgresAlertInstanceRepository, which appends `AND state::text = ANY(?)` /
`AND severity::text = ANY(?)` to the inbox query (null/empty = no filter).
`AlertInstanceRepository.listForInbox` gained a 7-arg overload; the old 5-arg
form is preserved as a default delegate so existing callers (evaluator,
AlertingFullLifecycleIT, PostgresAlertInstanceRepositoryIT) compile unchanged.
`InAppInboxQuery.listInbox` also has a new filtered overload.
UI: InboxPage severity filter migrated from `SegmentedTabs` (single-select,
no color cues) to `ButtonGroup` (multi-select with severity-coloured dots),
matching the topnavbar status-filter pattern. `useAlerts` forwards the
filters as query params and cache-keys on the filter tuple so each combo
is independently cached.
Unit + hook tests updated to the new contract (5 UI tests + 8 Java unit
tests passing). OpenAPI types regenerated from the fresh local backend.
V13 added a partial unique index on alert_instances(rule_id) WHERE state
IN (PENDING,FIRING,ACKNOWLEDGED). Correct for scalar condition kinds
(ROUTE_METRIC / AGENT_STATE / DEPLOYMENT_STATE / LOG_PATTERN / JVM_METRIC
/ EXCHANGE_MATCH in COUNT_IN_WINDOW) but wrong for EXCHANGE_MATCH /
PER_EXCHANGE, which by design emits one alert_instance per matching
exchange. Under V13 every PER_EXCHANGE tick with >1 match logged
"Skipped duplicate open alert_instance for rule …" at evaluator cadence
and silently lost alert fidelity — only the first matching exchange per
tick got an AlertInstance + webhook dispatch.
V15 drops the rule_id-only constraint and recreates it with a
discriminator on context->'exchange'->>'id'. Scalar kinds emit
Map.of() as context, so their expression resolves to '' — "one open per
rule" preserved. ExchangeMatchEvaluator.evaluatePerExchange always
populates exchange.id, so per-exchange instances coexist cleanly.
Two new PostgresAlertInstanceRepositoryIT tests:
- multiple open instances for same rule + distinct exchanges all land
- second open for identical (rule, exchange) still dedups via the
DuplicateKeyException fallback in save() — defense-in-depth kept
Also fixes pre-existing PostgresAlertReadRepositoryIT brokenness: its
setup() inserted 3 open instances sharing one rule_id, which V13 blocked
on arrival. Migrate to one rule_id per instance (pattern already used
across other storage ITs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec §13 calls for the notification bell to colour-code by highest
unread severity (CRITICAL → error, WARNING → amber, INFO → muted).
The old { count } DTO forced the UI to pick one static colour, so
NotificationBell shipped with a TODO. Grow the contract instead:
UnreadCountResponse = { total, bySeverity: { CRITICAL, WARNING, INFO } }
Guarantees:
- every severity is always present with a >=0 value (no undefined
keys on the wire), so the UI can branch without defaults.
- total = sum of bySeverity values — kept explicit on the wire for
cheap top-line display, not recomputed client-side.
Backend
- AlertInstanceRepository: replaces countUnreadForUser(long) with
countUnreadBySeverityForUser returning Map<AlertSeverity, Long>.
One SQL round-trip per (env, user) — GROUP BY ai.severity over the
same NOT EXISTS(alert_reads) filter.
- UnreadCountResponse.from(Map) normalises and defensively copies;
missing severities default to 0.
- InAppInboxQuery.countUnread now returns the DTO, caches the full
response (still 5s TTL) so severity breakdown gets the same
hit-rate as the total did before.
- AlertController just hands the DTO back.
Breaking change — no backwards-compat shim: the `count` field is
gone. UI and tests updated in the same commit; there are no other
API consumers in the tree.
Frontend
- Regenerated openapi.json + schema.d.ts against a fresh build of
the new backend.
- NotificationBell branches badge colour on the highest unread
severity (CRITICAL > WARNING > INFO) via new CSS variants.
- Tests cover all four paths: zero, critical-present, warning-only,
info-only.
Tests: 7 unit tests + 12 ITs (incl. new grouping + empty-map)
+ 49 vitest (was 46; +3 severity-branch assertions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prometheus scrapes can fire every few seconds. The open-alerts / open-rules
gauges query Postgres on each read — caching the values for 30s amortises
that to one query per half-minute. Addresses final-review NIT from Plan 02.
- Introduces a package-private TtlCache that wraps a Supplier<Long> and
memoises the last read for a configurable Duration against a Supplier<Instant>
clock.
- Wraps each gauge supplier (alerting_rules_total{enabled|disabled},
alerting_instances_total{state}) in its own TtlCache.
- Adds a test-friendly constructor (package-private) taking explicit
Duration + Supplier<Instant> so AlertingMetricsCachingTest can advance
a fake clock without waiting wall-clock time.
- Adds AlertingMetricsCachingTest covering:
* supplier invoked once per TTL across repeated scrapes
* 29 s elapsed → still cached; 31 s elapsed → re-queried
* gauge value reflects the cached result even after delegate mutates
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rejects webhook URLs that resolve to loopback, link-local, or RFC-1918
private ranges (IPv4 + IPv6 ULA fc00::/7). Enforced on both create and
update in OutboundConnectionServiceImpl before persistence; returns 400
Bad Request with "private or loopback" in the body.
Bypass via `cameleer.server.outbound-http.allow-private-targets=true`
for dev environments where webhooks legitimately point at local
services. Production default is `false`.
Test profile sets the flag to `true` in application-test.yml so the
existing ITs that post webhooks to WireMock on https://localhost:PORT
keep working. A dedicated OutboundConnectionSsrfIT overrides the flag
back to false (via @TestPropertySource + @DirtiesContext) to exercise
the reject path end-to-end through the admin controller.
Plan 01 scope; required before SaaS exposure (spec §17).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #141. AbstractPostgresIT centrally declared three @MockBean
fields (clickHouseSearchIndex, clickHouseLogStore, agentRegistryService),
which meant EVERY IT ran against mocks instead of the real Spring context.
That masked the production crashloop — the real bean graph was never
exercised by CI.
- Remove the three @MockBean fields from AbstractPostgresIT.
- Move @MockBean declarations onto only the specific ITs that stub
method behavior (verified by grepping for when/verify calls).
- ITs that don't stub CH behavior now inject the real beans.
- Add SpringContextSmokeIT — @SpringBootTest with no mocks, void
contextLoads(). Fails fast on declared-type / autowire-type
mismatches like the one #141 fixed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rule creation now goes through POST /alerts/rules (exercises saveTargets on the
write path). Clock is replaced with @MockBean(name="alertingClock") and re-stubbed
in @BeforeEach to survive Mockito's inter-test reset. Six ordered steps:
1. seed log → tick evaluator → assert FIRING instance with non-empty targets (B-1)
2. tick dispatcher → assert DELIVERED notification + lastNotifiedAt stamped (B-2)
3. ack via REST → assert ACKNOWLEDGED state
4. create silence → inject PENDING notification → tick dispatcher → assert silenced (FAILED)
5. delete rule → assert rule_id nullified, rule_snapshot preserved (ON DELETE SET NULL)
6. new rule with reNotifyMinutes=1 → first dispatch → advance clock 61s →
evaluator sweep → second dispatch → verify 2 WireMock POSTs (B-2 cadence)
Background scheduler races addressed by resetting claimed_by/claimed_until before
each manual tick. Simulated clock set AFTER log insert to guarantee log timestamp
falls within the evaluator window. Re-notify notifications backdated in Postgres
to work around the simulated vs real clock gap in claimDueNotifications.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlertNotificationRepository gains resetForRetry(UUID, Instant) which sets
attempts=0, status=PENDING, next_attempt_at=now, and clears claim/response
fields. AlertNotificationController calls resetForRetry instead of
scheduleRetry so a manual retry always starts from a clean slate.
AlertNotificationControllerIT adds retryResetsAttemptsToZero to verify
attempts==0 and status==PENDING after three prior markFailed calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlertInstanceRepository gains listFiringDueForReNotify(Instant) — only returns
instances where last_notified_at IS NOT NULL and cadence has elapsed (IS NULL
branch excluded: sweep only re-notifies, initial notify is the dispatcher's job).
AlertEvaluatorJob.sweepReNotify() runs at the end of each tick, enqueues fresh
notifications for eligible instances and stamps last_notified_at.
NotificationDispatchJob stamps last_notified_at on the alert_instance when a
notification is DELIVERED, providing the anchor timestamp for cadence checks.
PostgresAlertInstanceRepositoryIT adds listFiringDueForReNotify test covering
the three-rule eligibility matrix (never-notified, long-ago, recent).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
saveTargets() is called unconditionally at the end of save() — it deletes
existing targets and re-inserts from the current targets list. findById()
and listByEnvironment() already call withTargets() so reads are consistent.
PostgresAlertRuleRepositoryIT adds saveTargets_roundtrip and
saveTargets_updateReplacesExistingTargets to cover the new write path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Investigated three approaches for CH 24.12:
- Inline SETTINGS on ADD PROJECTION: rejected (UNKNOWN_SETTING — not a query-level setting).
- ALTER TABLE MODIFY SETTING deduplicate_merge_projection_mode='rebuild': works; persists in
table metadata across connection restarts; runs before ADD PROJECTION in the SQL script.
- Session-level JDBC URL param: not pursued (MODIFY SETTING is strictly better).
alerting_projections.sql now runs MODIFY SETTING before the two executions ADD PROJECTIONs.
AlertingProjectionsIT strengthened to assert all four projections (including alerting_app_status
and alerting_route_status on executions) exist after schema init.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AbstractPostgresIT gained clickHouseSearchIndex and agentRegistryService mocks in Phase 9.
All 14 alerting IT subclasses that re-declared the same @MockBean fields now fail with
"Duplicate mock definition". Removed the redundant declarations; per-class clickHouseLogStore
mock kept where needed. 120 alerting tests now pass (0 failures).
Also adds docs/alerting-02-verification.md (Task 43).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Nightly @Scheduled(03:00) job deletes RESOLVED alert_instances older
than eventRetentionDays and DELIVERED/FAILED alert_notifications older
than notificationRetentionDays. Uses injected Clock for testability.
IT covers: old-resolved deleted, fresh-resolved kept, FIRING kept
regardless of age, PENDING notification never deleted.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- GET /environments/{envSlug}/alerts/{alertId}/notifications — list notifications for instance (VIEWER+)
- POST /alerts/notifications/{id}/retry — manual retry of failed notification (OPERATOR+)
Flat path because notification IDs are globally unique (no env routing needed)
- scheduleRetry resets attempts to 0 and sets nextAttemptAt = now
- Added 11 alerting path matchers to SecurityConfig before outbound-connections block
- Fixed context loading failure in 6 pre-existing alerting storage/migration ITs by adding
@MockBean(clickHouseSearchIndex/clickHouseLogStore): ExchangeMatchEvaluator and
LogPatternEvaluator inject the concrete classes directly (not interface beans), so the
full Spring context fails without these mocks in tests that don't use the real CH container
- 5 IT tests: list, viewer-can-list, retry, viewer-cannot-retry, unknown-404
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- POST/GET/DELETE /environments/{envSlug}/alerts/silences
- 422 when endsAt <= startsAt ("endsAt must be after startsAt")
- OPERATOR+ for create/delete, VIEWER+ for list
- Audit: ALERT_SILENCE_CREATE/DELETE with AuditCategory.ALERT_SILENCE_CHANGE
- 6 IT tests: create, viewer-list, viewer-cannot-create, bad time-range, delete, audit event
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GET /environments/{envSlug}/alerts — inbox filtered by userId/groupIds/roleNames via InAppInboxQuery
- GET /unread-count — memoized unread count (5s TTL)
- GET /{id}, POST /{id}/ack, POST /{id}/read, POST /bulk-read
- bulkRead filters instanceIds to env before delegating to AlertReadRepository
- VIEWER+ for all endpoints; env isolation enforced by requireInstance
- 7 IT tests: list, env isolation, unread-count, ack flow, read, bulk-read, viewer access
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- POST/GET/PUT/DELETE /environments/{envSlug}/alerts/rules CRUD
- POST /{id}/enable, /{id}/disable, /{id}/render-preview, /{id}/test-evaluate
- Attribute-key validation: rejects keys not matching ^[a-zA-Z0-9._-]+$ at rule-save time
(CRITICAL: ExchangeMatchCondition attribute keys are inlined into ClickHouse SQL)
- Webhook validation: verifies outboundConnectionId exists and is allowed in env
- Null-safe notification template defaults to "" for NOT NULL DB constraint
- Fixed misleading comment in ClickHouseSearchIndex to document validation contract
- OPERATOR+ for mutations, VIEWER+ for reads
- Audit: ALERT_RULE_CREATE/UPDATE/DELETE/ENABLE/DISABLE with AuditCategory.ALERT_RULE_CHANGE
- 11 IT tests covering RBAC, SQL-injection prevention, enable/disable, audit, render-preview
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
listInbox resolves user groups+roles via RbacService.getEffectiveGroupsForUser
/ getEffectiveRolesForUser then delegates to AlertInstanceRepository.
countUnread memoized per (envId, userId) with 5s TTL via ConcurrentHashMap
using a controllable Clock. 6 unit tests covering delegation, cache hit,
TTL expiry, and isolation between users/envs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renders URL/headers/body with Mustache, optionally HMAC-signs the body
(X-Cameleer-Signature), supports POST/PUT/PATCH, classifies 2xx/4xx/5xx
into DELIVERED/FAILED/retry. 8 WireMock-backed IT tests including HTTPS
TRUST_ALL against WireMock self-signed cert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
HmacSHA256 signer returning sha256=<lowercase-hex>. 5 unit tests covering
known vector, prefix, hex casing, and different secrets/bodies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add withRuleSnapshot(Map) wither to AlertInstance (same pattern as other withers)
- Call snapshotRule(rule) + withRuleSnapshot in both applyResult (single-firing) and
applyBatchFiring paths so every persisted instance carries a non-empty JSONB snapshot
- Strip null values from the Jackson-serialized map before wrapping in the immutable
snapshot so Map.copyOf in the compact ctor does not throw NPE on nullable rule fields
- Add ruleSnapshotIsPersistedOnInstanceCreation IT: asserts name/severity/conditionKind
appear in the rule_snapshot column after a tick fires an instance
- Add historySurvivesRuleDelete IT: fires an instance, deletes the rule, asserts
rule_id IS NULL and rule_snapshot still contains the rule name (spec §5 guarantee)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AlertEvaluatorJob implements SchedulingConfigurer; fixed-delay tick from
AlertingProperties.effectiveEvaluatorTickIntervalMs (5 s floor)
- Claim-polling via AlertRuleRepository.claimDueRules (FOR UPDATE SKIP LOCKED)
- Per-kind circuit breaker guards each evaluator; failures recorded, open kinds
skipped and rescheduled without evaluation
- Single-Firing path delegates to AlertStateTransitions; new FIRING instances
enqueue AlertNotification rows per rule.webhooks()
- Batch (PER_EXCHANGE) path creates one FIRING AlertInstance per Firing entry
- PENDING→FIRING promotion handled in applyResult via state machine
- Title/message rendered via MustacheRenderer + NotificationContextBuilder;
environment resolved from EnvironmentRepository.findById per tick
- AlertEvaluatorJobIT (4 tests): uses named @MockBean replacements for
ClickHouseSearchIndex + ClickHouseLogStore; @MockBean AgentRegistryService
drives Clear/Firing/resolve cycle without timing sensitivity
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PER_EXCHANGE returns EvalResult.Batch(List<Firing>); last Firing carries
_nextCursor (Instant) in its context map for the job to persist as
evalState.lastExchangeTs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SilenceMatcherService.matches() evaluates AND semantics across ruleId,
severity, appSlug, routeId, agentId constraints. Null fields are wildcards.
Scope-based constraints (appSlug/routeId/agentId) return false when rule is
null (deleted rule — scope cannot be verified). 17 unit tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sentinel-substitution approach: unresolved {{x.y.z}} tokens are replaced
with a unique NUL-delimited sentinel before Mustache compilation, rendered
as opaque text, then post-replaced with the original {{x.y.z}} literal.
Malformed templates (unclosed {{) are caught and return the raw template.
Never throws. 9 unit tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds alerting_projections.sql with four projections (alerting_app_status,
alerting_route_status on executions; alerting_app_level on logs;
alerting_instance_metric on agent_metrics). ClickHouseSchemaInitializer now
runs both init.sql and alerting_projections.sql, with ADD PROJECTION and
MATERIALIZE treated as non-fatal — executions (ReplacingMergeTree) requires
deduplicate_merge_projection_mode=rebuild which is unavailable via JDBC pool.
MergeTree projections (logs, agent_metrics) always succeed and are asserted in IT.
Column names confirmed from init.sql: logs uses 'application' (not application_id),
agent_metrics uses 'collected_at' (not timestamp). All column names match the plan.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds AlertMatchSpec record (core) and ClickHouseSearchIndex.countExecutionsForAlerting —
no FINAL, no text subqueries. Filters by tenant, env, app, route, status, time window,
and optional after-cursor. Attributes (JSON string column) use inlined JSONExtractString
key literals since ClickHouse JDBC does not bind ? placeholders inside JSON functions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds countLogs(LogSearchRequest) — no FINAL, no cursor/sort/limit —
reusing the same WHERE-clause logic as search() for tenant, env, app,
level, q, logger, source, exchangeId, and time-range filters.
Also extends ClickHouseTestHelper with executeInitSqlWithProjections()
and makes the script runner non-fatal for ADD/MATERIALIZE PROJECTION.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>