feat(alerts): DS alignment + AGENT_LIFECYCLE + single-inbox redesign #146

Merged
hsiegeln merged 49 commits from feat/alerts-ds-alignment into main 2026-04-21 19:53:12 +02:00
Owner

Summary

Three layers of work, landed together because they all rework the same alerts area:

  1. Alerts UI brought onto the design system. Inbox / All / History / Rules / Silences pages rewritten to use DataTable, ButtonGroup, Toggle, Dropdown, ConfirmDialog, DateRangePicker, and shared page-header + expanded-row helpers. Severity badges and state chips standardised. Wizard banners → DS Alert; removed local CSS var leaks.

  2. New AGENT_LIFECYCLE alert condition (backend + UI). Six-entry allowlist enum (REGISTERED / RE_REGISTERED / DEREGISTERED / WENT_STALE / WENT_DEAD / RECOVERED) with per-subject fire-mode — one alert_instance per (agent, eventType, timestamp) via a _subjectFingerprint in the evaluator firing context. Includes AgentEventRepository.findInWindow, AgentLifecycleEvaluator, rule-editor condition form, and Mustache variable registry updates. V15 (exchange id) and V16 (generalised fingerprint) migrations back it.

  3. Alerts inbox redesign — the big one. Single filterable inbox replaces the Inbox / All / History split. Ack, read, and delete become global timestamp flags on alert_instances (no more per-user alert_reads table). Driven by a spec + 16-task plan in docs/superpowers/specs/ and docs/superpowers/plans/.

What the inbox redesign changes

Data model (V17 migration):

  • Drop ACKNOWLEDGED from alert_state_enum (ack is now orthogonal — acked_at timestamp only).
  • Add read_at + deleted_at columns to alert_instances.
  • Drop the alert_reads table.
  • Rework the V13/V15/V16 open-rule unique index predicate to state IN ('PENDING','FIRING') AND deleted_at IS NULL so ack doesn't close the slot and soft-delete frees it.

Backend endpoints (AlertController under /api/v1/environments/{envSlug}/alerts):

  • GET / gains tri-state acked + read query params; always excludes soft-deleted rows.
  • POST /{id}/ack now sets acked_at only; no state change.
  • POST /{id}/read + POST /bulk-read write to alert_instances.read_at (no more alert_reads join).
  • New: POST /bulk-ack (VIEWER+), DELETE /{id} (OPERATOR+, soft-delete), POST /bulk-delete (OPERATOR+), POST /{id}/restore (OPERATOR+, undo).
  • New AlertInstanceRepository.filterInEnvLive(ids, envId) collapses the prior N+1 findById loop to one SQL round-trip.
  • AlertDto gains readAt; deletedAt stays off the wire.

UI:

  • Single /alerts/inbox page with four filter dimensions (Severity, Status, Hide acked, Hide read — defaults: FIRING + hide-acked + hide-read). Row actions: Ack · Mark read · Silence rule… · Delete. Bulk toolbar with confirmation modal for delete.
  • SilenceRuleMenu component — DS Dropdown with 1h / 8h / 24h / Custom presets; Custom navigates to /alerts/silences?ruleId=… and prefills the form.
  • Soft-delete has a 5-second undo toast wired through useRestoreAlert.
  • AllAlertsPage + HistoryPage removed (status filter in the inbox covers those cases). Sidebar trims to Inbox · Rules · Silences. Stale /alerts/all / /alerts/history 404 per clean-break policy.
  • AlertStateChip + CMD-K deep-link-search updated for the three-state enum.

Docs: HOWTO.md rewritten to cover a brand-new-environment walkthrough via docker-compose.yml (full stack — PG + ClickHouse + server + UI). .claude/rules/app-classes.md, .claude/rules/ui.md, and CLAUDE.md updated for the new surface.

Test plan

Backend:

  • mvn clean verify — confirm BUILD SUCCESS.
  • mvn -pl cameleer-server-app -Dtest='*Alert*,V17MigrationIT,V12MigrationIT' test — expect 122/0 green. Covers V17MigrationIT (enum drop + column add + table drop + index predicate), PostgresAlertInstanceRepositoryIT (23 tests including filterInEnvLive_excludes_other_env_and_soft_deleted, bulkMarkRead_respects_deleted_at, ack_setsAckedAtAndLeavesStateFiring, findOpenForRule_skips_soft_deleted, restore_clears_deleted_at), AlertControllerIT (15 tests including read_is_global_other_users_see_readAt_set, delete_non_operator_returns_403, bulkDelete_only_affects_matching_env), AlertStateTransitionsTest, AlertingFullLifecycleIT, AlertingEnvIsolationIT, AgentLifecycleEvaluatorTest.
  • Verify V17 migration applies cleanly on a populated DB: seed an ACKNOWLEDGED row pre-migration, confirm post-migration state='FIRING' AND acked_at IS NOT NULL.

UI:

  • cd ui && npm run build — TS strict + Vite bundle green.
  • cd ui && npx vitest run — 76/76 tests green (includes new InboxPage.test.tsx: default filters, Hide-acked toggle, Ack visibility, bulk-delete dialog count, role-gated delete, undo toast).

Manual smoke (via docker compose up -d --build):

  • /alerts/inbox default view shows only unread-firing alerts (FIRING + hide-acked + hide-read).
  • Toggle "Hide acked" OFF — acked rows appear; Ack button is hidden on those rows.
  • Click "Silence rule… → 1 hour" on a row — success toast, new silence visible on /alerts/silences.
  • Click "Silence rule… → Custom…" — lands on /alerts/silences?ruleId=<id> with the Rule ID prefilled.
  • As OPERATOR, click Delete on a row — row disappears, Undo toast appears. Click Undo within 5s — row reappears.
  • As VIEWER, confirm Delete buttons are absent (both row and bulk).
  • Select 2+ rows → click "Delete N" → modal says "Delete N alerts? This affects all users."
  • One user marks an alert read; a second user sees the same alert with readAt set.
  • Bell badge in the top nav decreases after marking rows as read (5-second memo window).
  • Sidebar shows only Inbox · Rules · Silences. /alerts/all and /alerts/history return 404.
  • Create an AGENT_LIFECYCLE rule in the wizard; trigger a stale/dead event; one alert per agent+event fires.

🤖 Generated with Claude Code

## Summary Three layers of work, landed together because they all rework the same alerts area: 1. **Alerts UI brought onto the design system.** Inbox / All / History / Rules / Silences pages rewritten to use `DataTable`, `ButtonGroup`, `Toggle`, `Dropdown`, `ConfirmDialog`, `DateRangePicker`, and shared page-header + expanded-row helpers. Severity badges and state chips standardised. Wizard banners → DS `Alert`; removed local CSS var leaks. 2. **New `AGENT_LIFECYCLE` alert condition** (backend + UI). Six-entry allowlist enum (`REGISTERED / RE_REGISTERED / DEREGISTERED / WENT_STALE / WENT_DEAD / RECOVERED`) with per-subject fire-mode — one `alert_instance` per `(agent, eventType, timestamp)` via a `_subjectFingerprint` in the evaluator firing context. Includes `AgentEventRepository.findInWindow`, `AgentLifecycleEvaluator`, rule-editor condition form, and Mustache variable registry updates. V15 (exchange id) and V16 (generalised fingerprint) migrations back it. 3. **Alerts inbox redesign** — the big one. Single filterable inbox replaces the Inbox / All / History split. Ack, read, and delete become global timestamp flags on `alert_instances` (no more per-user `alert_reads` table). Driven by a spec + 16-task plan in `docs/superpowers/specs/` and `docs/superpowers/plans/`. ### What the inbox redesign changes **Data model (V17 migration):** - Drop `ACKNOWLEDGED` from `alert_state_enum` (ack is now orthogonal — `acked_at` timestamp only). - Add `read_at` + `deleted_at` columns to `alert_instances`. - Drop the `alert_reads` table. - Rework the V13/V15/V16 open-rule unique index predicate to `state IN ('PENDING','FIRING') AND deleted_at IS NULL` so ack doesn't close the slot and soft-delete frees it. **Backend endpoints** (`AlertController` under `/api/v1/environments/{envSlug}/alerts`): - `GET /` gains tri-state `acked` + `read` query params; always excludes soft-deleted rows. - `POST /{id}/ack` now sets `acked_at` only; no state change. - `POST /{id}/read` + `POST /bulk-read` write to `alert_instances.read_at` (no more `alert_reads` join). - **New:** `POST /bulk-ack` (VIEWER+), `DELETE /{id}` (OPERATOR+, soft-delete), `POST /bulk-delete` (OPERATOR+), `POST /{id}/restore` (OPERATOR+, undo). - New `AlertInstanceRepository.filterInEnvLive(ids, envId)` collapses the prior N+1 `findById` loop to one SQL round-trip. - `AlertDto` gains `readAt`; `deletedAt` stays off the wire. **UI:** - Single `/alerts/inbox` page with four filter dimensions (Severity, Status, Hide acked, Hide read — defaults: FIRING + hide-acked + hide-read). Row actions: Ack · Mark read · Silence rule… · Delete. Bulk toolbar with confirmation modal for delete. - `SilenceRuleMenu` component — DS `Dropdown` with 1h / 8h / 24h / Custom presets; Custom navigates to `/alerts/silences?ruleId=…` and prefills the form. - Soft-delete has a 5-second undo toast wired through `useRestoreAlert`. - `AllAlertsPage` + `HistoryPage` removed (status filter in the inbox covers those cases). Sidebar trims to Inbox · Rules · Silences. Stale `/alerts/all` / `/alerts/history` 404 per clean-break policy. - `AlertStateChip` + CMD-K deep-link-search updated for the three-state enum. **Docs:** `HOWTO.md` rewritten to cover a brand-new-environment walkthrough via `docker-compose.yml` (full stack — PG + ClickHouse + server + UI). `.claude/rules/app-classes.md`, `.claude/rules/ui.md`, and `CLAUDE.md` updated for the new surface. ## Test plan Backend: - [ ] `mvn clean verify` — confirm BUILD SUCCESS. - [ ] `mvn -pl cameleer-server-app -Dtest='*Alert*,V17MigrationIT,V12MigrationIT' test` — expect 122/0 green. Covers `V17MigrationIT` (enum drop + column add + table drop + index predicate), `PostgresAlertInstanceRepositoryIT` (23 tests including `filterInEnvLive_excludes_other_env_and_soft_deleted`, `bulkMarkRead_respects_deleted_at`, `ack_setsAckedAtAndLeavesStateFiring`, `findOpenForRule_skips_soft_deleted`, `restore_clears_deleted_at`), `AlertControllerIT` (15 tests including `read_is_global_other_users_see_readAt_set`, `delete_non_operator_returns_403`, `bulkDelete_only_affects_matching_env`), `AlertStateTransitionsTest`, `AlertingFullLifecycleIT`, `AlertingEnvIsolationIT`, `AgentLifecycleEvaluatorTest`. - [ ] Verify V17 migration applies cleanly on a populated DB: seed an `ACKNOWLEDGED` row pre-migration, confirm post-migration `state='FIRING' AND acked_at IS NOT NULL`. UI: - [ ] `cd ui && npm run build` — TS strict + Vite bundle green. - [ ] `cd ui && npx vitest run` — 76/76 tests green (includes new `InboxPage.test.tsx`: default filters, Hide-acked toggle, Ack visibility, bulk-delete dialog count, role-gated delete, undo toast). Manual smoke (via `docker compose up -d --build`): - [ ] `/alerts/inbox` default view shows only unread-firing alerts (FIRING + hide-acked + hide-read). - [ ] Toggle "Hide acked" OFF — acked rows appear; Ack button is hidden on those rows. - [ ] Click "Silence rule… → 1 hour" on a row — success toast, new silence visible on `/alerts/silences`. - [ ] Click "Silence rule… → Custom…" — lands on `/alerts/silences?ruleId=<id>` with the Rule ID prefilled. - [ ] As OPERATOR, click Delete on a row — row disappears, Undo toast appears. Click Undo within 5s — row reappears. - [ ] As VIEWER, confirm Delete buttons are absent (both row and bulk). - [ ] Select 2+ rows → click "Delete N" → modal says "Delete N alerts? This affects all users." - [ ] One user marks an alert read; a second user sees the same alert with `readAt` set. - [ ] Bell badge in the top nav decreases after marking rows as read (5-second memo window). - [ ] Sidebar shows only Inbox · Rules · Silences. `/alerts/all` and `/alerts/history` return 404. - [ ] Create an `AGENT_LIFECYCLE` rule in the wizard; trigger a stale/dead event; one alert per agent+event fires. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claude added 49 commits 2026-04-21 19:48:08 +02:00
Rework all pages under /alerts to use @cameleer/design-system components
and tokens. Unified DataTable shell for Inbox/All/History with expandable
rows; DataTable + Dropdown + ConfirmDialog for Rules list; FormField grid
+ DataTable for Silences; DS Alert for wizard banners. Replaces undefined
CSS variables (--bg, --fg, --muted, --accent) with DS tokens and removes
raw <table>/<select>/confirm() usage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-by-task TDD plan implementing the design spec. Splits the work
into 14 tasks: helper utilities (TDD), shared renderer, CSS token
migration, per-page rewrites (Inbox/All/History/Rules/Silences),
wizard banner migration, AlertRow deletion, E2E adaptation for
ConfirmDialog, and full verification pass. Each task produces an
atomic commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure function mapping the 3-value AlertDto.severity enum to the 2-value
DataTable rowAccent prop. INFO maps to undefined (no tint) because the
DS DataTable rowAccent only supports error|warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Formats ISO timestamps as `Nm ago` / `Nh ago` / `Nd ago`, falling back
to an absolute locale date string for values older than 30 days. Used
by the alert DataTable Age column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the per-row detail block used by Inbox/All/History DataTables
so the three pages share one rendering. Consumes AlertDto fields that
are nullable in the schema; hides missing fields instead of rendering
placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the feed-row classes (.row, .rowUnread, .body, .meta, .time,
.message, .actions, .empty) — these are replaced by DS DataTable +
EmptyState in follow-up tasks. Keep layout helpers for page shell,
toolbar, filter bar, bulk-action bar, title cell, and DataTable
expanded content. All colors / spacing use DS tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces custom feed-row layout with the shared DataTable shell used
elsewhere in the app. Adds checkbox selection + bulk "Mark selected
read" toolbar alongside the existing "Mark all read". Uses DS
EmptyState for empty lists, severity-driven rowAccent for unread
tinting, and renderAlertExpanded for row detail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces 4-Button filter row with DS SegmentedTabs and custom row
rendering with DataTable. Shares expandedContent renderer and
severity-driven rowAccent with Inbox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces custom feed rows with DataTable. Adds a DateRangePicker
filter (client-side) defaulting to the last 7 days. Client-side
range filter is a stopgap; a server-side range param is a future
enhancement captured in the design spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces raw <table> with DataTable, raw <select> promote control with
DS Dropdown, and native confirm() delete with ConfirmDialog. Adds DS
EmptyState with CTA for the no-rules case. Uses SectionHeader's
action slot instead of ad-hoc flex wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces raw <table> with DataTable, inline-styled form with proper
FormField hints, and native confirm() end-early with ConfirmDialog
(warning variant). Adds DS EmptyState for no-silences case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The feed-row component is replaced by DataTable column renderers and
the shared renderAlertExpanded content renderer. No callers remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace undefined tokens (--muted, --fg, --accent, --border,
--amber-bg) with DS tokens (--text-muted, --text-primary, --amber,
--border-subtle, --space-sm|md). Drop .promoteBanner — replaced by
DS Alert in follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote banner and prefill warnings now render as DS Alert components
(info / warning variants). Step body wraps in sectionStyles.section
for card affordance matching other forms in the app.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Rules list Delete and Silences End-early flows now use DS
ConfirmDialog instead of native confirm(). Update selectors to
target the dialog's role=dialog + confirm button instead of
listening for the native `dialog` event.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two inline-style color refs in NotifyStep and TriggerStep were still
pointing at the undefined --muted token instead of the DS
--text-muted. Caught by the design-system-alignment verification
grep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Visual regressions surfaced during browser smoke:

1. Page headers — `SectionHeader` renders as 12px uppercase gray (a
   section divider, not a page title). Replace with proper h2 title
   + inline subtitle (`N firing · N total` etc.) and right-aligned
   actions, styled from `alerts-page.module.css`.

2. Undefined `--space-*` tokens — the project (and `@cameleer/design-system`)
   has never shipped `--space-sm|md|lg|xl`, even though many modules
   (SensitiveKeysPage, alerts CSS, …) reference them. The fallback
   to `initial` silently collapsed gaps/paddings to 0. Define the
   scale in `ui/src/index.css` so every consumer picks it up.

3. List scrolling — DataTable was using default pagination, but with
   no flex sizing the whole page scrolled. Add `fillHeight` and raise
   `pageSize`/list `limit` to 200 so the table gets sticky header +
   internal scroll + pinned pagination footer (Gmail-style). True
   cursor-based infinite scroll needs a backend change (filed as
   follow-up — `/alerts` only accepts `limit` today).

4. Title column clipping — `.titlePreview` used `white-space: nowrap`
   + fixed `max-width`, truncating message mid-UUID. Switch to a
   2-line `-webkit-line-clamp` so full context is visible.

5. Notification bell badge invisible — `NotificationBell.module.css`
   referenced undefined tokens (`--fg`, `--hover-bg`, `--bg`,
   `--muted`). Map to real DS tokens (`--text-primary`, `--bg-hover`,
   `#fff`, `--text-muted`). The admin user currently sees no badge
   because the backend `/alerts/unread-count` returns 0 (read
   receipts) — that's data, not UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced during second smoke:

1. Notification bell moved — was first child of TopBar (left of
   breadcrumb); now rendered inside the `environment` slot so it
   sits between the env selector and the user menu, matching user
   expectations.

2. Content tabs (Exchanges/Dashboard/Runtime/Deployments) hidden on
   `/alerts/*` — the operational tabs don't apply there.

3. Inbox / All alerts filters now actually filter. `AlertController.list`
   accepts only `limit` — `state`/`severity` query params are dropped
   server-side. Move `useAlerts` to fetch once per env (limit 200) and
   apply filters client-side via react-query `select`, with a stable
   queryKey so filter toggles are instant and don't re-request. True
   server-side filter needs a backend change (follow-up).

4. Novice-friendly labels:
   - Inbox subtitle: "99 firing · 100 total" → "99 need attention ·
     100 total in inbox"
   - All alerts filter: Open/Firing/Acked/All →
     "Currently open"/"Firing now"/"Acknowledged"/"All states"
   - All alerts subtitle: "N shown" → "N matching your filter"
   - History subtitle: "N resolved" → "N resolved alert(s) in range"
   - Rules subtitle: "N total" → "N rule(s) configured"
   - Silences subtitle: "N active" → "N active silence(s)" or
     "Nothing silenced right now"
   - Column headers: "State" → "Status", rules "Kind" → "Type",
     rules "Targets" → "Notifies"
   - Buttons: "Ack" → "Acknowledge", silence "End" → "End early"

Updated alerts.test.tsx and e2e selector to match new behavior/labels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 4 smoke feedback on /alerts:
- Bell now has consistent 12px gap from env selector and user name
  (wrap env + bell in flex container inside TopBar's environment prop)
- RuleEditorWizard constrained to max-width 840px (centered) and
  upgraded the page title from SectionHeader to h2 pattern used by
  the list pages
- Inbox: added select-all checkbox, severity SegmentedTabs filter
  (All / Critical / Warning / Info), and bulk-ack actions
  (Acknowledge selected + Acknowledge all firing) alongside the
  existing mark-read actions
Backend: `GET /environments/{envSlug}/alerts` now accepts optional multi-value
`state=…` and `severity=…` query params. Filters are pushed down to
PostgresAlertInstanceRepository, which appends `AND state::text = ANY(?)` /
`AND severity::text = ANY(?)` to the inbox query (null/empty = no filter).

`AlertInstanceRepository.listForInbox` gained a 7-arg overload; the old 5-arg
form is preserved as a default delegate so existing callers (evaluator,
AlertingFullLifecycleIT, PostgresAlertInstanceRepositoryIT) compile unchanged.
`InAppInboxQuery.listInbox` also has a new filtered overload.

UI: InboxPage severity filter migrated from `SegmentedTabs` (single-select,
no color cues) to `ButtonGroup` (multi-select with severity-coloured dots),
matching the topnavbar status-filter pattern. `useAlerts` forwards the
filters as query params and cache-keys on the filter tuple so each combo
is independently cached.

Unit + hook tests updated to the new contract (5 UI tests + 8 Java unit
tests passing). OpenAPI types regenerated from the fresh local backend.
Replace the SegmentedTabs with multi-select ButtonGroup, matching the
topnavbar Completed/Warning/Failed/Running pattern. State dots use the
same palette as AlertStateChip (FIRING=error, ACKNOWLEDGED=warning,
PENDING=muted, RESOLVED=success). Default selection is the three "open"
states — Resolved is off by default and a single click surfaces closed
alerts without navigating to /history.
Inbox: replace 4 parallel outlined buttons with 2 context-aware ones.
When nothing is selected → "Acknowledge all firing" (primary) + "Mark all
read" (secondary). When rows are selected → the same slots become
"Acknowledge N" + "Mark N read" with counts inlined. Primary variant
gives the foreground action proper visual weight; secondary is the
supporting action. No more visually-identical disabled buttons cluttering
the bar.

History: drop the local DateRangePicker. The page now reads
`timeRange` from `useGlobalFilters()` so the top-bar TimeRangeDropdown
(1h / 3h / 6h / Today / 24h / 7d / custom) is the single source of
truth, consistent with every other time-scoped page in the app.
Allows alert rules to fire on agent-lifecycle events — REGISTERED,
RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather
than only on current state. Each matching `(agent, eventType, timestamp)`
becomes its own ackable AlertInstance, so outages on distinct agents are
independently routable.

Core:
- New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record
  (scope, eventTypes, withinSeconds). Compact ctor rejects empty
  eventTypes and withinSeconds<1.
- Strict allowlist enum `AgentLifecycleEventType` (six entries matching
  the server-emitted types in `AgentRegistrationController` and
  `AgentLifecycleMonitor`). Custom agent-emitted event types tracked in
  backlog issue #145.
- `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes,
  from, to, limit)` — new read path ordered `(timestamp ASC, insert_id
  ASC)` used by the evaluator. Implemented on
  `ClickHouseAgentEventRepository` with tenant + env filter mandatory.

App:
- `AgentLifecycleEvaluator` queries events in the last `withinSeconds`
  window and returns `EvalResult.Batch` with one `Firing` per row.
  Every Firing carries a canonical `_subjectFingerprint` of
  `"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event`
  subtrees for Mustache templating.
- `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that
  exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`,
  `{{event.timestamp}}`, `{{event.detail}}`.
- Validation is delegated to the record compact ctor + enum at Jackson
  deserialization time — matches the existing policy of keeping
  controller validators focused on env-scoped / SQL-injection concerns.

Schema:
- V16 migration generalises the V15 per-exchange discriminator on
  `alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a
  fallback to the legacy `exchange.id` expression. Scalar kinds still
  resolve to `''` and keep one-open-per-rule. Duplicate-key path in
  `PostgresAlertInstanceRepository.save` is unchanged — the index is
  the deduper.

UI:
- New `AgentLifecycleForm.tsx` wizard form with multi-select chips for
  the six allowed event types + `withinSeconds` input. Wired into
  `ConditionStep`, `form-state` (validation + defaults: WENT_DEAD,
  300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the
  new option array.
- `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`,
  `{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind,
  and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`.

Tests (all passing):
- 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive +
  empty/zero/unknown-type rejection).
- 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty
  window, multi-agent fingerprint shape, scope forwarding, missing env).
- `NotificationContextBuilderTest` switch now covers the new kind.
- 119 alerting unit tests + 71 UI tests green.

Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
Collapse /alerts/inbox, /alerts/all, /alerts/history into a single
filterable inbox. Drop ACKNOWLEDGED from AlertState; add read_at and
deleted_at as orthogonal timestamp flags. Retire per-user alert_reads
tracking. Add Silence-rule and Delete row/bulk actions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 TDD tasks covering V17 migration (drop ACKNOWLEDGED + add read_at/deleted_at +
drop alert_reads + rework open-rule index), backend repo/controller/endpoints
including /restore for undo-toast backing, OpenAPI regen, UI rebuild (single
filterable inbox, row/bulk actions, silence-rule quick menu, SilencesPage
?ruleId= prefill), concrete test bodies, and rules/CLAUDE.md updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- AlertState: remove ACKNOWLEDGED case (V17 migration already dropped it from DB enum)
- AlertInstance: insert readAt + deletedAt Instant fields after lastNotifiedAt; add withReadAt/withDeletedAt withers; update all existing withers to pass both fields positionally
- AlertStateTransitions: add null,null for readAt/deletedAt in newInstance ctor call; collapse FIRING,ACKNOWLEDGED switch arm to just FIRING
- AlertScopeTest: update AlertState.values() assertion to 3 values; fix stale ConditionKind.hasSize(6) to 7 (JVM_METRIC was added earlier)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- AlertStateTransitionsTest: add null,null for readAt/deletedAt in openInstance helper;
  replace firingWhenAcknowledgedIsNoOp with firing_with_ack_stays_firing_on_next_firing_tick;
  convert ackedInstanceClearsToResolved to use FIRING+withAck; update section comment.
- PostgresAlertInstanceRepository: stub null,null for readAt/deletedAt in rowMapper
  to unblock compilation (Task 4 will read the actual DB columns).
- All other alerting test files: add null,null for readAt/deletedAt to AlertInstance
  ctor calls so the test source tree compiles; stub ACKNOWLEDGED JSON/state assertions
  with FIRING + TODO Task 4 comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove SET state='ACKNOWLEDGED' from ack() and the ACKNOWLEDGED predicate
from findOpenForRule — both would error after V17. The final ack() + open-rule
semantics (idempotent guards, deleted_at) are owned by Task 5; this is just
the minimum to stop runtime SQL errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- listForInbox gains tri-state acked/read filter params
- countUnreadBySeverityForUser(envId, userId) → countUnreadBySeverity(envId, userId, groupIds, roleNames)
- new methods: markRead, bulkMarkRead, softDelete, bulkSoftDelete, bulkAck, restore
- delete AlertReadRepository — read is now global on alert_instances.read_at

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- save/rowMapper read+write read_at and deleted_at
- listForInbox: tri-state acked/read filters; always excludes deleted
- countUnreadBySeverity: rewire without alert_reads join, preserve zero-fill
- new: markRead/bulkMarkRead/softDelete/bulkSoftDelete/bulkAck/restore
- delete PostgresAlertReadRepository + its bean
- restore zero-fill Javadoc on interface
- mechanical compile-fixes in AlertController, InAppInboxQuery,
  AlertControllerIT, InAppInboxQueryTest; Task 6 owns the rewrite
- PostgresAlertReadRepositoryIT stubbed @Disabled; Task 7 owns migration

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The class under test was removed in da281933; the IT became a @Disabled
placeholder. Deleting per no-backwards-compat policy. Read mutation
coverage lives in PostgresAlertInstanceRepositoryIT going forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GET /alerts gains tri-state acked + read query params
- new endpoints: DELETE /{id} (soft-delete), POST /bulk-delete, POST /bulk-ack, POST /{id}/restore
- requireLiveInstance 404s on soft-deleted rows; restore() reads the row regardless
- BulkReadRequest → BulkIdsRequest (shared body for bulk read/ack/delete)
- AlertDto gains readAt; deletedAt stays off the wire
- InAppInboxQuery.listInbox threads acked/read through to the repo (7-arg, no more null placeholders)
- SecurityConfig: new matchers for bulk-ack (VIEWER+), DELETE/bulk-delete/restore (OPERATOR+)
- AlertControllerIT: persistence assertions on /read + /bulk-read; full coverage for new endpoints
- InAppInboxQueryTest: updated to 7-arg listInbox signature

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- new AlertInstanceRepository.filterInEnvLive(ids, env): single-query bulk ID validation
- AlertController.inEnvLiveIds now one SQL round-trip instead of N
- bulkMarkRead SQL: defense-in-depth AND deleted_at IS NULL
- bulkAck SQL already had deleted_at IS NULL guard — no change needed
- PostgresAlertInstanceRepositoryIT: add filterInEnvLive_excludes_other_env_and_soft_deleted
- V12MigrationIT: remove alert_reads assertion (table dropped by V17)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New endpoints visible to the SPA: DELETE /alerts/{id}, POST
/alerts/{id}/restore, POST /alerts/bulk-delete, POST /alerts/bulk-ack.
GET /alerts gains tri-state acked / read query params. AlertDto now
includes readAt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- useAlerts gains acked/read filter params threaded into query + queryKey
- new mutations: useBulkAckAlerts, useDeleteAlert, useBulkDeleteAlerts, useRestoreAlert
- all cache-invalidate the alerts list and unread-count on success

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar Alerts section now just: Inbox · Rules · Silences. The /alerts
redirect still lands in /alerts/inbox; /alerts/all and /alerts/history
routes are gone (no redirect — stale URLs 404 per clean-break policy).

Also updates sidebar-utils.test.ts to match the new 3-entry shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Used by InboxPage row + bulk actions to silence an alert's underlying
rule for a chosen preset window. 'Custom…' routes to
/alerts/silences?ruleId=<id> (T13 adds the prefill wire).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the old FIRING+ACK hardcoded inbox with the single filterable
inbox:

- Filter bar: Severity · Status (PENDING/FIRING/RESOLVED, default FIRING) ·
  Hide acked (default on) · Hide read (default on).
- Row actions: Ack, Mark read, Silence rule… (quick menu), Delete
  (OPERATOR+, soft delete with undo toast wired to useRestoreAlert).
- Bulk toolbar: Ack N · Mark N read · Silence rules · Delete N
  (ConfirmDialog; OPERATOR+).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- STATE_ITEMS gains color dots (text-muted/error/success) to match SEVERITY_ITEMS
- onDeleteOne removes the deleted id from the selection Set so a follow-up bulk
  action doesn't try to re-delete a tombstoned row
- drop stale comment block that described an alternative SilenceRulesForSelection
  implementation not matching the shipped code

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Used by InboxPage's 'Silence rule… → Custom…' flow to carry the alert's
ruleId into the silence creation form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers: default useAlerts call (FIRING + hide-acked + hide-read),
Hide-acked toggle removes the acked filter, Acknowledge button only
renders for unacked rows, bulk-delete confirmation dialog with count,
delete buttons hidden for non-OPERATOR users, row-delete wires to
useDeleteAlert + renders an Undo action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- .claude/rules/ui.md: rewrite Alerts section — sidebar trims to
  Inbox/Rules/Silences, InboxPage description updated (4 filters, row
  actions, bulk toolbar, soft-delete undo), SilenceRuleMenu documented,
  SilencesPage ?ruleId= prefill noted.
- CLAUDE.md: V17 migration entry describing enum/column/table/index
  changes for the inbox redesign.
- .claude/rules/app-classes.md AlertController bullet already updated
  in the T6 drive-by.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UI: AlertStateChip.LABELS and .COLORS no longer include ACKNOWLEDGED
(dropped in V17). AlertStateChip.test.tsx test-cases trimmed to the
three remaining states. LayoutShell CMD-K now searches FIRING alerts
with acked=false (was state=[FIRING,ACKNOWLEDGED]).

Test: V17MigrationIT.open_rule_index_predicate_is_reworked replaced
with a structural-only assertion (index exists, indisunique). The
pg_get_indexdef pretty-printer varies across Postgres versions, so
predicate semantics are verified behaviorally in
PostgresAlertInstanceRepositoryIT (findOpenForRule_* +
save_rejectsSecondOpenInstanceForSameRuleAndExchange).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chore: refresh GitNexus stats + drop stale tsbuildinfo
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
8a6744d3e9
GitNexus analyze --embeddings after the alerts-inbox-redesign branch
brought the graph to 8780 symbols / 22753 relationships (was 8527/22174
in AGENTS.md and 8603/22281 in CLAUDE.md). The stat-header drift between
AGENTS.md and CLAUDE.md is an artifact of separate reindexes — both now
in sync.

ui/tsconfig.app.tsbuildinfo was a stale tsc incremental-build cache
that shouldn't be tracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs(howto): brand-new local environment via docker-compose
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m58s
CI / docker (push) Successful in 1m19s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 39s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 2m2s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
849265a1c6
Rewrite the "Infrastructure Setup" / "Run the Server" sections to
reflect what docker-compose.yml actually provides (full stack —
PostgreSQL + ClickHouse + server + UI — not just PostgreSQL). Adds:

- Step-by-step walkthrough for a first-run clean environment.
- Port map including the UI (8080), ClickHouse (8123/9000), PG (5432),
  server (8081).
- Dev credentials baked into compose surfaced in one place.
- Lifecycle commands (stop/start/rebuild-single-service/wipe).
- Infra-only mode for backend-via-mvn / UI-via-vite iteration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hsiegeln merged commit 181a479037 into main 2026-04-21 19:53:12 +02:00
Sign in to join this conversation.