diff --git a/docs/superpowers/specs/2026-04-21-alerts-inbox-redesign-design.md b/docs/superpowers/specs/2026-04-21-alerts-inbox-redesign-design.md new file mode 100644 index 00000000..9457e2fc --- /dev/null +++ b/docs/superpowers/specs/2026-04-21-alerts-inbox-redesign-design.md @@ -0,0 +1,202 @@ +# Alerts Inbox Redesign — Design + +**Status:** Approved for planning +**Date:** 2026-04-21 +**Author:** Hendrik + Claude + +## Goal + +Collapse the three alert list pages (`/alerts/inbox`, `/alerts/all`, `/alerts/history`) into a single filterable inbox. Retire per-user read tracking: `ack`, `read`, and `delete` become global timestamp flags on the alert instance. Add row/bulk actions for **Silence rule…** and **Delete** directly from the list. + +## Motivation + +Today the three pages all hit the same user-scoped `InAppInboxQuery.listInbox`, so "All" is misleading (it's not env-wide) and "History" is just "Inbox with status=RESOLVED". The user asked for a single clean inbox with richer filters and list-level actions, and explicitly granted simplification of the read/ack/delete tracking model: one action is visible to everyone, no per-user state. + +## Scope + +**In:** +- Data model: drop `ACKNOWLEDGED` from `AlertState`, add `read_at` + `deleted_at` columns, drop `alert_reads` table, rework V13 open-rule index predicate. +- Backend: `AlertController` gains `acked`/`read` filter params, new `DELETE /alerts/{id}`, new `POST /alerts/bulk-delete`, new `POST /alerts/bulk-ack`. `/read` + `/bulk-read` rewire to update `alert_instances.read_at`. +- Data migration (V17): existing `ACKNOWLEDGED` rows → `state='FIRING'` (ack_time preserved). +- UI: rebuild `InboxPage` filter bar, add Silence/Delete row + bulk actions. Delete `AllAlertsPage.tsx` + `HistoryPage.tsx`. Sidebar trims to Inbox · Rules · Silences. +- Tests + rules-file updates. + +**Out:** +- No redirects from `/alerts/all` or `/alerts/history` — clean break per project's no-backwards-compat policy. Stale URLs 404. +- No per-instance silence (different from rule-silence). Silence row action always silences the rule that produced the alert. +- No "mark unread". Read is a one-way flag. +- No per-user actor tracking for `read`/`deleted`. `acked_by` stays (already exists, useful in UI), but only because it's already wired. + +## Architecture + +### Data model (`alert_instances`) + +``` +state enum: PENDING · FIRING · RESOLVED (was: + ACKNOWLEDGED) +acked_at TIMESTAMPTZ NULL (existing, semantics unchanged) +acked_by TEXT NULL → users(user_id) (existing, retained for UI) +read_at TIMESTAMPTZ NULL (NEW, global) +deleted_at TIMESTAMPTZ NULL (NEW, soft delete) +``` + +**Orthogonality:** `state` describes the alert's lifecycle (is the underlying condition still met?). `acked_at` / `read_at` / `deleted_at` describe what humans have done to the notification. A FIRING alert can be acked (= "someone's on it") while remaining FIRING until the condition clears. + +**V13 open-rule unique index predicate** (preserved as the evaluator's dedup key) changes from: +```sql +WHERE state IN ('PENDING','FIRING','ACKNOWLEDGED') +``` +to: +```sql +WHERE state IN ('PENDING','FIRING') AND deleted_at IS NULL +``` +Ack no longer "closes" the open window — a rule that's still matching stays de-duped against the open instance whether acked or not. Deleting soft-deletes the row and opens a new slot so the rule can fire again fresh if the condition re-triggers. + +**`alert_reads` table:** dropped entirely. No FK references elsewhere. + +### Postgres enum removal + +Postgres doesn't support removing a value from an enum type. Migration path: + +```sql +-- V17 +-- 1. Coerce existing ACKNOWLEDGED rows → FIRING (ack_time already set) +UPDATE alert_instances SET state = 'FIRING' WHERE state = 'ACKNOWLEDGED'; + +-- 2. Swap to a new enum type without ACKNOWLEDGED +CREATE TYPE alert_state_enum_v2 AS ENUM ('PENDING','FIRING','RESOLVED'); +ALTER TABLE alert_instances + ALTER COLUMN state TYPE alert_state_enum_v2 + USING state::text::alert_state_enum_v2; +DROP TYPE alert_state_enum; +ALTER TYPE alert_state_enum_v2 RENAME TO alert_state_enum; + +-- 3. New columns +ALTER TABLE alert_instances + ADD COLUMN read_at timestamptz NULL, + ADD COLUMN deleted_at timestamptz NULL; +CREATE INDEX alert_instances_read_idx ON alert_instances (environment_id, read_at) WHERE read_at IS NULL AND deleted_at IS NULL; +CREATE INDEX alert_instances_deleted_idx ON alert_instances (deleted_at) WHERE deleted_at IS NOT NULL; + +-- 4. Rework V13/V15/V16 open-rule unique index with the new predicate +DROP INDEX IF EXISTS alert_instances_open_rule_uq; +CREATE UNIQUE INDEX alert_instances_open_rule_uq + ON alert_instances (rule_id, (COALESCE( + context->>'_subjectFingerprint', + context->'exchange'->>'id', + ''))) + WHERE rule_id IS NOT NULL + AND state IN ('PENDING','FIRING') + AND deleted_at IS NULL; + +-- 5. Drop alert_reads +DROP TABLE alert_reads; +``` + +### Backend — `AlertController` + +| Method | Path | Body/Query | RBAC | Effect | +|---|---|---|---|---| +| GET | `/alerts` | `state, severity, acked, read, limit` | VIEWER+ | Inbox list, always `deleted_at IS NULL`. `state` no longer accepts `ACKNOWLEDGED`. `acked` / `read` are tri-state: **omitted** = no filter; `=true` = only acked/read; `=false` = only unacked/unread. UI defaults to `acked=false&read=false` via the "Hide acked" + "Hide read" toggles. | +| GET | `/alerts/unread-count` | — | VIEWER+ | Counts `read_at IS NULL AND deleted_at IS NULL` + user-visibility predicate. | +| GET | `/alerts/{id}` | — | VIEWER+ | Detail. Returns 404 if `deleted_at IS NOT NULL`. | +| POST | `/alerts/{id}/ack` | — | VIEWER+ | Sets `acked_at=now, acked_by=user`. No state change. | +| POST | `/alerts/{id}/read` | — | VIEWER+ | Sets `read_at=now` if null. Idempotent. | +| POST | `/alerts/bulk-read` | `{ instanceIds: [...] }` | VIEWER+ | UPDATE `alert_instances SET read_at=now()` for all ids in env. | +| POST | `/alerts/bulk-ack` | `{ instanceIds: [...] }` | **NEW** VIEWER+ | Parallel to bulk-read. | +| DELETE | `/alerts/{id}` | — | **NEW** OPERATOR+ | Sets `deleted_at=now`. Returns 204. | +| POST | `/alerts/bulk-delete` | `{ instanceIds: [...] }` | **NEW** OPERATOR+ | Bulk soft-delete in env. | + +Removed: +- `AlertReadRepository` bean + `alert_reads` usage — `read_at`/`bulk-read` now update `alert_instances` directly. +- `ACKNOWLEDGED` handling in all backend code paths. + +`InAppInboxQuery.countUnread` rewires to a single SQL count on `alert_instances` with `read_at IS NULL AND deleted_at IS NULL` + target-visibility predicate. + +### Backend — evaluator + notifier + +- `AlertInstanceRepository.findOpenByRule(ruleId, subjectFingerprint)` already exists; its predicate now matches the new index (`state IN ('PENDING','FIRING') AND deleted_at IS NULL`). +- All test fixtures that assert `state=ACKNOWLEDGED` → assert `acked_at IS NOT NULL`. +- Notification pipeline (`AlertNotifier`) already fires on state transitions; no change — ack no longer being a state means one fewer state-change branch to handle. + +### Silence from list — no new endpoint + +UI row-action calls existing `POST /alerts/silences` with `{ matcher: { ruleId: }, startsAt: now, endsAt: now + duration, reason: "Silenced from inbox" }`. The duration picker is a small menu: `1h / 8h / 24h / Custom…`. "Custom" routes to `/alerts/silences` (the existing SilencesPage form) with the `ruleId` pre-filled via URL search param. + +### UI — `InboxPage.tsx` + +**Filter bar (topnavbar-style, left-to-right):** + +| Filter | Values | Default | +|---|---|---| +| Severity (ButtonGroup multi) | CRITICAL · WARNING · INFO | none (= no filter) | +| Status (ButtonGroup multi) | PENDING · FIRING · RESOLVED | FIRING selected | +| Hide acked (Toggle) | on/off | **on** | +| Hide read (Toggle) | on/off | **on** | + +Default state: "Show me firing things nobody's touched." Matches the "what needs attention" mental model. + +**Row actions column** (right-aligned, shown on hover or always for the touched row): +- `Acknowledge` (when `acked_at IS NULL`) +- `Mark read` (when `read_at IS NULL`) +- `Silence rule…` (opens quick menu: `1h / 8h / 24h / Custom…`) +- `Delete` (trash icon, OPERATOR+ only). Soft-delete. Undo toast for 5s invalidates the mutation. + +**Bulk toolbar** (shown when selection > 0, above table): +- `Acknowledge N` (filters to unacked) +- `Mark N read` (filters to unread) +- `Silence rules` (silences every unique ruleId in selection — duration menu) +- `Delete N` (OPERATOR+) — opens confirmation modal: "Delete N alerts? This affects all users." + +**Deleted/dropped files:** +- `ui/src/pages/Alerts/AllAlertsPage.tsx` — removed +- `ui/src/pages/Alerts/HistoryPage.tsx` — removed +- `/alerts/all` and `/alerts/history` route definitions in `router.tsx` — removed + +**Sidebar:** +`buildAlertsTreeNodes` in `ui/src/components/sidebar-utils.ts` trims to: Inbox · Rules · Silences. "Inbox" stays as the first child; selecting `/alerts` (bare) goes to inbox. + +**CMD-K:** `buildAlertSearchData` still registers `alert` and `alertRule` categories, but the alert-category deep links all point to `/alerts/inbox/{id}` (single detail route). + +### Tests + +Backend: +- `AlertControllerTest` — new `acked` + `read` filter cases, new DELETE + bulk-delete + bulk-ack tests, 404 on soft-deleted instance. +- `PostgresAlertInstanceRepositoryTest` — `markRead`/`bulkMarkRead`/`softDelete`/`bulkSoftDelete` SQL, index predicate correctness. +- V17 migration test: seed an `ACKNOWLEDGED` row pre-migration, verify post-migration state and index. +- Every test using `AlertState.ACKNOWLEDGED` — removed or switched to `acked_at IS NOT NULL` assertion. +- `AlertNotifierTest` — confirm no regression on notification emission paths. + +UI: +- `InboxPage.test.tsx` — filter toggles, select-all, row actions, bulk actions, optimistic delete + undo. +- `enums.test.ts` snapshot — `AlertState` drops ACKNOWLEDGED, new filter option arrays added. +- Silence duration menu component test. + +### Docs / rules updates + +- `.claude/rules/app-classes.md`: + - `AlertController` endpoint list updated (new DELETE, bulk-delete, bulk-ack; `acked`/`read` filter params; `ACKNOWLEDGED` removed from allowed state). + - Drop `AlertReadRepository` from `security/` or repository listings. +- `.claude/rules/ui.md`: + - Alerts section: remove "All" and "History" pages, drop their routes. Rewrite Inbox description to new filter bar + actions. + - Note: unread-count bell now global. +- `.claude/rules/core-classes.md`: + - `AlertState` enum values reduced to three. + - Note `alert_reads` table is retired. +- `CLAUDE.md`: + - New migration entry: `V17 — Alerts: drop ACKNOWLEDGED state, add read_at/deleted_at, drop alert_reads, rework open-rule index.` + +## Risk / open concerns + +1. **Enum-type swap on a populated table.** `ALTER COLUMN TYPE … USING cast::text::enum_v2` rewrites every row. `alert_instances` is expected to remain bounded (RESOLVED rows age out via retention), but on large installs this should run during a low-traffic window. Migration is idempotent. +2. **Concurrent ack/delete races.** Both are simple column updates with `WHERE id=? AND deleted_at IS NULL`; last-write wins is acceptable per the "no individual tracking" decision. +3. **Notification context mustache variables.** No change — `alert.state` shape is unchanged; templates referencing `state=ACKNOWLEDGED` are user-authored and will start producing no matches after the migration, which is intentional. Add a release note. +4. **CMD-K deep links** to deleted alert ids return 404 now (they did before for missing, now also for soft-deleted). Acceptable. + +## Acceptance + +- Single inbox at `/alerts/inbox` with four filter dimensions wired end-to-end. +- Silence-rule menu works from row + bulk. +- Soft-delete works from row + bulk, with OPERATOR+ guard and undo toast for single. +- Unread count bell reflects global `read_at IS NULL`. +- All existing backend/UI tests green; new test coverage as listed above. +- V17 up-migrates ACKNOWLEDGED rows cleanly; reviewer can verify with a seeded pre-migration snapshot.