Collapse /alerts/inbox, /alerts/all, /alerts/history into a single
filterable inbox. Drop ACKNOWLEDGED from AlertState; add read_at and
deleted_at as orthogonal timestamp flags. Retire per-user alert_reads
tracking. Add Silence-rule and Delete row/bulk actions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allows alert rules to fire on agent-lifecycle events — REGISTERED,
RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather
than only on current state. Each matching `(agent, eventType, timestamp)`
becomes its own ackable AlertInstance, so outages on distinct agents are
independently routable.
Core:
- New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record
(scope, eventTypes, withinSeconds). Compact ctor rejects empty
eventTypes and withinSeconds<1.
- Strict allowlist enum `AgentLifecycleEventType` (six entries matching
the server-emitted types in `AgentRegistrationController` and
`AgentLifecycleMonitor`). Custom agent-emitted event types tracked in
backlog issue #145.
- `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes,
from, to, limit)` — new read path ordered `(timestamp ASC, insert_id
ASC)` used by the evaluator. Implemented on
`ClickHouseAgentEventRepository` with tenant + env filter mandatory.
App:
- `AgentLifecycleEvaluator` queries events in the last `withinSeconds`
window and returns `EvalResult.Batch` with one `Firing` per row.
Every Firing carries a canonical `_subjectFingerprint` of
`"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event`
subtrees for Mustache templating.
- `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that
exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}`.
- Validation is delegated to the record compact ctor + enum at Jackson
deserialization time — matches the existing policy of keeping
controller validators focused on env-scoped / SQL-injection concerns.
Schema:
- V16 migration generalises the V15 per-exchange discriminator on
`alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a
fallback to the legacy `exchange.id` expression. Scalar kinds still
resolve to `''` and keep one-open-per-rule. Duplicate-key path in
`PostgresAlertInstanceRepository.save` is unchanged — the index is
the deduper.
UI:
- New `AgentLifecycleForm.tsx` wizard form with multi-select chips for
the six allowed event types + `withinSeconds` input. Wired into
`ConditionStep`, `form-state` (validation + defaults: WENT_DEAD,
300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the
new option array.
- `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind,
and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`.
Tests (all passing):
- 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive +
empty/zero/unknown-type rejection).
- 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty
window, multi-agent fingerprint shape, scope forwarding, missing env).
- `NotificationContextBuilderTest` switch now covers the new kind.
- 119 alerting unit tests + 71 UI tests green.
Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
Inbox: replace 4 parallel outlined buttons with 2 context-aware ones.
When nothing is selected → "Acknowledge all firing" (primary) + "Mark all
read" (secondary). When rows are selected → the same slots become
"Acknowledge N" + "Mark N read" with counts inlined. Primary variant
gives the foreground action proper visual weight; secondary is the
supporting action. No more visually-identical disabled buttons cluttering
the bar.
History: drop the local DateRangePicker. The page now reads
`timeRange` from `useGlobalFilters()` so the top-bar TimeRangeDropdown
(1h / 3h / 6h / Today / 24h / 7d / custom) is the single source of
truth, consistent with every other time-scoped page in the app.
Replace the SegmentedTabs with multi-select ButtonGroup, matching the
topnavbar Completed/Warning/Failed/Running pattern. State dots use the
same palette as AlertStateChip (FIRING=error, ACKNOWLEDGED=warning,
PENDING=muted, RESOLVED=success). Default selection is the three "open"
states — Resolved is off by default and a single click surfaces closed
alerts without navigating to /history.
Backend: `GET /environments/{envSlug}/alerts` now accepts optional multi-value
`state=…` and `severity=…` query params. Filters are pushed down to
PostgresAlertInstanceRepository, which appends `AND state::text = ANY(?)` /
`AND severity::text = ANY(?)` to the inbox query (null/empty = no filter).
`AlertInstanceRepository.listForInbox` gained a 7-arg overload; the old 5-arg
form is preserved as a default delegate so existing callers (evaluator,
AlertingFullLifecycleIT, PostgresAlertInstanceRepositoryIT) compile unchanged.
`InAppInboxQuery.listInbox` also has a new filtered overload.
UI: InboxPage severity filter migrated from `SegmentedTabs` (single-select,
no color cues) to `ButtonGroup` (multi-select with severity-coloured dots),
matching the topnavbar status-filter pattern. `useAlerts` forwards the
filters as query params and cache-keys on the filter tuple so each combo
is independently cached.
Unit + hook tests updated to the new contract (5 UI tests + 8 Java unit
tests passing). OpenAPI types regenerated from the fresh local backend.
Round 4 smoke feedback on /alerts:
- Bell now has consistent 12px gap from env selector and user name
(wrap env + bell in flex container inside TopBar's environment prop)
- RuleEditorWizard constrained to max-width 840px (centered) and
upgraded the page title from SectionHeader to h2 pattern used by
the list pages
- Inbox: added select-all checkbox, severity SegmentedTabs filter
(All / Critical / Warning / Info), and bulk-ack actions
(Acknowledge selected + Acknowledge all firing) alongside the
existing mark-read actions
Surfaced during second smoke:
1. Notification bell moved — was first child of TopBar (left of
breadcrumb); now rendered inside the `environment` slot so it
sits between the env selector and the user menu, matching user
expectations.
2. Content tabs (Exchanges/Dashboard/Runtime/Deployments) hidden on
`/alerts/*` — the operational tabs don't apply there.
3. Inbox / All alerts filters now actually filter. `AlertController.list`
accepts only `limit` — `state`/`severity` query params are dropped
server-side. Move `useAlerts` to fetch once per env (limit 200) and
apply filters client-side via react-query `select`, with a stable
queryKey so filter toggles are instant and don't re-request. True
server-side filter needs a backend change (follow-up).
4. Novice-friendly labels:
- Inbox subtitle: "99 firing · 100 total" → "99 need attention ·
100 total in inbox"
- All alerts filter: Open/Firing/Acked/All →
"Currently open"/"Firing now"/"Acknowledged"/"All states"
- All alerts subtitle: "N shown" → "N matching your filter"
- History subtitle: "N resolved" → "N resolved alert(s) in range"
- Rules subtitle: "N total" → "N rule(s) configured"
- Silences subtitle: "N active" → "N active silence(s)" or
"Nothing silenced right now"
- Column headers: "State" → "Status", rules "Kind" → "Type",
rules "Targets" → "Notifies"
- Buttons: "Ack" → "Acknowledge", silence "End" → "End early"
Updated alerts.test.tsx and e2e selector to match new behavior/labels.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Visual regressions surfaced during browser smoke:
1. Page headers — `SectionHeader` renders as 12px uppercase gray (a
section divider, not a page title). Replace with proper h2 title
+ inline subtitle (`N firing · N total` etc.) and right-aligned
actions, styled from `alerts-page.module.css`.
2. Undefined `--space-*` tokens — the project (and `@cameleer/design-system`)
has never shipped `--space-sm|md|lg|xl`, even though many modules
(SensitiveKeysPage, alerts CSS, …) reference them. The fallback
to `initial` silently collapsed gaps/paddings to 0. Define the
scale in `ui/src/index.css` so every consumer picks it up.
3. List scrolling — DataTable was using default pagination, but with
no flex sizing the whole page scrolled. Add `fillHeight` and raise
`pageSize`/list `limit` to 200 so the table gets sticky header +
internal scroll + pinned pagination footer (Gmail-style). True
cursor-based infinite scroll needs a backend change (filed as
follow-up — `/alerts` only accepts `limit` today).
4. Title column clipping — `.titlePreview` used `white-space: nowrap`
+ fixed `max-width`, truncating message mid-UUID. Switch to a
2-line `-webkit-line-clamp` so full context is visible.
5. Notification bell badge invisible — `NotificationBell.module.css`
referenced undefined tokens (`--fg`, `--hover-bg`, `--bg`,
`--muted`). Map to real DS tokens (`--text-primary`, `--bg-hover`,
`#fff`, `--text-muted`). The admin user currently sees no badge
because the backend `/alerts/unread-count` returns 0 (read
receipts) — that's data, not UI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two inline-style color refs in NotifyStep and TriggerStep were still
pointing at the undefined --muted token instead of the DS
--text-muted. Caught by the design-system-alignment verification
grep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Rules list Delete and Silences End-early flows now use DS
ConfirmDialog instead of native confirm(). Update selectors to
target the dialog's role=dialog + confirm button instead of
listening for the native `dialog` event.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote banner and prefill warnings now render as DS Alert components
(info / warning variants). Step body wraps in sectionStyles.section
for card affordance matching other forms in the app.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The feed-row component is replaced by DataTable column renderers and
the shared renderAlertExpanded content renderer. No callers remain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces raw <table> with DataTable, inline-styled form with proper
FormField hints, and native confirm() end-early with ConfirmDialog
(warning variant). Adds DS EmptyState for no-silences case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces raw <table> with DataTable, raw <select> promote control with
DS Dropdown, and native confirm() delete with ConfirmDialog. Adds DS
EmptyState with CTA for the no-rules case. Uses SectionHeader's
action slot instead of ad-hoc flex wrapper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces custom feed rows with DataTable. Adds a DateRangePicker
filter (client-side) defaulting to the last 7 days. Client-side
range filter is a stopgap; a server-side range param is a future
enhancement captured in the design spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces 4-Button filter row with DS SegmentedTabs and custom row
rendering with DataTable. Shares expandedContent renderer and
severity-driven rowAccent with Inbox.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces custom feed-row layout with the shared DataTable shell used
elsewhere in the app. Adds checkbox selection + bulk "Mark selected
read" toolbar alongside the existing "Mark all read". Uses DS
EmptyState for empty lists, severity-driven rowAccent for unread
tinting, and renderAlertExpanded for row detail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the feed-row classes (.row, .rowUnread, .body, .meta, .time,
.message, .actions, .empty) — these are replaced by DS DataTable +
EmptyState in follow-up tasks. Keep layout helpers for page shell,
toolbar, filter bar, bulk-action bar, title cell, and DataTable
expanded content. All colors / spacing use DS tokens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the per-row detail block used by Inbox/All/History DataTables
so the three pages share one rendering. Consumes AlertDto fields that
are nullable in the schema; hides missing fields instead of rendering
placeholders.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Formats ISO timestamps as `Nm ago` / `Nh ago` / `Nd ago`, falling back
to an absolute locale date string for values older than 30 days. Used
by the alert DataTable Age column.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure function mapping the 3-value AlertDto.severity enum to the 2-value
DataTable rowAccent prop. INFO maps to undefined (no tint) because the
DS DataTable rowAccent only supports error|warning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-by-task TDD plan implementing the design spec. Splits the work
into 14 tasks: helper utilities (TDD), shared renderer, CSS token
migration, per-page rewrites (Inbox/All/History/Rules/Silences),
wizard banner migration, AlertRow deletion, E2E adaptation for
ConfirmDialog, and full verification pass. Each task produces an
atomic commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rework all pages under /alerts to use @cameleer/design-system components
and tokens. Unified DataTable shell for Inbox/All/History with expandable
rows; DataTable + Dropdown + ConfirmDialog for Rules list; FormField grid
+ DataTable for Silences; DS Alert for wizard banners. Replaces undefined
CSS variables (--bg, --fg, --muted, --accent) with DS tokens and removes
raw <table>/<select>/confirm() usage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V13 added a partial unique index on alert_instances(rule_id) WHERE state
IN (PENDING,FIRING,ACKNOWLEDGED). Correct for scalar condition kinds
(ROUTE_METRIC / AGENT_STATE / DEPLOYMENT_STATE / LOG_PATTERN / JVM_METRIC
/ EXCHANGE_MATCH in COUNT_IN_WINDOW) but wrong for EXCHANGE_MATCH /
PER_EXCHANGE, which by design emits one alert_instance per matching
exchange. Under V13 every PER_EXCHANGE tick with >1 match logged
"Skipped duplicate open alert_instance for rule …" at evaluator cadence
and silently lost alert fidelity — only the first matching exchange per
tick got an AlertInstance + webhook dispatch.
V15 drops the rule_id-only constraint and recreates it with a
discriminator on context->'exchange'->>'id'. Scalar kinds emit
Map.of() as context, so their expression resolves to '' — "one open per
rule" preserved. ExchangeMatchEvaluator.evaluatePerExchange always
populates exchange.id, so per-exchange instances coexist cleanly.
Two new PostgresAlertInstanceRepositoryIT tests:
- multiple open instances for same rule + distinct exchanges all land
- second open for identical (rule, exchange) still dedups via the
DuplicateKeyException fallback in save() — defense-in-depth kept
Also fixes pre-existing PostgresAlertReadRepositoryIT brokenness: its
setup() inserted 3 open instances sharing one rule_id, which V13 blocked
on arrival. Migrate to one rule_id per instance (pattern already used
across other storage ITs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the loop on three bug classes from Plan 03 triage:
context-load regressions (missing @Autowired), UI/backend drift
on template variables, and hand-maintained TS enum unions caused
by springdoc polymorphic schema quirk.
Covers 5 tasks: context-startup smoke test, template-variables
SSOT endpoint, second Playwright spec, String-to-enum migrations
on 5 condition fields, and @DiscriminatorMapping on AlertCondition.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rule editor wizard reset the condition payload on kind-change without
seeding a fireMode default; the ExchangeMatchCondition ctor allowed null to
pass through; AlertEvaluatorJob then NPE-looped every tick on a saved rule.
- core: compact ctor now rejects null fireMode (Jackson-deser path only — all
production callers already pass a value).
- V14: repair existing EXCHANGE_MATCH rows with fireMode=null to
PER_EXCHANGE + perExchangeLingerSeconds=300 (default matches the wizard).
- ui: ConditionStep.onKindChange seeds EXCHANGE_MATCH defaults so the
Select's displayed fallback ("Per exchange") is actually in form state.
- ui: validateStep('condition', ...) now enforces fireMode presence + the
mode-specific fields before the user reaches Review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to 83837ada addressing the critical-review feedback:
- Duplicate ConditionKind type consolidated: the one in
api/queries/alertRules.ts (which was nullable — wrong) is gone;
single source of truth lives in this module.
- Module moved out of api/ into pages/Alerts/ where it belongs.
api/ is the data layer; labels + hide lists are view-layer concerns.
- Hidden values formalised: Comparator.EQ and JvmAggregation.LATEST
are intentionally not surfaced in dropdowns (noisy / wrong feature
boundary, see in-file comments). They remain in the type unions so
rules that carry those values save/load correctly — we just don't
advertise them in the UI.
- JvmAggregation declaration order restored to MAX/AVG/MIN (matches
what users saw before 83837ada). LATEST declared last; hidden.
- Snapshot tests for every visible *_OPTIONS array — reviewer signal
in future PRs when a backend enum change or hide-list edit
silently reshapes the dropdown.
- `toOptions` gains a JSDoc noting that label-map declaration order
is load-bearing (ES2015 Object.keys insertion-order guarantee).
- **Honest about the springdoc schema quirk**: the generated
polymorphic condition types resolve to `never` at the TypeScript
level (two conflicting `kind` discriminators — the class-name
literal and the Jackson enum — intersect to never), which silently
defeated `Record<T, string>` exhaustiveness. The previous commit's
"schema-derived enums" claim was accurate only for the flat-field
enums (ConditionKind, Severity, TargetKind); condition-specific
enums (RouteMetric, Comparator, JvmAggregation, ExchangeFireMode)
were silently `never`. Those are now declared as hand-written
string-literal unions with a top-of-file comment spelling out the
issue and the regen-and-compare workflow. Real upstream fix is a
backend-side adjustment to how springdoc emits polymorphic
`@JsonSubTypes` — out of scope for this phase.
Verified: ui build green, 56/56 vitest pass (49 pre-existing + 7
new enum snapshots).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes item 5 on the Plan 03 cleanup triage. The option arrays
("METRICS", "COMPARATORS", KIND_OPTIONS, SEVERITY_OPTIONS, FIRE_MODES)
scattered across RouteMetricForm / JvmMetricForm / ExchangeMatchForm /
ConditionStep / ScopeStep were hand-typed string literals. They drifted
silently — P95_LATENCY_MS appeared in a dropdown without a backend
counterpart (caught at runtime in bcde6678); JvmMetric.LATEST and
Comparator.EQ existed on the backend but were missing from the UI all
along.
Fix: new `ui/src/api/alerting-enums.ts` derives every enum from
schema.d.ts and pairs each with a `Record<T, string>` label map.
TypeScript enforces exhaustiveness — adding or removing a backend
value fails the build of this file until the label map is updated.
Every consumer imports the generated `*_OPTIONS` array.
Covered (schema-derived):
- ConditionKind → CONDITION_KIND_OPTIONS
- Severity → SEVERITY_OPTIONS
- RouteMetric → ROUTE_METRIC_OPTIONS
- Comparator → COMPARATOR_OPTIONS (adds EQ that was missing)
- JvmAggregation → JVM_AGGREGATION_OPTIONS (adds LATEST that was missing)
- ExchangeMatch.fireMode → EXCHANGE_FIRE_MODE_OPTIONS
- AlertRuleTarget.kind → TARGET_KIND_OPTIONS
form-state.ts: `severity: 'CRITICAL' | 'WARNING' | 'INFO'` and
`kind: 'USER' | 'GROUP' | 'ROLE'` literal unions swapped for the
derived `Severity` / `TargetKind` aliases.
Not covered, backend types them as `String` (no `@Schema(allowableValues)`
annotation yet):
- AgentStateCondition.state
- DeploymentStateCondition.states
- LogPatternCondition.level
- ExchangeFilter.status
- JvmMetricCondition.metric
These stay hand-typed with a pointer-comment. Follow-up: add
`@Schema(allowableValues = …)` to the Java record components so the
enums land in schema.d.ts; then fold them into alerting-enums.ts.
Plus: gitnexus index-stats refresh in AGENTS.md/CLAUDE.md from the
post-deploy reindex.
Verified: ui build green, 49/49 vitest pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the UiAuth / Oidc / UserAdmin controllers were aligned to store
bare user_ids, the rules that future sessions read were still describing
the old behaviour (OutboundConnectionAdminController "strips user:
prefix" — true mechanically but the subtlety is that the strip is
the bridge between a prefixed JWT subject and an unprefixed DB key,
not a hack).
- CLAUDE.md: expand the User persistence one-liner to state the
convention authoritatively (local `<username>`, OIDC `oidc:<sub>`,
JWT `user:` namespace, env-scoped controllers strip for FK).
- .claude/rules/app-classes.md:
- Add "User ID conventions" section near the top that spells out
write-path vs read-path behaviour in one place.
- Add UiAuthController + OidcAuthController entries under
security/ with their upsert shape documented.
- Soften the OutboundConnectionAdminController line to reference
the convention instead of restating the mechanism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the UiAuthController fix: every write path that puts a row
into users/user_roles/user_groups must use the bare DB key, because
the env-scoped controllers (Alert, AlertRule, AlertSilence, Outbound)
strip "user:" before using the name as an FK. If the write path stores
prefixed, first-time alerting/outbound writes fail with
alert_rules_created_by_fkey violation.
UiAuthController shipped the model in the prior commit (bare userId
for all DB/RBAC calls, "user:"-namespaced subject for JWT signing).
Bringing the other two write paths in line:
- OidcAuthController.callback:
userId = "oidc:" + oidcUser.subject() // DB key, no "user:"
subject = "user:" + userId // JWT subject (namespaced)
All userRepository / rbacService / applyClaimMappings calls use
userId. Tokens still carry the namespaced subject so
JwtAuthenticationFilter can distinguish user vs agent tokens.
- UserAdminController.createUser: userId = request.username() (bare).
resetPassword: dropped the "user:"-strip fallback that was only
needed because create used to prefix — now dead.
No migration. Greenfield alpha product — any pre-existing prefixed
rows in a dev DB will become orphans on next login (login upserts
the unprefixed row, old prefixed row is harmless but unused).
Operators doing a clean re-index can wipe the DB.
Read-path controllers still strip — harmless for bare DB rows, and
OIDC humans (JWT sub "user:oidc:<s>") still resolve correctly to
the new DB key "oidc:<s>" after stripping.
Verified: 45/45 alerting + outbound ITs pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Post-commit stats: 8524 nodes / 22174 edges / 415 clusters / 300 flows
(up from 8513/22146/409/300 after the five cleanup-batch commits).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of the mismatch that prompted the one-shot cameleer-seed
docker service: UiAuthController stored users.user_id as the JWT
subject "user:admin" (JWT sub format). Every env-scoped controller
(Alert, AlertSilence, AlertRule, OutboundConnectionAdmin) already
strips the "user:" prefix on the read path — so the rest of the
system expects the DB key to be the bare username. With UiAuth
storing prefixed, fresh docker stacks hit
"alert_rules_created_by_fkey violation" on the first rule create.
Fix: inside login(), compute `userId = request.username()` and use
it everywhere the DB/RBAC layer is touched (isLocked, getPasswordHash,
record/clearFailedLogins, upsert, assignRoleToUser, addUserToGroup,
getSystemRoleNames). Keep `subject = "user:" + userId` — we still
sign JWTs with the namespaced subject so JwtAuthenticationFilter can
distinguish user vs agent tokens.
refresh() and me() follow the same rule via a stripSubjectPrefix()
helper (JWT subject in, bare DB key out).
With the write path aligned, the docker bridge is no longer needed:
- Deleted deploy/docker/postgres-init.sql
- Deleted cameleer-seed service from docker-compose.yml
Scope: UiAuthController only. UserAdminController + OidcAuthController
still prefix on upsert — that's the bug class the triage identified
as "Option A or B either way OK". Not changing them now because:
a) prod admins are provisioned unprefixed through some other path,
so those two files aren't the docker-only failure observed;
b) stripping them would need a data migration for any existing
prod users stored prefixed, which is out of scope for a cleanup
phase. Follow-up worth scheduling if we ever wire OIDC or admin-
created users into alerting FKs.
Verified: 33/33 alerting+outbound controller ITs pass (9 outbound,
10 rules, 9 silences, 5 alert inbox).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec §13 calls for the notification bell to colour-code by highest
unread severity (CRITICAL → error, WARNING → amber, INFO → muted).
The old { count } DTO forced the UI to pick one static colour, so
NotificationBell shipped with a TODO. Grow the contract instead:
UnreadCountResponse = { total, bySeverity: { CRITICAL, WARNING, INFO } }
Guarantees:
- every severity is always present with a >=0 value (no undefined
keys on the wire), so the UI can branch without defaults.
- total = sum of bySeverity values — kept explicit on the wire for
cheap top-line display, not recomputed client-side.
Backend
- AlertInstanceRepository: replaces countUnreadForUser(long) with
countUnreadBySeverityForUser returning Map<AlertSeverity, Long>.
One SQL round-trip per (env, user) — GROUP BY ai.severity over the
same NOT EXISTS(alert_reads) filter.
- UnreadCountResponse.from(Map) normalises and defensively copies;
missing severities default to 0.
- InAppInboxQuery.countUnread now returns the DTO, caches the full
response (still 5s TTL) so severity breakdown gets the same
hit-rate as the total did before.
- AlertController just hands the DTO back.
Breaking change — no backwards-compat shim: the `count` field is
gone. UI and tests updated in the same commit; there are no other
API consumers in the tree.
Frontend
- Regenerated openapi.json + schema.d.ts against a fresh build of
the new backend.
- NotificationBell branches badge colour on the highest unread
severity (CRITICAL > WARNING > INFO) via new CSS variants.
- Tests cover all four paths: zero, critical-present, warning-only,
info-only.
Tests: 7 unit tests + 12 ITs (incl. new grouping + empty-map)
+ 49 vitest (was 46; +3 severity-branch assertions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Design spec and Plan 02 described AlertCondition polymorphism as
Id.DEDUCTION, but the code that shipped in PR #140 uses Id.NAME with
property="kind" and include=EXISTING_PROPERTY. The `kind` field is
real on every subtype and the DB stores it in a separate column
(condition_kind), so reading the discriminator directly is simpler
than deduction — update the docs to match. Also add `"kind"` to the
example JSON payloads so they match on-wire reality.
OutboundAuth (Plan 01) correctly still uses Id.DEDUCTION and is
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Added as a reusable primitive during Plan 03 Task 9, but the intended
consumer (NotificationBell live-region refresh) was removed during
code review, leaving the hook unused. Delete it — YAGNI; reintroduce
when a real consumer shows up.
Verified upstream impact (gitnexus): 0 callers, LOW risk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fixtures.ts: auto-applied login fixture — visits /login?local to skip OIDC
auto-redirect, fills username/password via label-matcher, clicks 'Sign in',
then selects the 'default' env so alerting hooks enable (useSelectedEnv gate).
Override via E2E_ADMIN_USER + E2E_ADMIN_PASS.
alerting.spec.ts: 4 tests against the full docker-compose stack:
- sidebar Alerts accordion → /alerts/inbox
- 5-step wizard: defaults-only create + row delete (unique timestamp name
avoids strict-mode collisions with leftover rules)
- CMD-K palette via SearchTrigger click (deterministic; Ctrl+K via keyboard
is flaky when the canvas doesn't have focus)
- silence matcher-based create + end-early
DS FormField renders labels as generics (not htmlFor-wired), so inputs are
targeted by placeholder or label-proximity locators instead of getByLabel.
Does not exercise fire→ack→clear; that's covered backend-side by
AlertingFullLifecycleIT (Plan 02). UI E2E for that path would need event
injection into ClickHouse, out of scope for this smoke.
Alerting + outbound controllers resolve acting user via
authentication.name with 'user:' prefix stripped → 'admin'. But
UserRepository.upsert stores env-admin as 'user:admin' (JWT sub format).
The resulting FK mismatch manifests as 500 'alert_rules_created_by_fkey'
on any create operation in a fresh docker stack.
Workaround: run-once 'cameleer-seed' compose service runs psql against
deploy/docker/postgres-init.sql after the server is healthy (i.e. after
Flyway migrations have created tenant_default.users), inserting
user_id='admin' idempotently. The root-cause fix belongs in the backend
(either stop stripping the prefix in alerting/outbound controllers, or
normalise storage to the unprefixed form) and is out of scope for
Plan 03.
- RouteMetricForm dropped P95_LATENCY_MS — not in cameleer-server-core
RouteMetric enum (valid: ERROR_RATE, P99_LATENCY_MS, AVG_DURATION_MS,
THROUGHPUT, ERROR_COUNT).
- initialForm now returns a ready-to-save ROUTE_METRIC condition
(metric=ERROR_RATE, comparator=GT, threshold=0.05, windowSeconds=300),
so clicking through the wizard with all defaults produces a valid rule.
Prevents a 400 'missing type id property kind' + 400 on condition enum
validation if the user leaves the condition step untouched.
Task 29's refactor added a package-private test-friendly constructor
alongside the public production one. Without @Autowired Spring cannot pick
which constructor to use for the @Component, and falls back to searching
for a no-arg default — crashing startup with 'No default constructor found'.
Detected when launching the server via the new docker-compose stack; unit
tests still pass because they invoke the package-private test constructor
directly.
Mirrors the k8s manifests in deploy/ as a local dev stack:
- cameleer-postgres (matches deploy/cameleer-postgres.yaml)
- cameleer-clickhouse (matches deploy/cameleer-clickhouse.yaml, default CLICKHOUSE_DB=cameleer)
- cameleer-server (built from Dockerfile, env mirrors deploy/base/server.yaml)
- cameleer-ui (built from ui/Dockerfile, served on host :8080 to leave :5173 free for Vite dev)
Dockerfile + ui/Dockerfile: REGISTRY_TOKEN is now optional (empty → skip Maven/npm auth).
cameleer-common package is public, so anonymous pulls succeed; private packages still require the token.
Backend defaults tuned for local E2E:
- RUNTIME_ENABLED=false (no Docker-in-Docker deployments in dev stack)
- OUTBOUND_HTTP_ALLOW_PRIVATE_TARGETS=true (so webhook tests can target host.docker.internal etc.)
- UIUSER/UIPASSWORD=admin/admin (matches Playwright E2E_ADMIN_USER/PASS defaults)
- CORS includes both :5173 (Vite) and :8080 (nginx)
.claude/rules/ui.md now maps every Plan 03 UI surface. Admin guide gains
an inbox/rules/silences walkthrough so ops teams can start in the UI
without reading the spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prometheus scrapes can fire every few seconds. The open-alerts / open-rules
gauges query Postgres on each read — caching the values for 30s amortises
that to one query per half-minute. Addresses final-review NIT from Plan 02.
- Introduces a package-private TtlCache that wraps a Supplier<Long> and
memoises the last read for a configurable Duration against a Supplier<Instant>
clock.
- Wraps each gauge supplier (alerting_rules_total{enabled|disabled},
alerting_instances_total{state}) in its own TtlCache.
- Adds a test-friendly constructor (package-private) taking explicit
Duration + Supplier<Instant> so AlertingMetricsCachingTest can advance
a fake clock without waiting wall-clock time.
- Adds AlertingMetricsCachingTest covering:
* supplier invoked once per TTL across repeated scrapes
* 29 s elapsed → still cached; 31 s elapsed → re-queried
* gauge value reflects the cached result even after delegate mutates
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rejects webhook URLs that resolve to loopback, link-local, or RFC-1918
private ranges (IPv4 + IPv6 ULA fc00::/7). Enforced on both create and
update in OutboundConnectionServiceImpl before persistence; returns 400
Bad Request with "private or loopback" in the body.
Bypass via `cameleer.server.outbound-http.allow-private-targets=true`
for dev environments where webhooks legitimately point at local
services. Production default is `false`.
Test profile sets the flag to `true` in application-test.yml so the
existing ITs that post webhooks to WireMock on https://localhost:PORT
keep working. A dedicated OutboundConnectionSsrfIT overrides the flag
back to false (via @TestPropertySource + @DirtiesContext) to exercise
the reject path end-to-end through the admin controller.
Plan 01 scope; required before SaaS exposure (spec §17).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends operationalSearchData with open alerts (FIRING|ACKNOWLEDGED) and
all rules. Badges convey severity + state. Selecting an alert navigates to
/alerts/inbox/{id}; a rule navigates to /alerts/rules/{id}. Uses the
existing CommandPalette extension point — no new registry.
Fetches target-env apps (useCatalog) and env-allowed outbound
connections, passes them to prefillFromPromotion, and renders the
returned warnings in an amber banner above the step nav. Warnings list
the field name and the remediation message so users see crossings that
need manual adjustment before saving.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>