feat(alerting): Plan 03 — UI + backfills (SSRF guard, metrics caching, docker stack) #144

Merged
hsiegeln merged 39 commits from feat/alerting-03-ui into main 2026-04-20 16:27:49 +02:00
Owner

Summary

Plan 03 delivers the full alerting UI on top of Plan 02's backend, plus two backend backfills (SSRF guard on outbound URLs + 30s TTL cache on AlertingMetrics gauges) and a complete local docker-compose stack mirroring the k8s manifests in deploy/.

UI additions (all under ui/src/pages/Alerts/ + ui/src/components/):

  • /alerts/{inbox,all,history,rules,rules/new,rules/:id,silences} routes (lazy-loaded).
  • Sidebar accordion section (Inbox / All / Rules / Silences / History) + NotificationBell in TopBar that polls /alerts/unread-count every 30s (paused when tab hidden via TanStack Query refetchIntervalInBackground).
  • 5-step rule editor wizard (Scope / Condition / Trigger / Notify / Review) with 6 kind-specific condition sub-forms (ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC).
  • Env-promotion flow (pure client-side URL prefill with warnings for cross-env agents and disallowed outbound connections — no new REST endpoint).
  • Shared <MustacheEditor /> (CodeMirror 6) with variable autocomplete + inline linter. Registry (alert-variables.ts) mirrors NotificationContextBuilder leaves.
  • AlertStateChip, SeverityBadge, InboxPage (bulk-read), AllAlertsPage (state filter), HistoryPage, RulesListPage (enable/disable + delete + promote), SilencesPage (matcher-based create + end-early).
  • CMD-K integration — alert + alertRule result categories via the existing LayoutShell searchData extension point.
  • TanStack Query hooks: alerts.ts, alertRules.ts, alertSilences.ts, alertNotifications.ts, alertMeta.ts. All env-scoped via useSelectedEnv.

Backend backfills:

  • SsrfGuard — rejects outbound webhook URLs that resolve to loopback, link-local, RFC-1918 private ranges, or IPv6 ULA. Wired into OutboundConnectionServiceImpl.create/update. Bypass via cameleer.server.outbound-http.allow-private-targets=true for dev.
  • AlertingMetrics gauges now wrap their Postgres-backed suppliers in a 30s TTL cache so Prometheus scrapes don't produce per-scrape DB queries (final-review NIT from Plan 02).
  • Hotfix during E2E: @Autowired on the AlertingMetrics production constructor so Spring picks it over the package-private test-friendly one (Task 29 refactor had introduced ambiguity).

Docker stack (new docker-compose.yml mirroring deploy/):

  • cameleer-postgres (matches deploy/cameleer-postgres.yaml).
  • cameleer-clickhouse (matches deploy/cameleer-clickhouse.yaml; CLICKHOUSE_DB=cameleer).
  • cameleer-server built from the repo Dockerfile (REGISTRY_TOKEN now optional — cameleer-common is public).
  • cameleer-ui built from ui/Dockerfile on host :8080 so Vite dev (npm run dev:local) keeps :5173 free.
  • cameleer-seed one-shot service that seeds user_id='admin' in tenant_default.users after the server is healthy, bridging a pre-existing FK mismatch between UserRepository storage (prefixed user:admin) and alerting-controller usage (stripped admin). The root-cause fix belongs in a future backend cleanup.

Docs + rules:

  • .claude/rules/ui.md — Alerts section mapping every Plan 03 UI surface.
  • docs/alerting.md — UI walkthrough (sidebar / bell / wizard / Mustache autocomplete / env promotion / CMD-K).

Plan + spec

  • Spec: docs/superpowers/specs/2026-04-19-alerting-design.md §9, §12, §13, §17.
  • Plan: docs/superpowers/plans/2026-04-20-alerting-03-ui.md.

Supersedes chore/openapi-regen-post-plan02 — delete that branch after merge.

Test plan

  • Frontend unit suites: 47/47 pass across 15 files (cd ui && npm test). Covers query hooks, CM6 completion + linter, Mustache variable registry, wizard form-state, promotion prefill, AlertStateChip, SeverityBadge, NotificationBell, usePageVisible.
  • Frontend TypeScript clean: cd ui && npx tsc -p tsconfig.app.json --noEmit → zero errors.
  • Frontend build succeeds: cd ui && npm run build (RuleEditorWizard chunk ~120 KB gzip incl. CM6).
  • Backend Plan 03 suites: 11/11 pass (mvn -pl cameleer-server-app -am test -Dtest='SsrfGuardTest,AlertingMetricsCachingTest,OutboundConnectionSsrfIT' -Dsurefire.failIfNoSpecifiedTests=false): 8 SsrfGuard + 2 AlertingMetrics caching + 1 SSRF admin-controller IT.
  • Regression: existing OutboundConnectionAdminControllerIT 9/9 pass with allow-private-targets=true in test profile.
  • Playwright E2E: 4/4 pass against the docker stack (cd ui && npx playwright test). Covers sidebar nav, rule CRUD via wizard, CMD-K open/close, silence create + end-early.
  • Manual smoke on a real deployment (post-merge).
  • Follow-up backend cleanup: unify UserRepository storage with alerting/outbound controller stripping so the compose seeder becomes redundant.

End-to-end fire → ack → clear is covered server-side by Plan 02's AlertingFullLifecycleIT. UI E2E for that path would require event injection into ClickHouse and is out of scope.

🤖 Generated with Claude Code

## Summary Plan 03 delivers the full alerting UI on top of Plan 02's backend, plus two backend backfills (SSRF guard on outbound URLs + 30s TTL cache on `AlertingMetrics` gauges) and a complete local docker-compose stack mirroring the k8s manifests in `deploy/`. **UI additions** (all under `ui/src/pages/Alerts/` + `ui/src/components/`): - `/alerts/{inbox,all,history,rules,rules/new,rules/:id,silences}` routes (lazy-loaded). - Sidebar accordion section (Inbox / All / Rules / Silences / History) + `NotificationBell` in TopBar that polls `/alerts/unread-count` every 30s (paused when tab hidden via TanStack Query `refetchIntervalInBackground`). - 5-step rule editor wizard (Scope / Condition / Trigger / Notify / Review) with 6 kind-specific condition sub-forms (`ROUTE_METRIC`, `EXCHANGE_MATCH`, `AGENT_STATE`, `DEPLOYMENT_STATE`, `LOG_PATTERN`, `JVM_METRIC`). - Env-promotion flow (pure client-side URL prefill with warnings for cross-env agents and disallowed outbound connections — no new REST endpoint). - Shared `<MustacheEditor />` (CodeMirror 6) with variable autocomplete + inline linter. Registry (`alert-variables.ts`) mirrors `NotificationContextBuilder` leaves. - `AlertStateChip`, `SeverityBadge`, `InboxPage` (bulk-read), `AllAlertsPage` (state filter), `HistoryPage`, `RulesListPage` (enable/disable + delete + promote), `SilencesPage` (matcher-based create + end-early). - CMD-K integration — `alert` + `alertRule` result categories via the existing `LayoutShell` searchData extension point. - TanStack Query hooks: `alerts.ts`, `alertRules.ts`, `alertSilences.ts`, `alertNotifications.ts`, `alertMeta.ts`. All env-scoped via `useSelectedEnv`. **Backend backfills**: - `SsrfGuard` — rejects outbound webhook URLs that resolve to loopback, link-local, RFC-1918 private ranges, or IPv6 ULA. Wired into `OutboundConnectionServiceImpl.create/update`. Bypass via `cameleer.server.outbound-http.allow-private-targets=true` for dev. - `AlertingMetrics` gauges now wrap their Postgres-backed suppliers in a 30s TTL cache so Prometheus scrapes don't produce per-scrape DB queries (final-review NIT from Plan 02). - Hotfix during E2E: `@Autowired` on the `AlertingMetrics` production constructor so Spring picks it over the package-private test-friendly one (Task 29 refactor had introduced ambiguity). **Docker stack** (new `docker-compose.yml` mirroring `deploy/`): - `cameleer-postgres` (matches `deploy/cameleer-postgres.yaml`). - `cameleer-clickhouse` (matches `deploy/cameleer-clickhouse.yaml`; `CLICKHOUSE_DB=cameleer`). - `cameleer-server` built from the repo `Dockerfile` (`REGISTRY_TOKEN` now optional — cameleer-common is public). - `cameleer-ui` built from `ui/Dockerfile` on host `:8080` so Vite dev (`npm run dev:local`) keeps `:5173` free. - `cameleer-seed` one-shot service that seeds `user_id='admin'` in `tenant_default.users` after the server is healthy, bridging a pre-existing FK mismatch between `UserRepository` storage (prefixed `user:admin`) and alerting-controller usage (stripped `admin`). The root-cause fix belongs in a future backend cleanup. **Docs + rules**: - `.claude/rules/ui.md` — Alerts section mapping every Plan 03 UI surface. - `docs/alerting.md` — UI walkthrough (sidebar / bell / wizard / Mustache autocomplete / env promotion / CMD-K). ## Plan + spec - Spec: `docs/superpowers/specs/2026-04-19-alerting-design.md` §9, §12, §13, §17. - Plan: `docs/superpowers/plans/2026-04-20-alerting-03-ui.md`. Supersedes `chore/openapi-regen-post-plan02` — delete that branch after merge. ## Test plan - [x] Frontend unit suites: **47/47 pass** across 15 files (`cd ui && npm test`). Covers query hooks, CM6 completion + linter, Mustache variable registry, wizard form-state, promotion prefill, AlertStateChip, SeverityBadge, NotificationBell, usePageVisible. - [x] Frontend TypeScript clean: `cd ui && npx tsc -p tsconfig.app.json --noEmit` → zero errors. - [x] Frontend build succeeds: `cd ui && npm run build` (RuleEditorWizard chunk ~120 KB gzip incl. CM6). - [x] Backend Plan 03 suites: **11/11 pass** (`mvn -pl cameleer-server-app -am test -Dtest='SsrfGuardTest,AlertingMetricsCachingTest,OutboundConnectionSsrfIT' -Dsurefire.failIfNoSpecifiedTests=false`): 8 SsrfGuard + 2 AlertingMetrics caching + 1 SSRF admin-controller IT. - [x] Regression: existing `OutboundConnectionAdminControllerIT` **9/9 pass** with `allow-private-targets=true` in test profile. - [x] Playwright E2E: **4/4 pass** against the docker stack (`cd ui && npx playwright test`). Covers sidebar nav, rule CRUD via wizard, CMD-K open/close, silence create + end-early. - [ ] Manual smoke on a real deployment (post-merge). - [ ] Follow-up backend cleanup: unify `UserRepository` storage with alerting/outbound controller stripping so the compose seeder becomes redundant. End-to-end `fire → ack → clear` is covered server-side by Plan 02's `AlertingFullLifecycleIT`. UI E2E for that path would require event injection into ClickHouse and is out of scope. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claude added 39 commits 2026-04-20 16:20:07 +02:00
32 tasks across 10 phases:
 - Foundation: Vitest, CodeMirror 6, Playwright scaffolding + schema regen.
 - API: env-scoped query hooks for alerts/rules/silences/notifications.
 - Components: AlertStateChip, SeverityBadge, NotificationBell (with tab-hidden poll pause), MustacheEditor (CM6 with variable autocomplete + linter).
 - Routes: /alerts/* section with sidebar accordion; bell mounted in TopBar.
 - Pages: Inbox / All / History / Rules (with env promotion) / Silences.
 - Wizard: 5-step editor with kind-specific condition forms + test-evaluate + render-preview + prefill warnings.
 - CMD-K: alerts + rules sources via LayoutShell extension.
 - Backend backfills: SSRF guard on outbound URL + 30s AlertingMetrics gauge cache.
 - Final: Playwright smoke, .claude/rules/ui.md + admin-guide updates, full build/test/PR.

Decisions: CM6 over Monaco/textarea (90KB gzipped, ARIA-conformant); CMD-K extension via existing LayoutShell searchData (not a new registry); REST-API-driven tests per project test policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prepares for Plan 03 unit tests (MustacheEditor, NotificationBell, wizard step
validation). jsdom environment + jest-dom matchers + canary test verifies the
wiring.
Install @codemirror/{view,state,autocomplete,commands,language,lint}
and @lezer/common — needed by Phase 3's MustacheEditor (Task 13).
CM6 picked over a raw textarea for its small incremental-rendering
bundle, full ARIA/keyboard support, and pluggable autocomplete +
linter APIs that map cleanly to Mustache token parsing.

Add ui/playwright.config.ts wiring Task 30's E2E smoke:
- testDir ./src/test/e2e, single worker, trace+screenshot on failure
- webServer launches `npm run dev:local` (backend on :8081 required)
- PLAYWRIGHT_BASE_URL env var skips the dev server for CI against a
  pre-deployed UI

Add test:e2e / test:e2e:ui npm scripts and exclude Playwright's
test-results/ and playwright-report/ from git. @playwright/test
itself was already in devDependencies from an earlier task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fetched from http://192.168.50.86:30090/api/v1/api-docs via
`npm run generate-api:live`. Adds TypeScript types for the new alerting
REST surface merged in #140:

- 15 alerting paths under /environments/{envSlug}/alerts/** (rules CRUD,
  enable/disable, render-preview, test-evaluate, inbox, unread-count,
  ack/read/bulk-read, silences CRUD, per-alert notifications)
- 1 flat notification retry path /alerts/notifications/{id}/retry
- 4 outbound-connection admin paths (from Plan 01 #139)

Verified tsc -p tsconfig.app.json --noEmit exits 0 — no existing SPA
call sites break against the fresh types. Plan 03 UI work can consume
these directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds env-scoped hooks for the alerts inbox:
- useAlerts (30s poll, background-paused, filter-aware)
- useAlert, useUnreadCount (30s poll)
- useAckAlert, useMarkAlertRead, useBulkReadAlerts (mutations that
  invalidate the alerts query key tree + unread-count)

Test file uses .tsx because the QueryClientProvider wrapper relies on
JSX; vitest picks up both .ts and .tsx via the configured include glob.
Client mock targets the actual export name (`api` in ../client) rather
than the `apiClient` alias that alertMeta re-exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan 03 prose had 'alertInstanceIds'; backend record is 'instanceIds'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
State colors follow the convention from @cameleer/design-system (CRITICAL->error,
WARNING->warning, INFO->auto). Silenced pill stacks next to state for the spec
section 8 audit-trail surface.
Adds a header bell component linking to /alerts/inbox with an unread-count
badge for the selected environment. Polling pauses when the tab is hidden
via TanStack Query's refetchIntervalInBackground:false (already set on
useUnreadCount); the new usePageVisible hook gives components a
re-renders-on-visibility-change signal for future defense-in-depth.

Plan-prose deviation: the plan assumed UnreadCountResponse carries a
bySeverity map for per-severity badge coloring, but the backend DTO only
exposes a scalar `count`. The bell reads `data?.count` and renders a single
var(--error) tint; a TODO references spec §13 for future per-severity work
that would require expanding the DTO.

Tests: usePageVisible toggles on visibilitychange events; NotificationBell
renders the bell with no badge at count=0 and shows "3" at count=3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NotificationBell used a usePageVisible() subscription that re-rendered on
every visibilitychange without consuming the value. TanStack Query's
refetchIntervalInBackground:false already pauses polling; the extra
subscription was speculative generality. Dropped the import + call + JSDoc
reference; usePageVisible hook + test retained as a reusable primitive.

Also: alerts.test.tsx 'returns the server payload unmodified' asserted a
pre-plan {total, bySeverity} shape, but UnreadCountResponse is actually
{count}. Fixed mock + assertion to {count: 3}.
ALERT_VARIABLES mirrors the spec §8 context map. availableVariables(kind)
returns the kind-specific filter (always vars + kind vars). extractReferences
+ unknownReferences drive the inline amber linter. Backend NotificationContext
adds must land here too.
Plan prose had spec §8 idealized leaves, but the backend NotificationContext
only emits a subset:
  ROUTE_METRIC / EXCHANGE_MATCH → route.id + route.uri (uri added)
  LOG_PATTERN → log.pattern + log.matchCount (renamed from log.logger/level/message)
  app.slug / app.id → scoped to non-env kinds (removed from 'always')
  exchange.link / alert.comparator / alert.window / app.displayName → removed (backend doesn't emit)

Without this alignment the Task 11 linter would (1) flag valid route.uri as
unknown, (2) suggest log.{logger,level,message} as valid paths that render
empty, and (3) flag app.slug on env-wide rules.
completion fires after {{ and narrows as the user types; apply() closes the
tag automatically. Linter raises an error on unclosed {{, a warning on
references that aren't in the allowed-variable set for the current condition
kind. Kind-specific allowed set comes from availableVariables().
Wires the mustache-completion source and mustache-linter into a CodeMirror 6
EditorView. Accepts kind (filters variables) and reducedContext (env-only for
connection URLs). singleLine prevents newlines for URL/header fields. Host
ref syncs when the parent replaces value (promotion prefill).
Adds 6 lazy-loaded route entries for the alerting UI (Inbox, All, History,
Rules list, Rule editor wizard, Silences) plus an `/alerts` → `/alerts/inbox`
redirect. Page components are placeholder stubs to be replaced in Phase 5/6/7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `buildAlertsTreeNodes` to sidebar-utils and renders an Alerts section
between Applications and Starred in LayoutShell. The section uses an
accordion pattern — entering `/alerts/*` collapses apps/admin/starred and
restores their state on leave.

gitnexus_impact(LayoutContent, upstream) = LOW (0 direct callers; rendered
only by LayoutShell's provider wrapper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renders the `<NotificationBell />` as the first child of `<TopBar>` (before
`<SearchTrigger>`). The bell links to `/alerts/inbox` and shows the unread
alert count for the currently selected environment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AlertRow is reused by AllAlertsPage and HistoryPage. Marking a row as read
happens when its link is followed (the detail sub-route will be added in
phase 10 polish). FIRING rows get an amber left border.
AllAlertsPage: state filter chips (Open/Firing/Acked/All).
HistoryPage: RESOLVED filter, respects retention window.
Promotion dropdown builds a /alerts/rules/new URL with promoteFrom, ruleId,
and targetEnv query params — the wizard will read these in Task 24 and
pre-fill the form with source-env prefill + client-side warnings.
Wizard navigates 5 steps (scope/condition/trigger/notify/review) with
per-step validation. form-state module is the single source of truth for
the rule form; initialForm/toRequest/validateStep are unit-tested (6
tests). Step components are stubbed and will be implemented in Tasks
20-24. prefillFromPromotion is a thin wrapper in this commit; Task 24
rewrites it to compute scope-adjustment warnings.

Deviation notes:
 - FormState.targets uses {kind, targetId} to match AlertRuleTarget DTO
   field names (plan draft had targetKind).
 - toRequest casts through Record<string, unknown> so the spread over
   the Partial<AlertCondition> union typechecks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Name, description, severity, scope-kind radio, and cascading app/route/
agent selectors driven by catalog + agents data. Adjusts condition
routing by clearing routeId/agentId when the app changes.

Deviation: DS Select uses native event-based onChange; plan draft had
a value-based signature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each condition kind (ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE,
DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC) renders its own payload-shape
form. Changing the kind resets the condition payload to {kind, scope} so
stale fields from a previous kind don't leak into the save request.

Deviation: DS Select uses native event-based onChange. Plan draft showed
a value-based signature (onChange(v) => ...).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three numeric inputs for evaluation cadence, for-duration, and
re-notification window, plus a Test evaluate button for saved rules.
TestEvaluateRequest is empty on the wire (server uses the rule id), so
we send {} and rely on the backend to evaluate the current saved state.

Deviation: plan draft passed {condition: toRequest(form).condition} into
the request body. The generated TestEvaluateRequest type is
Record<string, never>, so we send an empty body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Title and message use MustacheEditor with kind-specific autocomplete.
Preview button posts to the render-preview endpoint and shows rendered
title/message inline. Targets combine users/groups/roles into a unified
Badge pill list. Webhook picker filters to outbound connections allowed
in the current env (spec 6, allowed_environment_ids). Header overrides
use plain Input rather than MustacheEditor for now.

Deviations:
 - RenderPreviewRequest is Record<string, never>, so we send {} instead
   of {titleTemplate, messageTemplate}; backend resolves from rule state.
 - RenderPreviewResponse has {title, message} (plan draft used
   renderedTitle/renderedMessage).
 - Button size="sm" not "small" (DS only accepts sm|md).
 - Target kind field renamed from targetKind to kind to match
   AlertRuleTarget DTO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review step dumps a human summary plus raw request JSON, and (when a
setter is supplied) offers an Enabled-on-save Toggle. Promotion prefill
now returns {form, warnings}: clears agent IDs (per-env), flags missing
apps in target env, and flags webhook connections not allowed in target
env. 4 Vitest cases cover copy-name, agent clear, app-missing, and
webhook-not-allowed paths.

The wizard now consumes {form, warnings}; Task 25 renders the warnings
banner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fetches target-env apps (useCatalog) and env-allowed outbound
connections, passes them to prefillFromPromotion, and renders the
returned warnings in an amber banner above the step nav. Warnings list
the field name and the remediation message so users see crossings that
need manual adjustment before saving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matcher accepts ruleId and/or appSlug. Server enforces endsAt > startsAt
(V12 CHECK constraint) and matcher_matches() at dispatch time (spec §7).
Extends operationalSearchData with open alerts (FIRING|ACKNOWLEDGED) and
all rules. Badges convey severity + state. Selecting an alert navigates to
/alerts/inbox/{id}; a rule navigates to /alerts/rules/{id}. Uses the
existing CommandPalette extension point — no new registry.
Rejects webhook URLs that resolve to loopback, link-local, or RFC-1918
private ranges (IPv4 + IPv6 ULA fc00::/7). Enforced on both create and
update in OutboundConnectionServiceImpl before persistence; returns 400
Bad Request with "private or loopback" in the body.

Bypass via `cameleer.server.outbound-http.allow-private-targets=true`
for dev environments where webhooks legitimately point at local
services. Production default is `false`.

Test profile sets the flag to `true` in application-test.yml so the
existing ITs that post webhooks to WireMock on https://localhost:PORT
keep working. A dedicated OutboundConnectionSsrfIT overrides the flag
back to false (via @TestPropertySource + @DirtiesContext) to exercise
the reject path end-to-end through the admin controller.

Plan 01 scope; required before SaaS exposure (spec §17).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prometheus scrapes can fire every few seconds. The open-alerts / open-rules
gauges query Postgres on each read — caching the values for 30s amortises
that to one query per half-minute. Addresses final-review NIT from Plan 02.

- Introduces a package-private TtlCache that wraps a Supplier<Long> and
  memoises the last read for a configurable Duration against a Supplier<Instant>
  clock.
- Wraps each gauge supplier (alerting_rules_total{enabled|disabled},
  alerting_instances_total{state}) in its own TtlCache.
- Adds a test-friendly constructor (package-private) taking explicit
  Duration + Supplier<Instant> so AlertingMetricsCachingTest can advance
  a fake clock without waiting wall-clock time.
- Adds AlertingMetricsCachingTest covering:
  * supplier invoked once per TTL across repeated scrapes
  * 29 s elapsed → still cached; 31 s elapsed → re-queried
  * gauge value reflects the cached result even after delegate mutates

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.claude/rules/ui.md now maps every Plan 03 UI surface. Admin guide gains
an inbox/rules/silences walkthrough so ops teams can start in the UI
without reading the spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the k8s manifests in deploy/ as a local dev stack:
  - cameleer-postgres   (matches deploy/cameleer-postgres.yaml)
  - cameleer-clickhouse (matches deploy/cameleer-clickhouse.yaml, default CLICKHOUSE_DB=cameleer)
  - cameleer-server     (built from Dockerfile, env mirrors deploy/base/server.yaml)
  - cameleer-ui         (built from ui/Dockerfile, served on host :8080 to leave :5173 free for Vite dev)

Dockerfile + ui/Dockerfile: REGISTRY_TOKEN is now optional (empty → skip Maven/npm auth).
cameleer-common package is public, so anonymous pulls succeed; private packages still require the token.

Backend defaults tuned for local E2E:
  - RUNTIME_ENABLED=false (no Docker-in-Docker deployments in dev stack)
  - OUTBOUND_HTTP_ALLOW_PRIVATE_TARGETS=true (so webhook tests can target host.docker.internal etc.)
  - UIUSER/UIPASSWORD=admin/admin (matches Playwright E2E_ADMIN_USER/PASS defaults)
  - CORS includes both :5173 (Vite) and :8080 (nginx)
Task 29's refactor added a package-private test-friendly constructor
alongside the public production one. Without @Autowired Spring cannot pick
which constructor to use for the @Component, and falls back to searching
for a no-arg default — crashing startup with 'No default constructor found'.

Detected when launching the server via the new docker-compose stack; unit
tests still pass because they invoke the package-private test constructor
directly.
- RouteMetricForm dropped P95_LATENCY_MS — not in cameleer-server-core
  RouteMetric enum (valid: ERROR_RATE, P99_LATENCY_MS, AVG_DURATION_MS,
  THROUGHPUT, ERROR_COUNT).
- initialForm now returns a ready-to-save ROUTE_METRIC condition
  (metric=ERROR_RATE, comparator=GT, threshold=0.05, windowSeconds=300),
  so clicking through the wizard with all defaults produces a valid rule.
  Prevents a 400 'missing type id property kind' + 400 on condition enum
  validation if the user leaves the condition step untouched.
Alerting + outbound controllers resolve acting user via
authentication.name with 'user:' prefix stripped → 'admin'. But
UserRepository.upsert stores env-admin as 'user:admin' (JWT sub format).
The resulting FK mismatch manifests as 500 'alert_rules_created_by_fkey'
on any create operation in a fresh docker stack.

Workaround: run-once 'cameleer-seed' compose service runs psql against
deploy/docker/postgres-init.sql after the server is healthy (i.e. after
Flyway migrations have created tenant_default.users), inserting
user_id='admin' idempotently. The root-cause fix belongs in the backend
(either stop stripping the prefix in alerting/outbound controllers, or
normalise storage to the unprefixed form) and is out of scope for
Plan 03.
test(ui/alerts): Playwright E2E smoke (sidebar, rule CRUD, CMD-K, silence CRUD)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m10s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 2m34s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 5m11s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 40s
1ebc2fa71e
fixtures.ts: auto-applied login fixture — visits /login?local to skip OIDC
auto-redirect, fills username/password via label-matcher, clicks 'Sign in',
then selects the 'default' env so alerting hooks enable (useSelectedEnv gate).
Override via E2E_ADMIN_USER + E2E_ADMIN_PASS.

alerting.spec.ts: 4 tests against the full docker-compose stack:
 - sidebar Alerts accordion → /alerts/inbox
 - 5-step wizard: defaults-only create + row delete (unique timestamp name
   avoids strict-mode collisions with leftover rules)
 - CMD-K palette via SearchTrigger click (deterministic; Ctrl+K via keyboard
   is flaky when the canvas doesn't have focus)
 - silence matcher-based create + end-early

DS FormField renders labels as generics (not htmlFor-wired), so inputs are
targeted by placeholder or label-proximity locators instead of getByLabel.

Does not exercise fire→ack→clear; that's covered backend-side by
AlertingFullLifecycleIT (Plan 02). UI E2E for that path would need event
injection into ClickHouse, out of scope for this smoke.
hsiegeln merged commit ec460faf02 into main 2026-04-20 16:27:49 +02:00
Sign in to join this conversation.