Commit Graph

1371 Commits

Author SHA1 Message Date
hsiegeln
c87c77c1cf refactor(alerts/ui): slim alerts-page.module.css to layout-only DS tokens
Drop the feed-row classes (.row, .rowUnread, .body, .meta, .time,
.message, .actions, .empty) — these are replaced by DS DataTable +
EmptyState in follow-up tasks. Keep layout helpers for page shell,
toolbar, filter bar, bulk-action bar, title cell, and DataTable
expanded content. All colors / spacing use DS tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:03:20 +02:00
hsiegeln
b16ea8b185 feat(alerts/ui): add shared renderAlertExpanded for DataTable rows
Extracts the per-row detail block used by Inbox/All/History DataTables
so the three pages share one rendering. Consumes AlertDto fields that
are nullable in the schema; hides missing fields instead of rendering
placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:01:39 +02:00
hsiegeln
4a63149338 feat(alerts/ui): add formatRelativeTime helper
Formats ISO timestamps as `Nm ago` / `Nh ago` / `Nd ago`, falling back
to an absolute locale date string for values older than 30 days. Used
by the alert DataTable Age column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 10:00:15 +02:00
hsiegeln
a2b2ccbab7 feat(alerts/ui): add severityToAccent helper for DataTable rowAccent
Pure function mapping the 3-value AlertDto.severity enum to the 2-value
DataTable rowAccent prop. INFO maps to undefined (no tint) because the
DS DataTable rowAccent only supports error|warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:57:58 +02:00
hsiegeln
52a08a8769 docs(alerts): Implementation plan — design-system alignment for /alerts pages
Task-by-task TDD plan implementing the design spec. Splits the work
into 14 tasks: helper utilities (TDD), shared renderer, CSS token
migration, per-page rewrites (Inbox/All/History/Rules/Silences),
wizard banner migration, AlertRow deletion, E2E adaptation for
ConfirmDialog, and full verification pass. Each task produces an
atomic commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:49:47 +02:00
hsiegeln
3d0a4d289b docs(alerts): Design spec — design-system alignment for /alerts pages
Rework all pages under /alerts to use @cameleer/design-system components
and tokens. Unified DataTable shell for Inbox/All/History with expandable
rows; DataTable + Dropdown + ConfirmDialog for Rules list; FormField grid
+ DataTable for Silences; DS Alert for wizard banners. Replaces undefined
CSS variables (--bg, --fg, --muted, --accent) with DS tokens and removes
raw <table>/<select>/confirm() usage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:43:19 +02:00
hsiegeln
037a27d405 fix(alerting): allow multiple open alert_instances per rule for PER_EXCHANGE
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m51s
CI / docker (push) Successful in 1m17s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
V13 added a partial unique index on alert_instances(rule_id) WHERE state
IN (PENDING,FIRING,ACKNOWLEDGED). Correct for scalar condition kinds
(ROUTE_METRIC / AGENT_STATE / DEPLOYMENT_STATE / LOG_PATTERN / JVM_METRIC
/ EXCHANGE_MATCH in COUNT_IN_WINDOW) but wrong for EXCHANGE_MATCH /
PER_EXCHANGE, which by design emits one alert_instance per matching
exchange. Under V13 every PER_EXCHANGE tick with >1 match logged
"Skipped duplicate open alert_instance for rule …" at evaluator cadence
and silently lost alert fidelity — only the first matching exchange per
tick got an AlertInstance + webhook dispatch.

V15 drops the rule_id-only constraint and recreates it with a
discriminator on context->'exchange'->>'id'. Scalar kinds emit
Map.of() as context, so their expression resolves to '' — "one open per
rule" preserved. ExchangeMatchEvaluator.evaluatePerExchange always
populates exchange.id, so per-exchange instances coexist cleanly.

Two new PostgresAlertInstanceRepositoryIT tests:
  - multiple open instances for same rule + distinct exchanges all land
  - second open for identical (rule, exchange) still dedups via the
    DuplicateKeyException fallback in save() — defense-in-depth kept

Also fixes pre-existing PostgresAlertReadRepositoryIT brokenness: its
setup() inserted 3 open instances sharing one rule_id, which V13 blocked
on arrival. Migrate to one rule_id per instance (pattern already used
across other storage ITs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:26:19 +02:00
hsiegeln
e7ce1a73d0 docs(alerting): Plan 04 implementation plan — post-ship hardening
13 atomic commits covering 5 hardening tasks:

  Task 1-2: @Schema(discriminatorMapping) on AlertCondition, derive
            polymorphic unions in enums.ts from schema
  Task 3-7: AgentState / DeploymentStatus / LogLevel / ExecutionStatus
            enum migrations + @Schema(allowableValues) on JvmMetric
  Task 8:   ContextStartupSmokeTest (unit-tier, no Testcontainers)
  Task 9-12: AlertTemplateVariables registry + round-trip test +
             SSOT endpoint + UI consumer
  Task 13:  alerting-editor.spec.ts Playwright spec

Each task has bite-sized write-test/red/green/commit steps with
exact paths and full code. Pre-flight SQL check and post-flight
self-verification scripts included.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:54:09 +02:00
hsiegeln
46867cc659 docs(alerting): Plan 04 design spec — post-ship hardening
Closes the loop on three bug classes from Plan 03 triage:
context-load regressions (missing @Autowired), UI/backend drift
on template variables, and hand-maintained TS enum unions caused
by springdoc polymorphic schema quirk.

Covers 5 tasks: context-startup smoke test, template-variables
SSOT endpoint, second Playwright spec, String-to-enum migrations
on 5 condition fields, and @DiscriminatorMapping on AlertCondition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:44:41 +02:00
hsiegeln
efa8390108 fix(alerting): reject null fireMode on ExchangeMatchCondition + repair in-flight rows
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m2s
CI / docker (push) Successful in 1m20s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 5m31s
The rule editor wizard reset the condition payload on kind-change without
seeding a fireMode default; the ExchangeMatchCondition ctor allowed null to
pass through; AlertEvaluatorJob then NPE-looped every tick on a saved rule.

- core: compact ctor now rejects null fireMode (Jackson-deser path only — all
  production callers already pass a value).
- V14: repair existing EXCHANGE_MATCH rows with fireMode=null to
  PER_EXCHANGE + perExchangeLingerSeconds=300 (default matches the wizard).
- ui: ConditionStep.onKindChange seeds EXCHANGE_MATCH defaults so the
  Select's displayed fallback ("Per exchange") is actually in form state.
- ui: validateStep('condition', ...) now enforces fireMode presence + the
  mode-specific fields before the user reaches Review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:05:55 +02:00
hsiegeln
e590682f8f refactor(ui/alerts): address code-review findings on alerting-enums
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m3s
CI / docker (push) Successful in 1m22s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Follow-up to 83837ada addressing the critical-review feedback:

- Duplicate ConditionKind type consolidated: the one in
  api/queries/alertRules.ts (which was nullable — wrong) is gone;
  single source of truth lives in this module.

- Module moved out of api/ into pages/Alerts/ where it belongs.
  api/ is the data layer; labels + hide lists are view-layer concerns.

- Hidden values formalised: Comparator.EQ and JvmAggregation.LATEST
  are intentionally not surfaced in dropdowns (noisy / wrong feature
  boundary, see in-file comments). They remain in the type unions so
  rules that carry those values save/load correctly — we just don't
  advertise them in the UI.

- JvmAggregation declaration order restored to MAX/AVG/MIN (matches
  what users saw before 83837ada). LATEST declared last; hidden.

- Snapshot tests for every visible *_OPTIONS array — reviewer signal
  in future PRs when a backend enum change or hide-list edit
  silently reshapes the dropdown.

- `toOptions` gains a JSDoc noting that label-map declaration order
  is load-bearing (ES2015 Object.keys insertion-order guarantee).

- **Honest about the springdoc schema quirk**: the generated
  polymorphic condition types resolve to `never` at the TypeScript
  level (two conflicting `kind` discriminators — the class-name
  literal and the Jackson enum — intersect to never), which silently
  defeated `Record<T, string>` exhaustiveness. The previous commit's
  "schema-derived enums" claim was accurate only for the flat-field
  enums (ConditionKind, Severity, TargetKind); condition-specific
  enums (RouteMetric, Comparator, JvmAggregation, ExchangeFireMode)
  were silently `never`. Those are now declared as hand-written
  string-literal unions with a top-of-file comment spelling out the
  issue and the regen-and-compare workflow. Real upstream fix is a
  backend-side adjustment to how springdoc emits polymorphic
  `@JsonSubTypes` — out of scope for this phase.

Verified: ui build green, 56/56 vitest pass (49 pre-existing + 7
new enum snapshots).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 19:26:16 +02:00
hsiegeln
83837ada8f refactor(ui/alerts): derive option lists + form-state types from schema.d.ts
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m8s
CI / docker (push) Successful in 1m15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Closes item 5 on the Plan 03 cleanup triage. The option arrays
("METRICS", "COMPARATORS", KIND_OPTIONS, SEVERITY_OPTIONS, FIRE_MODES)
scattered across RouteMetricForm / JvmMetricForm / ExchangeMatchForm /
ConditionStep / ScopeStep were hand-typed string literals. They drifted
silently — P95_LATENCY_MS appeared in a dropdown without a backend
counterpart (caught at runtime in bcde6678); JvmMetric.LATEST and
Comparator.EQ existed on the backend but were missing from the UI all
along.

Fix: new `ui/src/api/alerting-enums.ts` derives every enum from
schema.d.ts and pairs each with a `Record<T, string>` label map.
TypeScript enforces exhaustiveness — adding or removing a backend
value fails the build of this file until the label map is updated.
Every consumer imports the generated `*_OPTIONS` array.

Covered (schema-derived):
  - ConditionKind            → CONDITION_KIND_OPTIONS
  - Severity                 → SEVERITY_OPTIONS
  - RouteMetric              → ROUTE_METRIC_OPTIONS
  - Comparator               → COMPARATOR_OPTIONS (adds EQ that was missing)
  - JvmAggregation           → JVM_AGGREGATION_OPTIONS (adds LATEST that was missing)
  - ExchangeMatch.fireMode   → EXCHANGE_FIRE_MODE_OPTIONS
  - AlertRuleTarget.kind     → TARGET_KIND_OPTIONS

form-state.ts: `severity: 'CRITICAL' | 'WARNING' | 'INFO'` and
`kind: 'USER' | 'GROUP' | 'ROLE'` literal unions swapped for the
derived `Severity` / `TargetKind` aliases.

Not covered, backend types them as `String` (no `@Schema(allowableValues)`
annotation yet):
  - AgentStateCondition.state
  - DeploymentStateCondition.states
  - LogPatternCondition.level
  - ExchangeFilter.status
  - JvmMetricCondition.metric

These stay hand-typed with a pointer-comment. Follow-up: add
`@Schema(allowableValues = …)` to the Java record components so the
enums land in schema.d.ts; then fold them into alerting-enums.ts.

Plus: gitnexus index-stats refresh in AGENTS.md/CLAUDE.md from the
post-deploy reindex.

Verified: ui build green, 49/49 vitest pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 19:02:52 +02:00
hsiegeln
f8c1ba4988 docs(auth): document user_id convention and write-path shape
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m2s
CI / docker (push) Successful in 1m20s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m41s
After the UiAuth / Oidc / UserAdmin controllers were aligned to store
bare user_ids, the rules that future sessions read were still describing
the old behaviour (OutboundConnectionAdminController "strips user:
prefix" — true mechanically but the subtlety is that the strip is
the bridge between a prefixed JWT subject and an unprefixed DB key,
not a hack).

- CLAUDE.md: expand the User persistence one-liner to state the
  convention authoritatively (local `<username>`, OIDC `oidc:<sub>`,
  JWT `user:` namespace, env-scoped controllers strip for FK).
- .claude/rules/app-classes.md:
  - Add "User ID conventions" section near the top that spells out
    write-path vs read-path behaviour in one place.
  - Add UiAuthController + OidcAuthController entries under
    security/ with their upsert shape documented.
  - Soften the OutboundConnectionAdminController line to reference
    the convention instead of restating the mechanism.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:49:22 +02:00
hsiegeln
ae6473635d fix(auth): OidcAuthController + UserAdminController upsert unprefixed
Follow-up to the UiAuthController fix: every write path that puts a row
into users/user_roles/user_groups must use the bare DB key, because
the env-scoped controllers (Alert, AlertRule, AlertSilence, Outbound)
strip "user:" before using the name as an FK. If the write path stores
prefixed, first-time alerting/outbound writes fail with
alert_rules_created_by_fkey violation.

UiAuthController shipped the model in the prior commit (bare userId
for all DB/RBAC calls, "user:"-namespaced subject for JWT signing).
Bringing the other two write paths in line:

- OidcAuthController.callback:
    userId  = "oidc:" + oidcUser.subject()    // DB key, no "user:"
    subject = "user:" + userId                // JWT subject (namespaced)
  All userRepository / rbacService / applyClaimMappings calls use
  userId. Tokens still carry the namespaced subject so
  JwtAuthenticationFilter can distinguish user vs agent tokens.

- UserAdminController.createUser: userId = request.username() (bare).
  resetPassword: dropped the "user:"-strip fallback that was only
  needed because create used to prefix — now dead.

No migration. Greenfield alpha product — any pre-existing prefixed
rows in a dev DB will become orphans on next login (login upserts
the unprefixed row, old prefixed row is harmless but unused).
Operators doing a clean re-index can wipe the DB.

Read-path controllers still strip — harmless for bare DB rows, and
OIDC humans (JWT sub "user:oidc:<s>") still resolve correctly to
the new DB key "oidc:<s>" after stripping.

Verified: 45/45 alerting + outbound ITs pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:44:17 +02:00
hsiegeln
6b5aefd4c2 docs(gitnexus): re-run analyze after cleanup-batch commits
Post-commit stats: 8524 nodes / 22174 edges / 415 clusters / 300 flows
(up from 8513/22146/409/300 after the five cleanup-batch commits).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:27:27 +02:00
hsiegeln
1ea0258393 fix(auth): upsert UI login user_id unprefixed (drop docker seeder workaround)
Root cause of the mismatch that prompted the one-shot cameleer-seed
docker service: UiAuthController stored users.user_id as the JWT
subject "user:admin" (JWT sub format). Every env-scoped controller
(Alert, AlertSilence, AlertRule, OutboundConnectionAdmin) already
strips the "user:" prefix on the read path — so the rest of the
system expects the DB key to be the bare username. With UiAuth
storing prefixed, fresh docker stacks hit
"alert_rules_created_by_fkey violation" on the first rule create.

Fix: inside login(), compute `userId = request.username()` and use
it everywhere the DB/RBAC layer is touched (isLocked, getPasswordHash,
record/clearFailedLogins, upsert, assignRoleToUser, addUserToGroup,
getSystemRoleNames). Keep `subject = "user:" + userId` — we still
sign JWTs with the namespaced subject so JwtAuthenticationFilter can
distinguish user vs agent tokens.

refresh() and me() follow the same rule via a stripSubjectPrefix()
helper (JWT subject in, bare DB key out).

With the write path aligned, the docker bridge is no longer needed:
- Deleted deploy/docker/postgres-init.sql
- Deleted cameleer-seed service from docker-compose.yml

Scope: UiAuthController only. UserAdminController + OidcAuthController
still prefix on upsert — that's the bug class the triage identified
as "Option A or B either way OK". Not changing them now because:
  a) prod admins are provisioned unprefixed through some other path,
     so those two files aren't the docker-only failure observed;
  b) stripping them would need a data migration for any existing
     prod users stored prefixed, which is out of scope for a cleanup
     phase. Follow-up worth scheduling if we ever wire OIDC or admin-
     created users into alerting FKs.

Verified: 33/33 alerting+outbound controller ITs pass (9 outbound,
10 rules, 9 silences, 5 alert inbox).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:26:03 +02:00
hsiegeln
09b49f096c feat(alerting): per-severity breakdown on unread-count DTO
Spec §13 calls for the notification bell to colour-code by highest
unread severity (CRITICAL → error, WARNING → amber, INFO → muted).
The old { count } DTO forced the UI to pick one static colour, so
NotificationBell shipped with a TODO. Grow the contract instead:

  UnreadCountResponse = { total, bySeverity: { CRITICAL, WARNING, INFO } }

Guarantees:
- every severity is always present with a >=0 value (no undefined
  keys on the wire), so the UI can branch without defaults.
- total = sum of bySeverity values — kept explicit on the wire for
  cheap top-line display, not recomputed client-side.

Backend
- AlertInstanceRepository: replaces countUnreadForUser(long) with
  countUnreadBySeverityForUser returning Map<AlertSeverity, Long>.
  One SQL round-trip per (env, user) — GROUP BY ai.severity over the
  same NOT EXISTS(alert_reads) filter.
- UnreadCountResponse.from(Map) normalises and defensively copies;
  missing severities default to 0.
- InAppInboxQuery.countUnread now returns the DTO, caches the full
  response (still 5s TTL) so severity breakdown gets the same
  hit-rate as the total did before.
- AlertController just hands the DTO back.

Breaking change — no backwards-compat shim: the `count` field is
gone. UI and tests updated in the same commit; there are no other
API consumers in the tree.

Frontend
- Regenerated openapi.json + schema.d.ts against a fresh build of
  the new backend.
- NotificationBell branches badge colour on the highest unread
  severity (CRITICAL > WARNING > INFO) via new CSS variants.
- Tests cover all four paths: zero, critical-present, warning-only,
  info-only.

Tests: 7 unit tests + 12 ITs (incl. new grouping + empty-map)
       + 49 vitest (was 46; +3 severity-branch assertions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:15:56 +02:00
hsiegeln
18cacb33ee docs(alerting): align @JsonTypeInfo spec with shipped code
Design spec and Plan 02 described AlertCondition polymorphism as
Id.DEDUCTION, but the code that shipped in PR #140 uses Id.NAME with
property="kind" and include=EXISTING_PROPERTY. The `kind` field is
real on every subtype and the DB stores it in a separate column
(condition_kind), so reading the discriminator directly is simpler
than deduction — update the docs to match. Also add `"kind"` to the
example JSON payloads so they match on-wire reality.

OutboundAuth (Plan 01) correctly still uses Id.DEDUCTION and is
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:04:17 +02:00
hsiegeln
d850d00bab docs(gitnexus): refresh index stats + repo name (alerting-02 → cameleer-server)
Re-ran `npx gitnexus analyze --embeddings` after PR #144 merge.
8513 symbols / 22146 relationships / 300 execution flows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:02:18 +02:00
hsiegeln
579b5f1a04 chore(ui): delete unused usePageVisible hook
Added as a reusable primitive during Plan 03 Task 9, but the intended
consumer (NotificationBell live-region refresh) was removed during
code review, leaving the hook unused. Delete it — YAGNI; reintroduce
when a real consumer shows up.

Verified upstream impact (gitnexus): 0 callers, LOW risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:02:04 +02:00
ec460faf02 Merge pull request 'feat(alerting): Plan 03 — UI + backfills (SSRF guard, metrics caching, docker stack)' (#144) from feat/alerting-03-ui into main
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m1s
CI / docker (push) Successful in 1m16s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Reviewed-on: #144
2026-04-20 16:27:49 +02:00
hsiegeln
1ebc2fa71e test(ui/alerts): Playwright E2E smoke (sidebar, rule CRUD, CMD-K, silence CRUD)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m10s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 2m34s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 5m11s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 40s
fixtures.ts: auto-applied login fixture — visits /login?local to skip OIDC
auto-redirect, fills username/password via label-matcher, clicks 'Sign in',
then selects the 'default' env so alerting hooks enable (useSelectedEnv gate).
Override via E2E_ADMIN_USER + E2E_ADMIN_PASS.

alerting.spec.ts: 4 tests against the full docker-compose stack:
 - sidebar Alerts accordion → /alerts/inbox
 - 5-step wizard: defaults-only create + row delete (unique timestamp name
   avoids strict-mode collisions with leftover rules)
 - CMD-K palette via SearchTrigger click (deterministic; Ctrl+K via keyboard
   is flaky when the canvas doesn't have focus)
 - silence matcher-based create + end-early

DS FormField renders labels as generics (not htmlFor-wired), so inputs are
targeted by placeholder or label-proximity locators instead of getByLabel.

Does not exercise fire→ack→clear; that's covered backend-side by
AlertingFullLifecycleIT (Plan 02). UI E2E for that path would need event
injection into ClickHouse, out of scope for this smoke.
2026-04-20 16:18:17 +02:00
hsiegeln
d88bede097 chore(docker): seeder service pre-creates unprefixed 'admin' user row
Alerting + outbound controllers resolve acting user via
authentication.name with 'user:' prefix stripped → 'admin'. But
UserRepository.upsert stores env-admin as 'user:admin' (JWT sub format).
The resulting FK mismatch manifests as 500 'alert_rules_created_by_fkey'
on any create operation in a fresh docker stack.

Workaround: run-once 'cameleer-seed' compose service runs psql against
deploy/docker/postgres-init.sql after the server is healthy (i.e. after
Flyway migrations have created tenant_default.users), inserting
user_id='admin' idempotently. The root-cause fix belongs in the backend
(either stop stripping the prefix in alerting/outbound controllers, or
normalise storage to the unprefixed form) and is out of scope for
Plan 03.
2026-04-20 16:18:07 +02:00
hsiegeln
bcde6678b8 fix(ui/alerts): align RouteMetric metric enum with backend; pre-populate ROUTE_METRIC defaults
- RouteMetricForm dropped P95_LATENCY_MS — not in cameleer-server-core
  RouteMetric enum (valid: ERROR_RATE, P99_LATENCY_MS, AVG_DURATION_MS,
  THROUGHPUT, ERROR_COUNT).
- initialForm now returns a ready-to-save ROUTE_METRIC condition
  (metric=ERROR_RATE, comparator=GT, threshold=0.05, windowSeconds=300),
  so clicking through the wizard with all defaults produces a valid rule.
  Prevents a 400 'missing type id property kind' + 400 on condition enum
  validation if the user leaves the condition step untouched.
2026-04-20 16:17:59 +02:00
hsiegeln
5edf7eb23a fix(alerting): @Autowired on AlertingMetrics production constructor
Task 29's refactor added a package-private test-friendly constructor
alongside the public production one. Without @Autowired Spring cannot pick
which constructor to use for the @Component, and falls back to searching
for a no-arg default — crashing startup with 'No default constructor found'.

Detected when launching the server via the new docker-compose stack; unit
tests still pass because they invoke the package-private test constructor
directly.
2026-04-20 16:02:48 +02:00
hsiegeln
1ed2d3a611 chore(docker): full-stack docker-compose mirroring deploy/ k8s manifests
Mirrors the k8s manifests in deploy/ as a local dev stack:
  - cameleer-postgres   (matches deploy/cameleer-postgres.yaml)
  - cameleer-clickhouse (matches deploy/cameleer-clickhouse.yaml, default CLICKHOUSE_DB=cameleer)
  - cameleer-server     (built from Dockerfile, env mirrors deploy/base/server.yaml)
  - cameleer-ui         (built from ui/Dockerfile, served on host :8080 to leave :5173 free for Vite dev)

Dockerfile + ui/Dockerfile: REGISTRY_TOKEN is now optional (empty → skip Maven/npm auth).
cameleer-common package is public, so anonymous pulls succeed; private packages still require the token.

Backend defaults tuned for local E2E:
  - RUNTIME_ENABLED=false (no Docker-in-Docker deployments in dev stack)
  - OUTBOUND_HTTP_ALLOW_PRIVATE_TARGETS=true (so webhook tests can target host.docker.internal etc.)
  - UIUSER/UIPASSWORD=admin/admin (matches Playwright E2E_ADMIN_USER/PASS defaults)
  - CORS includes both :5173 (Vite) and :8080 (nginx)
2026-04-20 15:52:24 +02:00
hsiegeln
f75ee9f352 docs(alerting): UI map + admin-guide walkthrough for Plan 03
.claude/rules/ui.md now maps every Plan 03 UI surface. Admin guide gains
an inbox/rules/silences walkthrough so ops teams can start in the UI
without reading the spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:55:36 +02:00
hsiegeln
9f109b20fd perf(alerting): 30s TTL cache on AlertingMetrics gauge suppliers
Prometheus scrapes can fire every few seconds. The open-alerts / open-rules
gauges query Postgres on each read — caching the values for 30s amortises
that to one query per half-minute. Addresses final-review NIT from Plan 02.

- Introduces a package-private TtlCache that wraps a Supplier<Long> and
  memoises the last read for a configurable Duration against a Supplier<Instant>
  clock.
- Wraps each gauge supplier (alerting_rules_total{enabled|disabled},
  alerting_instances_total{state}) in its own TtlCache.
- Adds a test-friendly constructor (package-private) taking explicit
  Duration + Supplier<Instant> so AlertingMetricsCachingTest can advance
  a fake clock without waiting wall-clock time.
- Adds AlertingMetricsCachingTest covering:
  * supplier invoked once per TTL across repeated scrapes
  * 29 s elapsed → still cached; 31 s elapsed → re-queried
  * gauge value reflects the cached result even after delegate mutates

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:22:54 +02:00
hsiegeln
5ebc729b82 feat(alerting): SSRF guard on outbound connection URL
Rejects webhook URLs that resolve to loopback, link-local, or RFC-1918
private ranges (IPv4 + IPv6 ULA fc00::/7). Enforced on both create and
update in OutboundConnectionServiceImpl before persistence; returns 400
Bad Request with "private or loopback" in the body.

Bypass via `cameleer.server.outbound-http.allow-private-targets=true`
for dev environments where webhooks legitimately point at local
services. Production default is `false`.

Test profile sets the flag to `true` in application-test.yml so the
existing ITs that post webhooks to WireMock on https://localhost:PORT
keep working. A dedicated OutboundConnectionSsrfIT overrides the flag
back to false (via @TestPropertySource + @DirtiesContext) to exercise
the reject path end-to-end through the admin controller.

Plan 01 scope; required before SaaS exposure (spec §17).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:17:44 +02:00
hsiegeln
f4c2cb120b feat(ui/alerts): CMD-K sources for alerts + alert rules
Extends operationalSearchData with open alerts (FIRING|ACKNOWLEDGED) and
all rules. Badges convey severity + state. Selecting an alert navigates to
/alerts/inbox/{id}; a rule navigates to /alerts/rules/{id}. Uses the
existing CommandPalette extension point — no new registry.
2026-04-20 14:09:39 +02:00
hsiegeln
8689643e11 feat(ui/alerts): SilencesPage with matcher-based create + end-early action
Matcher accepts ruleId and/or appSlug. Server enforces endsAt > startsAt
(V12 CHECK constraint) and matcher_matches() at dispatch time (spec §7).
2026-04-20 14:08:27 +02:00
hsiegeln
0191ca4b13 feat(ui/alerts): render promotion warnings in wizard banner
Fetches target-env apps (useCatalog) and env-allowed outbound
connections, passes them to prefillFromPromotion, and renders the
returned warnings in an amber banner above the step nav. Warnings list
the field name and the remediation message so users see crossings that
need manual adjustment before saving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:05:08 +02:00
hsiegeln
3963ea5591 feat(ui/alerts): ReviewStep + promotion prefill warnings
Review step dumps a human summary plus raw request JSON, and (when a
setter is supplied) offers an Enabled-on-save Toggle. Promotion prefill
now returns {form, warnings}: clears agent IDs (per-env), flags missing
apps in target env, and flags webhook connections not allowed in target
env. 4 Vitest cases cover copy-name, agent clear, app-missing, and
webhook-not-allowed paths.

The wizard now consumes {form, warnings}; Task 25 renders the warnings
banner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:04:04 +02:00
hsiegeln
816096f4d1 feat(ui/alerts): NotifyStep (MustacheEditor for title/message/body, targets, webhook bindings)
Title and message use MustacheEditor with kind-specific autocomplete.
Preview button posts to the render-preview endpoint and shows rendered
title/message inline. Targets combine users/groups/roles into a unified
Badge pill list. Webhook picker filters to outbound connections allowed
in the current env (spec 6, allowed_environment_ids). Header overrides
use plain Input rather than MustacheEditor for now.

Deviations:
 - RenderPreviewRequest is Record<string, never>, so we send {} instead
   of {titleTemplate, messageTemplate}; backend resolves from rule state.
 - RenderPreviewResponse has {title, message} (plan draft used
   renderedTitle/renderedMessage).
 - Button size="sm" not "small" (DS only accepts sm|md).
 - Target kind field renamed from targetKind to kind to match
   AlertRuleTarget DTO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:02:26 +02:00
hsiegeln
d42a6ca6a8 feat(ui/alerts): TriggerStep (evaluation interval, for-duration, re-notify, test-evaluate)
Three numeric inputs for evaluation cadence, for-duration, and
re-notification window, plus a Test evaluate button for saved rules.
TestEvaluateRequest is empty on the wire (server uses the rule id), so
we send {} and rely on the backend to evaluate the current saved state.

Deviation: plan draft passed {condition: toRequest(form).condition} into
the request body. The generated TestEvaluateRequest type is
Record<string, never>, so we send an empty body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:00:33 +02:00
hsiegeln
ef8c60c2b5 feat(ui/alerts): ConditionStep with 6 kind-specific forms
Each condition kind (ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE,
DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC) renders its own payload-shape
form. Changing the kind resets the condition payload to {kind, scope} so
stale fields from a previous kind don't leak into the save request.

Deviation: DS Select uses native event-based onChange. Plan draft showed
a value-based signature (onChange(v) => ...).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:59:55 +02:00
hsiegeln
f48fc750f2 feat(ui/alerts): ScopeStep (name, severity, env/app/route/agent selectors)
Name, description, severity, scope-kind radio, and cascading app/route/
agent selectors driven by catalog + agents data. Adjusts condition
routing by clearing routeId/agentId when the app changes.

Deviation: DS Select uses native event-based onChange; plan draft had
a value-based signature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:58:21 +02:00
hsiegeln
334e815c25 feat(ui/alerts): rule editor wizard shell + form-state module
Wizard navigates 5 steps (scope/condition/trigger/notify/review) with
per-step validation. form-state module is the single source of truth for
the rule form; initialForm/toRequest/validateStep are unit-tested (6
tests). Step components are stubbed and will be implemented in Tasks
20-24. prefillFromPromotion is a thin wrapper in this commit; Task 24
rewrites it to compute scope-adjustment warnings.

Deviation notes:
 - FormState.targets uses {kind, targetId} to match AlertRuleTarget DTO
   field names (plan draft had targetKind).
 - toRequest casts through Record<string, unknown> so the spread over
   the Partial<AlertCondition> union typechecks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:57:30 +02:00
hsiegeln
7e91459cd6 feat(ui/alerts): RulesListPage with enable/disable, delete, env promotion
Promotion dropdown builds a /alerts/rules/new URL with promoteFrom, ruleId,
and targetEnv query params — the wizard will read these in Task 24 and
pre-fill the form with source-env prefill + client-side warnings.
2026-04-20 13:52:14 +02:00
hsiegeln
269a63af1f feat(ui/alerts): AllAlertsPage + HistoryPage
AllAlertsPage: state filter chips (Open/Firing/Acked/All).
HistoryPage: RESOLVED filter, respects retention window.
2026-04-20 13:49:52 +02:00
hsiegeln
8d8bae4e18 feat(ui/alerts): InboxPage with ack + bulk-read actions
AlertRow is reused by AllAlertsPage and HistoryPage. Marking a row as read
happens when its link is followed (the detail sub-route will be added in
phase 10 polish). FIRING rows get an amber left border.
2026-04-20 13:49:23 +02:00
hsiegeln
891dcaef32 feat(ui/alerts): mount NotificationBell in TopBar
Renders the `<NotificationBell />` as the first child of `<TopBar>` (before
`<SearchTrigger>`). The bell links to `/alerts/inbox` and shows the unread
alert count for the currently selected environment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:47:16 +02:00
hsiegeln
54e4217e21 feat(ui/alerts): Alerts sidebar section with Inbox/All/Rules/Silences/History
Adds `buildAlertsTreeNodes` to sidebar-utils and renders an Alerts section
between Applications and Starred in LayoutShell. The section uses an
accordion pattern — entering `/alerts/*` collapses apps/admin/starred and
restores their state on leave.

gitnexus_impact(LayoutContent, upstream) = LOW (0 direct callers; rendered
only by LayoutShell's provider wrapper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:46:49 +02:00
hsiegeln
167d0ebd42 feat(ui/alerts): register /alerts/* routes with placeholder pages
Adds 6 lazy-loaded route entries for the alerting UI (Inbox, All, History,
Rules list, Rule editor wizard, Silences) plus an `/alerts` → `/alerts/inbox`
redirect. Page components are placeholder stubs to be replaced in Phase 5/6/7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:44:44 +02:00
hsiegeln
019e79a362 feat(ui/alerts): MustacheEditor component (CM6 shell with completion + linter)
Wires the mustache-completion source and mustache-linter into a CodeMirror 6
EditorView. Accepts kind (filters variables) and reducedContext (env-only for
connection URLs). singleLine prevents newlines for URL/header fields. Host
ref syncs when the parent replaces value (promotion prefill).
2026-04-20 13:41:46 +02:00
hsiegeln
ac2a943feb feat(ui/alerts): CM6 completion + linter for Mustache templates
completion fires after {{ and narrows as the user types; apply() closes the
tag automatically. Linter raises an error on unclosed {{, a warning on
references that aren't in the allowed-variable set for the current condition
kind. Kind-specific allowed set comes from availableVariables().
2026-04-20 13:39:10 +02:00
hsiegeln
18e6dde67a fix(ui/alerts): align ALERT_VARIABLES registry with NotificationContextBuilder
Plan prose had spec §8 idealized leaves, but the backend NotificationContext
only emits a subset:
  ROUTE_METRIC / EXCHANGE_MATCH → route.id + route.uri (uri added)
  LOG_PATTERN → log.pattern + log.matchCount (renamed from log.logger/level/message)
  app.slug / app.id → scoped to non-env kinds (removed from 'always')
  exchange.link / alert.comparator / alert.window / app.displayName → removed (backend doesn't emit)

Without this alignment the Task 11 linter would (1) flag valid route.uri as
unknown, (2) suggest log.{logger,level,message} as valid paths that render
empty, and (3) flag app.slug on env-wide rules.
2026-04-20 13:35:42 +02:00
hsiegeln
5ddd89f883 feat(ui/alerts): Mustache variable metadata registry for autocomplete
ALERT_VARIABLES mirrors the spec §8 context map. availableVariables(kind)
returns the kind-specific filter (always vars + kind vars). extractReferences
+ unknownReferences drive the inline amber linter. Backend NotificationContext
adds must land here too.
2026-04-20 13:31:55 +02:00
hsiegeln
38083d7c3f fix(ui/alerts): remove dead usePageVisible subscription; align alerts test mock with DTO
NotificationBell used a usePageVisible() subscription that re-rendered on
every visibilitychange without consuming the value. TanStack Query's
refetchIntervalInBackground:false already pauses polling; the extra
subscription was speculative generality. Dropped the import + call + JSDoc
reference; usePageVisible hook + test retained as a reusable primitive.

Also: alerts.test.tsx 'returns the server payload unmodified' asserted a
pre-plan {total, bySeverity} shape, but UnreadCountResponse is actually
{count}. Fixed mock + assertion to {count: 3}.
2026-04-20 13:30:07 +02:00
hsiegeln
197c60126c feat(ui/alerts): NotificationBell with Page Visibility poll pause
Adds a header bell component linking to /alerts/inbox with an unread-count
badge for the selected environment. Polling pauses when the tab is hidden
via TanStack Query's refetchIntervalInBackground:false (already set on
useUnreadCount); the new usePageVisible hook gives components a
re-renders-on-visibility-change signal for future defense-in-depth.

Plan-prose deviation: the plan assumed UnreadCountResponse carries a
bySeverity map for per-severity badge coloring, but the backend DTO only
exposes a scalar `count`. The bell reads `data?.count` and renders a single
var(--error) tint; a TODO references spec §13 for future per-severity work
that would require expanding the DTO.

Tests: usePageVisible toggles on visibilitychange events; NotificationBell
renders the bell with no badge at count=0 and shows "3" at count=3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:26:58 +02:00