Compare commits

3 Commits

Author SHA1 Message Date
hsiegeln
037a27d405 fix(alerting): allow multiple open alert_instances per rule for PER_EXCHANGE
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m51s
CI / docker (push) Successful in 1m17s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
V13 added a partial unique index on alert_instances(rule_id) WHERE state
IN (PENDING,FIRING,ACKNOWLEDGED). Correct for scalar condition kinds
(ROUTE_METRIC / AGENT_STATE / DEPLOYMENT_STATE / LOG_PATTERN / JVM_METRIC
/ EXCHANGE_MATCH in COUNT_IN_WINDOW) but wrong for EXCHANGE_MATCH /
PER_EXCHANGE, which by design emits one alert_instance per matching
exchange. Under V13 every PER_EXCHANGE tick with >1 match logged
"Skipped duplicate open alert_instance for rule …" at evaluator cadence
and silently lost alert fidelity — only the first matching exchange per
tick got an AlertInstance + webhook dispatch.

V15 drops the rule_id-only constraint and recreates it with a
discriminator on context->'exchange'->>'id'. Scalar kinds emit
Map.of() as context, so their expression resolves to '' — "one open per
rule" preserved. ExchangeMatchEvaluator.evaluatePerExchange always
populates exchange.id, so per-exchange instances coexist cleanly.

Two new PostgresAlertInstanceRepositoryIT tests:
  - multiple open instances for same rule + distinct exchanges all land
  - second open for identical (rule, exchange) still dedups via the
    DuplicateKeyException fallback in save() — defense-in-depth kept

Also fixes pre-existing PostgresAlertReadRepositoryIT brokenness: its
setup() inserted 3 open instances sharing one rule_id, which V13 blocked
on arrival. Migrate to one rule_id per instance (pattern already used
across other storage ITs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:26:19 +02:00
hsiegeln
e7ce1a73d0 docs(alerting): Plan 04 implementation plan — post-ship hardening
13 atomic commits covering 5 hardening tasks:

  Task 1-2: @Schema(discriminatorMapping) on AlertCondition, derive
            polymorphic unions in enums.ts from schema
  Task 3-7: AgentState / DeploymentStatus / LogLevel / ExecutionStatus
            enum migrations + @Schema(allowableValues) on JvmMetric
  Task 8:   ContextStartupSmokeTest (unit-tier, no Testcontainers)
  Task 9-12: AlertTemplateVariables registry + round-trip test +
             SSOT endpoint + UI consumer
  Task 13:  alerting-editor.spec.ts Playwright spec

Each task has bite-sized write-test/red/green/commit steps with
exact paths and full code. Pre-flight SQL check and post-flight
self-verification scripts included.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:54:09 +02:00
hsiegeln
46867cc659 docs(alerting): Plan 04 design spec — post-ship hardening
Closes the loop on three bug classes from Plan 03 triage:
context-load regressions (missing @Autowired), UI/backend drift
on template variables, and hand-maintained TS enum unions caused
by springdoc polymorphic schema quirk.

Covers 5 tasks: context-startup smoke test, template-variables
SSOT endpoint, second Playwright spec, String-to-enum migrations
on 5 condition fields, and @DiscriminatorMapping on AlertCondition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:44:41 +02:00
5 changed files with 3201 additions and 8 deletions

View File

@@ -0,0 +1,24 @@
-- V15 — Discriminate open-alert_instance uniqueness by exchange id for PER_EXCHANGE rules.
--
-- V13 enforced "one open alert_instance per rule" via a partial unique on
-- (rule_id). That is correct for every scalar condition kind (ROUTE_METRIC,
-- AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC, and EXCHANGE_MATCH
-- in COUNT_IN_WINDOW mode) but wrong for EXCHANGE_MATCH / PER_EXCHANGE, which
-- by design emits one alert_instance per matching exchange. Under V13 every
-- PER_EXCHANGE tick with >1 match logged "Skipped duplicate open alert_instance
-- for rule …" at evaluator cadence and silently lost alert fidelity — only the
-- first matching exchange per tick got an AlertInstance + webhook dispatch.
--
-- New discriminator: (rule_id, COALESCE(context->'exchange'->>'id', '')).
-- Scalar evaluators emit Map.of() (no exchange subtree), the expression
-- resolves to '' for all of them, so they continue to get "one open per rule".
-- ExchangeMatchEvaluator.evaluatePerExchange emits {exchange.id = <execId>}
-- per firing, so PER_EXCHANGE instances for distinct exchanges coexist.
-- Two attempts to open an instance for the same (rule, exchange) still collapse
-- to one — the repo's DuplicateKeyException fallback preserves defense-in-depth.
DROP INDEX IF EXISTS alert_instances_open_rule_uq;
CREATE UNIQUE INDEX alert_instances_open_rule_uq
ON alert_instances (rule_id, (COALESCE(context->'exchange'->>'id', '')))
WHERE rule_id IS NOT NULL
AND state IN ('PENDING','FIRING','ACKNOWLEDGED');

View File

@@ -243,6 +243,61 @@ class PostgresAlertInstanceRepositoryIT extends AbstractPostgresIT {
// Extra rules are cleaned up by @AfterEach via env-scoped DELETE
}
@Test
void save_allowsMultipleOpenInstancesForSameRuleWithDifferentExchangeIds() {
// PER_EXCHANGE mode (EXCHANGE_MATCH rules) emits one alert_instance per
// matching exchange — the design requires many open instances per rule,
// each carrying a distinct exchange.id in its context. The open-rule
// partial unique must therefore discriminate on (rule_id, exchange.id),
// not on rule_id alone. Regression: V13's original (rule_id)-only
// constraint masked this by catching DuplicateKeyException and silently
// dropping every per-exchange firing after the first — logs filled with
// "Skipped duplicate open alert_instance for rule …" at evaluator cadence
// and PER_EXCHANGE notifications were never dispatched.
var exchangeA = newInstanceWithExchange(ruleId, "exchange-aaa");
var exchangeB = newInstanceWithExchange(ruleId, "exchange-bbb");
var exchangeC = newInstanceWithExchange(ruleId, "exchange-ccc");
repo.save(exchangeA);
repo.save(exchangeB);
repo.save(exchangeC);
assertThat(repo.findById(exchangeA.id())).isPresent();
assertThat(repo.findById(exchangeB.id())).isPresent();
assertThat(repo.findById(exchangeC.id())).isPresent();
Long count = jdbcTemplate.queryForObject(
"SELECT COUNT(*) FROM alert_instances " +
" WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED')",
Long.class, ruleId);
assertThat(count).isEqualTo(3L);
}
@Test
void save_rejectsSecondOpenInstanceForSameRuleAndExchange() {
// The partial unique still prevents genuine duplicates: two open instances
// for the same (rule, exchange) pair must collapse to one. The repo's
// DuplicateKeyException fallback returns the already-persisted instance.
var first = newInstanceWithExchange(ruleId, "exchange-dup");
var second = newInstanceWithExchange(ruleId, "exchange-dup");
repo.save(first);
var returned = repo.save(second);
// Second insert was suppressed; the repo returned the existing instance.
assertThat(returned.id()).isEqualTo(first.id());
assertThat(repo.findById(first.id())).isPresent();
assertThat(repo.findById(second.id())).isEmpty();
Long count = jdbcTemplate.queryForObject(
"SELECT COUNT(*) FROM alert_instances " +
" WHERE rule_id = ? AND state IN ('PENDING','FIRING','ACKNOWLEDGED')",
Long.class, ruleId);
assertThat(count).isEqualTo(1L);
}
@Test
void markSilenced_togglesToTrue() {
var inst = newInstance(ruleId, List.of(userId), List.of(), List.of());
@@ -276,6 +331,23 @@ class PostgresAlertInstanceRepositoryIT extends AbstractPostgresIT {
userIds, groupIds, roleNames);
}
/**
* Creates an open FIRING instance carrying a PER_EXCHANGE-shaped context:
* {@code context.exchange.id = <exchangeId>}. Mirrors what
* {@code ExchangeMatchEvaluator.evaluatePerExchange} emits for each matching
* exchange in a Batch result.
*/
private AlertInstance newInstanceWithExchange(UUID ruleId, String exchangeId) {
return new AlertInstance(
UUID.randomUUID(), ruleId, Map.of(), envId,
AlertState.FIRING, AlertSeverity.WARNING,
Instant.now(), null, null, null, null,
false, null, null,
Map.of("exchange", Map.of("id", exchangeId)),
"title", "message",
List.of(userId), List.of(), List.of());
}
/** Inserts a minimal alert_rule with the given severity. */
private UUID seedRuleWithSeverity(String name, AlertSeverity severity) {
UUID id = UUID.randomUUID();

View File

@@ -31,7 +31,6 @@ class PostgresAlertReadRepositoryIT extends AbstractPostgresIT {
instanceId1 = UUID.randomUUID();
instanceId2 = UUID.randomUUID();
instanceId3 = UUID.randomUUID();
UUID ruleId = UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO environments (id, slug, display_name) VALUES (?, ?, ?)",
@@ -41,18 +40,23 @@ class PostgresAlertReadRepositoryIT extends AbstractPostgresIT {
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, email) VALUES (?, 'local', ?) ON CONFLICT (user_id) DO NOTHING",
userId, userId + "@example.com");
jdbcTemplate.update(
"INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " +
"notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " +
"VALUES (?, ?, 'rule', 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'sys-user', 'sys-user')",
ruleId, envId);
for (UUID id : List.of(instanceId1, instanceId2, instanceId3)) {
// Each open alert_instance needs its own rule_id — the alert_instances_open_rule_uq
// partial unique forbids multiple open instances sharing the same rule_id + exchange
// discriminator (V13/V15). Three separate rules let all three instances coexist
// in FIRING state so alert_reads tests can target each one independently.
for (UUID instanceId : List.of(instanceId1, instanceId2, instanceId3)) {
UUID ruleId = UUID.randomUUID();
jdbcTemplate.update(
"INSERT INTO alert_rules (id, environment_id, name, severity, condition_kind, condition, " +
"notification_title_tmpl, notification_message_tmpl, created_by, updated_by) " +
"VALUES (?, ?, ?, 'WARNING', 'AGENT_STATE', '{}'::jsonb, 't', 'm', 'sys-user', 'sys-user')",
ruleId, envId, "rule-" + instanceId);
jdbcTemplate.update(
"INSERT INTO alert_instances (id, rule_id, rule_snapshot, environment_id, state, severity, " +
"fired_at, context, title, message) VALUES (?, ?, '{}'::jsonb, ?, 'FIRING', 'WARNING', " +
"now(), '{}'::jsonb, 'title', 'msg')",
id, ruleId, envId);
instanceId, ruleId, envId);
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,403 @@
# Alerting — Plan 04 Design Spec: Post-Ship Hardening
**Date:** 2026-04-20
**Status:** Draft — awaiting user review
**Surfaces:** server (core + app), UI, tests
**Related:** [Plan 03](../plans/2026-04-20-alerting-03-ui.md) (shipped as PR #144); builds on [Alerting design](2026-04-19-alerting-design.md)
---
## 1. Summary
Close the loop on three bug classes that surfaced in Plan 03's post-merge triage:
1. **Spring wiring regressions not caught by unit tests**`AlertingMetrics`'s production constructor was missing `@Autowired` and only broke at Docker runtime (commit `5edf7eb2`). Unit tests exercised the package-private ctor, so green locally.
2. **UI↔backend drift on Mustache template variables**`ALERT_VARIABLES` in the UI and `NotificationContextBuilder` on the server were two hand-maintained lists of the same thing; one drift was fixed in commit `18e6dde6`.
3. **Hand-maintained TypeScript enum unions**`ui/src/pages/Alerts/enums.ts` declares `RouteMetric`, `Comparator`, `JvmAggregation`, `ExchangeFireMode` manually because springdoc emits a broken polymorphic schema (condition types resolve to `never` in TS due to discriminator-literal conflict).
Plan 04 is **pure hardening**: no new product features. Five tasks that together eliminate these bug classes at their source rather than adding another test that detects them.
### Guiding principles
- **Prefer SSOT over drift-detect.** When two hand-maintained lists need to match, make one derive from the other rather than adding a test that fails when they don't.
- **Type correctness at the contract layer.** Backend records are the contract; if a field has a closed vocabulary, it is an enum. String-typed-with-comment does not survive schema regeneration.
- **Close the loop on Plan 03, don't start new product work.** Features (#8 native integrations, #9 CA-bundle UI), performance (#7 load tests), and deferred cleanup (#6 OIDC post-alpha, #10 router lint sweep) stay out of scope — each is a separate future plan.
---
## 2. Scope
### In scope
1. **Context-startup smoke test** — a `@SpringBootTest` that loads the full production application context and asserts the alerting beans are present. Catches `@Autowired` regressions at `mvn verify` time.
2. **Template-variables SSOT endpoint** — new `GET /api/v1/alerts/template-variables` returning the declarative registry. UI deletes its duplicate; autocomplete + linter consume the server registry via TanStack Query.
3. **Second Playwright spec** — rule EDIT path, MustacheEditor autocomplete + linter in-browser, env-promotion warnings banner, inbox ack, bulk-read.
4. **Typed vocabulary for 5 String condition fields**`AgentStateCondition.state`, `DeploymentStateCondition.states`, `LogPatternCondition.level`, `ExchangeFilter.status` become proper Java enums. `JvmMetricCondition.metric` stays `String` + `@Schema(allowableValues={...})` because agents report raw Micrometer metric names that the evaluator passes through unchanged. OpenAPI emits closed unions for all five; UI types derive.
5. **Springdoc discriminator mapping** — add `@Schema(discriminatorProperty, discriminatorMapping)` to the `AlertCondition` sealed interface. TS stops generating `kind: "<ClassName>"` literals. Hand-declared unions in `ui/src/pages/Alerts/enums.ts` migrate to derived types.
### Out of scope (explicit)
To keep Plan 04 focused, none of these are touched in this plan. Each has a separate issue or a future plan slot:
- Native Slack / Teams / PagerDuty integrations (Plan 03 spec BL-002 — future plan).
- Admin CA-bundle UI ([gitea #137](https://gitea.siegeln.net/cameleer/cameleer-server/issues/137) / BL-001).
- Router.tsx lint sweep (207 fast-refresh warnings, pre-existed Plan 03 — standalone lint PR).
- Performance ITs for 500 rules × 5 replicas (Plan 03 spec deferral — separate plan, different shape).
- OIDC / UserAdmin re-audit (commit `ae647363` was minimal; re-audit post-alpha).
- Orphan `.worktrees/` cleanup (user holds authorization).
---
## 3. Task 1 — Context-startup smoke test
### Problem
Plan 03 shipped a regression where `AlertingMetrics`'s production constructor was missing `@Autowired`. Unit tests instantiated it via a package-private ctor, so local builds and CI were green. Spring context-load failed only when the full app started under Docker. The entire class of regression (missing annotation on any alerting-subsystem bean) is invisible to the current test pyramid.
### Design
**Location:** `cameleer-server-app/src/test/java/com/cameleer/server/app/ContextStartupSmokeTest.java`
**Shape:**
```java
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE)
@ActiveProfiles("test-context-smoke")
class ContextStartupSmokeTest {
@Autowired ApplicationContext ctx;
@Test
void productionContextLoadsWithAllAlertingBeans() {
assertNotNull(ctx.getBean(AlertingMetrics.class));
assertNotNull(ctx.getBean(AlertEvaluatorJob.class));
assertNotNull(ctx.getBean(NotificationDispatchJob.class));
assertNotNull(ctx.getBean(NotificationContextBuilder.class));
assertNotNull(ctx.getBean(WebhookDispatcher.class));
assertNotNull(ctx.getBean(MustacheRenderer.class));
assertNotNull(ctx.getBean(AlertingRetentionJob.class));
assertNotNull(ctx.getBean(PerKindCircuitBreaker.class));
// + one assertion per ConditionEvaluator impl (6 evaluators)
}
}
```
**Profile stubs:** `test-context-smoke` profile replaces `DataSource`, `ClickHouseJdbcTemplate`, and `RuntimeOrchestrator` with Mockito mocks so the test doesn't need Testcontainers. Real beans for everything alerting.
### Rationale
- **Why not extend an existing `@SpringBootTest`?** Existing ITs use Testcontainers for Postgres + ClickHouse. They catch query-path regressions but are slow and duplicate work we don't need. The bug class here — missing `@Autowired` on a ctor — surfaces during bean creation regardless of whether dependencies are real or stubbed.
- **Why assert individual beans, not just "context loads"?** A context loads if *any* bean breaks is still a passing test if the broken bean is conditional. Naming each bean makes intent explicit and failure messages readable.
### Validation
Intentionally break one bean (remove `@Autowired` or `@Component`) and confirm the test fails with a clear message. Revert before committing.
---
## 4. Task 2 — Template-variables SSOT endpoint
### Problem
`ui/src/components/MustacheEditor/alert-variables.ts` is a hand-maintained list of template variable paths (`alert.rule.name`, `exchange.status`, etc.) mirroring what `NotificationContextBuilder` actually produces. On 2026-04-20, commit `18e6dde6` fixed one drift between them. Until drift is structurally prevented, it will recur.
### Design
#### Backend
1. **Declarative registry in core.** Extract `NotificationContextBuilder`'s leaf paths into a static constant:
```java
public record TemplateVariableDescriptor(
String path, // "alert.rule.name"
String type, // "string" | "number" | "boolean" | "datetime"
String description,
String example
) {}
public final class AlertTemplateVariables {
public static final List<TemplateVariableDescriptor> ALL = List.of(
new TemplateVariableDescriptor("alert.rule.name", "string", "Rule display name", "High error rate"),
new TemplateVariableDescriptor("alert.firing_at", "datetime", "ISO-8601 firing timestamp", "2026-04-20T12:34:56Z"),
// ... exhaustive list
);
}
```
2. **Refactor `NotificationContextBuilder`** to build its context from this registry, so divergence is structural impossibility rather than convention.
3. **Round-trip unit test.** `AlertTemplateVariablesTest`: for every descriptor in `ALL`, confirm the path resolves to a non-null value when `NotificationContextBuilder.build(...)` runs against a canonical sample firing instance. Fails if a variable is added to `ALL` but the builder doesn't produce it (or vice versa).
4. **Controller.** `AlertTemplateVariablesController` at `GET /api/v1/alerts/template-variables`:
- **Flat endpoint** (added to allow-list in `.claude/rules/app-classes.md`, reason: registry is tenant- and env-agnostic global metadata).
- VIEWER+ (operators authoring webhook bodies are typically VIEWER+).
- Returns `List<TemplateVariableDto>` — identical shape to `TemplateVariableDescriptor`.
- Response is deterministic and cacheable (`Cache-Control: public, max-age=3600` — the registry changes only across server versions).
#### Frontend
1. **New query hook:** `ui/src/api/queries/alertTemplateVariables.ts` exposing `useTemplateVariables()`. TanStack Query config: `staleTime: Infinity`, `gcTime: Infinity` — the list never changes at runtime.
2. **Delete** `ui/src/components/MustacheEditor/alert-variables.ts` entirely.
3. **Prop-drill the variable list:**
- `MustacheEditor.tsx` takes `variables: TemplateVariable[]` as a prop.
- `mustache-completion.ts` and `mustache-linter.ts` become functions taking the list as an argument instead of importing a module.
- `NotifyStep.tsx` calls `useTemplateVariables()` once and passes the result.
- Webhook body/header editors do the same.
4. **Loading state:** while the variables are still being fetched, the MustacheEditor shows the textarea without autocomplete rather than blocking. The query runs on mount of `NotifyStep` (or whichever component opens first) and the result is cached forever.
### Rationale
- **Why a runtime endpoint, not a build-time JSON artifact?** The registry is also useful to admins outside the wizard — e.g. generated docs, tooling that inspects the schema. A REST endpoint is idiomatic API surface; a generated JSON file is build plumbing that couples UI to server at build time.
- **Why `staleTime: Infinity`?** The registry only changes when the server version changes, at which point the SPA is also redeployed (fresh fetch on load). No point polling.
---
## 5. Task 3 — Second Playwright spec
### Problem
Existing `ui/src/test/e2e/alerting.spec.ts` covers CREATE + DELETE of rules and the sidebar structure. It doesn't exercise the EDIT path, MustacheEditor autocomplete in the browser, the env-promotion warnings banner, or inbox interactions.
### Design
**File:** `ui/src/test/e2e/alerting-editor.spec.ts` (new; existing smoke stays unchanged)
**Cases:**
1. **Rule EDIT.** Seed a rule via backend API, navigate to `/alerts/rules/:id`, change `threshold` field, save, reload page, assert threshold persisted.
2. **MustacheEditor autocomplete.** On NotifyStep, focus the notification title field, type `{{`, assert CodeMirror completion popover appears listing `alert.rule.name` (and friends), arrow-down + Enter, assert insertion.
3. **MustacheEditor linter.** Type `{{alert.not_a_real_var}}`, assert a linter diagnostic appears within 500 ms.
4. **Env-promotion warnings.** Seed a rule in env A with `scope.routeId` referencing a route that doesn't exist in env B. Open RulesListPage, promote to env B, assert the wizard opens with a warning banner listing the missing route.
5. **Inbox ack.** Seed a FIRING alert via API, navigate to `/alerts/inbox`, click Ack button on the row, assert state chip flips to ACKNOWLEDGED and row's Ack button disables.
6. **Bulk-read.** Seed 3 FIRING alerts, select-all via header checkbox, click bulk-read button, assert notification-bell unread-count becomes 0.
### Stability
- Use `page.waitForFunction` on CodeMirror's `EditorView.completionStatus` rather than `waitForTimeout` for autocomplete assertions.
- Gate CI on 3 consecutive green runs of the new spec before merging (reviewer checks the CI history, not a scripted gate).
- Seed fixtures via API, not via page navigation — each test independent, no ordering dependency.
### Rationale
Playwright tests are the only layer that exercises CodeMirror's live behaviour. Unit tests mock it; integration tests skip it. The cost of flakiness mitigation (three green runs, `waitForFunction` over `waitForTimeout`) buys the coverage that bit us in Plan 03 triage.
---
## 6. Task 4 — Typed vocabulary for condition String fields
### Problem
Five condition fields are typed as `String` in the Java records, so springdoc emits `"type": "string"` without `enum`. UI has no way to derive a closed union and either (a) cannot render a dropdown or (b) hard-codes the values. See the top-of-file comment in `ui/src/pages/Alerts/enums.ts` for context.
### Design
Four field migrations to proper Java enums + one schema-only hint:
| Field | Target type | Existing or new? |
|---|---|---|
| `AgentStateCondition.state` | `com.cameleer.server.core.agent.AgentState` (enum) | existing (`LIVE, STALE, DEAD, SHUTDOWN`) |
| `DeploymentStateCondition.states` | `List<com.cameleer.server.core.runtime.DeploymentStatus>` (enum) | existing (`STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED`) |
| `LogPatternCondition.level` | new `com.cameleer.server.core.alerting.LogLevel` (enum) | new — `TRACE, DEBUG, INFO, WARN, ERROR` |
| `ExchangeFilter.status` | `com.cameleer.common.model.ExecutionStatus` (enum) | existing in cameleer-common (`RUNNING, COMPLETED, FAILED, ABANDONED`) |
| `JvmMetricCondition.metric` | stays `String` + `@Schema(allowableValues={...})` | schema-only hint — wire stays as agent-reported Micrometer metric names (`jvm.memory.used.value`, etc.) |
**Why JvmMetric is different.** `JvmMetricEvaluator` passes `c.metric()` directly to `MetricsQueryStore.queryTimeSeries(agentId, List.of(c.metric()), ...)`. The field is a raw Micrometer metric name emitted by the agent — `jvm.memory.used.value`, `process.cpu.usage.value`, `jvm.gc.pause.total_time`, etc. Making this an enum either (a) forces an un-idiomatic mapping between SCREAMING_SNAKE constants and dotted strings, or (b) locks out any metric not in the hard-coded list. Option (a)'s `@Schema(allowableValues=…)` gives springdoc enough to emit a narrowed TS union for UI autocomplete while leaving the backend permissive. The curated list comes from `.claude/rules/metrics.md`.
### Persistence
`alert_rules.condition` is stored as JSONB.
- For the 4 enum migrations: Jackson serializes enum values by `.name()`; on-the-wire payloads already use uppercase tokens (`"COMPLETED"`, not `"Completed"`). Legacy rows round-trip unchanged.
- For `JvmMetricCondition.metric`: no change — it's already a String with dotted-name values. `@Schema(allowableValues=…)` affects OpenAPI emission only, not deserialization or validation.
**Pre-flight check:** before the first enum-flip commit, run against the local dev DB:
```sql
SELECT condition_kind, condition->>'level' AS val
FROM alert_rules WHERE condition_kind='LOG_PATTERN'
UNION ALL
SELECT condition_kind, condition->>'state'
FROM alert_rules WHERE condition_kind='AGENT_STATE'
-- etc, one union per migrating field
```
If any row has a value that doesn't match the target enum (e.g. lowercase `"info"` instead of `"INFO"`), Plan 04 adds a one-shot Flyway `V15__alert_condition_enum_repair.sql` that normalizes them. If all values match, skip the migration.
### Validation
Controller-level `@Valid` on `AlertRuleRequest` automatically enforces enum membership — unknown values produce `400 Bad Request`. Replaces any ad-hoc string checks in evaluators.
Integration test per migrated field: `POST /rules` with known-bad value → 400; known-good value → 201.
### UI cleanup
After each enum migration ships, the corresponding manual artifact in `ui/src/pages/Alerts/enums.ts` (if any) is either deleted (derived type replaces it) or narrowed to label maps only. The top-of-file comment noting "Fields whose backend type is String … aren't in here at all" becomes obsolete and gets deleted.
---
## 7. Task 5 — Springdoc discriminator mapping
### Problem
The generated OpenAPI has a discriminator for `AlertCondition` but no `mapping`:
```json
"AlertCondition": {
"discriminator": { "propertyName": "kind" },
"properties": { "kind": { "enum": ["ROUTE_METRIC", ...] } }
}
```
When `openapi-typescript` encounters a discriminator without a `mapping`, it synthesizes the literal from the schema name. Result:
```ts
AgentStateCondition: {
kind: "AgentStateCondition"; // synthesized — class simple name
} & (Omit<AlertCondition, "kind"> & {
readonly kind?: "ROUTE_METRIC" | ... | "JVM_METRIC";
});
```
Indexed access like `AgentStateCondition['kind']` is `"AgentStateCondition" & ("ROUTE_METRIC" | ...)` = `never`. `Record<T, string>` exhaustiveness checks silently collapse.
### Design
One edit to `cameleer-server-core/src/main/java/com/cameleer/server/core/alerting/AlertCondition.java`:
```java
import io.swagger.v3.oas.annotations.media.DiscriminatorMapping;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(
discriminatorProperty = "kind",
discriminatorMapping = {
@DiscriminatorMapping(value = "ROUTE_METRIC", schema = RouteMetricCondition.class),
@DiscriminatorMapping(value = "EXCHANGE_MATCH", schema = ExchangeMatchCondition.class),
@DiscriminatorMapping(value = "AGENT_STATE", schema = AgentStateCondition.class),
@DiscriminatorMapping(value = "DEPLOYMENT_STATE", schema = DeploymentStateCondition.class),
@DiscriminatorMapping(value = "LOG_PATTERN", schema = LogPatternCondition.class),
@DiscriminatorMapping(value = "JVM_METRIC", schema = JvmMetricCondition.class)
}
)
@JsonTypeInfo(...) // unchanged
@JsonSubTypes(...) // unchanged
public sealed interface AlertCondition ...
```
After `mvn compile && cd ui && npm run generate-api:live`, the schema gains:
```json
"discriminator": {
"propertyName": "kind",
"mapping": {
"ROUTE_METRIC": "#/components/schemas/RouteMetricCondition",
...
}
}
```
And `openapi-typescript` emits `kind: "ROUTE_METRIC"` instead of `kind: "RouteMetricCondition"`.
### Fallback
If springdoc 2.8.6 does not honor `@DiscriminatorMapping` on a sealed interface (sealed types are newer than the library's primary target surface), add a `SpringDocConfig` bean:
```java
@Bean OpenApiCustomizer alertConditionDiscriminatorFix() {
return openApi -> {
Schema<?> alertCondition = openApi.getComponents().getSchemas().get("AlertCondition");
alertCondition.getDiscriminator().mapping(Map.of(
"ROUTE_METRIC", "#/components/schemas/RouteMetricCondition",
// ...
));
};
}
```
~20 lines, no behavioural change for the server — emits the exact same OpenAPI. Plan 04 ships either way.
### UI cleanup
Once the schema is correct, `ui/src/pages/Alerts/enums.ts` is simplified:
- Delete hand-declared `RouteMetric`, `Comparator`, `JvmAggregation`, `ExchangeFireMode` unions.
- Replace with derived indexed access:
```ts
type RouteMetric = NonNullable<components['schemas']['RouteMetricCondition']['metric']>;
type Comparator = NonNullable<components['schemas']['RouteMetricCondition']['comparator']>;
// etc.
```
- Delete the 30-line comment block at the top of the file explaining the workaround.
- Keep `HIDDEN` arrays and label-map `Record<T, string>` exhaustiveness enforcement — those remain valuable.
---
## 8. Execution Order
1. **Task 5** (springdoc discriminator mapping) — purely additive annotation.
2. **Task 4** (String → enum migrations, one commit per field).
3. **Task 1** (context-startup smoke — independent, slots anywhere but early catches 4/5 fallout).
4. **Task 2** (template-variables SSOT).
5. **Task 3** (Playwright spec — last, exercises all of the above).
No hard blocking dependencies between tasks, but this order minimizes rebase pain and keeps each commit small.
---
## 9. Risks
| Risk | Mitigation |
|---|---|
| Springdoc `@DiscriminatorMapping` on a sealed interface not honored in 2.8.6 | `OpenApiCustomizer` bean fallback (§7). |
| Legacy `alert_rules.condition` JSONB rows have values that don't match new enum names | Pre-flight SQL inspection; `V15` Flyway repair migration only if needed. |
| `NotificationContextBuilder` refactor silently changes rendered notifications | Golden-file integration test — one sample notification per (severity × kind); run before and after refactor. |
| Playwright spec flakiness on CodeMirror autocomplete timing | `waitForFunction` on completion state; 3 consecutive green CI runs before merge. |
| `@Valid` enum enforcement breaks existing rule creation flows in the UI | Task-4 commits bundle `generate-api:live` + typecheck-fixing; any UI call-site change ships with the backend change. |
---
## 10. Acceptance Criteria
Plan 04 is done when:
1. Reverting commit `5edf7eb2` (re-introducing the `AlertingMetrics` ctor bug) causes `mvn clean verify` to fail. Revert the revert; test stays.
2. `grep -r "ALERT_VARIABLES" ui/src` shows no hand-maintained registry — only the backend-fetched hook and its consumers.
3. `ui/src/pages/Alerts/enums.ts` has no hand-declared unions for `RouteMetric`, `Comparator`, `JvmAggregation`, `ExchangeFireMode` — all four are derived from `schema.d.ts`. The top-of-file explanatory comment is gone.
4. All 5 condition String fields are enum-typed on the wire (inspect `openapi.json`) and in `schema.d.ts`.
5. `alerting-editor.spec.ts` green in CI for 3 consecutive runs.
6. `.claude/rules/app-classes.md` updated: new `AlertTemplateVariablesController` listed under the flat-endpoint allow-list with the reason ("registry is tenant- and env-agnostic global metadata").
7. `openapi.json` and `schema.d.ts` regenerated and committed alongside the backend changes they reflect.
---
## 11. Estimated Commit Cadence
Approximate, for sizing:
1. `feat(alerting): discriminator mapping on AlertCondition` — Task 5 backend.
2. `refactor(ui/alerts): derive polymorphic unions from schema` — Task 5 UI cleanup.
3. `refactor(alerting): AgentStateCondition.state → AgentState enum`.
4. `refactor(alerting): DeploymentStateCondition.states → DeploymentStatus enum`.
5. `refactor(alerting): LogPatternCondition.level → LogLevel enum` (adds new enum).
6. `refactor(alerting): JvmMetricCondition.metric — @Schema(allowableValues) hint` (no type change; OpenAPI hint only).
7. `refactor(alerting): ExchangeFilter.status → ExecutionStatus enum`.
8. `test(alerting): Spring context-startup smoke`.
9. `feat(alerting): template-variables SSOT endpoint` — Task 2 backend.
10. `refactor(ui/alerts): consume template-variables via API` — Task 2 frontend.
11. `test(ui/alerts): Playwright editor spec`.
~11 commits, bundleable to ~8 if enum flips are trivial. Each atomic and reversible.
---
## 12. CRITICAL process rules (per CLAUDE.md)
- `gitnexus_impact` on `AlertCondition`, `NotificationContextBuilder`, `AlertingMetrics`, `ExchangeMatchCondition`, each touched subtype before editing.
- `gitnexus_detect_changes` before every commit.
- `cd ui && npm run generate-api:live` after every controller or DTO change.
- `.claude/rules/app-classes.md` updated in the same commit that adds `AlertTemplateVariablesController`.
- `.claude/rules/core-classes.md` updated when the new `LogLevel` enum lands.
- Maven: `mvn clean verify` must stay green through every commit.
- UI: `cd ui && npm run typecheck && npm run test && npm run test:e2e` must stay green through every commit.