Three distinct root causes, all reproducible when the classes run
solo — not order-dependent as the triage report suggested. Full
diagnosis in .planning/sse-flakiness-diagnosis.md.
1. AgentSseController.events auto-heal was over-permissive: any valid
JWT allowed registering an arbitrary path-id, a spoofing vector.
Surface symptom was the parked sseConnect_unknownAgent_returns404
test hanging on a 200-with-empty-stream instead of getting 404.
Fix: auto-heal requires JWT subject == path id.
2. SseConnectionManager.pingAll read ${agent-registry.ping-interval-ms}
(unprefixed). AgentRegistryConfig binds cameleer.server.agentregistry.*
— same family of bug as the MetricsFlushScheduler fix in a6944911.
Fix: corrected placeholder prefix.
3. Spring's SseEmitter doesn't flush response headers until the first
emitter.send(); clients on BodyHandlers.ofInputStream blocked on
the first body byte, making awaitConnection(5s) unreliable under a
15s ping cadence. Fix: send an initial ": connected" comment on
connect() so headers hit the wire immediately.
Verified: 9/9 SSE tests green across AgentSseControllerIT + SseSigningIT.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No callers after the legacy PG ingestion path was retired in 0f635576.
core-classes.md updated to drop the leftover note.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the ExecutionController removal (0f635576), SearchIndexer
subscribed to ExecutionUpdatedEvent but nothing publishes that event.
Every SearchIndexerStats metric returned always-zero, and the admin
/api/v1/admin/clickhouse/pipeline endpoint that surfaced those stats
carried no signal.
Backend removed:
- core: SearchIndexer, SearchIndexerStats, ExecutionUpdatedEvent
- app: IndexerPipelineResponse DTO, /pipeline endpoint on
ClickHouseAdminController (field + ctor param)
- StorageBeanConfig.searchIndexer bean
UI removed:
- IndexerPipeline type + useIndexerPipeline hook in
api/queries/admin/clickhouse.ts
- Indexer Pipeline card in ClickHouseAdminPage.tsx (plus ProgressBar
import and pipeline* CSS classes)
OpenAPI schema.d.ts + openapi.json regenerated (stale /pipeline path
and IndexerPipelineResponse schema removed).
SearchIndex interface + ClickHouseSearchIndex impl kept — those are
live and used by SearchService + ExchangeMatchEvaluator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The @Scheduled placeholder read ${ingestion.flush-interval-ms:1000}
(unprefixed) but IngestionConfig binds cameleer.server.ingestion.* —
YAML tuning of the metrics flush interval was silently ignored and the
scheduler fell back to the 1s default in every environment.
Corrected to ${cameleer.server.ingestion.flush-interval-ms:1000}.
(The initial attempt to bind via SpEL #{@ingestionConfig.flushIntervalMs}
failed because beans registered via @EnableConfigurationProperties use a
compound bean name "<prefix>-<FQN>", not the simple camelCase form. The
property-placeholder path is sufficient — IngestionConfig still owns
the Java-side default.)
BackpressureIT: drops the obsolete workaround property
`ingestion.flush-interval-ms=60000`; the single prefixed override now
controls both buffer config and flush cadence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-layer fix for the TZ drift that caused stats reads to miss every row
when the JVM default TZ and CH session TZ disagreed:
- Insert side: ClickHouse JDBC 0.9.7 formats java.sql.Timestamp via
Timestamp.toString(), which uses JVM default TZ. A CEST JVM shipping
to a UTC CH server stored Unix timestamps off by the TZ offset (the
triage report's original symptom). Pinned JVM default to UTC in
CameleerServerApplication.main() — standard practice for observability
servers that push to time-series stores.
- Read side: stats_1m_* tables now declare bucket as DateTime('UTC'),
MV SELECTs wrap toStartOfMinute(start_time) in toDateTime(..., 'UTC')
so projections match column type, and ClickHouseStatsStore.lit(Instant)
emits toDateTime('...', 'UTC') rather than a bare literal — defence
in depth against future refactors.
Test class pins its own JVM TZ (the store IT builds its own
HikariDataSource, bypassing the main() path). Debug scaffolding from
the triage investigation removed.
Greenfield CH — no migration needed.
Verified: 14/14 ClickHouseStatsStoreIT green, plus 84/84 across all
ClickHouse IT classes (no regression from the JVM TZ default change).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-by-task plan for the 2026-04-21-it-triage-followups-design spec.
Autonomous execution variant — SSE diagnose-then-fix branches to either
apply-fix or park-with-@Disabled based on diagnosis confidence, since
this runs unattended overnight.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Design for closing the 12 parked IT failures (ClickHouseStatsStoreIT
timezone, SSE flakiness in AgentSseControllerIT/SseSigningIT) plus two
production-code side notes the ExecutionController removal surfaced:
- ClickHouseStatsStore timezone fix — column-level DateTime('UTC') on
bucket, greenfield CH
- SSE flakiness — diagnose-then-fix with user checkpoint between phases
- MetricsFlushScheduler property-key fix — bind via SpEL, single source
of truth in IngestionConfig
- Dead-code cleanup — SearchIndexer.onExecutionUpdated listener +
unused TaggedExecution record
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ExecutionController was @ConditionalOnMissingBean(ChunkAccumulator.class),
and ChunkAccumulator is registered unconditionally — the legacy controller
never bound in any profile. Even if it had, IngestionService.ingestExecution
called executionStore.upsert(), and the only ExecutionStore impl
(ClickHouseExecutionStore) threw UnsupportedOperationException from upsert
and upsertProcessors. The entire RouteExecution → upsert path was dead code
carrying four transitive dependencies (RouteExecution import, eventPublisher
wiring, body-size-limit config, searchIndexer::onExecutionUpdated hook).
Removed:
- cameleer-server-app/.../controller/ExecutionController.java (whole file)
- ExecutionStore.upsert + upsertProcessors (interface methods)
- ClickHouseExecutionStore.upsert + upsertProcessors (thrower overrides)
- IngestionService.ingestExecution + toExecutionRecord + flattenProcessors
+ hasAnyTraceData + truncateBody + toJson/toJsonObject helpers
- IngestionService constructor now takes (DiagramStore, WriteBuffer<Metrics>);
dropped ExecutionStore + Consumer<ExecutionUpdatedEvent> + bodySizeLimit
- StorageBeanConfig.ingestionService(...) simplified accordingly
Untouched because still in use:
- ExecutionRecord / ProcessorRecord records (findById / findProcessors /
SearchIndexer / DetailController)
- SearchIndexer (its onExecutionUpdated never fires now since no-one
publishes ExecutionUpdatedEvent, but SearchIndexerStats is still
referenced by ClickHouseAdminController — separate cleanup)
- TaggedExecution record has no remaining callers after this change —
flagged in core-classes.md as a leftover; separate cleanup.
Rule docs updated:
- .claude/rules/app-classes.md: retired ExecutionController bullet, fixed
stale URL for ChunkIngestionController (it owns /api/v1/data/executions,
not /api/v1/ingestion/chunk/executions).
- .claude/rules/core-classes.md: IngestionService surface + note the dead
TaggedExecution.
Full IT suite post-removal: 560 tests run, 11 F + 1 E — same 12 failures
in the same 3 previously-parked classes (AgentSseControllerIT / SseSigningIT
SSE-timing + ClickHouseStatsStoreIT timezone bug). No regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 commits landed on local main; the three remaining parked clusters
each need a specific intent call before the next pass can proceed:
- ClickHouseStatsStoreIT (8 failures) — timezone bug in
ClickHouseStatsStore.lit(Instant); needs a store-side fix, not a
test-side one.
- AgentSseControllerIT + SseSigningIT (4 failures) — SSE connection
timing; looks order-dependent, not spec drift.
Also flagged two side issues worth a follow-up PR:
- ExecutionController legacy path is dead code.
- MetricsFlushScheduler.@Scheduled reads the wrong property key and
silently ignores the configured flush interval in production.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pushToAgents fan-out iterates every distinct (app, env) slice in
the shared agent registry. In isolated runs that's 0, but with Spring
context reuse across IT classes we always see non-zero here. Assert
the response has a pushResult.total field (shape) rather than exact 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ForwardCompatIT: send a valid ExecutionChunk envelope with extra
unknown fields instead of a bare {futureField}. Was being parsed into
an empty/degenerate chunk and rejected with 400.
- ProtocolVersionIT.requestWithCorrectProtocolVersionPassesInterceptor:
same shape fix — minimal valid chunk so the controller's 400 is not
an ambiguous signal for interceptor-passthrough.
- BackpressureIT:
* TestPropertySource keys were "ingestion.*" but IngestionConfig is
bound under "cameleer.server.ingestion.*" — overrides were ignored
and the buffer stayed at its default 50_000, so the 503 overflow
branch was unreachable. Corrected the keys.
* MetricsFlushScheduler's @Scheduled uses a *different* key again
("ingestion.flush-interval-ms"), so we override that separately to
stop the default 1s flush from draining the buffer mid-test.
* executionIngestion_isSynchronous_returnsAccepted now uses the
chunked envelope format.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ClickHouseChunkPipelineIT.setUp was loading /clickhouse/V2__executions.sql
and /clickhouse/V3__processor_executions.sql — resource paths that no
longer exist after 90083f88 collapsed the V1..V18 ClickHouse schema into
init.sql. Swapped for ClickHouseTestHelper.executeInitSql(jdbc).
ClickHouseExecutionReadIT.detailService_buildTree_withIterations was
asserting getLoopIndex() on children of a split, but DetailService's
seq-based buildTree path (buildTreeBySeq) maps FlatProcessorRecord.iteration
into ProcessorNode.iteration — not loopIndex. The loopIndex path is only
populated by buildTreeByProcessorId (the legacy ID-only fallback). Switched
the assertion to getIteration() to match the seq-driven reconstruction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both tests extend AbstractPostgresIT and inherit the Postgres jdbcTemplate,
which they were using to query ClickHouse-resident tables (executions,
processor_executions, route_diagrams). Now:
- DiagramLinkingIT reads diagramContentHash off the execution-detail REST
response (and tolerates JSON null by normalising to empty string, which
matches how the ingestion service stamps un-linked executions).
- IngestionSchemaIT asserts the reconstructed processor tree through the
execution-detail endpoint (covers both flattening on write and
buildTree on read) and reads processor bodies via the processor-snapshot
endpoint rather than raw processor_executions rows.
Both tests now use the ExecutionChunk envelope on POST /data/executions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Largest Cluster B test: seeded 10 executions via the legacy RouteExecution
shape which ChunkIngestionController silently degenerates to empty chunks,
then verified via a Postgres SELECT against a ClickHouse table. Both
failure modes addressed:
- All 10 seed payloads are now ExecutionChunk envelopes (chunkSeq=0,
final=true, flat processors[]).
- Pipeline visibility probe is the env-scoped search REST endpoint
(polling for the last corr-page-10 row).
- searchGet() helper was using the AGENT token; env-scoped read
endpoints require VIEWER+, so it now uses viewerJwt (matches what
searchPost already did).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DiagramControllerIT.postDiagram_dataAppearsAfterFlush now verifies via
GET /api/v1/environments/{env}/apps/{app}/routes/{route}/diagram instead
of a PG SELECT against the ClickHouse route_diagrams table.
DiagramRenderControllerIT seeds both a diagram and an execution on the
same route, then reads the stamped diagramContentHash off the execution-
detail REST response to drive the flat /api/v1/diagrams/{hash}/render
tests. The env-scoped endpoint only serves JSON, so SVG tests still hit
the content-hash endpoint — but the hash comes from REST now, not SQL.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same pattern as DetailControllerIT:
- ExecutionControllerIT: all four tests now post ExecutionChunk envelopes
(chunkSeq=0, final=true) carrying instanceId/applicationId. Flush
visibility check pivoted from PG SELECT → env-scoped search REST.
- MetricsControllerIT: postMetrics_dataAppearsAfterFlush now stamps
collectedAt at now() and verifies through GET /environments/{env}/
agents/{id}/metrics with the default 1h lookback, looking for a
non-zero bucket on the metric name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /api/v1/data/executions is owned by ChunkIngestionController (the
legacy ExecutionController path is @ConditionalOnMissingBean(ChunkAccumulator)
and never binds). The old RouteExecution-shaped seed was silently parsed
as an empty ExecutionChunk and nothing landed in ClickHouse.
Rewrote the seed as a single final ExecutionChunk with chunkSeq=0 /
final=true and a flat processors[] carrying seq + parentSeq to preserve
the 3-level tree (DetailService.buildTree reconstructs the nested shape
for the API response). Execution-id lookup now goes through the search
REST API filtered by correlationId, per the no-raw-SQL preference.
Template for the other Cluster B ITs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records the 5 commits landed this session (65 → 44 failures), the 3
accepted remaining clusters (Cluster B ingestion-payload drift, SSE
timing, small Cluster E tail), and the open questions that require
spec intent before the next pass can proceed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both Testcontainers Postgres ITs were asserting exact counts on rows that
other classes in the shared context had already written.
- FlywayMigrationIT: treat the non-seed tables (users, server_config,
audit_log, application_config, app_settings) as "must exist; COUNT must
return a non-negative integer" rather than expecting exactly 0. The
seeded tables (roles=4, groups=1) still assert exact V1 baseline.
- ConfigEnvIsolationIT.findByEnvironment_excludesOtherEnvs: use unique
prefixed app slugs and switch containsExactlyInAnyOrder to contains +
doesNotContain, so the cross-env filter is still verified without
coupling to other tests' inserts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent list moved from /api/v1/agents to /api/v1/environments/{envSlug}/agents;
the 'valid JWT returns 200' test was hitting the retired flat path and
getting 404. The other 'without JWT' cases still pass because Spring
Security rejects them at the filter chain before URL routing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Registration now requires environmentId in the body (400 if missing), so
the stale register bodies were failing every downstream test that relied
on a registered agent. Affected helpers in:
- BootstrapTokenIT (static constant + inline body)
- JwtRefreshIT (registerAndGetTokens)
- RegistrationSecurityIT (registerAgent)
- SseSigningIT (registerAgentWithAuth)
- AgentSseControllerIT (registerAgent helper)
Also in JwtRefreshIT / RegistrationSecurityIT, the "access token can reach
a protected endpoint" tests were hitting env-scoped read endpoints that
now require VIEWER+. Redirected both to the AGENT-role heartbeat endpoint
— it proves the token is accepted by the security filter without being
coupled to RBAC rules for reader endpoints.
JwtRefreshIT.refreshWithValidToken also dropped an isNotEqualTo assertion
that assumed sub-second iat uniqueness — HMAC JWTs with second-precision
claims are byte-identical when minted for the same subject within the
same second, so the old assertion was flaky by design.
SseSigningIT / AgentSseControllerIT still have SSE-connection timing
failures unrelated to registration — parked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two drifts corrected:
- registerAgent helper missing required environmentId (spec: 400 if absent).
- sendGroupCommand is now synchronous request-reply: returns 200 with an
aggregated CommandGroupResponse {success,total,responded,responses,timedOut}
— no longer 202 with {targetCount,commandIds}. Updated assertions and name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four drifts against the current server contract, all now corrected:
- Registration body missing required environmentId (spec: 400 if absent).
- Agent list moved to env-scoped /api/v1/environments/{envSlug}/agents;
flat /api/v1/agents no longer exists.
- heartbeatUnknownAgent now auto-heals via JWT env claim (fb54f9cb);
the 404 branch is only reachable without a JWT, which the security
filter rejects before the controller sees the request.
- sseEndpoint is an absolute URL (ServletUriComponentsBuilder.fromCurrentContextPath),
so assert endsWith the path rather than equals-to-relative.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproduction: pause a container long enough to cross both the stale
and dead thresholds, then unpause. The agent resumes sending heartbeats
but the server keeps it shown as DEAD. Only a full container restart
(which re-registers) fixes it.
Root cause: AgentRegistryService.heartbeat() only revived STALE → LIVE.
A DEAD agent's heartbeat updated lastHeartbeat but left state unchanged.
checkLifecycle() never downgrades DEAD either (no-op in that branch),
so the agent was permanently stuck in DEAD until a register() call.
Fix: extend the revival branch to also cover DEAD. Same process; a
heartbeat is proof of liveness regardless of the previous state.
Also: AgentLifecycleMonitor.mapTransitionEvent() now emits RECOVERED
for DEAD → LIVE, mirroring its behavior for STALE → LIVE, so the
lifecycle timeline captures the transition.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The project is still greenfield (no production deployment) so this is
the last safe moment to flatten the migration archaeology before the
checksum history starts mattering for real.
Schema changes
- 18 migration files (531 lines) → one V1__init.sql (~380 lines)
declaring the final end-state: RBAC + claim mappings + runtime
management + config + audit + outbound + alerting, plus seed data
(system roles, Admins group, default environment).
- Drops the data-repair statements from V14 (firemode backfill),
V16 (subjectFingerprint migration), V17 (ACKNOWLEDGED → FIRING
coercion) — they were no-ops on any DB that starts at V1.
- Declares condition_kind_enum with AGENT_LIFECYCLE from the start
(was added retroactively by V18).
- Declares alert_state_enum with three values only (was five, then
swapped in V17) and alert_instances with read_at / deleted_at
columns from day one (was added by V17).
- alert_reads table never created (V12 created, V17 dropped).
- alert_instances_open_rule_uq built with the V17 predicate from
the start.
Test changes
- Replace V12MigrationIT / V17MigrationIT / V18MigrationIT with one
SchemaBootstrapIT that asserts the combined invariants: tables
present, alert_reads absent, enum value sets, alert_instances has
read_at + deleted_at, open_rule_uq exists and is unique, env-delete
cascade fires.
Verification
- pg_dump of the new V1 matches the pg_dump of V1..V18 applied in
sequence (bytewise modulo column order and Postgres-auto FK names).
- Full alerting IT suite (53 tests across 6 classes) green against
the new schema.
- The 47 pre-existing test failures on main (AgentRegistrationIT,
SearchControllerIT, ClickHouseStatsStoreIT, …) are unrelated and
fail identically without this change.
Developer impact
- Existing local DBs will fail checksum validation on boot. Wipe:
docker compose down -v (or drop the tenant_default schema).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the previous describeApiError rollout to the rest of the UI.
Two symptom classes covered:
- Bare e.message / err.message in toast descriptions would render
"undefined" on Spring error bodies (plain objects without a proper
Error prototype). Affected: OidcConfigPage (save/test/delete),
ClaimMappingRulesModal (save + test), AgentHealth (dismiss),
RouteControlBar (route action + replay).
- Inline {String(error)} on load-failure banners would render
"[object Object]". Affected: InboxPage, RulesListPage, SilencesPage,
OutboundConnectionsPage.
Not touched: auth-store, AppsTab, UsersTab — they already guard with
`e instanceof Error` and fall back to a static string; replacing the
fallback with describeApiError would be a behavioral change best
evaluated separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend
- V18 migration adds AGENT_LIFECYCLE to condition_kind_enum. Java
ConditionKind enum shipped with this value but no Postgres migration
extended the type, so any AGENT_LIFECYCLE rule insert failed with
"invalid input value for enum condition_kind_enum".
- ALTER TYPE ... ADD VALUE lives alone in its migration per Postgres
constraint that the new value cannot be referenced in the same tx.
- V18MigrationIT asserts the enum now contains all 7 kinds.
Frontend
- Add describeApiError(e) helper to unwrap openapi-fetch error bodies
(Spring error JSON) into readable strings. String(e) on a plain
object rendered "[object Object]" in toasts — the actual failure
reason was hidden from the user.
- Replace String(e) in all 13 toast descriptions across the alerting
and outbound-connection mutation paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrite the "Infrastructure Setup" / "Run the Server" sections to
reflect what docker-compose.yml actually provides (full stack —
PostgreSQL + ClickHouse + server + UI — not just PostgreSQL). Adds:
- Step-by-step walkthrough for a first-run clean environment.
- Port map including the UI (8080), ClickHouse (8123/9000), PG (5432),
server (8081).
- Dev credentials baked into compose surfaced in one place.
- Lifecycle commands (stop/start/rebuild-single-service/wipe).
- Infra-only mode for backend-via-mvn / UI-via-vite iteration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitNexus analyze --embeddings after the alerts-inbox-redesign branch
brought the graph to 8780 symbols / 22753 relationships (was 8527/22174
in AGENTS.md and 8603/22281 in CLAUDE.md). The stat-header drift between
AGENTS.md and CLAUDE.md is an artifact of separate reindexes — both now
in sync.
ui/tsconfig.app.tsbuildinfo was a stale tsc incremental-build cache
that shouldn't be tracked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UI: AlertStateChip.LABELS and .COLORS no longer include ACKNOWLEDGED
(dropped in V17). AlertStateChip.test.tsx test-cases trimmed to the
three remaining states. LayoutShell CMD-K now searches FIRING alerts
with acked=false (was state=[FIRING,ACKNOWLEDGED]).
Test: V17MigrationIT.open_rule_index_predicate_is_reworked replaced
with a structural-only assertion (index exists, indisunique). The
pg_get_indexdef pretty-printer varies across Postgres versions, so
predicate semantics are verified behaviorally in
PostgresAlertInstanceRepositoryIT (findOpenForRule_* +
save_rejectsSecondOpenInstanceForSameRuleAndExchange).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Used by InboxPage's 'Silence rule… → Custom…' flow to carry the alert's
ruleId into the silence creation form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- STATE_ITEMS gains color dots (text-muted/error/success) to match SEVERITY_ITEMS
- onDeleteOne removes the deleted id from the selection Set so a follow-up bulk
action doesn't try to re-delete a tombstoned row
- drop stale comment block that described an alternative SilenceRulesForSelection
implementation not matching the shipped code
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the old FIRING+ACK hardcoded inbox with the single filterable
inbox:
- Filter bar: Severity · Status (PENDING/FIRING/RESOLVED, default FIRING) ·
Hide acked (default on) · Hide read (default on).
- Row actions: Ack, Mark read, Silence rule… (quick menu), Delete
(OPERATOR+, soft delete with undo toast wired to useRestoreAlert).
- Bulk toolbar: Ack N · Mark N read · Silence rules · Delete N
(ConfirmDialog; OPERATOR+).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Used by InboxPage row + bulk actions to silence an alert's underlying
rule for a chosen preset window. 'Custom…' routes to
/alerts/silences?ruleId=<id> (T13 adds the prefill wire).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar Alerts section now just: Inbox · Rules · Silences. The /alerts
redirect still lands in /alerts/inbox; /alerts/all and /alerts/history
routes are gone (no redirect — stale URLs 404 per clean-break policy).
Also updates sidebar-utils.test.ts to match the new 3-entry shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- useAlerts gains acked/read filter params threaded into query + queryKey
- new mutations: useBulkAckAlerts, useDeleteAlert, useBulkDeleteAlerts, useRestoreAlert
- all cache-invalidate the alerts list and unread-count on success
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New endpoints visible to the SPA: DELETE /alerts/{id}, POST
/alerts/{id}/restore, POST /alerts/bulk-delete, POST /alerts/bulk-ack.
GET /alerts gains tri-state acked / read query params. AlertDto now
includes readAt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- new AlertInstanceRepository.filterInEnvLive(ids, env): single-query bulk ID validation
- AlertController.inEnvLiveIds now one SQL round-trip instead of N
- bulkMarkRead SQL: defense-in-depth AND deleted_at IS NULL
- bulkAck SQL already had deleted_at IS NULL guard — no change needed
- PostgresAlertInstanceRepositoryIT: add filterInEnvLive_excludes_other_env_and_soft_deleted
- V12MigrationIT: remove alert_reads assertion (table dropped by V17)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GET /alerts gains tri-state acked + read query params
- new endpoints: DELETE /{id} (soft-delete), POST /bulk-delete, POST /bulk-ack, POST /{id}/restore
- requireLiveInstance 404s on soft-deleted rows; restore() reads the row regardless
- BulkReadRequest → BulkIdsRequest (shared body for bulk read/ack/delete)
- AlertDto gains readAt; deletedAt stays off the wire
- InAppInboxQuery.listInbox threads acked/read through to the repo (7-arg, no more null placeholders)
- SecurityConfig: new matchers for bulk-ack (VIEWER+), DELETE/bulk-delete/restore (OPERATOR+)
- AlertControllerIT: persistence assertions on /read + /bulk-read; full coverage for new endpoints
- InAppInboxQueryTest: updated to 7-arg listInbox signature
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The class under test was removed in da281933; the IT became a @Disabled
placeholder. Deleting per no-backwards-compat policy. Read mutation
coverage lives in PostgresAlertInstanceRepositoryIT going forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove SET state='ACKNOWLEDGED' from ack() and the ACKNOWLEDGED predicate
from findOpenForRule — both would error after V17. The final ack() + open-rule
semantics (idempotent guards, deleted_at) are owned by Task 5; this is just
the minimum to stop runtime SQL errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>