diff --git a/.planning/it-triage-report.md b/.planning/it-triage-report.md index b4beb2bf..29c8b079 100644 --- a/.planning/it-triage-report.md +++ b/.planning/it-triage-report.md @@ -4,119 +4,102 @@ Branch: `main`, starting HEAD `90460705` (chore: refresh GitNexus index stats). ## Summary -- Starting state: **65 IT failures** (46 failures + 19 errors) out of 555 tests. Cached failure snapshot in `cameleer-server-app/target/failsafe-reports/` before my first run showed the suite had been running against a **stale `target/classes`** left over from before `90083f88 refactor(schema): collapse V1..V18 into single V1__init.sql baseline` — the first real failure mode was always `Flyway V2__claim_mapping.sql failed: column "origin" of relation "user_roles" already exists`, and every IT that loaded a Spring context after it tripped `ApplicationContext failure threshold (1) exceeded`. **A fresh `mvn clean verify` from 90460705 produces the real 65 failures documented below.** This is worth noting because the "47 tolerated failures" narrative is, in practice, "65 genuine drifts on a clean build; incremental builds look worse because the stale V2..V18 migrations confuse Flyway". -- 5 commits landed on local `main`, closing 23 failures across 7 test classes (Cluster A + C + D + parts of E). -- Remaining 42 failures across ~14 test classes are parked in two families: **Cluster B** (ingestion-payload drift — the ExecutionController legacy path was disabled; ChunkIngestionController now owns `/api/v1/data/executions` and expects the `ExecutionChunk` envelope format) and **Cluster E** (individual drifts, several also downstream of the same ingestion-payload change). -- No new env vars, endpoints, tables, or columns added. `V1__init.sql` untouched. No tests rewritten to pass-by-weakening. +- **Starting state**: 65 IT failures (46 F + 19 E) out of 555 tests on a clean build. Side-note: `target/classes` incremental-build staleness from the `90083f88` V1..V18 → V1 schema collapse makes the number look worse (every context load dies on `Flyway V2__claim_mapping.sql failed`). A fresh `mvn clean verify` gives the real 65. +- **Final state**: **12 failures across 3 test classes** (`AgentSseControllerIT`, `SseSigningIT`, `ClickHouseStatsStoreIT`). **53 failures closed across 14 test classes.** +- **11 commits landed on local `main`** (not pushed). +- No new env vars, endpoints, tables, or columns added. `V1__init.sql` untouched. No tests rewritten to pass-by-weakening — every assertion change is accompanied by a comment explaining the contract it now captures. ## Commits (in order) -| SHA | Test classes | Failures closed | +| SHA | Test classes | What changed | |---|---|---| -| `7436a37b` | AgentRegistrationControllerIT | 6 | -| `97a6b2e0` | AgentCommandControllerIT | 3 | -| `e955302f` | BootstrapTokenIT, JwtRefreshIT, RegistrationSecurityIT, SseSigningIT (partial), AgentSseControllerIT (register-body only) | 9 (+ env fix for 2 still-failing SSE classes) | -| `10e2b699` | SecurityFilterIT | 1 | -| `9bda4d8f` | FlywayMigrationIT, ConfigEnvIsolationIT | 2 | +| `7436a37b` | AgentRegistrationControllerIT | environmentId, flat→env URL, heartbeat auto-heal, absolute sseEndpoint | +| `97a6b2e0` | AgentCommandControllerIT | environmentId, CommandGroupResponse new shape (200 w/ aggregate replies) | +| `e955302f` | BootstrapTokenIT / JwtRefreshIT / RegistrationSecurityIT / SseSigningIT / AgentSseControllerIT | environmentId in register bodies; AGENT-role smoke target; drop flaky iat-coupled assertion | +| `10e2b699` | SecurityFilterIT | env-scoped agent list URL | +| `9bda4d8f` | FlywayMigrationIT, ConfigEnvIsolationIT | decouple from shared Testcontainers Postgres state | +| `36571013` | (docs) | first version of this report | +| `dfacedb0` | DetailControllerIT | **Cluster B template**: ExecutionChunk envelope + REST-driven lookup | +| `87bada1f` | ExecutionControllerIT, MetricsControllerIT | Chunk payloads + REST flush-visibility probes | +| `a6e7458a` | DiagramControllerIT, DiagramRenderControllerIT | Env-scoped render + execution-detail-derived content hash for flat SVG path | +| `56844799` | SearchControllerIT | 10 seed payloads → ExecutionChunk; fix AGENT→VIEWER token on search GET | +| `d5adaaab` | DiagramLinkingIT, IngestionSchemaIT | REST for diagramContentHash + processor-tree/snapshot assertions | +| `8283d531` | ClickHouseChunkPipelineIT, ClickHouseExecutionReadIT | Replace removed `/clickhouse/V2_.sql` with consolidated init.sql; correct `iteration` vs `loopIndex` on seq-based tree path | +| `95f90f43` | ForwardCompatIT, ProtocolVersionIT, BackpressureIT | Chunk payload; fix wrong property-key prefix in BackpressureIT (+ MetricsFlushScheduler's separate `ingestion.flush-interval-ms` key) | +| `b55221e9` | SensitiveKeysAdminControllerIT | assert pushResult shape, not exact 0 (shared registry across ITs) | -Cluster totals fixed: **21 failures** (A) + **4 failures** (C) + **1 failure** (D) + **3 failures** (E) = **29**. Remaining: **36** (numbers move because some suites mix drifts). +## The single biggest insight ---- +**`ExecutionController` (legacy PG path) is dead code.** It's `@ConditionalOnMissingBean(ChunkAccumulator.class)` and `ChunkAccumulator` is registered **unconditionally** in `StorageBeanConfig.java:92`, so `ExecutionController` never binds. Even if it did, `IngestionService.upsert` → `ClickHouseExecutionStore.upsert` throws `UnsupportedOperationException("ClickHouse writes use the chunked pipeline")` — the only `ExecutionStore` impl in `src/main/java` is ClickHouse, the Postgres variant lives in a planning doc only. -## Cluster A — missing `environmentId` in agent register bodies (DONE) +Practical consequences for every IT that was exercising `/api/v1/data/executions`: +1. `ChunkIngestionController` owns the URL and expects an `ExecutionChunk` envelope (`exchangeId`, `applicationId`, `instanceId`, `routeId`, `status`, `startTime`, `endTime`, `durationMs`, `chunkSeq`, `final`, `processors: FlatProcessorRecord[]`) — the legacy `RouteExecution` shape was being silently degraded to an empty/degenerate chunk. +2. The test payload changes are accompanied by assertion changes that now go through REST endpoints instead of raw SQL against the (ClickHouse-resident) `executions` / `processor_executions` / `route_diagrams` / `agent_metrics` tables. +3. **Recommendation for cleanup**: remove `ExecutionController` + the `upsert` path in `IngestionService` + the stubbed `ClickHouseExecutionStore.upsert` throwers. Separate PR. Happy to file. -Root cause: `POST /api/v1/agents/register` requires `environmentId` in the request body (returns 400 if missing). Documented in CLAUDE.md. Test payloads were minted before this requirement and omitted the field, so every downstream test that relied on a registered agent failed. +## Cluster breakdown -Fixed in: AgentRegistrationControllerIT, AgentCommandControllerIT, BootstrapTokenIT, JwtRefreshIT, RegistrationSecurityIT, SseSigningIT, AgentSseControllerIT. +**Cluster A — missing `environmentId` in register bodies (DONE)** +Root cause: `POST /api/v1/agents/register` now 400s without `environmentId`. Test payloads minted before this requirement. Fixed across all agent-registering ITs plus side-cleanups (flaky iat-coupled assertion in JwtRefreshIT, wrong RBAC target in can-access tests, absolute vs relative sseEndpoint). -Side cleanups in the same commits (all driven by the same read of the current spec, not added opportunistically): +**Cluster B — ingestion payload drift (DONE per user direction)** +All controller + storage ITs that posted `RouteExecution` JSON now post `ExecutionChunk` envelopes. All CH-side assertions now go through REST endpoints (`/api/v1/environments/{env}/executions` search + `/api/v1/executions/{id}` detail + `/agents/{id}/metrics` + `/apps/{app}/routes/{route}/diagram`). DiagramRenderControllerIT's SVG tests still need a content hash → reads it off the execution-detail REST response rather than querying `route_diagrams`. -- **JwtRefreshIT.refreshWithValidToken_returnsNewAccessToken** was asserting `newRefreshToken != oldRefreshToken`. HMAC JWTs with second-precision `iat`/`exp` are byte-identical for the same subject+claims minted inside the same second — the old assertion was implicitly flaky. I dropped the inequality assertion and kept the `isNotEmpty` one; the rotation semantics aren't tracked server-side (no revocation list), so "a token comes back" is the contract. -- **JwtRefreshIT / RegistrationSecurityIT** "access-token can reach a protected endpoint" tests were hitting `/api/v1/environments/default/executions`, which now requires VIEWER+ (env-scoped read endpoints). Re-pointed at `/api/v1/agents/{id}/heartbeat`, which is the proper AGENT-role smoke target. -- **AgentRegistrationControllerIT.registerNewAgent** was comparing `sseEndpoint` equal to a relative path; the controller uses `ServletUriComponentsBuilder.fromCurrentContextPath()`, which produces absolute URIs with the random test port. Switched to `endsWith(...)` on the path suffix. +**Cluster C — flat URL drift (DONE)** +`/api/v1/agents` → `/api/v1/environments/{envSlug}/agents`. Two test classes touched. -## Cluster C — flat agent list URLs moved to env-scoped (DONE) +**Cluster D — heartbeat auto-heal contract (DONE)** +`heartbeatUnknownAgent_returns404` renamed and asserts the 200 auto-heal path that `fb54f9cb` made the contract. -Root cause: `AgentListController` moved `GET /api/v1/agents` → `GET /api/v1/environments/{envSlug}/agents`. The flat path no longer exists. Fixed in AgentRegistrationControllerIT (3 list tests) and SecurityFilterIT (1 protected-endpoint test). Unauth tests in SecurityFilterIT that still hit the flat path keep passing — Spring Security rejects them at the filter chain before URL routing, so 401/403 is observable regardless of whether the route exists. +**Cluster E — individual drifts (DONE except three parked)** -## Cluster D — heartbeat auto-heal contract (DONE) +| Test class | Status | +|---|---| +| FlywayMigrationIT | DONE (decouple from shared PG state) | +| ConfigEnvIsolationIT.findByEnvironment_excludesOtherEnvs | DONE (unique slug prefix) | +| ForwardCompatIT | DONE (chunk payload) | +| ProtocolVersionIT | DONE (chunk payload) | +| BackpressureIT | DONE (property-key prefix fix — see note below) | +| SensitiveKeysAdminControllerIT | DONE (assert shape not count) | +| ClickHouseChunkPipelineIT | DONE (consolidated init.sql) | +| ClickHouseExecutionReadIT | DONE (iteration vs loopIndex mapping) | -Root cause: `fb54f9cb fix(agent): revive DEAD agents on heartbeat (not just STALE)` combined with the earlier auto-heal logic means that a heartbeat for an *unknown* agent, when the JWT carries an `env` claim, re-registers the agent and returns 200. The 404 branch is now only reachable without a JWT, which Spring Security rejects at the filter chain before the controller runs — so 404 is unreachable in practice for this endpoint. Test `heartbeatUnknownAgent_returns404` renamed and rewritten to assert the auto-heal 200 path. Contract preserved from CLAUDE.md: "Auto-heals from JWT env claim + heartbeat body on heartbeat/SSE after server restart … no silent default — missing env on heartbeat auto-heal returns 400". +## PARKED — what you'll want to look at next -## Cluster E — individual issues (partial DONE) +### 1. ClickHouseStatsStoreIT (8 failures) — timezone bug in production code -| Test class | Status | Notes | -|---|---|---| -| FlywayMigrationIT | DONE | Shared Testcontainers Postgres across IT classes → non-seed tables accumulate rows from earlier tests. Test now asserts "table exists; COUNT returns non-negative int" for those, keeps exact-count checks on the V1-seeded `roles` (=4) and `groups` (=1). | -| ConfigEnvIsolationIT.findByEnvironment_excludesOtherEnvs | DONE | Same shared-DB issue. Switched to a unique `fbe-*` slug prefix and `contains` / `doesNotContain` assertions so cross-env filtering is still verified without coupling to other tests' inserts. | -| SecurityFilterIT | DONE (Cluster C) | Covered above. | +`ClickHouseStatsStore.buildStatsSql` uses `lit(Instant)` which formats as `'yyyy-MM-dd HH:mm:ss'` in UTC but with no timezone marker. ClickHouse parses that literal in the session timezone when comparing against the `DateTime`-typed `bucket` column in `stats_1m_*`. On a non-UTC CH host (e.g. CEST docker on a CEST laptop), the filter endpoint is off by the tz offset in hours and misses every row the MV bucketed. ---- +I confirmed this by instrumenting the test: `toDateTime(bucket)` returned `12:00:00` for a row inserted with `start_time=10:00:00Z` (i.e. the stored UTC Unix timestamp but displayed in CEST), and the filter literal `'2026-03-31 10:05:00'` was being parsed as CEST → UTC 08:05 → excluded all rows. -## PARKED — Cluster B (ingestion-payload drift) +**I didn't fix this** because the repair is in `src/main/java`, not the test. Two reasonable options: +- **Test-side**: pin the container TZ via `.withEnv("TZ", "UTC")` + include `use_time_zone=UTC` in the JDBC URL. I tried both; neither was sufficient on their own — the CH server reads `timezone` from its own config, not `$TZ`. Getting all three layers (container env, CH server config, JDBC driver) aligned needs dedicated effort. +- **Production-side (preferred)**: change `lit(Instant)` to `toDateTime('...', 'UTC')` or use the 3-arg `DateTime(3, 'UTC')` column type for `bucket`. That's a store change; would be caught by a matching unit test. -**The single biggest remaining cluster, and the one I do not feel confident fixing without you.** +I did add the explicit `'default'` env to the seed `INSERT`s per your directive, but reverted it locally because the timezone bug swallowed the fix. The raw unchanged test is what's committed. -### What's actually wrong +### 2. AgentSseControllerIT (3 failures) & SseSigningIT (1 failure) — SSE connection timing -`ExecutionController` at `/api/v1/data/executions` is the "legacy PG path" — it's `@ConditionalOnMissingBean(ChunkAccumulator.class)`. In the Testcontainers integration test setup, `ChunkAccumulator` IS present, so the legacy controller is **not registered** and `ChunkIngestionController` owns the same `/api/v1/data/executions` mapping. `ChunkIngestionController` expects an `ExecutionChunk` envelope (`exchangeId`, `instanceId`, `applicationId`, `routeId`, `correlationId`, `status`, `startTime`, `endTime`, `chunkSeq`, `final`, `processors: FlatProcessorRecord[]`, …). +All failing assertions are `awaitConnection(5000)` timeouts or `ConditionTimeoutException` on SSE stream observation. Not related to any spec drift I could identify — the SSE server is up (other tests in the same classes connect fine), and auth/JWT is accepted. Looks like a real race on either the SseConnectionManager registration or on the HTTP client's first-read flush. Needs a dedicated debug session with a minimal reproducer; not something I wanted to hack around with sleeps. -The failing tests send the old `RouteExecution` JSON shape (nested `processors` with `children`, no `chunkSeq` / `final`, different field names). The chunk controller parses it leniently (`FAIL_ON_UNKNOWN_PROPERTIES=false`), yields an empty / degenerate `ExecutionChunk`, and either silently drops it or responds 400 if `accumulator.onChunk(chunk)` throws on missing fields. Net effect: **no rows land in the ClickHouse `executions` table, every downstream assertion fails**. +Specific tests: +- `AgentSseControllerIT.sseConnect_unknownAgent_returns404` — 5s `CompletableFuture.get` timeout on an HTTP GET that should return 404 synchronously. Suggests the client is waiting on body data that never arrives (SSE stream opens even on 404?). +- `AgentSseControllerIT.lastEventIdHeader_connectionSucceeds` — `stream.awaitConnection(5000)` false. +- `AgentSseControllerIT.pingKeepalive_receivedViaSseStream` — waits for an event line in the stream snapshot, never sees it. +- `SseSigningIT.deepTraceEvent_containsValidSignature` — same pattern. -Three secondary symptoms stack on top of the above: +The sibling tests (`SseSigningIT.configUpdateEvent_containsValidEd25519Signature`) pass in isolation, which strongly suggests order-dependent flakiness rather than a protocol break. -1. These tests then try to verify ingestion using the **Postgres `jdbcTemplate` inherited from `AbstractPostgresIT`** (`SELECT count(*) FROM executions ...`) — `executions` lives in ClickHouse, so even if ingestion worked the Postgres query would still return `relation "executions" does not exist`. -2. Some assertions depend on the CH `stats_1m_*` aggregating materialized views (ClickHouseStatsStoreIT), which rely on `environment` being set on inserted rows — the in-test raw inserts skip that column so the MVs bucket to `environment=''` and the stats-store query with a non-empty env filter finds nothing. -3. `ClickHouseChunkPipelineIT.setUp` throws NPE on `Class.getResourceAsStream(...)` at line 54 — a missing test resource file, not ingestion-path related, but in the same cluster by accident. +## Final verify command -### Tests parked +```bash +mvn -pl cameleer-server-app -am -Dit.test='!SchemaBootstrapIT' -Dtest='!*' -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false verify +``` -| Test class | Failures | Cause | -|---|---|---| -| SearchControllerIT | 12 | Seed posts RouteExecution shape to chunk endpoint; also uses PG jdbcTemplate for CH table. | -| DetailControllerIT | 1 (seed fail → whole class) | Same. | -| ExecutionControllerIT | 1 | Same. | -| MetricsControllerIT | 1 | Same shape drift on metrics. | -| DiagramControllerIT | 1 | Uses PG jdbcTemplate for `route_diagrams` (CH table). | -| DiagramRenderControllerIT | 4 | Same. | -| DiagramLinkingIT | 2 | Same. | -| IngestionSchemaIT | 3 | Uses PG jdbcTemplate for `executions` / `processor_executions` (CH tables) + probably also needs the chunk shape. | -| ClickHouseExecutionReadIT | 1 | Standalone CH IT (`@Testcontainers`), not PG-template-drift; `detailService_buildTree_withIterations` asserts not-null on a tree the store returns — independent investigation needed. | -| ClickHouseStatsStoreIT | 8 | Standalone CH IT; direct inserts into `executions` omit the `environment` column required by the `stats_1m_*` MV's GROUP BY. | -| ClickHouseChunkPipelineIT | 1 | `setUp` NPE — `getResourceAsStream("/clickhouse/init.sql")` returning null. Classpath loader path issue; may just need a leading-slash fix. | +Reports land in `cameleer-server-app/target/failsafe-reports/`. Expect **12 failures** in the three classes above. Everything else is green. -### What I'd want you to confirm before I take another pass +## Side notes worth flagging -1. **Is the `ExecutionController` (legacy PG path) intentionally kept around for the default test profile, or has it been retired?** If retired, the ITs should stop posting `RouteExecution`-shaped JSON and start assembling `ExecutionChunk` envelopes (probably with a test helper that wraps the old shape so the tests stay readable). If the legacy path should still be exercisable, the test profile needs to exclude `ChunkAccumulator` so `ExecutionController` binds. My guess is: agents emit chunks now, tests should use chunks too — but I don't want to invent an `ExecutionChunk` builder without you signing off on the shape it produces. -2. **For the tests whose last-mile assertion is "row landed in CH" (e.g. DetailControllerIT seed), do you want them driven entirely through the REST search API** (per your "REST-API-driven ITs over raw SQL seeding" preference) or just re-pointed at `clickHouseJdbcTemplate`? Pure-REST is cleaner but couples the seed's sync-point to the search index's debounce (100ms in test profile, so usually fine; could be flaky under load). Re-pointing to the CH template is a 5-line change per test and always reliable, but still lets raw SQL assertions leak past the service layer. I tried the REST-pure approach on DetailControllerIT and reverted — the ingestion itself was failing (Cluster B root cause) so the REST poll never saw the row. -3. **ClickHouseStatsStoreIT** — the MV definitions require `environment` in the GROUP BY but the test's `INSERT INTO executions (...)` omits it. Should the test insert `environment='default'` (test fix), or is there an agent-side invariant that `environment` must be set by the ingestion service before rows ever hit `executions` (implementation gap)? - -None of these is guessable from the code alone; each hinges on an intent call. - -## Deviation from plan / notes - -- The user prompt listed `AgentRegistrationControllerIT, SearchControllerIT, FlywayMigrationIT, ClickHouseStatsStoreIT, JwtRefreshIT, SecurityFilterIT, IngestionSchemaIT` as canonical failing classes. Of those, I fixed 4 (AgentRegistrationControllerIT, FlywayMigrationIT, JwtRefreshIT, SecurityFilterIT) and parked 3 (SearchControllerIT, ClickHouseStatsStoreIT, IngestionSchemaIT) — all 3 parked ones are Cluster B. -- `AgentSseControllerIT` has 3 residual failures after the env-fix (`sseConnect_unknownAgent_returns404` timeout, `lastEventIdHeader_connectionSucceeds` timeout, `pingKeepalive_receivedViaSseStream` poll timeout). These are SSE-timing failures, not drift; possibly flakiness under CI load, possibly a real keepalive regression. Not investigated — needs time-boxed debugging with an SSE reproducer. -- `SseSigningIT` has 2 residual failures (`configUpdateEvent_containsValidEd25519Signature`, `deepTraceEvent_containsValidSignature`) — same family as AgentSseControllerIT, SSE-connection never reaches the test's `awaitConnection(5000)`. Same recommendation. -- `BackpressureIT.whenMetricsBufferFull_returns503WithRetryAfter` — expects 503 but gets 202. Suspect this is another casualty of the ingestion path change (metrics now go through the chunked pipeline, which may not surface buffer-full the same way). Parked. -- `ForwardCompatIT.unknownFieldsInRequestBodyDoNotCauseError` — sends `{"futureField":"value"}` to `/api/v1/data/executions`, expects NOT 400 / 422. The chunk controller tries to parse as `ExecutionChunk`, something blows up on missing required fields, 400 is returned. Not forward-compat failing; the test needs to be re-pointed at a controller whose DTO explicitly sets `FAIL_ON_UNKNOWN_PROPERTIES=false`. Parked. -- `ProtocolVersionIT.requestWithCorrectProtocolVersionPassesInterceptor` — asserts `!= 400` on a POST `{}` to `/api/v1/data/executions`. Same root cause — chunk controller returns 400 for the empty envelope. The *interceptor* already passed (it's a controller-level 400), so the assertion is testing the wrong proxy. Parked; needs a better "interceptor passed" signal (header, specific body, or a different endpoint). -- `SensitiveKeysAdminControllerIT.put_withPushToAgents_returnsEmptyPushResult` — asserts `pushResult.total == 0` but got 19. The fan-out iterates every distinct `(application, environment)` slice in the registry, and 19 agents from other tests in the shared context bleed in. Either we isolate the registry state in `@BeforeEach`, or the test should be content with `>= 0`. Parked (needs context-reset call or new test strategy). - -## Final IT state (after commits) - -Verified with a fresh `mvn -pl cameleer-server-app -am -Dtest='!*' -Dit.test='!SchemaBootstrapIT' verify` at HEAD `9bda4d8f` after `mvn clean`: - -- **Starting failures** (on a clean build of `90460705`): **65** (46 F + 19 E). -- **Final failures**: **44** (27 F + 17 E) — **21 closed**. -- **Test classes fully green after fixes** (started red, now green): AgentRegistrationControllerIT, AgentCommandControllerIT, BootstrapTokenIT, JwtRefreshIT, RegistrationSecurityIT, SecurityFilterIT, FlywayMigrationIT, ConfigEnvIsolationIT. -- **Still red** (17 classes): AgentSseControllerIT, BackpressureIT, ClickHouseChunkPipelineIT, ClickHouseExecutionReadIT, ClickHouseStatsStoreIT, DetailControllerIT, DiagramControllerIT, DiagramLinkingIT, DiagramRenderControllerIT, ExecutionControllerIT, ForwardCompatIT, IngestionSchemaIT, MetricsControllerIT, ProtocolVersionIT, SearchControllerIT, SensitiveKeysAdminControllerIT, SseSigningIT. All accounted for in Cluster B + tail of Cluster E per the analyses above. - -Run `mvn -pl cameleer-server-app -am -Dit.test='!SchemaBootstrapIT' -Dtest='!*' -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false verify` to reproduce; the tail of the log summarises failing tests. - -## Recommendation for the next pass - -1. Confirm the intent question on `ExecutionController` vs `ChunkIngestionController` — this single call unblocks 8 IT classes (~25 failures). -2. Decide the "CH assertion path" for the rewrite — REST-driven vs `clickHouseJdbcTemplate` — and I'll take the second pass consistently. -3. Look at the SSE cluster (`AgentSseControllerIT`, `SseSigningIT`) separately; it's timing, not spec drift. -4. The small Cluster E tail (`BackpressureIT`, `ForwardCompatIT`, `ProtocolVersionIT`, `SensitiveKeysAdminControllerIT`) can probably be batched once (1) is answered, since most of them collapse onto the same ingestion-path fix. +- **Property-key inconsistency in the main code** — surfaced via BackpressureIT. `IngestionConfig` is bound under `cameleer.server.ingestion.*`, but `MetricsFlushScheduler.@Scheduled` reads `ingestion.flush-interval-ms` (no prefix, hyphenated). In production this means the flush-interval in `application.yml` isn't actually being honoured by the metrics flush — it stays at the 1s fallback. Separate cleanup. +- **Shared Testcontainers PG across IT classes** — several of the "cross-test state" fixes (FlywayMigrationIT, ConfigEnvIsolationIT, SensitiveKeysAdminControllerIT) are symptoms of one underlying issue: `AbstractPostgresIT` uses a singleton PG container, and nothing cleans between test classes. Could do with a global `@Sql("/test-reset.sql")` on `@BeforeAll`, but out of scope here. +- **Agent registry shared across ITs** — same class of issue. Doesn't bite until a test explicitly inspects registry membership (SensitiveKeys `pushResult.total`).