From d99650015bc64b6049ecfc3ae6ccef1879c2473b Mon Sep 17 00:00:00 2001 From: hsiegeln <37154749+hsiegeln@users.noreply.github.com> Date: Wed, 11 Mar 2026 18:10:15 +0100 Subject: [PATCH] docs(03): capture phase context Co-Authored-By: Claude Opus 4.6 --- .../03-agent-registry-sse-push/03-CONTEXT.md | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 .planning/phases/03-agent-registry-sse-push/03-CONTEXT.md diff --git a/.planning/phases/03-agent-registry-sse-push/03-CONTEXT.md b/.planning/phases/03-agent-registry-sse-push/03-CONTEXT.md new file mode 100644 index 00000000..a80e6e94 --- /dev/null +++ b/.planning/phases/03-agent-registry-sse-push/03-CONTEXT.md @@ -0,0 +1,95 @@ +# Phase 3: Agent Registry + SSE Push - Context + +**Gathered:** 2026-03-11 +**Status:** Ready for planning + + +## Phase Boundary + +Server tracks connected agents through their full lifecycle (LIVE/STALE/DEAD) and can push configuration updates, deep-trace commands, and replay commands to specific agents (or groups/all) in real time via SSE. JWT auth enforcement and Ed25519 signing are Phase 4 — this phase builds the registration flow, heartbeat lifecycle, SSE streams, and command push infrastructure. + + + + +## Implementation Decisions + +### Agent lifecycle timing +- Heartbeat interval: 30 seconds +- STALE threshold: 90 seconds (3 missed heartbeats) +- DEAD threshold: 5 minutes after going STALE +- DEAD agents kept indefinitely (no auto-purge) +- Agent list endpoint returns all agents (LIVE, STALE, DEAD) with `?status=` filter parameter + +### SSE command model +- Generic command endpoint: `POST /api/v1/agents/{id}/commands` with `{"type": "config-update|deep-trace|replay", "payload": {...}}` +- Three targeting levels: single agent (`/agents/{id}/commands`), group (`/agents/groups/{group}/commands`), all live agents (`/agents/commands`) +- Agent self-declares group name at registration (e.g., "order-service-prod") +- Command delivery tracking: server tracks each command as PENDING until agent acknowledges (via dedicated ack mechanism) +- Pending commands expire after 60 seconds if undelivered + +### Registration handshake +- Agent provides its own persistent ID at registration (from agent config) +- Rich registration payload: agent ID, name, group, version, list of route IDs, capabilities +- Re-registration with same ID resumes existing identity (agent restart scenario) +- Heartbeat is just a ping — no metadata update (agent re-registers if routes/version change) +- Registration response includes: SSE endpoint URL, current server config (heartbeat interval, etc.), server public key placeholder (Phase 4) + +### SSE reconnection behavior +- Last-Event-ID supported but does NOT replay missed events — only future events delivered on reconnect +- Pending commands are NOT auto-pushed on reconnect — caller must re-send if needed +- SSE ping/keepalive interval: 15 seconds + +### Claude's Discretion +- In-memory vs persistent storage for agent registry (in-memory is fine for v1, ClickHouse later if needed) +- Command acknowledgement mechanism details (heartbeat piggyback vs dedicated endpoint) +- SSE implementation approach (Spring SseEmitter, WebFlux, or other) +- Thread scheduling for lifecycle state transitions (scheduled executor, Spring @Scheduled) + + + + +## Specific Ideas + +- HA/LB group targeting enables fleet-wide operations like config rollouts across all instances of a service +- Agent-provided persistent IDs mean the agent controls its identity — useful for containerized deployments where hostname changes but agent config persists +- 60-second command expiry is aggressive — commands are time-sensitive operations (deep-trace, config-update) that lose relevance quickly + + + + +## Existing Code Insights + +### Reusable Assets +- `ProtocolVersionInterceptor` already registered for `/api/v1/agents/**` paths — interceptor infrastructure ready +- `WebConfig` already has the path pattern for agent endpoints +- `IngestionService` pattern (core module plain class, wired as bean by config in app module) — reuse for AgentRegistryService +- `WriteBuffer` pattern — potential reuse for command queuing if needed +- `ObjectMapper` with `JavaTimeModule` already configured for Instant serialization + +### Established Patterns +- Core module: interfaces + domain logic; App module: Spring Boot + implementations +- Controllers accept raw String body; services handle deserialization +- Spring `@Scheduled` used by `ClickHouseFlushScheduler` — pattern for heartbeat monitor scheduling +- `application.yml` for configurable intervals — add agent registry config section + +### Integration Points +- New endpoints under `/api/v1/agents/` path (already in interceptor registry) +- Agent ID from registration becomes the `agentId` field used in existing ingestion endpoints +- SSE stream is a new connection type — first use of server-push in the codebase + + + + +## Deferred Ideas + +- Server-side agent tags/labels for more flexible grouping — future enhancement +- Auto-push pending commands on reconnect — evaluate after v1 usage patterns emerge +- Last-Event-ID replay of missed events — complexity vs value tradeoff, defer to v2 +- Agent capability negotiation (feature flags for what commands an agent supports) — future phase + + + +--- + +*Phase: 03-agent-registry-sse-push* +*Context gathered: 2026-03-11*