docs(03): capture phase context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:10:15 +01:00
parent 1fb93c3b6e
commit d99650015b
1 changed files with 95 additions and 0 deletions
--- a/.planning/phases/03-agent-registry-sse-push/03-CONTEXT.md
+++ b/.planning/phases/03-agent-registry-sse-push/03-CONTEXT.md
@@ -0,0 +1,95 @@
+# Phase 3: Agent Registry + SSE Push - Context
+
+**Gathered:** 2026-03-11
+**Status:** Ready for planning
+
+<domain>
+## Phase Boundary
+
+Server tracks connected agents through their full lifecycle (LIVE/STALE/DEAD) and can push configuration updates, deep-trace commands, and replay commands to specific agents (or groups/all) in real time via SSE. JWT auth enforcement and Ed25519 signing are Phase 4 — this phase builds the registration flow, heartbeat lifecycle, SSE streams, and command push infrastructure.
+
+</domain>
+
+<decisions>
+## Implementation Decisions
+
+### Agent lifecycle timing
+- Heartbeat interval: 30 seconds
+- STALE threshold: 90 seconds (3 missed heartbeats)
+- DEAD threshold: 5 minutes after going STALE
+- DEAD agents kept indefinitely (no auto-purge)
+- Agent list endpoint returns all agents (LIVE, STALE, DEAD) with `?status=` filter parameter
+
+### SSE command model
+- Generic command endpoint: `POST /api/v1/agents/{id}/commands` with `{"type": "config-update|deep-trace|replay", "payload": {...}}`
+- Three targeting levels: single agent (`/agents/{id}/commands`), group (`/agents/groups/{group}/commands`), all live agents (`/agents/commands`)
+- Agent self-declares group name at registration (e.g., "order-service-prod")
+- Command delivery tracking: server tracks each command as PENDING until agent acknowledges (via dedicated ack mechanism)
+- Pending commands expire after 60 seconds if undelivered
+
+### Registration handshake
+- Agent provides its own persistent ID at registration (from agent config)
+- Rich registration payload: agent ID, name, group, version, list of route IDs, capabilities
+- Re-registration with same ID resumes existing identity (agent restart scenario)
+- Heartbeat is just a ping — no metadata update (agent re-registers if routes/version change)
+- Registration response includes: SSE endpoint URL, current server config (heartbeat interval, etc.), server public key placeholder (Phase 4)
+
+### SSE reconnection behavior
+- Last-Event-ID supported but does NOT replay missed events — only future events delivered on reconnect
+- Pending commands are NOT auto-pushed on reconnect — caller must re-send if needed
+- SSE ping/keepalive interval: 15 seconds
+
+### Claude's Discretion
+- In-memory vs persistent storage for agent registry (in-memory is fine for v1, ClickHouse later if needed)
+- Command acknowledgement mechanism details (heartbeat piggyback vs dedicated endpoint)
+- SSE implementation approach (Spring SseEmitter, WebFlux, or other)
+- Thread scheduling for lifecycle state transitions (scheduled executor, Spring @Scheduled)
+
+</decisions>
+
+<specifics>
+## Specific Ideas
+
+- HA/LB group targeting enables fleet-wide operations like config rollouts across all instances of a service
+- Agent-provided persistent IDs mean the agent controls its identity — useful for containerized deployments where hostname changes but agent config persists
+- 60-second command expiry is aggressive — commands are time-sensitive operations (deep-trace, config-update) that lose relevance quickly
+
+</specifics>
+
+<code_context>
+## Existing Code Insights
+
+### Reusable Assets
+- `ProtocolVersionInterceptor` already registered for `/api/v1/agents/**` paths — interceptor infrastructure ready
+- `WebConfig` already has the path pattern for agent endpoints
+- `IngestionService` pattern (core module plain class, wired as bean by config in app module) — reuse for AgentRegistryService
+- `WriteBuffer<T>` pattern — potential reuse for command queuing if needed
+- `ObjectMapper` with `JavaTimeModule` already configured for Instant serialization
+
+### Established Patterns
+- Core module: interfaces + domain logic; App module: Spring Boot + implementations
+- Controllers accept raw String body; services handle deserialization
+- Spring `@Scheduled` used by `ClickHouseFlushScheduler` — pattern for heartbeat monitor scheduling
+- `application.yml` for configurable intervals — add agent registry config section
+
+### Integration Points
+- New endpoints under `/api/v1/agents/` path (already in interceptor registry)
+- Agent ID from registration becomes the `agentId` field used in existing ingestion endpoints
+- SSE stream is a new connection type — first use of server-push in the codebase
+
+</code_context>
+
+<deferred>
+## Deferred Ideas
+
+- Server-side agent tags/labels for more flexible grouping — future enhancement
+- Auto-push pending commands on reconnect — evaluate after v1 usage patterns emerge
+- Last-Event-ID replay of missed events — complexity vs value tradeoff, defer to v2
+- Agent capability negotiation (feature flags for what commands an agent supports) — future phase
+
+</deferred>
+
+---
+
+*Phase: 03-agent-registry-sse-push*
+*Context gathered: 2026-03-11*