Files

hsiegeln d99650015b docs(03): capture phase context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-11 18:10:15 +01:00

4.6 KiB

Raw Blame History

Phase 3: Agent Registry + SSE Push - Context

Gathered: 2026-03-11 Status: Ready for planning

## Phase Boundary

Server tracks connected agents through their full lifecycle (LIVE/STALE/DEAD) and can push configuration updates, deep-trace commands, and replay commands to specific agents (or groups/all) in real time via SSE. JWT auth enforcement and Ed25519 signing are Phase 4 — this phase builds the registration flow, heartbeat lifecycle, SSE streams, and command push infrastructure.

## Implementation Decisions

Agent lifecycle timing

Heartbeat interval: 30 seconds
STALE threshold: 90 seconds (3 missed heartbeats)
DEAD threshold: 5 minutes after going STALE
DEAD agents kept indefinitely (no auto-purge)
Agent list endpoint returns all agents (LIVE, STALE, DEAD) with ?status= filter parameter

SSE command model

Generic command endpoint: POST /api/v1/agents/{id}/commands with {"type": "config-update|deep-trace|replay", "payload": {...}}
Three targeting levels: single agent (/agents/{id}/commands), group (/agents/groups/{group}/commands), all live agents (/agents/commands)
Agent self-declares group name at registration (e.g., "order-service-prod")
Command delivery tracking: server tracks each command as PENDING until agent acknowledges (via dedicated ack mechanism)
Pending commands expire after 60 seconds if undelivered

Registration handshake

Agent provides its own persistent ID at registration (from agent config)
Rich registration payload: agent ID, name, group, version, list of route IDs, capabilities
Re-registration with same ID resumes existing identity (agent restart scenario)
Heartbeat is just a ping — no metadata update (agent re-registers if routes/version change)
Registration response includes: SSE endpoint URL, current server config (heartbeat interval, etc.), server public key placeholder (Phase 4)

SSE reconnection behavior

Last-Event-ID supported but does NOT replay missed events — only future events delivered on reconnect
Pending commands are NOT auto-pushed on reconnect — caller must re-send if needed
SSE ping/keepalive interval: 15 seconds

Claude's Discretion

In-memory vs persistent storage for agent registry (in-memory is fine for v1, ClickHouse later if needed)
Command acknowledgement mechanism details (heartbeat piggyback vs dedicated endpoint)
SSE implementation approach (Spring SseEmitter, WebFlux, or other)
Thread scheduling for lifecycle state transitions (scheduled executor, Spring @Scheduled)

## Specific Ideas

HA/LB group targeting enables fleet-wide operations like config rollouts across all instances of a service
Agent-provided persistent IDs mean the agent controls its identity — useful for containerized deployments where hostname changes but agent config persists
60-second command expiry is aggressive — commands are time-sensitive operations (deep-trace, config-update) that lose relevance quickly

<code_context>

Existing Code Insights

Reusable Assets

ProtocolVersionInterceptor already registered for /api/v1/agents/** paths — interceptor infrastructure ready
WebConfig already has the path pattern for agent endpoints
IngestionService pattern (core module plain class, wired as bean by config in app module) — reuse for AgentRegistryService
WriteBuffer<T> pattern — potential reuse for command queuing if needed
ObjectMapper with JavaTimeModule already configured for Instant serialization

Established Patterns

Core module: interfaces + domain logic; App module: Spring Boot + implementations
Controllers accept raw String body; services handle deserialization
Spring @Scheduled used by ClickHouseFlushScheduler — pattern for heartbeat monitor scheduling
application.yml for configurable intervals — add agent registry config section

Integration Points

New endpoints under /api/v1/agents/ path (already in interceptor registry)
Agent ID from registration becomes the agentId field used in existing ingestion endpoints
SSE stream is a new connection type — first use of server-push in the codebase

</code_context>

## Deferred Ideas

Server-side agent tags/labels for more flexible grouping — future enhancement
Auto-push pending commands on reconnect — evaluate after v1 usage patterns emerge
Last-Event-ID replay of missed events — complexity vs value tradeoff, defer to v2
Agent capability negotiation (feature flags for what commands an agent supports) — future phase

Phase: 03-agent-registry-sse-push Context gathered: 2026-03-11

4.6 KiB Raw Blame History