Files
cameleer-server/.planning/phases/03-agent-registry-sse-push/03-CONTEXT.md
hsiegeln d99650015b docs(03): capture phase context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:10:15 +01:00

4.6 KiB

Phase 3: Agent Registry + SSE Push - Context

Gathered: 2026-03-11 Status: Ready for planning

## Phase Boundary

Server tracks connected agents through their full lifecycle (LIVE/STALE/DEAD) and can push configuration updates, deep-trace commands, and replay commands to specific agents (or groups/all) in real time via SSE. JWT auth enforcement and Ed25519 signing are Phase 4 — this phase builds the registration flow, heartbeat lifecycle, SSE streams, and command push infrastructure.

## Implementation Decisions

Agent lifecycle timing

  • Heartbeat interval: 30 seconds
  • STALE threshold: 90 seconds (3 missed heartbeats)
  • DEAD threshold: 5 minutes after going STALE
  • DEAD agents kept indefinitely (no auto-purge)
  • Agent list endpoint returns all agents (LIVE, STALE, DEAD) with ?status= filter parameter

SSE command model

  • Generic command endpoint: POST /api/v1/agents/{id}/commands with {"type": "config-update|deep-trace|replay", "payload": {...}}
  • Three targeting levels: single agent (/agents/{id}/commands), group (/agents/groups/{group}/commands), all live agents (/agents/commands)
  • Agent self-declares group name at registration (e.g., "order-service-prod")
  • Command delivery tracking: server tracks each command as PENDING until agent acknowledges (via dedicated ack mechanism)
  • Pending commands expire after 60 seconds if undelivered

Registration handshake

  • Agent provides its own persistent ID at registration (from agent config)
  • Rich registration payload: agent ID, name, group, version, list of route IDs, capabilities
  • Re-registration with same ID resumes existing identity (agent restart scenario)
  • Heartbeat is just a ping — no metadata update (agent re-registers if routes/version change)
  • Registration response includes: SSE endpoint URL, current server config (heartbeat interval, etc.), server public key placeholder (Phase 4)

SSE reconnection behavior

  • Last-Event-ID supported but does NOT replay missed events — only future events delivered on reconnect
  • Pending commands are NOT auto-pushed on reconnect — caller must re-send if needed
  • SSE ping/keepalive interval: 15 seconds

Claude's Discretion

  • In-memory vs persistent storage for agent registry (in-memory is fine for v1, ClickHouse later if needed)
  • Command acknowledgement mechanism details (heartbeat piggyback vs dedicated endpoint)
  • SSE implementation approach (Spring SseEmitter, WebFlux, or other)
  • Thread scheduling for lifecycle state transitions (scheduled executor, Spring @Scheduled)
## Specific Ideas
  • HA/LB group targeting enables fleet-wide operations like config rollouts across all instances of a service
  • Agent-provided persistent IDs mean the agent controls its identity — useful for containerized deployments where hostname changes but agent config persists
  • 60-second command expiry is aggressive — commands are time-sensitive operations (deep-trace, config-update) that lose relevance quickly

<code_context>

Existing Code Insights

Reusable Assets

  • ProtocolVersionInterceptor already registered for /api/v1/agents/** paths — interceptor infrastructure ready
  • WebConfig already has the path pattern for agent endpoints
  • IngestionService pattern (core module plain class, wired as bean by config in app module) — reuse for AgentRegistryService
  • WriteBuffer<T> pattern — potential reuse for command queuing if needed
  • ObjectMapper with JavaTimeModule already configured for Instant serialization

Established Patterns

  • Core module: interfaces + domain logic; App module: Spring Boot + implementations
  • Controllers accept raw String body; services handle deserialization
  • Spring @Scheduled used by ClickHouseFlushScheduler — pattern for heartbeat monitor scheduling
  • application.yml for configurable intervals — add agent registry config section

Integration Points

  • New endpoints under /api/v1/agents/ path (already in interceptor registry)
  • Agent ID from registration becomes the agentId field used in existing ingestion endpoints
  • SSE stream is a new connection type — first use of server-push in the codebase

</code_context>

## Deferred Ideas
  • Server-side agent tags/labels for more flexible grouping — future enhancement
  • Auto-push pending commands on reconnect — evaluate after v1 usage patterns emerge
  • Last-Event-ID replay of missed events — complexity vs value tradeoff, defer to v2
  • Agent capability negotiation (feature flags for what commands an agent supports) — future phase

Phase: 03-agent-registry-sse-push Context gathered: 2026-03-11