Build an observability server that ingests millions of Camel route transactions per day into ClickHouse, provides structured and full-text search, manages agent lifecycles via SSE, and secures all communication with JWT and Ed25519 signing. The roadmap moves from data-in (ingestion) to data-out (search) to agent management to security, each phase delivering a complete, verifiable capability.
## Phases
**Phase Numbering:**
- Integer phases (1, 2, 3): Planned milestone work
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
Decimal phases appear between their surrounding integers in numeric order.
- [ ]**Phase 1: Ingestion Pipeline + API Foundation** - ClickHouse schema, batch write buffer, ingestion endpoints, API scaffolding
**Goal**: Agents can POST execution data, diagrams, and metrics to the server, which batch-writes them to ClickHouse with TTL retention and backpressure protection
1. An HTTP client can POST a RouteExecution payload to `/api/v1/data/executions` and receive 202 Accepted, and the data appears in ClickHouse within the flush interval
2. An HTTP client can POST RouteGraph and metrics payloads to their respective endpoints and receive 202 Accepted
3. When the write buffer is full, the server returns 503 and does not lose already-buffered data
4. Data older than the configured TTL (default 30 days) is automatically removed by ClickHouse
5. The health endpoint responds at `/api/v1/health`, OpenAPI docs are available, protocol version header is validated, and unknown JSON fields are accepted
**Goal**: Users can find any transaction by status, time, duration, correlation ID, or content, view execution detail trees, and see versioned route diagrams linked to transactions
1. User can query transactions filtered by any combination of status, date range, duration range, and correlationId, and receive matching results via REST
2. User can full-text search across message bodies, headers, error messages, and stack traces and find matching transactions
3. User can retrieve a transaction's detail view showing the nested processor execution tree
4. Route diagrams are stored with content-addressable versioning (identical definitions stored once), each transaction links to its active diagram version, and diagrams can be rendered from stored definitions
**Goal**: Server tracks connected agents through their full lifecycle and can push configuration updates, deep-trace commands, and replay commands to specific agents in real time
1. An agent can register via POST with a bootstrap token and receive a JWT (security enforcement deferred to Phase 4, but the registration flow and token issuance work end-to-end)
2. Server correctly transitions agents through LIVE/STALE/DEAD states based on heartbeat timing, and the agent list endpoint reflects current states
3. Server pushes config-update, deep-trace, and replay events to a specific agent's SSE stream, with ping keepalive and Last-Event-ID reconnection support
1. All API endpoints except health and register reject requests without a valid JWT Bearer token
2. Agents can refresh expired JWTs via the refresh endpoint without re-registering
3. Server generates an Ed25519 keypair at startup, delivers the public key during registration, and all config-update and replay SSE payloads carry a valid Ed25519 signature
4. Bootstrap token from CAMELEER_AUTH_TOKEN environment variable is required for initial agent registration