cameleer-server/.planning/ROADMAP.md

# Roadmap: Cameleer3 Server

## Overview

Build an observability server that ingests millions of Camel route transactions per day into ClickHouse, provides structured and full-text search, manages agent lifecycles via SSE, and secures all communication with JWT and Ed25519 signing. The roadmap moves from data-in (ingestion) to data-out (search) to agent management to security, each phase delivering a complete, verifiable capability.

## Phases

**Phase Numbering:**
- Integer phases (1, 2, 3): Planned milestone work
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

- [ ] **Phase 1: Ingestion Pipeline + API Foundation** - ClickHouse schema, batch write buffer, ingestion endpoints, API scaffolding
- [ ] **Phase 2: Transaction Search + Diagrams** - Structured search, full-text search, diagram versioning and rendering
- [x] **Phase 3: Agent Registry + SSE Push** - Agent lifecycle management, heartbeat monitoring, SSE config/command push (completed 2026-03-11)
- [ ] **Phase 4: Security** - JWT authentication, Ed25519 signing, bootstrap token registration, endpoint protection

## Phase Details

### Phase 1: Ingestion Pipeline + API Foundation
**Goal**: Agents can POST execution data, diagrams, and metrics to the server, which batch-writes them to ClickHouse with TTL retention and backpressure protection
**Depends on**: Nothing (first phase)
**Requirements**: INGST-01 (#1), INGST-02 (#2), INGST-03 (#3), INGST-04 (#4), INGST-05 (#5), INGST-06 (#6), API-01 (#28), API-02 (#29), API-03 (#30), API-04 (#31), API-05 (#32)
**Success Criteria** (what must be TRUE):
  1. An HTTP client can POST a RouteExecution payload to `/api/v1/data/executions` and receive 202 Accepted, and the data appears in ClickHouse within the flush interval
  2. An HTTP client can POST RouteGraph and metrics payloads to their respective endpoints and receive 202 Accepted
  3. When the write buffer is full, the server returns 503 and does not lose already-buffered data
  4. Data older than the configured TTL (default 30 days) is automatically removed by ClickHouse
  5. The health endpoint responds at `/api/v1/health`, OpenAPI docs are available, protocol version header is validated, and unknown JSON fields are accepted
**Plans:** 2/3 plans executed

Plans:
- [ ] 01-01-PLAN.md -- ClickHouse infrastructure, schema, WriteBuffer, repository interfaces, test infrastructure
- [ ] 01-02-PLAN.md -- Ingestion REST endpoints, ClickHouse repositories, flush scheduler, integration tests
- [ ] 01-03-PLAN.md -- API foundation (health, OpenAPI, protocol header, forward compat, TTL verification)

### Phase 2: Transaction Search + Diagrams
**Goal**: Users can find any transaction by status, time, duration, correlation ID, or content, view execution detail trees, and see versioned route diagrams linked to transactions
**Depends on**: Phase 1
**Requirements**: SRCH-01 (#7), SRCH-02 (#8), SRCH-03 (#9), SRCH-04 (#10), SRCH-05 (#11), SRCH-06 (#12), DIAG-01 (#20), DIAG-02 (#21), DIAG-03 (#22)
**Success Criteria** (what must be TRUE):
  1. User can query transactions filtered by any combination of status, date range, duration range, and correlationId, and receive matching results via REST
  2. User can full-text search across message bodies, headers, error messages, and stack traces and find matching transactions
  3. User can retrieve a transaction's detail view showing the nested processor execution tree
  4. Route diagrams are stored with content-addressable versioning (identical definitions stored once), each transaction links to its active diagram version, and diagrams can be rendered from stored definitions
**Plans:** 4 plans (3 executed, 1 gap closure)

Plans:
- [ ] 02-01-PLAN.md -- Schema extension, core domain types, ingestion updates for search/detail columns
- [ ] 02-02-PLAN.md -- Diagram rendering with ELK layout and JFreeSVG (SVG + JSON via content negotiation)
- [ ] 02-03-PLAN.md -- Search endpoints (GET + POST), transaction detail with tree reconstruction, integration tests
- [ ] 02-04-PLAN.md -- Gap closure: populate diagram_content_hash during ingestion, fix Surefire classloader isolation

### Phase 3: Agent Registry + SSE Push
**Goal**: Server tracks connected agents through their full lifecycle and can push configuration updates, deep-trace commands, and replay commands to specific agents in real time
**Depends on**: Phase 1
**Requirements**: AGNT-01 (#13), AGNT-02 (#14), AGNT-03 (#15), AGNT-04 (#16), AGNT-05 (#17), AGNT-06 (#18), AGNT-07 (#19)
**Success Criteria** (what must be TRUE):
  1. An agent can register via POST with a bootstrap token and receive a JWT (security enforcement deferred to Phase 4, but the registration flow and token issuance work end-to-end)
  2. Server correctly transitions agents through LIVE/STALE/DEAD states based on heartbeat timing, and the agent list endpoint reflects current states
  3. Server pushes config-update, deep-trace, and replay events to a specific agent's SSE stream, with ping keepalive and Last-Event-ID reconnection support
**Plans:** 2/2 plans complete

Plans:
- [ ] 03-01-PLAN.md -- Agent domain types, registry service, registration/heartbeat/list endpoints, lifecycle monitor
- [ ] 03-02-PLAN.md -- SSE connection management, command push (config-update, deep-trace, replay), ping keepalive, acknowledgement, integration tests

### Phase 4: Security
**Goal**: All server communication is authenticated and integrity-protected, with JWT for API access and Ed25519 signatures for pushed configuration
**Depends on**: Phase 1, Phase 3
**Requirements**: SECU-01 (#23), SECU-02 (#24), SECU-03 (#25), SECU-04 (#26), SECU-05 (#27)
**Success Criteria** (what must be TRUE):
  1. All API endpoints except health and register reject requests without a valid JWT Bearer token
  2. Agents can refresh expired JWTs via the refresh endpoint without re-registering
  3. Server generates an Ed25519 keypair at startup, delivers the public key during registration, and all config-update and replay SSE payloads carry a valid Ed25519 signature
  4. Bootstrap token from CAMELEER_AUTH_TOKEN environment variable is required for initial agent registration
**Plans:** 3 plans

Plans:
- [x] 04-01-PLAN.md -- Security service foundation: JwtService, Ed25519SigningService, BootstrapTokenValidator, Maven deps, config
- [ ] 04-02-PLAN.md -- Spring Security filter chain, JWT auth filter, registration/refresh integration, existing test adaptation
- [ ] 04-03-PLAN.md -- Ed25519 signing of SSE command payloads (config-update, deep-trace, replay)

## Progress

**Execution Order:**
Phases execute in numeric order: 1 -> 2 -> 3 -> 4
Note: Phases 2 and 3 both depend only on Phase 1 and could execute in parallel.

| Phase | Plans Complete | Status | Completed |
|-------|----------------|--------|-----------|
| 1. Ingestion Pipeline + API Foundation | 3/3 | Complete | 2026-03-11 |
| 2. Transaction Search + Diagrams | 3/4 | Gap Closure |  |
| 3. Agent Registry + SSE Push | 2/2 | Complete   | 2026-03-11 |
| 4. Security | 1/3 | In Progress | - |