diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index 87227168..84592956 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -93,13 +93,44 @@ Which phases cover which requirements. Updated during roadmap creation. | Requirement | Phase | Status | |-------------|-------|--------| -| (populated by roadmapper) | | | +| INGST-01 (#1) | Phase 1 | Pending | +| INGST-02 (#2) | Phase 1 | Pending | +| INGST-03 (#3) | Phase 1 | Pending | +| INGST-04 (#4) | Phase 1 | Pending | +| INGST-05 (#5) | Phase 1 | Pending | +| INGST-06 (#6) | Phase 1 | Pending | +| SRCH-01 (#7) | Phase 2 | Pending | +| SRCH-02 (#8) | Phase 2 | Pending | +| SRCH-03 (#9) | Phase 2 | Pending | +| SRCH-04 (#10) | Phase 2 | Pending | +| SRCH-05 (#11) | Phase 2 | Pending | +| SRCH-06 (#12) | Phase 2 | Pending | +| AGNT-01 (#13) | Phase 3 | Pending | +| AGNT-02 (#14) | Phase 3 | Pending | +| AGNT-03 (#15) | Phase 3 | Pending | +| AGNT-04 (#16) | Phase 3 | Pending | +| AGNT-05 (#17) | Phase 3 | Pending | +| AGNT-06 (#18) | Phase 3 | Pending | +| AGNT-07 (#19) | Phase 3 | Pending | +| DIAG-01 (#20) | Phase 2 | Pending | +| DIAG-02 (#21) | Phase 2 | Pending | +| DIAG-03 (#22) | Phase 2 | Pending | +| SECU-01 (#23) | Phase 4 | Pending | +| SECU-02 (#24) | Phase 4 | Pending | +| SECU-03 (#25) | Phase 4 | Pending | +| SECU-04 (#26) | Phase 4 | Pending | +| SECU-05 (#27) | Phase 4 | Pending | +| API-01 (#28) | Phase 1 | Pending | +| API-02 (#29) | Phase 1 | Pending | +| API-03 (#30) | Phase 1 | Pending | +| API-04 (#31) | Phase 1 | Pending | +| API-05 (#32) | Phase 1 | Pending | **Coverage:** - v1 requirements: 32 total -- Mapped to phases: 0 -- Unmapped: 32 +- Mapped to phases: 32 +- Unmapped: 0 --- *Requirements defined: 2026-03-11* -*Last updated: 2026-03-11 after initial definition* +*Last updated: 2026-03-11 after roadmap creation* diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md new file mode 100644 index 00000000..6c9b5af8 --- /dev/null +++ b/.planning/ROADMAP.md @@ -0,0 +1,92 @@ +# Roadmap: Cameleer3 Server + +## Overview + +Build an observability server that ingests millions of Camel route transactions per day into ClickHouse, provides structured and full-text search, manages agent lifecycles via SSE, and secures all communication with JWT and Ed25519 signing. The roadmap moves from data-in (ingestion) to data-out (search) to agent management to security, each phase delivering a complete, verifiable capability. + +## Phases + +**Phase Numbering:** +- Integer phases (1, 2, 3): Planned milestone work +- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED) + +Decimal phases appear between their surrounding integers in numeric order. + +- [ ] **Phase 1: Ingestion Pipeline + API Foundation** - ClickHouse schema, batch write buffer, ingestion endpoints, API scaffolding +- [ ] **Phase 2: Transaction Search + Diagrams** - Structured search, full-text search, diagram versioning and rendering +- [ ] **Phase 3: Agent Registry + SSE Push** - Agent lifecycle management, heartbeat monitoring, SSE config/command push +- [ ] **Phase 4: Security** - JWT authentication, Ed25519 signing, bootstrap token registration, endpoint protection + +## Phase Details + +### Phase 1: Ingestion Pipeline + API Foundation +**Goal**: Agents can POST execution data, diagrams, and metrics to the server, which batch-writes them to ClickHouse with TTL retention and backpressure protection +**Depends on**: Nothing (first phase) +**Requirements**: INGST-01 (#1), INGST-02 (#2), INGST-03 (#3), INGST-04 (#4), INGST-05 (#5), INGST-06 (#6), API-01 (#28), API-02 (#29), API-03 (#30), API-04 (#31), API-05 (#32) +**Success Criteria** (what must be TRUE): + 1. An HTTP client can POST a RouteExecution payload to `/api/v1/data/executions` and receive 202 Accepted, and the data appears in ClickHouse within the flush interval + 2. An HTTP client can POST RouteGraph and metrics payloads to their respective endpoints and receive 202 Accepted + 3. When the write buffer is full, the server returns 503 and does not lose already-buffered data + 4. Data older than the configured TTL (default 30 days) is automatically removed by ClickHouse + 5. The health endpoint responds at `/api/v1/health`, OpenAPI docs are available, protocol version header is validated, and unknown JSON fields are accepted +**Plans**: TBD + +Plans: +- [ ] 01-01: ClickHouse schema design + Docker setup + batch write buffer +- [ ] 01-02: Ingestion REST endpoints + API foundation (health, OpenAPI, protocol header, forward compat) + +### Phase 2: Transaction Search + Diagrams +**Goal**: Users can find any transaction by status, time, duration, correlation ID, or content, view execution detail trees, and see versioned route diagrams linked to transactions +**Depends on**: Phase 1 +**Requirements**: SRCH-01 (#7), SRCH-02 (#8), SRCH-03 (#9), SRCH-04 (#10), SRCH-05 (#11), SRCH-06 (#12), DIAG-01 (#20), DIAG-02 (#21), DIAG-03 (#22) +**Success Criteria** (what must be TRUE): + 1. User can query transactions filtered by any combination of status, date range, duration range, and correlationId, and receive matching results via REST + 2. User can full-text search across message bodies, headers, error messages, and stack traces and find matching transactions + 3. User can retrieve a transaction's detail view showing the nested processor execution tree + 4. Route diagrams are stored with content-addressable versioning (identical definitions stored once), each transaction links to its active diagram version, and diagrams can be rendered from stored definitions +**Plans**: TBD + +Plans: +- [ ] 02-01: Transaction query engine (structured filters + full-text via ClickHouse skip indexes) +- [ ] 02-02: Transaction detail + diagram versioning, linking, and rendering + +### Phase 3: Agent Registry + SSE Push +**Goal**: Server tracks connected agents through their full lifecycle and can push configuration updates, deep-trace commands, and replay commands to specific agents in real time +**Depends on**: Phase 1 +**Requirements**: AGNT-01 (#13), AGNT-02 (#14), AGNT-03 (#15), AGNT-04 (#16), AGNT-05 (#17), AGNT-06 (#18), AGNT-07 (#19) +**Success Criteria** (what must be TRUE): + 1. An agent can register via POST with a bootstrap token and receive a JWT (security enforcement deferred to Phase 4, but the registration flow and token issuance work end-to-end) + 2. Server correctly transitions agents through LIVE/STALE/DEAD states based on heartbeat timing, and the agent list endpoint reflects current states + 3. Server pushes config-update, deep-trace, and replay events to a specific agent's SSE stream, with ping keepalive and Last-Event-ID reconnection support +**Plans**: TBD + +Plans: +- [ ] 03-01: Agent registration, heartbeat lifecycle, and registry endpoints +- [ ] 03-02: SSE connection management and command push (config-update, deep-trace, replay, ping, reconnection) + +### Phase 4: Security +**Goal**: All server communication is authenticated and integrity-protected, with JWT for API access and Ed25519 signatures for pushed configuration +**Depends on**: Phase 1, Phase 3 +**Requirements**: SECU-01 (#23), SECU-02 (#24), SECU-03 (#25), SECU-04 (#26), SECU-05 (#27) +**Success Criteria** (what must be TRUE): + 1. All API endpoints except health and register reject requests without a valid JWT Bearer token + 2. Agents can refresh expired JWTs via the refresh endpoint without re-registering + 3. Server generates an Ed25519 keypair at startup, delivers the public key during registration, and all config-update and replay SSE payloads carry a valid Ed25519 signature + 4. Bootstrap token from CAMELEER_AUTH_TOKEN environment variable is required for initial agent registration +**Plans**: TBD + +Plans: +- [ ] 04-01: JWT authentication filter, refresh flow, Ed25519 keypair generation and config signing, bootstrap token validation + +## Progress + +**Execution Order:** +Phases execute in numeric order: 1 -> 2 -> 3 -> 4 +Note: Phases 2 and 3 both depend only on Phase 1 and could execute in parallel. + +| Phase | Plans Complete | Status | Completed | +|-------|----------------|--------|-----------| +| 1. Ingestion Pipeline + API Foundation | 0/2 | Not started | - | +| 2. Transaction Search + Diagrams | 0/2 | Not started | - | +| 3. Agent Registry + SSE Push | 0/2 | Not started | - | +| 4. Security | 0/1 | Not started | - | diff --git a/.planning/STATE.md b/.planning/STATE.md new file mode 100644 index 00000000..78c4550e --- /dev/null +++ b/.planning/STATE.md @@ -0,0 +1,65 @@ +# Project State + +## Project Reference + +See: .planning/PROJECT.md (updated 2026-03-11) + +**Core value:** Users can reliably search and find any transaction across all connected Camel instances -- by any combination of state, time, duration, or content -- even at millions of transactions per day with 30-day retention. +**Current focus:** Phase 1: Ingestion Pipeline + API Foundation + +## Current Position + +Phase: 1 of 4 (Ingestion Pipeline + API Foundation) +Plan: 0 of 2 in current phase +Status: Ready to plan +Last activity: 2026-03-11 -- Roadmap created + +Progress: [..........] 0% + +## Performance Metrics + +**Velocity:** +- Total plans completed: 0 +- Average duration: - +- Total execution time: 0 hours + +**By Phase:** + +| Phase | Plans | Total | Avg/Plan | +|-------|-------|-------|----------| +| - | - | - | - | + +**Recent Trend:** +- Last 5 plans: - +- Trend: - + +*Updated after each plan completion* + +## Accumulated Context + +### Decisions + +Decisions are logged in PROJECT.md Key Decisions table. +Recent decisions affecting current work: + +- [Roadmap]: ClickHouse chosen as primary store (research recommendation, HIGH confidence) +- [Roadmap]: Full-text search starts with ClickHouse skip indexes (tokenbf_v1), OpenSearch deferred +- [Roadmap]: Phases 2 and 3 can execute in parallel (both depend only on Phase 1) +- [Roadmap]: Web UI deferred to v2 + +### Pending Todos + +None yet. + +### Blockers/Concerns + +- [Phase 1]: ClickHouse Java client API needs phase-specific research (library has undergone changes) +- [Phase 1]: Must read cameleer3-common PROTOCOL.md before designing ClickHouse schema +- [Phase 2]: Diagram rendering library selection is an open question (Batik, jsvg, JGraphX, or client-side) +- [Phase 2]: ClickHouse skip indexes may not suffice for full-text; decision point during Phase 2 + +## Session Continuity + +Last session: 2026-03-11 +Stopped at: Roadmap created, ready for Phase 1 planning +Resume file: None