Compare commits
22 Commits
223e1fd279
...
aaac6e45cb
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
aaac6e45cb | ||
|
|
7f0ceca8b1 | ||
|
|
8fe65f083c | ||
|
|
d55ebc1f57 | ||
|
|
17a18cf6da | ||
|
|
ff0af0ef2f | ||
|
|
2d3fde3766 | ||
|
|
b8a4739f72 | ||
|
|
b2501f2937 | ||
|
|
cc1c082adb | ||
|
|
f37009e380 | ||
|
|
96c52b88d1 | ||
|
|
5436c0a490 | ||
|
|
da09e5fda7 | ||
|
|
e808b567cd | ||
|
|
9ba1b839df | ||
|
|
ef934743fb | ||
|
|
7294f4be68 | ||
|
|
e11496242f | ||
|
|
6f39e29707 | ||
|
|
06a0fddc73 | ||
|
|
e1a5d994a6 |
67
.planning/PROJECT.md
Normal file
67
.planning/PROJECT.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Cameleer3 Server
|
||||
|
||||
## What This Is
|
||||
|
||||
An observability server that receives, stores, and serves Apache Camel route execution data from distributed Cameleer3 agents. Think njams Server (by Integration Matters) — but built incrementally, API-first, with a modern stack. Users can search through millions of recorded transactions by state, time, duration, full text, and correlate executions across multiple Camel instances. The server also pushes configuration, tracing controls, and ad-hoc commands to agents via SSE.
|
||||
|
||||
## Core Value
|
||||
|
||||
Users can reliably search and find any transaction across all connected Camel instances — by any combination of state, time, duration, or content — even at millions of transactions per day with 30-day retention.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Validated
|
||||
|
||||
(None yet — ship to validate)
|
||||
|
||||
### Active
|
||||
|
||||
- [ ] Receive and ingest transaction/activity data from Cameleer3 agents via HTTP POST
|
||||
- [ ] Store transactions in a high-volume, horizontally scalable data store with 30-day retention
|
||||
- [ ] Search transactions by state, execution date/time, duration, and full-text content
|
||||
- [ ] Correlate activities across multiple routes and Camel instances within a single transaction
|
||||
- [ ] Store and version route diagrams so each transaction links to the diagram active at that time
|
||||
- [ ] Render route diagrams server-side from stored definitions
|
||||
- [ ] Maintain agent instance registry with lifecycle states (LIVE → STALE → DEAD)
|
||||
- [ ] Push configuration updates, tracing controls, and ad-hoc commands to agents via SSE
|
||||
- [ ] Expose a clean REST API that serves as the sole interface for both the UI and external consumers
|
||||
- [ ] Build a web UI that consumes only the REST API
|
||||
- [ ] Secure agent-server communication with JWT auth, Ed25519 config signing, and bootstrap token registration
|
||||
- [ ] Support containerized deployment (Docker)
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- Mobile app — web UI is sufficient for ops/dev users
|
||||
- Log aggregation — this is transaction-level observability, not a log collector
|
||||
- APM/metrics dashboards — focused on Camel route transactions, not general application metrics
|
||||
- Multi-tenancy — single-tenant deployment per environment for now
|
||||
|
||||
## Context
|
||||
|
||||
- **Agent side**: cameleer3 agent (`https://gitea.siegeln.net/cameleer/cameleer3`) is under active development; already supports creating diagrams and capturing executions
|
||||
- **Shared library**: `com.cameleer3:cameleer3-common` contains shared models and the graph API; protocol defined in `cameleer3-common/PROTOCOL.md`
|
||||
- **Data model**: Hierarchical — a **transaction** represents a message's full journey, containing **activities** per route execution. Transactions can span multiple Camel instances (e.g., route A calls route B on another instance via endpoint)
|
||||
- **Scale target**: Millions of transactions per day, 50+ connected agents, 30-day data retention
|
||||
- **Query pattern**: Incident-driven — mostly recent data queries, with deep historical dives during incidents
|
||||
- **Inspiration**: njams Server by Integration Matters — similar domain, building our own step by step
|
||||
- **Existing scaffolding**: Maven multi-module project with Spring Boot 3.4.3, two modules (core + app), Gitea CI pipeline
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Tech stack**: Java 17+, Spring Boot 3.4.3, Maven multi-module — already established
|
||||
- **Dependency**: Must consume `com.cameleer3:cameleer3-common` from Gitea Maven registry
|
||||
- **Protocol**: Agent protocol is still evolving — server must adapt as it stabilizes
|
||||
- **Incremental delivery**: Build step by step; storage and search first, then layer features on top
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Rationale | Outcome |
|
||||
|----------|-----------|---------|
|
||||
| API-first architecture | REST API is the sole interface; UI and external consumers use the same API | — Pending |
|
||||
| Storage engine selection | Must handle millions of tx/day, full-text search, time-series queries, 30-day TTL, horizontal scaling | — Pending |
|
||||
| Versioned diagram storage | Each transaction references the route diagram definition active at execution time | — Pending |
|
||||
| SSE for agent push | Server pushes config, tracing control, and commands to agents via Server-Sent Events | — Pending |
|
||||
| JWT + Ed25519 auth model | Secure bidirectional communication; bootstrap token for initial agent registration | — Pending |
|
||||
|
||||
---
|
||||
*Last updated: 2026-03-11 after initialization*
|
||||
136
.planning/REQUIREMENTS.md
Normal file
136
.planning/REQUIREMENTS.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Requirements: Cameleer3 Server
|
||||
|
||||
**Defined:** 2026-03-11
|
||||
**Core Value:** Users can reliably search and find any transaction across all connected Camel instances — by any combination of state, time, duration, or content — even at millions of transactions per day with 30-day retention.
|
||||
|
||||
## v1 Requirements
|
||||
|
||||
Requirements for initial release. Each maps to roadmap phases. Tracked as Gitea issues.
|
||||
|
||||
### Data Ingestion
|
||||
|
||||
- [x] **INGST-01**: Server accepts `RouteExecution` (single or array) via `POST /api/v1/data/executions` and returns `202 Accepted` (#1)
|
||||
- [x] **INGST-02**: Server accepts `RouteGraph` (single or array) via `POST /api/v1/data/diagrams` and returns `202 Accepted` (#2)
|
||||
- [x] **INGST-03**: Server accepts metrics snapshots via `POST /api/v1/data/metrics` and returns `202 Accepted` (#3)
|
||||
- [x] **INGST-04**: Ingestion uses in-memory batch buffer with configurable flush interval/size for ClickHouse writes (#4)
|
||||
- [x] **INGST-05**: Server returns `503 Service Unavailable` when write buffer is full (backpressure) (#5)
|
||||
- [x] **INGST-06**: ClickHouse TTL automatically expires data after 30 days (configurable) (#6)
|
||||
|
||||
### Transaction Search
|
||||
|
||||
- [ ] **SRCH-01**: User can search transactions by execution status (COMPLETED, FAILED, RUNNING) (#7)
|
||||
- [ ] **SRCH-02**: User can search transactions by date/time range (startTime, endTime) (#8)
|
||||
- [ ] **SRCH-03**: User can search transactions by duration range (min/max milliseconds) (#9)
|
||||
- [ ] **SRCH-04**: User can search transactions by correlationId to find all related executions across instances (#10)
|
||||
- [ ] **SRCH-05**: User can full-text search across message bodies, headers, error messages, and stack traces (#11)
|
||||
- [ ] **SRCH-06**: User can view transaction detail with nested processor execution tree (#12)
|
||||
|
||||
### Agent Management
|
||||
|
||||
- [ ] **AGNT-01**: Agent registers via `POST /api/v1/agents/register` with bootstrap token, receives JWT + server public key (#13)
|
||||
- [ ] **AGNT-02**: Server maintains agent registry with LIVE/STALE/DEAD lifecycle based on heartbeat timing (#14)
|
||||
- [ ] **AGNT-03**: Agent sends heartbeat via `POST /api/v1/agents/{id}/heartbeat` every 30s (#15)
|
||||
- [ ] **AGNT-04**: Server pushes `config-update` events to agents via SSE with Ed25519 signature (#16)
|
||||
- [ ] **AGNT-05**: Server pushes `deep-trace` commands to agents via SSE for specific correlationIds (#17)
|
||||
- [ ] **AGNT-06**: Server pushes `replay` commands to agents via SSE with signed replay tokens (#18)
|
||||
- [ ] **AGNT-07**: SSE connection includes `ping` keepalive and supports `Last-Event-ID` reconnection (#19)
|
||||
|
||||
### Route Diagrams
|
||||
|
||||
- [ ] **DIAG-01**: Server stores `RouteGraph` definitions with content-addressable versioning (hash-based dedup) (#20)
|
||||
- [ ] **DIAG-02**: Each transaction links to the `RouteGraph` version that was active at execution time (#21)
|
||||
- [ ] **DIAG-03**: Server renders route diagrams from stored `RouteGraph` definitions (nodes, edges, EIP patterns) (#22)
|
||||
|
||||
### Security
|
||||
|
||||
- [ ] **SECU-01**: All API endpoints (except health and register) require valid JWT Bearer token (#23)
|
||||
- [ ] **SECU-02**: JWT refresh flow via `POST /api/v1/agents/{id}/refresh` (#24)
|
||||
- [ ] **SECU-03**: Server generates Ed25519 keypair; public key delivered at registration (#25)
|
||||
- [ ] **SECU-04**: All config-update and replay SSE payloads are signed with server's Ed25519 private key (#26)
|
||||
- [ ] **SECU-05**: Bootstrap token from `CAMELEER_AUTH_TOKEN` env var validates initial agent registration (#27)
|
||||
|
||||
### REST API
|
||||
|
||||
- [x] **API-01**: All endpoints follow the protocol v1 path structure (`/api/v1/...`) (#28)
|
||||
- [x] **API-02**: API documented via OpenAPI/Swagger (springdoc-openapi) (#29)
|
||||
- [x] **API-03**: Server includes `GET /api/v1/health` endpoint (#30)
|
||||
- [x] **API-04**: All requests validated for `X-Cameleer-Protocol-Version: 1` header (#31)
|
||||
- [x] **API-05**: Server accepts unknown JSON fields for forward compatibility (#32)
|
||||
|
||||
## v2 Requirements
|
||||
|
||||
Deferred to future release. Tracked but not in current roadmap.
|
||||
|
||||
### Web UI
|
||||
|
||||
- **UI-01**: Transaction search form and result list view
|
||||
- **UI-02**: Transaction detail view with activity drill-down
|
||||
- **UI-03**: Route diagram visualization with execution overlay
|
||||
- **UI-04**: Agent status overview dashboard
|
||||
- **UI-05**: Dashboard with volume/error trend charts
|
||||
|
||||
### Advanced Search
|
||||
|
||||
- **ASRCH-01**: Cursor-based pagination for large result sets
|
||||
- **ASRCH-02**: Saved search queries
|
||||
|
||||
## Out of Scope
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| Mobile app | Web UI sufficient for ops/dev users |
|
||||
| Log aggregation | Transaction-level observability, not a log collector |
|
||||
| APM/metrics dashboards | Focused on Camel route transactions, not general application metrics |
|
||||
| Multi-tenancy | Single-tenant deployment per environment |
|
||||
| Kafka transport | HTTP POST ingestion is the primary path; Kafka is agent-side concern |
|
||||
| Custom dashboards | Fixed dashboard views; no user-configurable widgets |
|
||||
| Real-time firehose | Not a streaming platform; query-based access |
|
||||
| AI root cause analysis | Out of scope for v1; focus on search and visualization |
|
||||
|
||||
## Traceability
|
||||
|
||||
Which phases cover which requirements. Updated during roadmap creation.
|
||||
|
||||
| Requirement | Phase | Status |
|
||||
|-------------|-------|--------|
|
||||
| INGST-01 (#1) | Phase 1 | Pending |
|
||||
| INGST-02 (#2) | Phase 1 | Pending |
|
||||
| INGST-03 (#3) | Phase 1 | Pending |
|
||||
| INGST-04 (#4) | Phase 1 | Pending |
|
||||
| INGST-05 (#5) | Phase 1 | Pending |
|
||||
| INGST-06 (#6) | Phase 1 | Pending |
|
||||
| SRCH-01 (#7) | Phase 2 | Pending |
|
||||
| SRCH-02 (#8) | Phase 2 | Pending |
|
||||
| SRCH-03 (#9) | Phase 2 | Pending |
|
||||
| SRCH-04 (#10) | Phase 2 | Pending |
|
||||
| SRCH-05 (#11) | Phase 2 | Pending |
|
||||
| SRCH-06 (#12) | Phase 2 | Pending |
|
||||
| AGNT-01 (#13) | Phase 3 | Pending |
|
||||
| AGNT-02 (#14) | Phase 3 | Pending |
|
||||
| AGNT-03 (#15) | Phase 3 | Pending |
|
||||
| AGNT-04 (#16) | Phase 3 | Pending |
|
||||
| AGNT-05 (#17) | Phase 3 | Pending |
|
||||
| AGNT-06 (#18) | Phase 3 | Pending |
|
||||
| AGNT-07 (#19) | Phase 3 | Pending |
|
||||
| DIAG-01 (#20) | Phase 2 | Pending |
|
||||
| DIAG-02 (#21) | Phase 2 | Pending |
|
||||
| DIAG-03 (#22) | Phase 2 | Pending |
|
||||
| SECU-01 (#23) | Phase 4 | Pending |
|
||||
| SECU-02 (#24) | Phase 4 | Pending |
|
||||
| SECU-03 (#25) | Phase 4 | Pending |
|
||||
| SECU-04 (#26) | Phase 4 | Pending |
|
||||
| SECU-05 (#27) | Phase 4 | Pending |
|
||||
| API-01 (#28) | Phase 1 | Pending |
|
||||
| API-02 (#29) | Phase 1 | Pending |
|
||||
| API-03 (#30) | Phase 1 | Pending |
|
||||
| API-04 (#31) | Phase 1 | Pending |
|
||||
| API-05 (#32) | Phase 1 | Pending |
|
||||
|
||||
**Coverage:**
|
||||
- v1 requirements: 32 total
|
||||
- Mapped to phases: 32
|
||||
- Unmapped: 0
|
||||
|
||||
---
|
||||
*Requirements defined: 2026-03-11*
|
||||
*Last updated: 2026-03-11 after roadmap creation*
|
||||
93
.planning/ROADMAP.md
Normal file
93
.planning/ROADMAP.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Roadmap: Cameleer3 Server
|
||||
|
||||
## Overview
|
||||
|
||||
Build an observability server that ingests millions of Camel route transactions per day into ClickHouse, provides structured and full-text search, manages agent lifecycles via SSE, and secures all communication with JWT and Ed25519 signing. The roadmap moves from data-in (ingestion) to data-out (search) to agent management to security, each phase delivering a complete, verifiable capability.
|
||||
|
||||
## Phases
|
||||
|
||||
**Phase Numbering:**
|
||||
- Integer phases (1, 2, 3): Planned milestone work
|
||||
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
|
||||
|
||||
Decimal phases appear between their surrounding integers in numeric order.
|
||||
|
||||
- [ ] **Phase 1: Ingestion Pipeline + API Foundation** - ClickHouse schema, batch write buffer, ingestion endpoints, API scaffolding
|
||||
- [ ] **Phase 2: Transaction Search + Diagrams** - Structured search, full-text search, diagram versioning and rendering
|
||||
- [ ] **Phase 3: Agent Registry + SSE Push** - Agent lifecycle management, heartbeat monitoring, SSE config/command push
|
||||
- [ ] **Phase 4: Security** - JWT authentication, Ed25519 signing, bootstrap token registration, endpoint protection
|
||||
|
||||
## Phase Details
|
||||
|
||||
### Phase 1: Ingestion Pipeline + API Foundation
|
||||
**Goal**: Agents can POST execution data, diagrams, and metrics to the server, which batch-writes them to ClickHouse with TTL retention and backpressure protection
|
||||
**Depends on**: Nothing (first phase)
|
||||
**Requirements**: INGST-01 (#1), INGST-02 (#2), INGST-03 (#3), INGST-04 (#4), INGST-05 (#5), INGST-06 (#6), API-01 (#28), API-02 (#29), API-03 (#30), API-04 (#31), API-05 (#32)
|
||||
**Success Criteria** (what must be TRUE):
|
||||
1. An HTTP client can POST a RouteExecution payload to `/api/v1/data/executions` and receive 202 Accepted, and the data appears in ClickHouse within the flush interval
|
||||
2. An HTTP client can POST RouteGraph and metrics payloads to their respective endpoints and receive 202 Accepted
|
||||
3. When the write buffer is full, the server returns 503 and does not lose already-buffered data
|
||||
4. Data older than the configured TTL (default 30 days) is automatically removed by ClickHouse
|
||||
5. The health endpoint responds at `/api/v1/health`, OpenAPI docs are available, protocol version header is validated, and unknown JSON fields are accepted
|
||||
**Plans:** 2/3 plans executed
|
||||
|
||||
Plans:
|
||||
- [ ] 01-01-PLAN.md -- ClickHouse infrastructure, schema, WriteBuffer, repository interfaces, test infrastructure
|
||||
- [ ] 01-02-PLAN.md -- Ingestion REST endpoints, ClickHouse repositories, flush scheduler, integration tests
|
||||
- [ ] 01-03-PLAN.md -- API foundation (health, OpenAPI, protocol header, forward compat, TTL verification)
|
||||
|
||||
### Phase 2: Transaction Search + Diagrams
|
||||
**Goal**: Users can find any transaction by status, time, duration, correlation ID, or content, view execution detail trees, and see versioned route diagrams linked to transactions
|
||||
**Depends on**: Phase 1
|
||||
**Requirements**: SRCH-01 (#7), SRCH-02 (#8), SRCH-03 (#9), SRCH-04 (#10), SRCH-05 (#11), SRCH-06 (#12), DIAG-01 (#20), DIAG-02 (#21), DIAG-03 (#22)
|
||||
**Success Criteria** (what must be TRUE):
|
||||
1. User can query transactions filtered by any combination of status, date range, duration range, and correlationId, and receive matching results via REST
|
||||
2. User can full-text search across message bodies, headers, error messages, and stack traces and find matching transactions
|
||||
3. User can retrieve a transaction's detail view showing the nested processor execution tree
|
||||
4. Route diagrams are stored with content-addressable versioning (identical definitions stored once), each transaction links to its active diagram version, and diagrams can be rendered from stored definitions
|
||||
**Plans**: TBD
|
||||
|
||||
Plans:
|
||||
- [ ] 02-01: Transaction query engine (structured filters + full-text via ClickHouse skip indexes)
|
||||
- [ ] 02-02: Transaction detail + diagram versioning, linking, and rendering
|
||||
|
||||
### Phase 3: Agent Registry + SSE Push
|
||||
**Goal**: Server tracks connected agents through their full lifecycle and can push configuration updates, deep-trace commands, and replay commands to specific agents in real time
|
||||
**Depends on**: Phase 1
|
||||
**Requirements**: AGNT-01 (#13), AGNT-02 (#14), AGNT-03 (#15), AGNT-04 (#16), AGNT-05 (#17), AGNT-06 (#18), AGNT-07 (#19)
|
||||
**Success Criteria** (what must be TRUE):
|
||||
1. An agent can register via POST with a bootstrap token and receive a JWT (security enforcement deferred to Phase 4, but the registration flow and token issuance work end-to-end)
|
||||
2. Server correctly transitions agents through LIVE/STALE/DEAD states based on heartbeat timing, and the agent list endpoint reflects current states
|
||||
3. Server pushes config-update, deep-trace, and replay events to a specific agent's SSE stream, with ping keepalive and Last-Event-ID reconnection support
|
||||
**Plans**: TBD
|
||||
|
||||
Plans:
|
||||
- [ ] 03-01: Agent registration, heartbeat lifecycle, and registry endpoints
|
||||
- [ ] 03-02: SSE connection management and command push (config-update, deep-trace, replay, ping, reconnection)
|
||||
|
||||
### Phase 4: Security
|
||||
**Goal**: All server communication is authenticated and integrity-protected, with JWT for API access and Ed25519 signatures for pushed configuration
|
||||
**Depends on**: Phase 1, Phase 3
|
||||
**Requirements**: SECU-01 (#23), SECU-02 (#24), SECU-03 (#25), SECU-04 (#26), SECU-05 (#27)
|
||||
**Success Criteria** (what must be TRUE):
|
||||
1. All API endpoints except health and register reject requests without a valid JWT Bearer token
|
||||
2. Agents can refresh expired JWTs via the refresh endpoint without re-registering
|
||||
3. Server generates an Ed25519 keypair at startup, delivers the public key during registration, and all config-update and replay SSE payloads carry a valid Ed25519 signature
|
||||
4. Bootstrap token from CAMELEER_AUTH_TOKEN environment variable is required for initial agent registration
|
||||
**Plans**: TBD
|
||||
|
||||
Plans:
|
||||
- [ ] 04-01: JWT authentication filter, refresh flow, Ed25519 keypair generation and config signing, bootstrap token validation
|
||||
|
||||
## Progress
|
||||
|
||||
**Execution Order:**
|
||||
Phases execute in numeric order: 1 -> 2 -> 3 -> 4
|
||||
Note: Phases 2 and 3 both depend only on Phase 1 and could execute in parallel.
|
||||
|
||||
| Phase | Plans Complete | Status | Completed |
|
||||
|-------|----------------|--------|-----------|
|
||||
| 1. Ingestion Pipeline + API Foundation | 2/3 | In Progress| |
|
||||
| 2. Transaction Search + Diagrams | 0/2 | Not started | - |
|
||||
| 3. Agent Registry + SSE Push | 0/2 | Not started | - |
|
||||
| 4. Security | 0/1 | Not started | - |
|
||||
92
.planning/STATE.md
Normal file
92
.planning/STATE.md
Normal file
@@ -0,0 +1,92 @@
|
||||
---
|
||||
gsd_state_version: 1.0
|
||||
milestone: v1.0
|
||||
milestone_name: milestone
|
||||
status: completed
|
||||
stopped_at: Completed 01-02-PLAN.md (Phase 1 fully complete)
|
||||
last_updated: "2026-03-11T11:20:09.673Z"
|
||||
last_activity: 2026-03-11 -- Completed 01-02 (Ingestion endpoints, ClickHouse repositories, flush scheduler, 11 ITs)
|
||||
progress:
|
||||
total_phases: 4
|
||||
completed_phases: 1
|
||||
total_plans: 3
|
||||
completed_plans: 3
|
||||
percent: 100
|
||||
---
|
||||
|
||||
# Project State
|
||||
|
||||
## Project Reference
|
||||
|
||||
See: .planning/PROJECT.md (updated 2026-03-11)
|
||||
|
||||
**Core value:** Users can reliably search and find any transaction across all connected Camel instances -- by any combination of state, time, duration, or content -- even at millions of transactions per day with 30-day retention.
|
||||
**Current focus:** Phase 1: Ingestion Pipeline + API Foundation
|
||||
|
||||
## Current Position
|
||||
|
||||
Phase: 1 of 4 (Ingestion Pipeline + API Foundation) -- COMPLETE
|
||||
Plan: 3 of 3 in current phase
|
||||
Status: Phase 1 Complete
|
||||
Last activity: 2026-03-11 -- Completed 01-02 (Ingestion endpoints, ClickHouse repositories, flush scheduler, 11 ITs)
|
||||
|
||||
Progress: [██████████] 100%
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
**Velocity:**
|
||||
- Total plans completed: 0
|
||||
- Average duration: -
|
||||
- Total execution time: 0 hours
|
||||
|
||||
**By Phase:**
|
||||
|
||||
| Phase | Plans | Total | Avg/Plan |
|
||||
|-------|-------|-------|----------|
|
||||
| - | - | - | - |
|
||||
|
||||
**Recent Trend:**
|
||||
- Last 5 plans: -
|
||||
- Trend: -
|
||||
|
||||
*Updated after each plan completion*
|
||||
| Phase 01 P01 | 3min | 2 tasks | 13 files |
|
||||
| Phase 01 P02 | 7min | 2 tasks | 14 files |
|
||||
| Phase 01 P03 | 10min | 2 tasks | 12 files |
|
||||
|
||||
## Accumulated Context
|
||||
|
||||
### Decisions
|
||||
|
||||
Decisions are logged in PROJECT.md Key Decisions table.
|
||||
Recent decisions affecting current work:
|
||||
|
||||
- [Roadmap]: ClickHouse chosen as primary store (research recommendation, HIGH confidence)
|
||||
- [Roadmap]: Full-text search starts with ClickHouse skip indexes (tokenbf_v1), OpenSearch deferred
|
||||
- [Roadmap]: Phases 2 and 3 can execute in parallel (both depend only on Phase 1)
|
||||
- [Roadmap]: Web UI deferred to v2
|
||||
- [Phase 01]: Used spring-boot-starter-jdbc for JdbcTemplate + HikariCP auto-config
|
||||
- [Phase 01]: Created MetricsSnapshot record in core module (cameleer3-common has no metrics model)
|
||||
- [Phase 01]: Upgraded testcontainers to 2.0.3 for Docker Desktop 29.x compatibility
|
||||
- [Phase 01]: Changed error_message/error_stacktrace to non-nullable String for tokenbf_v1 index compat
|
||||
- [Phase 01]: TTL expressions require toDateTime() cast for DateTime64 columns in ClickHouse 25.3
|
||||
- [Phase 01]: Controllers accept raw String body to support both single and array JSON payloads
|
||||
- [Phase 01]: IngestionService is a plain class in core module, wired as bean by IngestionBeanConfig in app
|
||||
- [Phase 01]: Removed @Configuration from IngestionConfig to fix duplicate bean with @EnableConfigurationProperties
|
||||
|
||||
### Pending Todos
|
||||
|
||||
None yet.
|
||||
|
||||
### Blockers/Concerns
|
||||
|
||||
- [Phase 1]: ClickHouse Java client API needs phase-specific research (library has undergone changes)
|
||||
- [Phase 1]: Must read cameleer3-common PROTOCOL.md before designing ClickHouse schema
|
||||
- [Phase 2]: Diagram rendering library selection is an open question (Batik, jsvg, JGraphX, or client-side)
|
||||
- [Phase 2]: ClickHouse skip indexes may not suffice for full-text; decision point during Phase 2
|
||||
|
||||
## Session Continuity
|
||||
|
||||
Last session: 2026-03-11T11:14:00.000Z
|
||||
Stopped at: Completed 01-02-PLAN.md (Phase 1 fully complete)
|
||||
Resume file: None
|
||||
13
.planning/config.json
Normal file
13
.planning/config.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"mode": "yolo",
|
||||
"granularity": "coarse",
|
||||
"parallelization": true,
|
||||
"commit_docs": true,
|
||||
"model_profile": "quality",
|
||||
"workflow": {
|
||||
"research": true,
|
||||
"plan_check": true,
|
||||
"verifier": true,
|
||||
"nyquist_validation": true
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,216 @@
|
||||
---
|
||||
phase: 01-ingestion-pipeline-api-foundation
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- pom.xml
|
||||
- cameleer3-server-core/pom.xml
|
||||
- cameleer3-server-app/pom.xml
|
||||
- docker-compose.yml
|
||||
- clickhouse/init/01-schema.sql
|
||||
- cameleer3-server-app/src/main/resources/application.yml
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/ClickHouseConfig.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/IngestionConfig.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/WriteBuffer.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/ExecutionRepository.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/DiagramRepository.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/MetricsRepository.java
|
||||
- cameleer3-server-core/src/test/java/com/cameleer3/server/core/ingestion/WriteBufferTest.java
|
||||
autonomous: true
|
||||
requirements:
|
||||
- INGST-04
|
||||
- INGST-05
|
||||
- INGST-06
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "WriteBuffer accepts items and returns false when full (backpressure signal)"
|
||||
- "WriteBuffer drains items in batches for scheduled flush"
|
||||
- "ClickHouse schema creates route_executions, route_diagrams, and agent_metrics tables with correct column types"
|
||||
- "TTL clause on tables removes data older than configured days"
|
||||
- "Docker Compose starts ClickHouse and initializes the schema"
|
||||
artifacts:
|
||||
- path: "cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/WriteBuffer.java"
|
||||
provides: "Generic bounded write buffer with offer/drain/isFull"
|
||||
min_lines: 30
|
||||
- path: "clickhouse/init/01-schema.sql"
|
||||
provides: "ClickHouse DDL for all three tables"
|
||||
contains: "CREATE TABLE route_executions"
|
||||
- path: "docker-compose.yml"
|
||||
provides: "Local ClickHouse service"
|
||||
contains: "clickhouse-server"
|
||||
- path: "cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/ExecutionRepository.java"
|
||||
provides: "Repository interface for execution batch inserts"
|
||||
exports: ["insertBatch"]
|
||||
key_links:
|
||||
- from: "cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/ClickHouseConfig.java"
|
||||
to: "application.yml"
|
||||
via: "spring.datasource properties"
|
||||
pattern: "spring\\.datasource"
|
||||
- from: "cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/IngestionConfig.java"
|
||||
to: "application.yml"
|
||||
via: "ingestion.* properties"
|
||||
pattern: "ingestion\\."
|
||||
---
|
||||
|
||||
<objective>
|
||||
Set up ClickHouse infrastructure, schema, WriteBuffer with backpressure, and repository interfaces.
|
||||
|
||||
Purpose: Establishes the storage foundation that all ingestion endpoints and future search queries depend on. The WriteBuffer is the central throughput mechanism -- all data flows through it before reaching ClickHouse.
|
||||
Output: Working ClickHouse via Docker Compose, DDL with TTL, WriteBuffer with unit tests, repository interfaces.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@C:/Users/Hendrik/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@C:/Users/Hendrik/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/01-ingestion-pipeline-api-foundation/01-RESEARCH.md
|
||||
|
||||
@pom.xml
|
||||
@cameleer3-server-core/pom.xml
|
||||
@cameleer3-server-app/pom.xml
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: Dependencies, Docker Compose, ClickHouse schema, and application config</name>
|
||||
<files>
|
||||
pom.xml,
|
||||
cameleer3-server-core/pom.xml,
|
||||
cameleer3-server-app/pom.xml,
|
||||
docker-compose.yml,
|
||||
clickhouse/init/01-schema.sql,
|
||||
cameleer3-server-app/src/main/resources/application.yml
|
||||
</files>
|
||||
<behavior>
|
||||
- docker compose up -d starts ClickHouse on ports 8123/9000
|
||||
- Connecting to ClickHouse and running SELECT 1 succeeds
|
||||
- Tables route_executions, route_diagrams, agent_metrics exist after init
|
||||
- route_executions has TTL clause with configurable interval
|
||||
- route_executions has PARTITION BY toYYYYMMDD(start_time) and ORDER BY (agent_id, status, start_time, execution_id)
|
||||
- route_diagrams uses ReplacingMergeTree with ORDER BY (content_hash)
|
||||
- agent_metrics has TTL and daily partitioning
|
||||
- Maven compile succeeds with new dependencies
|
||||
</behavior>
|
||||
<action>
|
||||
1. Add dependencies to cameleer3-server-app/pom.xml per research:
|
||||
- clickhouse-jdbc 0.9.7 (classifier: all)
|
||||
- spring-boot-starter-actuator
|
||||
- springdoc-openapi-starter-webmvc-ui 2.8.6
|
||||
- testcontainers-clickhouse 2.0.2 (test scope)
|
||||
- junit-jupiter from testcontainers 2.0.2 (test scope)
|
||||
- awaitility (test scope)
|
||||
|
||||
2. Add slf4j-api dependency to cameleer3-server-core/pom.xml.
|
||||
|
||||
3. Create docker-compose.yml at project root with ClickHouse service:
|
||||
- Image: clickhouse/clickhouse-server:25.3
|
||||
- Ports: 8123:8123, 9000:9000
|
||||
- Volume mount ./clickhouse/init to /docker-entrypoint-initdb.d
|
||||
- Environment: CLICKHOUSE_USER=cameleer, CLICKHOUSE_PASSWORD=cameleer_dev, CLICKHOUSE_DB=cameleer3
|
||||
- ulimits nofile 262144
|
||||
|
||||
4. Create clickhouse/init/01-schema.sql with the three tables from research:
|
||||
- route_executions: MergeTree, daily partitioning on start_time, ORDER BY (agent_id, status, start_time, execution_id), TTL start_time + INTERVAL 30 DAY, SETTINGS ttl_only_drop_parts=1. Include Array columns for processor executions (processor_ids, processor_types, processor_starts, processor_ends, processor_durations, processor_statuses). Include skip indexes for correlation_id (bloom_filter) and error_message (tokenbf_v1).
|
||||
- route_diagrams: ReplacingMergeTree(created_at), ORDER BY (content_hash). No TTL.
|
||||
- agent_metrics: MergeTree, daily partitioning on collected_at, ORDER BY (agent_id, metric_name, collected_at), TTL collected_at + INTERVAL 30 DAY, SETTINGS ttl_only_drop_parts=1.
|
||||
- All DateTime fields use DateTime64(3, 'UTC').
|
||||
|
||||
5. Create cameleer3-server-app/src/main/resources/application.yml with config from research:
|
||||
- server.port: 8081
|
||||
- spring.datasource: url=jdbc:ch://localhost:8123/cameleer3, username/password, driver-class-name
|
||||
- spring.jackson: write-dates-as-timestamps=false, fail-on-unknown-properties=false
|
||||
- ingestion: buffer-capacity=50000, batch-size=5000, flush-interval-ms=1000
|
||||
- clickhouse.ttl-days: 30
|
||||
- springdoc paths under /api/v1/
|
||||
- management endpoints (health under /api/v1/, show-details=always)
|
||||
|
||||
6. Ensure .gitattributes exists with `* text=auto eol=lf`.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>mvn clean compile -q 2>&1 | tail -5</automated>
|
||||
</verify>
|
||||
<done>Maven compiles successfully with all new dependencies. Docker Compose file and ClickHouse DDL exist. application.yml configures datasource, ingestion buffer, and springdoc.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: WriteBuffer, repository interfaces, IngestionConfig, and ClickHouseConfig</name>
|
||||
<files>
|
||||
cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/WriteBuffer.java,
|
||||
cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/ExecutionRepository.java,
|
||||
cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/DiagramRepository.java,
|
||||
cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/MetricsRepository.java,
|
||||
cameleer3-server-core/src/test/java/com/cameleer3/server/core/ingestion/WriteBufferTest.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/ClickHouseConfig.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/IngestionConfig.java
|
||||
</files>
|
||||
<behavior>
|
||||
- WriteBuffer(capacity=10): offer() returns true for first 10 items, false on 11th
|
||||
- WriteBuffer.drain(5) returns up to 5 items and removes them from the queue
|
||||
- WriteBuffer.isFull() returns true when at capacity
|
||||
- WriteBuffer.offerBatch(list) returns false without partial insert if buffer would overflow
|
||||
- WriteBuffer.size() tracks current queue depth
|
||||
- ExecutionRepository interface declares insertBatch(List of RouteExecution)
|
||||
- DiagramRepository interface declares store(RouteGraph) and findByContentHash(String)
|
||||
- MetricsRepository interface declares insertBatch(List of metric data)
|
||||
</behavior>
|
||||
<action>
|
||||
1. Create WriteBuffer<T> in core module (no Spring dependency):
|
||||
- Constructor takes int capacity, creates ArrayBlockingQueue(capacity)
|
||||
- offer(T item): returns queue.offer(item) -- false when full
|
||||
- offerBatch(List<T> items): check remainingCapacity() >= items.size() first, then offer each. If insufficient capacity, return false immediately without adding any items.
|
||||
- drain(int maxBatch): drainTo into ArrayList, return list
|
||||
- size(), capacity(), isFull(), remainingCapacity() accessors
|
||||
|
||||
2. Create WriteBufferTest (JUnit 5, no Spring):
|
||||
- Test offer succeeds until capacity
|
||||
- Test offer returns false when full
|
||||
- Test offerBatch all-or-nothing semantics
|
||||
- Test drain returns items and removes from queue
|
||||
- Test drain with empty queue returns empty list
|
||||
- Test isFull/size/remainingCapacity
|
||||
|
||||
3. Create repository interfaces in core module:
|
||||
- ExecutionRepository: void insertBatch(List<RouteExecution> executions)
|
||||
- DiagramRepository: void store(RouteGraph graph), Optional<RouteGraph> findByContentHash(String hash), Optional<String> findContentHashForRoute(String routeId, String agentId)
|
||||
- MetricsRepository: void insertBatch(List<MetricsSnapshot> metrics) -- use a generic type or the cameleer3-common metrics model if available; if not, create a simple MetricsData record in core module
|
||||
|
||||
4. Create IngestionConfig as @ConfigurationProperties("ingestion"):
|
||||
- bufferCapacity (int, default 50000)
|
||||
- batchSize (int, default 5000)
|
||||
- flushIntervalMs (long, default 1000)
|
||||
|
||||
5. Create ClickHouseConfig as @Configuration:
|
||||
- Exposes JdbcTemplate bean (Spring Boot auto-configures DataSource from spring.datasource)
|
||||
- No custom bean needed if relying on auto-config; only create if explicit JdbcTemplate customization required
|
||||
</action>
|
||||
<verify>
|
||||
<automated>mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q 2>&1 | tail -10</automated>
|
||||
</verify>
|
||||
<done>WriteBuffer passes all unit tests. Repository interfaces exist with correct method signatures. IngestionConfig reads from application.yml.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
- `mvn test -pl cameleer3-server-core -q` -- all WriteBuffer unit tests pass
|
||||
- `mvn clean compile -q` -- full project compiles with new dependencies
|
||||
- `docker compose config` -- validates Docker Compose file
|
||||
- clickhouse/init/01-schema.sql contains CREATE TABLE for all three tables with correct ENGINE, ORDER BY, PARTITION BY, and TTL
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
WriteBuffer unit tests green. Project compiles. ClickHouse DDL defines all three tables with TTL and correct partitioning. Repository interfaces define batch insert contracts.
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/01-ingestion-pipeline-api-foundation/01-01-SUMMARY.md`
|
||||
</output>
|
||||
@@ -0,0 +1,125 @@
|
||||
---
|
||||
phase: 01-ingestion-pipeline-api-foundation
|
||||
plan: 01
|
||||
subsystem: database
|
||||
tags: [clickhouse, jdbc, docker-compose, write-buffer, backpressure]
|
||||
|
||||
requires:
|
||||
- phase: none
|
||||
provides: greenfield project skeleton
|
||||
|
||||
provides:
|
||||
- ClickHouse Docker Compose for local development
|
||||
- ClickHouse DDL with route_executions, route_diagrams, agent_metrics tables
|
||||
- WriteBuffer<T> generic bounded buffer with backpressure signal
|
||||
- ExecutionRepository, DiagramRepository, MetricsRepository interfaces
|
||||
- IngestionConfig and ClickHouseConfig Spring configuration
|
||||
- Application.yml with datasource, ingestion, springdoc, actuator config
|
||||
|
||||
affects: [01-02, 01-03, 02-search, 03-agent-registry]
|
||||
|
||||
tech-stack:
|
||||
added: [clickhouse-jdbc 0.9.7, springdoc-openapi 2.8.6, spring-boot-starter-actuator, spring-boot-starter-jdbc, testcontainers-clickhouse 2.0.2, awaitility, slf4j-api]
|
||||
patterns: [ArrayBlockingQueue write buffer with offer/drain, all-or-nothing offerBatch, content-hash dedup for diagrams, daily partitioning with TTL + ttl_only_drop_parts]
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- docker-compose.yml
|
||||
- clickhouse/init/01-schema.sql
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/WriteBuffer.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/ExecutionRepository.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/DiagramRepository.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/MetricsRepository.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/model/MetricsSnapshot.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/IngestionConfig.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/ClickHouseConfig.java
|
||||
- cameleer3-server-core/src/test/java/com/cameleer3/server/core/ingestion/WriteBufferTest.java
|
||||
modified:
|
||||
- cameleer3-server-core/pom.xml
|
||||
- cameleer3-server-app/pom.xml
|
||||
- cameleer3-server-app/src/main/resources/application.yml
|
||||
|
||||
key-decisions:
|
||||
- "Used spring-boot-starter-jdbc for JdbcTemplate + HikariCP auto-config rather than manual DataSource"
|
||||
- "Created MetricsSnapshot record in core module since cameleer3-common has no metrics model"
|
||||
- "ClickHouseConfig exposes JdbcTemplate bean; relies on Spring Boot DataSource auto-config"
|
||||
|
||||
patterns-established:
|
||||
- "WriteBuffer pattern: ArrayBlockingQueue with offer()/offerBatch()/drain() for decoupling HTTP from ClickHouse"
|
||||
- "Repository interfaces in core module, implementations will go in app module"
|
||||
- "ClickHouse schema: DateTime64(3, 'UTC') for all timestamps, daily partitioning, TTL with ttl_only_drop_parts"
|
||||
|
||||
requirements-completed: [INGST-04, INGST-05, INGST-06]
|
||||
|
||||
duration: 3min
|
||||
completed: 2026-03-11
|
||||
---
|
||||
|
||||
# Phase 1 Plan 01: ClickHouse Infrastructure and WriteBuffer Summary
|
||||
|
||||
**ClickHouse schema with three tables (daily partitioned, TTL), Docker Compose, WriteBuffer with backpressure, and repository interfaces**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** 3 min
|
||||
- **Started:** 2026-03-11T10:45:57Z
|
||||
- **Completed:** 2026-03-11T10:49:47Z
|
||||
- **Tasks:** 2
|
||||
- **Files modified:** 13
|
||||
|
||||
## Accomplishments
|
||||
- ClickHouse DDL with route_executions (MergeTree, bloom_filter + tokenbf_v1 skip indexes), route_diagrams (ReplacingMergeTree), agent_metrics (MergeTree with TTL)
|
||||
- Generic WriteBuffer<T> with all-or-nothing batch semantics and 10 passing unit tests
|
||||
- Repository interfaces defining batch insert contracts for executions, diagrams, and metrics
|
||||
- Full application.yml with datasource, ingestion buffer config, springdoc, and actuator health endpoint
|
||||
|
||||
## Task Commits
|
||||
|
||||
Each task was committed atomically:
|
||||
|
||||
1. **Task 1: Dependencies, Docker Compose, ClickHouse schema, and application config** - `96c52b8` (feat)
|
||||
2. **Task 2 RED: WriteBuffer failing tests** - `f37009e` (test)
|
||||
3. **Task 2 GREEN: WriteBuffer, repository interfaces, config classes** - `cc1c082` (feat)
|
||||
|
||||
## Files Created/Modified
|
||||
- `docker-compose.yml` - ClickHouse service with ports 8123/9000, init volume mount
|
||||
- `clickhouse/init/01-schema.sql` - DDL for route_executions, route_diagrams, agent_metrics
|
||||
- `cameleer3-server-core/src/main/java/.../ingestion/WriteBuffer.java` - Bounded queue with offer/offerBatch/drain
|
||||
- `cameleer3-server-core/src/main/java/.../storage/ExecutionRepository.java` - Batch insert interface for RouteExecution
|
||||
- `cameleer3-server-core/src/main/java/.../storage/DiagramRepository.java` - Store/find interface for RouteGraph
|
||||
- `cameleer3-server-core/src/main/java/.../storage/MetricsRepository.java` - Batch insert interface for MetricsSnapshot
|
||||
- `cameleer3-server-core/src/main/java/.../storage/model/MetricsSnapshot.java` - Metrics data record
|
||||
- `cameleer3-server-app/src/main/java/.../config/IngestionConfig.java` - Buffer capacity, batch size, flush interval
|
||||
- `cameleer3-server-app/src/main/java/.../config/ClickHouseConfig.java` - JdbcTemplate bean
|
||||
- `cameleer3-server-core/src/test/java/.../ingestion/WriteBufferTest.java` - 10 unit tests for WriteBuffer
|
||||
- `cameleer3-server-core/pom.xml` - Added slf4j-api
|
||||
- `cameleer3-server-app/pom.xml` - Added clickhouse-jdbc, springdoc, actuator, testcontainers, awaitility
|
||||
- `cameleer3-server-app/src/main/resources/application.yml` - Full config with datasource, ingestion, springdoc, actuator
|
||||
|
||||
## Decisions Made
|
||||
- Used spring-boot-starter-jdbc to get JdbcTemplate and HikariCP auto-configuration rather than manually wiring a DataSource
|
||||
- Created MetricsSnapshot record in core module since cameleer3-common does not include a metrics model
|
||||
- ClickHouseConfig is minimal -- relies on Spring Boot auto-configuring DataSource from spring.datasource properties
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
None - plan executed exactly as written.
|
||||
|
||||
## Issues Encountered
|
||||
None
|
||||
|
||||
## User Setup Required
|
||||
None - no external service configuration required.
|
||||
|
||||
## Next Phase Readiness
|
||||
- ClickHouse infrastructure ready for Plan 02 (REST controllers + flush scheduler)
|
||||
- WriteBuffer and repository interfaces ready for implementation wiring
|
||||
- Docker Compose available for local development: `docker compose up -d`
|
||||
|
||||
## Self-Check: PASSED
|
||||
|
||||
All 10 created files verified present. All 3 task commits verified in git log.
|
||||
|
||||
---
|
||||
*Phase: 01-ingestion-pipeline-api-foundation*
|
||||
*Completed: 2026-03-11*
|
||||
@@ -0,0 +1,269 @@
|
||||
---
|
||||
phase: 01-ingestion-pipeline-api-foundation
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 3
|
||||
depends_on: ["01-01", "01-03"]
|
||||
files_modified:
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/ExecutionController.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/DiagramController.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/MetricsController.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseExecutionRepository.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseDiagramRepository.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseMetricsRepository.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/ingestion/ClickHouseFlushScheduler.java
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/IngestionService.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/ExecutionControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/DiagramControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/MetricsControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/BackpressureIT.java
|
||||
autonomous: true
|
||||
requirements:
|
||||
- INGST-01
|
||||
- INGST-02
|
||||
- INGST-03
|
||||
- INGST-05
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "POST /api/v1/data/executions with valid RouteExecution payload returns 202 Accepted"
|
||||
- "POST /api/v1/data/diagrams with valid RouteGraph payload returns 202 Accepted"
|
||||
- "POST /api/v1/data/metrics with valid metrics payload returns 202 Accepted"
|
||||
- "Data posted to endpoints appears in ClickHouse after flush interval"
|
||||
- "When buffer is full, endpoints return 503 with Retry-After header"
|
||||
artifacts:
|
||||
- path: "cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/ExecutionController.java"
|
||||
provides: "POST /api/v1/data/executions endpoint"
|
||||
min_lines: 20
|
||||
- path: "cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseExecutionRepository.java"
|
||||
provides: "Batch insert to route_executions table via JdbcTemplate"
|
||||
min_lines: 30
|
||||
- path: "cameleer3-server-app/src/main/java/com/cameleer3/server/app/ingestion/ClickHouseFlushScheduler.java"
|
||||
provides: "Scheduled drain of WriteBuffer into ClickHouse"
|
||||
min_lines: 20
|
||||
- path: "cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/IngestionService.java"
|
||||
provides: "Routes data to appropriate WriteBuffer instances"
|
||||
min_lines: 20
|
||||
key_links:
|
||||
- from: "ExecutionController.java"
|
||||
to: "IngestionService.java"
|
||||
via: "constructor injection"
|
||||
pattern: "IngestionService"
|
||||
- from: "IngestionService.java"
|
||||
to: "WriteBuffer.java"
|
||||
via: "offer/offerBatch calls"
|
||||
pattern: "writeBuffer\\.offer"
|
||||
- from: "ClickHouseFlushScheduler.java"
|
||||
to: "WriteBuffer.java"
|
||||
via: "drain call on scheduled interval"
|
||||
pattern: "writeBuffer\\.drain"
|
||||
- from: "ClickHouseFlushScheduler.java"
|
||||
to: "ClickHouseExecutionRepository.java"
|
||||
via: "insertBatch call"
|
||||
pattern: "executionRepository\\.insertBatch"
|
||||
- from: "ClickHouseFlushScheduler.java"
|
||||
to: "ClickHouseDiagramRepository.java"
|
||||
via: "store call after drain"
|
||||
pattern: "diagramRepository\\.store"
|
||||
- from: "ClickHouseFlushScheduler.java"
|
||||
to: "ClickHouseMetricsRepository.java"
|
||||
via: "insertBatch call after drain"
|
||||
pattern: "metricsRepository\\.insertBatch"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement the three ingestion REST endpoints, ClickHouse repository implementations, flush scheduler, and IngestionService that wires controllers to the WriteBuffer.
|
||||
|
||||
Purpose: This is the core data pipeline -- agents POST data to endpoints, IngestionService buffers it, ClickHouseFlushScheduler drains and batch-inserts to ClickHouse. Backpressure returns 503 when buffer full.
|
||||
Output: Working ingestion flow verified by integration tests against Testcontainers ClickHouse.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@C:/Users/Hendrik/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@C:/Users/Hendrik/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/01-ingestion-pipeline-api-foundation/01-RESEARCH.md
|
||||
@.planning/phases/01-ingestion-pipeline-api-foundation/01-01-SUMMARY.md
|
||||
|
||||
<!-- Interfaces from Plan 01 that this plan depends on -->
|
||||
<interfaces>
|
||||
From cameleer3-server-core WriteBuffer.java:
|
||||
```java
|
||||
public class WriteBuffer<T> {
|
||||
public WriteBuffer(int capacity);
|
||||
public boolean offer(T item);
|
||||
public boolean offerBatch(List<T> items);
|
||||
public List<T> drain(int maxBatch);
|
||||
public int size();
|
||||
public int capacity();
|
||||
public boolean isFull();
|
||||
public int remainingCapacity();
|
||||
}
|
||||
```
|
||||
|
||||
From cameleer3-server-core repository interfaces:
|
||||
```java
|
||||
public interface ExecutionRepository {
|
||||
void insertBatch(List<RouteExecution> executions);
|
||||
}
|
||||
public interface DiagramRepository {
|
||||
void store(RouteGraph graph);
|
||||
Optional<RouteGraph> findByContentHash(String hash);
|
||||
Optional<String> findContentHashForRoute(String routeId, String agentId);
|
||||
}
|
||||
public interface MetricsRepository {
|
||||
void insertBatch(List<MetricsSnapshot> metrics);
|
||||
}
|
||||
```
|
||||
|
||||
From IngestionConfig:
|
||||
```java
|
||||
@ConfigurationProperties("ingestion")
|
||||
public class IngestionConfig {
|
||||
int bufferCapacity; // default 50000
|
||||
int batchSize; // default 5000
|
||||
long flushIntervalMs; // default 1000
|
||||
}
|
||||
```
|
||||
</interfaces>
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="false">
|
||||
<name>Task 1: IngestionService, ClickHouse repositories, and flush scheduler</name>
|
||||
<files>
|
||||
cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/IngestionService.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseExecutionRepository.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseDiagramRepository.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseMetricsRepository.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/ingestion/ClickHouseFlushScheduler.java
|
||||
</files>
|
||||
<action>
|
||||
1. Create IngestionService in core module (no Spring annotations -- it's a plain class):
|
||||
- Constructor takes three WriteBuffer instances (executions, diagrams, metrics)
|
||||
- acceptExecutions(List<RouteExecution>): calls executionBuffer.offerBatch(), returns boolean
|
||||
- acceptExecution(RouteExecution): calls executionBuffer.offer(), returns boolean
|
||||
- acceptDiagram(RouteGraph): calls diagramBuffer.offer(), returns boolean
|
||||
- acceptDiagrams(List<RouteGraph>): calls diagramBuffer.offerBatch(), returns boolean
|
||||
- acceptMetrics(List<MetricsSnapshot>): calls metricsBuffer.offerBatch(), returns boolean
|
||||
- getExecutionBufferDepth(), getDiagramBufferDepth(), getMetricsBufferDepth() for monitoring
|
||||
|
||||
2. Create ClickHouseExecutionRepository implements ExecutionRepository:
|
||||
- @Repository, inject JdbcTemplate
|
||||
- insertBatch: INSERT INTO route_executions with all columns. Map RouteExecution fields to ClickHouse columns.
|
||||
For processor execution arrays: extract from RouteExecution.getProcessorExecutions() into parallel arrays (processor_ids, processor_types, etc.)
|
||||
Use JdbcTemplate.batchUpdate with BatchPreparedStatementSetter.
|
||||
For Array columns, use java.sql.Array via connection.createArrayOf() or pass as comma-separated and cast.
|
||||
Note: ClickHouse JDBC V2 handles Array types -- pass Java arrays directly via ps.setObject().
|
||||
|
||||
3. Create ClickHouseDiagramRepository implements DiagramRepository:
|
||||
- @Repository, inject JdbcTemplate
|
||||
- store(RouteGraph): serialize graph to JSON (Jackson ObjectMapper), compute SHA-256 hex hash of JSON bytes, INSERT INTO route_diagrams (content_hash, route_id, agent_id, definition)
|
||||
- findByContentHash: SELECT by content_hash, deserialize definition JSON back to RouteGraph
|
||||
- findContentHashForRoute: SELECT content_hash WHERE route_id=? AND agent_id=? ORDER BY created_at DESC LIMIT 1
|
||||
|
||||
4. Create ClickHouseMetricsRepository implements MetricsRepository:
|
||||
- @Repository, inject JdbcTemplate
|
||||
- insertBatch: INSERT INTO agent_metrics with batchUpdate
|
||||
|
||||
5. Create ClickHouseFlushScheduler:
|
||||
- @Component, @EnableScheduling on the app config or main class
|
||||
- Inject three WriteBuffer instances and three repository implementations
|
||||
- Inject IngestionConfig for batchSize
|
||||
- @Scheduled(fixedDelayString="${ingestion.flush-interval-ms:1000}") flushAll(): drains each buffer up to batchSize, calls insertBatch if non-empty. Wrap each in try-catch to log errors without stopping the scheduler.
|
||||
- Implement SmartLifecycle: on stop(), flush all remaining data (loop drain until empty) before returning.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>mvn clean compile -q 2>&1 | tail -5</automated>
|
||||
</verify>
|
||||
<done>IngestionService routes data to WriteBuffers. ClickHouse repositories implement batch inserts via JdbcTemplate. FlushScheduler drains buffers on interval and flushes remaining on shutdown.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: Ingestion REST controllers and integration tests</name>
|
||||
<files>
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/ExecutionController.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/DiagramController.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/MetricsController.java,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/ExecutionControllerIT.java,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/DiagramControllerIT.java,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/MetricsControllerIT.java,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/BackpressureIT.java
|
||||
</files>
|
||||
<behavior>
|
||||
- POST /api/v1/data/executions with single RouteExecution JSON returns 202
|
||||
- POST /api/v1/data/executions with array of RouteExecutions returns 202
|
||||
- POST /api/v1/data/diagrams with single RouteGraph returns 202
|
||||
- POST /api/v1/data/diagrams with array of RouteGraphs returns 202
|
||||
- POST /api/v1/data/metrics with metrics payload returns 202
|
||||
- After flush interval, posted data is queryable in ClickHouse
|
||||
- When buffer is full, POST returns 503 with Retry-After header
|
||||
- Unknown JSON fields in request body are accepted (not rejected)
|
||||
</behavior>
|
||||
<action>
|
||||
1. Create ExecutionController:
|
||||
- @RestController @RequestMapping("/api/v1/data")
|
||||
- POST /executions: accepts @RequestBody that handles both single RouteExecution and List<RouteExecution>. Use a custom deserializer or accept Object and check type, OR simply always accept as List (require agents to send arrays). Per protocol, accept both single and array.
|
||||
- Calls ingestionService.acceptExecutions(). If returns false -> 503 with Retry-After: 5 header. If true -> 202 Accepted.
|
||||
- Add @Operation annotations for OpenAPI documentation.
|
||||
|
||||
2. Create DiagramController:
|
||||
- @RestController @RequestMapping("/api/v1/data")
|
||||
- POST /diagrams: same pattern, accepts single or array of RouteGraph, delegates to ingestionService.
|
||||
|
||||
3. Create MetricsController:
|
||||
- @RestController @RequestMapping("/api/v1/data")
|
||||
- POST /metrics: same pattern.
|
||||
|
||||
4. Create ExecutionControllerIT (extends AbstractClickHouseIT):
|
||||
- Use TestRestTemplate or MockMvc with @AutoConfigureMockMvc
|
||||
- Test: POST valid RouteExecution JSON with X-Cameleer-Protocol-Version:1 header -> 202
|
||||
- Test: POST array of executions -> 202
|
||||
- Test: After post, wait for flush (use Awaitility), query ClickHouse directly via JdbcTemplate to verify data arrived
|
||||
- Test: POST with unknown JSON fields -> 202 (forward compat, API-05)
|
||||
|
||||
5. Create DiagramControllerIT (extends AbstractClickHouseIT):
|
||||
- Test: POST RouteGraph -> 202
|
||||
- Test: After flush, diagram stored in ClickHouse with content_hash
|
||||
|
||||
6. Create MetricsControllerIT (extends AbstractClickHouseIT):
|
||||
- Test: POST metrics -> 202
|
||||
- Test: After flush, metrics in ClickHouse
|
||||
|
||||
7. Create BackpressureIT (extends AbstractClickHouseIT):
|
||||
- Configure test with tiny buffer (ingestion.buffer-capacity=5)
|
||||
- Fill buffer by posting enough data
|
||||
- Next POST returns 503 with Retry-After header
|
||||
- Verify previously buffered data is NOT lost (still flushes to ClickHouse)
|
||||
|
||||
Note: All integration tests must include X-Cameleer-Protocol-Version:1 header (API-04 will be enforced by Plan 03's interceptor, but include the header now for forward compatibility).
|
||||
</action>
|
||||
<verify>
|
||||
<automated>mvn test -pl cameleer3-server-app -Dtest="ExecutionControllerIT,DiagramControllerIT,MetricsControllerIT,BackpressureIT" -q 2>&1 | tail -15</automated>
|
||||
</verify>
|
||||
<done>All three ingestion endpoints return 202 on valid data. Data arrives in ClickHouse after flush. Buffer-full returns 503 with Retry-After. Unknown JSON fields accepted. Integration tests green.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
- `mvn test -pl cameleer3-server-app -Dtest="ExecutionControllerIT,DiagramControllerIT,MetricsControllerIT,BackpressureIT" -q` -- all integration tests pass
|
||||
- POST to /api/v1/data/executions returns 202
|
||||
- POST to /api/v1/data/diagrams returns 202
|
||||
- POST to /api/v1/data/metrics returns 202
|
||||
- Buffer full returns 503
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
All four integration test classes green. Data flows from HTTP POST through WriteBuffer through FlushScheduler to ClickHouse. Backpressure returns 503 when buffer full without losing existing data.
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/01-ingestion-pipeline-api-foundation/01-02-SUMMARY.md`
|
||||
</output>
|
||||
@@ -0,0 +1,148 @@
|
||||
---
|
||||
phase: 01-ingestion-pipeline-api-foundation
|
||||
plan: 02
|
||||
subsystem: api
|
||||
tags: [rest, ingestion, clickhouse, jdbc, backpressure, spring-boot, integration-tests]
|
||||
|
||||
requires:
|
||||
- phase: 01-01
|
||||
provides: WriteBuffer, repository interfaces, IngestionConfig, ClickHouse schema
|
||||
- phase: 01-03
|
||||
provides: AbstractClickHouseIT base class, ProtocolVersionInterceptor, application bootstrap
|
||||
|
||||
provides:
|
||||
- IngestionService routing data to WriteBuffer instances
|
||||
- ClickHouseExecutionRepository with batch insert and parallel processor arrays
|
||||
- ClickHouseDiagramRepository with SHA-256 content-hash deduplication
|
||||
- ClickHouseMetricsRepository with batch insert for agent_metrics
|
||||
- ClickHouseFlushScheduler with SmartLifecycle shutdown flush
|
||||
- POST /api/v1/data/executions endpoint (single and array)
|
||||
- POST /api/v1/data/diagrams endpoint (single and array)
|
||||
- POST /api/v1/data/metrics endpoint (array)
|
||||
- Backpressure: 503 with Retry-After when buffer full
|
||||
- 11 integration tests verifying end-to-end ingestion pipeline
|
||||
|
||||
affects: [02-search, 03-agent-registry]
|
||||
|
||||
tech-stack:
|
||||
added: []
|
||||
patterns: [single/array JSON payload parsing via raw String body, SmartLifecycle for graceful shutdown flush, BatchPreparedStatementSetter for ClickHouse batch inserts, SHA-256 content-hash dedup for diagrams]
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/IngestionService.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseExecutionRepository.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseDiagramRepository.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseMetricsRepository.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/ingestion/ClickHouseFlushScheduler.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/IngestionBeanConfig.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/ExecutionController.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/DiagramController.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/MetricsController.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/ExecutionControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/DiagramControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/MetricsControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/BackpressureIT.java
|
||||
modified:
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/IngestionConfig.java
|
||||
|
||||
key-decisions:
|
||||
- "Controllers accept raw String body and detect single vs array JSON to support both payload formats"
|
||||
- "IngestionService is a plain class in core module, wired as a bean by IngestionBeanConfig in app module"
|
||||
- "Removed @Configuration from IngestionConfig to fix duplicate bean with @EnableConfigurationProperties"
|
||||
|
||||
patterns-established:
|
||||
- "Controller pattern: accept raw String body, parse single/array JSON, delegate to IngestionService, return 202 or 503"
|
||||
- "Repository pattern: BatchPreparedStatementSetter for ClickHouse batch inserts"
|
||||
- "FlushScheduler pattern: SmartLifecycle for graceful shutdown, loop-drain until empty"
|
||||
- "Backpressure pattern: WriteBuffer.offer returns false -> controller returns 503 + Retry-After"
|
||||
|
||||
requirements-completed: [INGST-01, INGST-02, INGST-03, INGST-05]
|
||||
|
||||
duration: 7min
|
||||
completed: 2026-03-11
|
||||
---
|
||||
|
||||
# Phase 1 Plan 02: Ingestion Endpoints and ClickHouse Pipeline Summary
|
||||
|
||||
**Three REST ingestion endpoints with ClickHouse batch insert repositories, scheduled flush with graceful shutdown, and 11 green integration tests verifying end-to-end data flow and backpressure**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** 7 min
|
||||
- **Started:** 2026-03-11T11:06:47Z
|
||||
- **Completed:** 2026-03-11T11:14:00Z
|
||||
- **Tasks:** 2
|
||||
- **Files modified:** 14
|
||||
|
||||
## Accomplishments
|
||||
- Complete ingestion pipeline: HTTP POST -> IngestionService -> WriteBuffer -> ClickHouseFlushScheduler -> ClickHouse repositories
|
||||
- Three REST endpoints accepting both single and array JSON payloads with 202 Accepted response
|
||||
- Backpressure returns 503 with Retry-After header when write buffer is full
|
||||
- ClickHouse repositories: batch insert for executions (with flattened processor arrays), JSON storage with SHA-256 dedup for diagrams, batch insert for metrics
|
||||
- Graceful shutdown via SmartLifecycle drains all remaining buffered data
|
||||
|
||||
## Task Commits
|
||||
|
||||
Each task was committed atomically:
|
||||
|
||||
1. **Task 1: IngestionService, ClickHouse repositories, and flush scheduler** - `17a18cf` (feat)
|
||||
2. **Task 2 RED: Failing integration tests for ingestion endpoints** - `d55ebc1` (test)
|
||||
3. **Task 2 GREEN: Ingestion REST controllers with backpressure** - `8fe65f0` (feat)
|
||||
|
||||
## Files Created/Modified
|
||||
- `cameleer3-server-core/.../ingestion/IngestionService.java` - Routes data to WriteBuffer instances
|
||||
- `cameleer3-server-app/.../storage/ClickHouseExecutionRepository.java` - Batch insert with parallel processor arrays
|
||||
- `cameleer3-server-app/.../storage/ClickHouseDiagramRepository.java` - JSON storage with SHA-256 content-hash dedup
|
||||
- `cameleer3-server-app/.../storage/ClickHouseMetricsRepository.java` - Batch insert for agent_metrics
|
||||
- `cameleer3-server-app/.../ingestion/ClickHouseFlushScheduler.java` - Scheduled drain + SmartLifecycle shutdown
|
||||
- `cameleer3-server-app/.../config/IngestionBeanConfig.java` - WriteBuffer and IngestionService bean wiring
|
||||
- `cameleer3-server-app/.../controller/ExecutionController.java` - POST /api/v1/data/executions
|
||||
- `cameleer3-server-app/.../controller/DiagramController.java` - POST /api/v1/data/diagrams
|
||||
- `cameleer3-server-app/.../controller/MetricsController.java` - POST /api/v1/data/metrics
|
||||
- `cameleer3-server-app/.../config/IngestionConfig.java` - Removed @Configuration (fix duplicate bean)
|
||||
- `cameleer3-server-app/.../controller/ExecutionControllerIT.java` - 4 tests: single, array, flush, unknown fields
|
||||
- `cameleer3-server-app/.../controller/DiagramControllerIT.java` - 3 tests: single, array, flush
|
||||
- `cameleer3-server-app/.../controller/MetricsControllerIT.java` - 2 tests: POST, flush
|
||||
- `cameleer3-server-app/.../controller/BackpressureIT.java` - 2 tests: 503 response, data not lost
|
||||
|
||||
## Decisions Made
|
||||
- Controllers accept raw String body and detect single vs array JSON (starts with `[`), supporting both payload formats per protocol spec
|
||||
- IngestionService is a plain class in core module (no Spring annotations), wired as a bean by IngestionBeanConfig in app module
|
||||
- Removed `@Configuration` from IngestionConfig to fix duplicate bean conflict with `@EnableConfigurationProperties`
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
### Auto-fixed Issues
|
||||
|
||||
**1. [Rule 1 - Bug] Fixed duplicate IngestionConfig bean**
|
||||
- **Found during:** Task 2 (integration test context startup)
|
||||
- **Issue:** IngestionConfig had both `@Configuration` and `@ConfigurationProperties`, while `@EnableConfigurationProperties(IngestionConfig.class)` on the app class created a second bean, causing "expected single matching bean but found 2"
|
||||
- **Fix:** Removed `@Configuration` from IngestionConfig, relying solely on `@EnableConfigurationProperties`
|
||||
- **Files modified:** cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/IngestionConfig.java
|
||||
- **Verification:** Application context starts successfully, all tests pass
|
||||
- **Committed in:** 8fe65f0
|
||||
|
||||
---
|
||||
|
||||
**Total deviations:** 1 auto-fixed (1 bug)
|
||||
**Impact on plan:** Necessary fix for Spring context startup. No scope creep.
|
||||
|
||||
## Issues Encountered
|
||||
- BackpressureIT initially failed because the scheduled flush drained the buffer before the test could fill it. Fixed by using a 60s flush interval and batch POST to fill buffer atomically.
|
||||
|
||||
## User Setup Required
|
||||
None - no external service configuration required.
|
||||
|
||||
## Next Phase Readiness
|
||||
- All three ingestion endpoints operational and tested
|
||||
- Phase 1 complete: ClickHouse infrastructure, API foundation, and ingestion pipeline all working
|
||||
- Ready for Phase 2 (search) and Phase 3 (agent registry) which both depend only on Phase 1
|
||||
|
||||
## Self-Check: PASSED
|
||||
|
||||
All 13 created files verified present. All 3 task commits verified in git log.
|
||||
|
||||
---
|
||||
*Phase: 01-ingestion-pipeline-api-foundation*
|
||||
*Completed: 2026-03-11*
|
||||
@@ -0,0 +1,202 @@
|
||||
---
|
||||
phase: 01-ingestion-pipeline-api-foundation
|
||||
plan: 03
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on: ["01-01"]
|
||||
files_modified:
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/interceptor/ProtocolVersionInterceptor.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/WebConfig.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/Cameleer3ServerApplication.java
|
||||
- cameleer3-server-app/src/test/resources/application-test.yml
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/AbstractClickHouseIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/interceptor/ProtocolVersionIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/HealthControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/OpenApiIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/ForwardCompatIT.java
|
||||
autonomous: true
|
||||
requirements:
|
||||
- API-01
|
||||
- API-02
|
||||
- API-03
|
||||
- API-04
|
||||
- API-05
|
||||
- INGST-06
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "GET /api/v1/health returns 200 with health status"
|
||||
- "GET /api/v1/swagger-ui returns Swagger UI page"
|
||||
- "GET /api/v1/api-docs returns OpenAPI JSON spec listing all endpoints"
|
||||
- "Requests without X-Cameleer-Protocol-Version header to /api/v1/data/* return 400"
|
||||
- "Requests with wrong protocol version return 400"
|
||||
- "Health and swagger endpoints work without protocol version header"
|
||||
- "Unknown JSON fields in request body do not cause deserialization errors"
|
||||
- "ClickHouse tables have TTL clause for 30-day retention"
|
||||
artifacts:
|
||||
- path: "cameleer3-server-app/src/main/java/com/cameleer3/server/app/interceptor/ProtocolVersionInterceptor.java"
|
||||
provides: "Validates X-Cameleer-Protocol-Version:1 header on data endpoints"
|
||||
min_lines: 20
|
||||
- path: "cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/WebConfig.java"
|
||||
provides: "Registers interceptor with path patterns"
|
||||
min_lines: 10
|
||||
- path: "cameleer3-server-app/src/test/java/com/cameleer3/server/app/AbstractClickHouseIT.java"
|
||||
provides: "Shared Testcontainers base class for integration tests"
|
||||
min_lines: 20
|
||||
key_links:
|
||||
- from: "WebConfig.java"
|
||||
to: "ProtocolVersionInterceptor.java"
|
||||
via: "addInterceptors registration"
|
||||
pattern: "addInterceptors.*ProtocolVersionInterceptor"
|
||||
- from: "application.yml"
|
||||
to: "Actuator health endpoint"
|
||||
via: "management.endpoints config"
|
||||
pattern: "management\\.endpoints"
|
||||
- from: "application.yml"
|
||||
to: "springdoc"
|
||||
via: "springdoc.api-docs.path and swagger-ui.path"
|
||||
pattern: "springdoc"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement the API foundation: test infrastructure, health endpoint, OpenAPI documentation, protocol version header validation, forward compatibility, and TTL verification.
|
||||
|
||||
Purpose: Establishes the Testcontainers base class used by all integration tests across plans. Completes the API scaffolding so all endpoints follow the protocol v1 contract. Health endpoint enables monitoring. OpenAPI enables discoverability. Protocol version interceptor enforces compatibility. TTL verification confirms data retention.
|
||||
Output: AbstractClickHouseIT base class, working health, Swagger UI, protocol header enforcement, and verified TTL retention.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@C:/Users/Hendrik/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@C:/Users/Hendrik/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/01-ingestion-pipeline-api-foundation/01-RESEARCH.md
|
||||
@.planning/phases/01-ingestion-pipeline-api-foundation/01-01-SUMMARY.md
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Test infrastructure, protocol version interceptor, WebConfig, and Spring Boot application class</name>
|
||||
<files>
|
||||
cameleer3-server-app/src/test/resources/application-test.yml,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/AbstractClickHouseIT.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/interceptor/ProtocolVersionInterceptor.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/WebConfig.java,
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/Cameleer3ServerApplication.java
|
||||
</files>
|
||||
<action>
|
||||
1. Create application-test.yml for test profile:
|
||||
- Placeholder datasource config (overridden by Testcontainers in AbstractClickHouseIT)
|
||||
- ingestion: small buffer for tests (capacity=100, batch-size=10, flush-interval-ms=100)
|
||||
|
||||
2. Create AbstractClickHouseIT base class:
|
||||
- @Testcontainers + @Container with ClickHouseContainer("clickhouse/clickhouse-server:25.3")
|
||||
- @DynamicPropertySource to override spring.datasource.url/username/password
|
||||
- @SpringBootTest
|
||||
- @ActiveProfiles("test")
|
||||
- @BeforeAll: read clickhouse/init/01-schema.sql and execute it against the container via JDBC
|
||||
- Expose protected JdbcTemplate for subclasses
|
||||
|
||||
3. Create ProtocolVersionInterceptor implementing HandlerInterceptor:
|
||||
- preHandle: read X-Cameleer-Protocol-Version header
|
||||
- If null or not "1": set response status 400, write JSON error body {"error": "Missing or unsupported X-Cameleer-Protocol-Version header"}, return false
|
||||
- If "1": return true
|
||||
|
||||
4. Create WebConfig implementing WebMvcConfigurer:
|
||||
- @Configuration
|
||||
- Inject ProtocolVersionInterceptor (declare it as @Component or @Bean)
|
||||
- Override addInterceptors: register interceptor with pathPatterns "/api/v1/data/**" and "/api/v1/agents/**"
|
||||
- Explicitly EXCLUDE: "/api/v1/health", "/api/v1/api-docs/**", "/api/v1/swagger-ui/**", "/api/v1/swagger-ui.html"
|
||||
|
||||
5. Create or update Cameleer3ServerApplication:
|
||||
- @SpringBootApplication in package com.cameleer3.server.app
|
||||
- @EnableScheduling (needed for ClickHouseFlushScheduler from Plan 02)
|
||||
- @EnableConfigurationProperties(IngestionConfig.class)
|
||||
- Main method with SpringApplication.run()
|
||||
- Ensure package scanning covers com.cameleer3.server.app and com.cameleer3.server.core
|
||||
</action>
|
||||
<verify>
|
||||
<automated>mvn clean compile -pl cameleer3-server-app -q 2>&1 | tail -5</automated>
|
||||
</verify>
|
||||
<done>AbstractClickHouseIT base class ready for integration tests. ProtocolVersionInterceptor validates header on data/agent paths. Health, swagger, and api-docs paths excluded. Application class enables scheduling and config properties.</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: Health, OpenAPI, protocol version, forward compat, and TTL integration tests</name>
|
||||
<files>
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/HealthControllerIT.java,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/OpenApiIT.java,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/interceptor/ProtocolVersionIT.java,
|
||||
cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/ForwardCompatIT.java
|
||||
</files>
|
||||
<behavior>
|
||||
- GET /api/v1/health returns 200 with JSON containing status field
|
||||
- GET /api/v1/api-docs returns 200 with OpenAPI JSON containing paths
|
||||
- GET /api/v1/swagger-ui/index.html returns 200 (Swagger UI page)
|
||||
- POST /api/v1/data/executions without protocol header returns 400
|
||||
- POST /api/v1/data/executions with X-Cameleer-Protocol-Version:1 does NOT return 400 (may return 202 or other based on body)
|
||||
- POST /api/v1/data/executions with X-Cameleer-Protocol-Version:2 returns 400
|
||||
- GET /api/v1/health without protocol header returns 200 (not intercepted)
|
||||
- POST /api/v1/data/executions with extra unknown JSON fields returns 202 (not 400/422)
|
||||
- ClickHouse route_executions table SHOW CREATE TABLE includes TTL
|
||||
- ClickHouse agent_metrics table SHOW CREATE TABLE includes TTL
|
||||
</behavior>
|
||||
<action>
|
||||
1. Create HealthControllerIT (extends AbstractClickHouseIT):
|
||||
- Test: GET /api/v1/health -> 200 with body containing "status"
|
||||
- Test: No X-Cameleer-Protocol-Version header required for health
|
||||
|
||||
2. Create OpenApiIT (extends AbstractClickHouseIT):
|
||||
- Test: GET /api/v1/api-docs -> 200, body contains "openapi" and "paths"
|
||||
- Test: GET /api/v1/swagger-ui/index.html -> 200 or 302 (redirect to UI)
|
||||
- Test: api-docs lists /api/v1/data/executions path
|
||||
|
||||
3. Create ProtocolVersionIT (extends AbstractClickHouseIT):
|
||||
- Test: POST /api/v1/data/executions without header -> 400 with error message
|
||||
- Test: POST /api/v1/data/executions with header value "2" -> 400
|
||||
- Test: POST /api/v1/data/executions with header value "1" and valid body -> 202 (not 400)
|
||||
- Test: GET /api/v1/health without header -> 200 (excluded from interceptor)
|
||||
- Test: GET /api/v1/api-docs without header -> 200 (excluded from interceptor)
|
||||
|
||||
4. Create ForwardCompatIT (extends AbstractClickHouseIT):
|
||||
- Test: POST /api/v1/data/executions with valid RouteExecution JSON plus extra unknown fields (e.g., "futureField": "value") -> 202 (Jackson does not fail on unknown properties)
|
||||
- This validates API-05 requirement explicitly.
|
||||
|
||||
5. TTL verification (add to HealthControllerIT as dedicated test methods):
|
||||
- Test method: ttlConfiguredOnRouteExecutions
|
||||
Query ClickHouse: SHOW CREATE TABLE route_executions
|
||||
Assert result contains "TTL start_time + toIntervalDay(30)"
|
||||
- Test method: ttlConfiguredOnAgentMetrics
|
||||
Query ClickHouse: SHOW CREATE TABLE agent_metrics
|
||||
Assert result contains TTL clause
|
||||
|
||||
Note: All tests that POST to data endpoints must include X-Cameleer-Protocol-Version:1 header.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>mvn test -pl cameleer3-server-app -Dtest="HealthControllerIT,OpenApiIT,ProtocolVersionIT,ForwardCompatIT" -q 2>&1 | tail -15</automated>
|
||||
</verify>
|
||||
<done>Health returns 200. OpenAPI docs are available and list endpoints. Protocol version header enforced on data paths, not on health/docs. Unknown JSON fields accepted. TTL confirmed in ClickHouse DDL via HealthControllerIT test methods.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
- `mvn test -pl cameleer3-server-app -Dtest="HealthControllerIT,OpenApiIT,ProtocolVersionIT,ForwardCompatIT" -q` -- all tests pass
|
||||
- GET /api/v1/health returns 200
|
||||
- GET /api/v1/api-docs returns OpenAPI spec
|
||||
- Missing protocol header returns 400 on data endpoints
|
||||
- Health works without protocol header
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
All four integration test classes green. AbstractClickHouseIT base class works with Testcontainers. Health endpoint accessible. OpenAPI docs list all ingestion endpoints. Protocol version header enforced on data/agent paths but not health/docs. Forward compatibility confirmed. TTL verified in ClickHouse schema via HealthControllerIT.
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/01-ingestion-pipeline-api-foundation/01-03-SUMMARY.md`
|
||||
</output>
|
||||
@@ -0,0 +1,161 @@
|
||||
---
|
||||
phase: 01-ingestion-pipeline-api-foundation
|
||||
plan: 03
|
||||
subsystem: api
|
||||
tags: [spring-boot, testcontainers, openapi, interceptor, clickhouse, integration-tests]
|
||||
|
||||
requires:
|
||||
- phase: 01-01
|
||||
provides: ClickHouse schema, application.yml, dependencies
|
||||
|
||||
provides:
|
||||
- AbstractClickHouseIT base class for all integration tests
|
||||
- ProtocolVersionInterceptor enforcing X-Cameleer-Protocol-Version:1 on data/agent paths
|
||||
- WebConfig with interceptor registration and path exclusions
|
||||
- Cameleer3ServerApplication with @EnableScheduling and component scanning
|
||||
- 12 passing integration tests (health, OpenAPI, protocol version, forward compat, TTL)
|
||||
|
||||
affects: [01-02, 02-search, 03-agent-registry]
|
||||
|
||||
tech-stack:
|
||||
added: [testcontainers 2.0.3, docker-java 3.7.0]
|
||||
patterns: [shared static Testcontainers ClickHouse container, HandlerInterceptor for protocol validation, WebMvcConfigurer for path-based interceptor registration]
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/Cameleer3ServerApplication.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/interceptor/ProtocolVersionInterceptor.java
|
||||
- cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/WebConfig.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/AbstractClickHouseIT.java
|
||||
- cameleer3-server-app/src/test/resources/application-test.yml
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/HealthControllerIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/OpenApiIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/interceptor/ProtocolVersionIT.java
|
||||
- cameleer3-server-app/src/test/java/com/cameleer3/server/app/controller/ForwardCompatIT.java
|
||||
modified:
|
||||
- cameleer3-server-app/pom.xml
|
||||
- pom.xml
|
||||
- clickhouse/init/01-schema.sql
|
||||
|
||||
key-decisions:
|
||||
- "Upgraded testcontainers from 1.20.5 to 2.0.3 for Docker Desktop 29.x compatibility (docker-java 3.7.0)"
|
||||
- "Removed junit-jupiter dependency; manual container lifecycle via static initializer instead"
|
||||
- "Changed error_message/error_stacktrace from Nullable(String) to String DEFAULT '' for tokenbf_v1 skip index compatibility"
|
||||
- "TTL expressions use toDateTime() cast for DateTime64 columns in ClickHouse 25.3"
|
||||
|
||||
patterns-established:
|
||||
- "AbstractClickHouseIT: static container shared across test classes, @DynamicPropertySource for datasource, @BeforeAll for schema init"
|
||||
- "ProtocolVersionInterceptor: all data/agent endpoints require X-Cameleer-Protocol-Version:1 header"
|
||||
- "Path exclusions: health, api-docs, swagger-ui bypass protocol version check"
|
||||
|
||||
requirements-completed: [API-01, API-02, API-03, API-04, API-05, INGST-06]
|
||||
|
||||
duration: 10min
|
||||
completed: 2026-03-11
|
||||
---
|
||||
|
||||
# Phase 1 Plan 03: API Foundation Summary
|
||||
|
||||
**Protocol version interceptor, health/OpenAPI endpoints, Testcontainers IT base class, and 12 green integration tests covering TTL, forward compat, and interceptor exclusions**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** 10 min
|
||||
- **Started:** 2026-03-11T10:52:21Z
|
||||
- **Completed:** 2026-03-11T11:03:08Z
|
||||
- **Tasks:** 2
|
||||
- **Files modified:** 12
|
||||
|
||||
## Accomplishments
|
||||
- ProtocolVersionInterceptor validates X-Cameleer-Protocol-Version:1 on /api/v1/data/** and /api/v1/agents/** paths, returning 400 JSON error for missing or wrong version
|
||||
- AbstractClickHouseIT base class with Testcontainers ClickHouse 25.3, shared static container, schema init from 01-schema.sql
|
||||
- 12 integration tests: health endpoint (2), OpenAPI docs (2), protocol version enforcement (5), forward compatibility (1), TTL verification (2)
|
||||
- Cameleer3ServerApplication with @EnableScheduling, @EnableConfigurationProperties, and dual package scanning
|
||||
|
||||
## Task Commits
|
||||
|
||||
Each task was committed atomically:
|
||||
|
||||
1. **Task 1: Test infrastructure, protocol version interceptor, WebConfig, and app bootstrap** - `b8a4739` (feat)
|
||||
2. **Task 2: Integration tests for health, OpenAPI, protocol version, forward compat, and TTL** - `2d3fde3` (test)
|
||||
|
||||
## Files Created/Modified
|
||||
- `cameleer3-server-app/src/main/java/.../Cameleer3ServerApplication.java` - Spring Boot entry point with scheduling and config properties
|
||||
- `cameleer3-server-app/src/main/java/.../interceptor/ProtocolVersionInterceptor.java` - Validates protocol version header on data/agent paths
|
||||
- `cameleer3-server-app/src/main/java/.../config/WebConfig.java` - Registers interceptor with path patterns and exclusions
|
||||
- `cameleer3-server-app/src/test/java/.../AbstractClickHouseIT.java` - Shared Testcontainers base class for ITs
|
||||
- `cameleer3-server-app/src/test/resources/application-test.yml` - Test profile with small buffer config
|
||||
- `cameleer3-server-app/src/test/java/.../controller/HealthControllerIT.java` - Health endpoint and TTL tests
|
||||
- `cameleer3-server-app/src/test/java/.../controller/OpenApiIT.java` - OpenAPI and Swagger UI tests
|
||||
- `cameleer3-server-app/src/test/java/.../interceptor/ProtocolVersionIT.java` - Protocol header enforcement tests
|
||||
- `cameleer3-server-app/src/test/java/.../controller/ForwardCompatIT.java` - Unknown JSON fields test
|
||||
- `pom.xml` - Override testcontainers.version to 2.0.3
|
||||
- `cameleer3-server-app/pom.xml` - Remove junit-jupiter, upgrade testcontainers-clickhouse to 2.0.3
|
||||
- `clickhouse/init/01-schema.sql` - Fix TTL expressions and error column types
|
||||
|
||||
## Decisions Made
|
||||
- Upgraded Testcontainers to 2.0.3 because docker-java 3.4.1 (from TC 1.20.5) is incompatible with Docker Desktop 29.x API 1.52
|
||||
- Removed junit-jupiter module (doesn't exist in TC 2.x); manage container lifecycle via static initializer block instead
|
||||
- Changed error_message and error_stacktrace from Nullable(String) to String DEFAULT '' because ClickHouse tokenbf_v1 skip indexes require non-nullable String columns
|
||||
- Added toDateTime() cast in TTL expressions because ClickHouse 25.3 requires DateTime (not DateTime64) in TTL column references
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
### Auto-fixed Issues
|
||||
|
||||
**1. [Rule 3 - Blocking] Fixed testcontainers junit-jupiter dependency not available in 2.0.x**
|
||||
- **Found during:** Task 2 (compilation)
|
||||
- **Issue:** org.testcontainers:junit-jupiter:2.0.2 does not exist in Maven Central
|
||||
- **Fix:** Removed junit-jupiter dependency, upgraded to TC 2.0.3, managed container lifecycle manually via static initializer
|
||||
- **Files modified:** cameleer3-server-app/pom.xml, pom.xml, AbstractClickHouseIT.java
|
||||
- **Verification:** All tests compile and pass
|
||||
- **Committed in:** 2d3fde3
|
||||
|
||||
**2. [Rule 3 - Blocking] Fixed Docker Desktop 29.x incompatibility with Testcontainers**
|
||||
- **Found during:** Task 2 (test execution)
|
||||
- **Issue:** docker-java 3.4.1 (from TC 1.20.5) sends unversioned API calls to Docker API 1.52, receiving 400 errors
|
||||
- **Fix:** Override testcontainers.version to 2.0.3 in parent POM (brings docker-java 3.7.0 with proper API version negotiation)
|
||||
- **Files modified:** pom.xml
|
||||
- **Verification:** ClickHouse container starts successfully
|
||||
- **Committed in:** 2d3fde3
|
||||
|
||||
**3. [Rule 1 - Bug] Fixed ClickHouse TTL expression for DateTime64 columns**
|
||||
- **Found during:** Task 2 (schema init in tests)
|
||||
- **Issue:** ClickHouse 25.3 requires TTL expressions to resolve to DateTime, not DateTime64(3, 'UTC')
|
||||
- **Fix:** Changed `TTL start_time + INTERVAL 30 DAY` to `TTL toDateTime(start_time) + toIntervalDay(30)`
|
||||
- **Files modified:** clickhouse/init/01-schema.sql
|
||||
- **Verification:** Schema creates without errors in ClickHouse 25.3 container
|
||||
- **Committed in:** 2d3fde3
|
||||
|
||||
**4. [Rule 1 - Bug] Fixed tokenbf_v1 index on Nullable column**
|
||||
- **Found during:** Task 2 (schema init in tests)
|
||||
- **Issue:** ClickHouse 25.3 does not allow tokenbf_v1 skip indexes on Nullable(String) columns
|
||||
- **Fix:** Changed error_message and error_stacktrace from Nullable(String) to String DEFAULT ''
|
||||
- **Files modified:** clickhouse/init/01-schema.sql
|
||||
- **Verification:** Schema creates without errors, all 12 tests pass
|
||||
- **Committed in:** 2d3fde3
|
||||
|
||||
---
|
||||
|
||||
**Total deviations:** 4 auto-fixed (2 blocking, 2 bugs)
|
||||
**Impact on plan:** All fixes necessary for test execution on current Docker Desktop and ClickHouse versions. No scope creep.
|
||||
|
||||
## Issues Encountered
|
||||
- Surefire runs from module directory, not project root; fixed schema path lookup in AbstractClickHouseIT to check both locations
|
||||
|
||||
## User Setup Required
|
||||
None - no external service configuration required.
|
||||
|
||||
## Next Phase Readiness
|
||||
- AbstractClickHouseIT base class ready for all future integration tests
|
||||
- Protocol version interceptor active for data/agent endpoints
|
||||
- API foundation complete, ready for Plan 02 (REST controllers, ClickHouse repositories, flush scheduler)
|
||||
- Health endpoint at /api/v1/health, OpenAPI at /api/v1/api-docs, Swagger UI at /api/v1/swagger-ui
|
||||
|
||||
## Self-Check: PASSED
|
||||
|
||||
All 9 created files verified present. Both task commits verified in git log.
|
||||
|
||||
---
|
||||
*Phase: 01-ingestion-pipeline-api-foundation*
|
||||
*Completed: 2026-03-11*
|
||||
@@ -0,0 +1,578 @@
|
||||
# Phase 1: Ingestion Pipeline + API Foundation - Research
|
||||
|
||||
**Researched:** 2026-03-11
|
||||
**Domain:** ClickHouse batch ingestion, Spring Boot REST API, write buffer with backpressure
|
||||
**Confidence:** HIGH
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 1 establishes the data pipeline and API skeleton for Cameleer3 Server. Agents POST execution data, diagrams, and metrics to REST endpoints; the server buffers these in memory and batch-flushes to ClickHouse. The ClickHouse schema design is the most critical and least reversible decision in this phase -- ORDER BY and partitioning cannot be changed without table recreation.
|
||||
|
||||
The ClickHouse Java ecosystem has undergone significant changes. The recommended approach is **clickhouse-jdbc v0.9.7** (JDBC V2 driver) with Spring Boot's JdbcTemplate for batch inserts. An alternative is the standalone **client-v2** artifact which offers a POJO-based insert API, but JDBC integration with Spring Boot is more conventional and better documented. ClickHouse now has a native full-text index (TYPE text, GA as of March 2026) that supersedes the older tokenbf_v1 bloom filter approach -- this is relevant for Phase 2 but should be accounted for in schema design now.
|
||||
|
||||
**Primary recommendation:** Use clickhouse-jdbc 0.9.7 with Spring JdbcTemplate, ArrayBlockingQueue write buffer with scheduled batch flush, daily partitioning with TTL + ttl_only_drop_parts, and Docker Compose for local ClickHouse. Keep Spring Security out of Phase 1 -- all endpoints open, security layered in Phase 4.
|
||||
|
||||
<phase_requirements>
|
||||
## Phase Requirements
|
||||
|
||||
| ID | Description | Research Support |
|
||||
|----|-------------|-----------------|
|
||||
| INGST-01 (#1) | Accept RouteExecution via POST /api/v1/data/executions, return 202 | REST controller + async write buffer pattern; Jackson deserialization of cameleer3-common models |
|
||||
| INGST-02 (#2) | Accept RouteGraph via POST /api/v1/data/diagrams, return 202 | Same pattern; separate ClickHouse table for diagrams with content-hash dedup |
|
||||
| INGST-03 (#3) | Accept metrics via POST /api/v1/data/metrics, return 202 | Same pattern; separate ClickHouse table for metrics |
|
||||
| INGST-04 (#4) | In-memory batch buffer with configurable flush interval/size | ArrayBlockingQueue + @Scheduled flush; configurable via application.yml |
|
||||
| INGST-05 (#5) | Return 503 when write buffer full (backpressure) | queue.offer() returns false when full -> controller returns 503 + Retry-After header |
|
||||
| INGST-06 (#6) | ClickHouse TTL expires data after 30 days (configurable) | Daily partitioning + TTL + ttl_only_drop_parts=1; configurable interval |
|
||||
| API-01 (#28) | All endpoints under /api/v1/ path | Spring @RequestMapping("/api/v1") base path |
|
||||
| API-02 (#29) | OpenAPI/Swagger via springdoc-openapi | springdoc-openapi-starter-webmvc-ui 2.8.6 |
|
||||
| API-03 (#30) | GET /api/v1/health endpoint | Spring Boot Actuator or custom health controller |
|
||||
| API-04 (#31) | Validate X-Cameleer-Protocol-Version: 1 header | Spring HandlerInterceptor or servlet filter |
|
||||
| API-05 (#32) | Accept unknown JSON fields (forward compat) | Spring Boot default: FAIL_ON_UNKNOWN_PROPERTIES=false (already the default) |
|
||||
</phase_requirements>
|
||||
|
||||
## Standard Stack
|
||||
|
||||
### Core (Phase 1 specific)
|
||||
|
||||
| Library | Version | Purpose | Why Standard |
|
||||
|---------|---------|---------|--------------|
|
||||
| clickhouse-jdbc | 0.9.7 (classifier: all) | ClickHouse JDBC V2 driver | Latest stable; V2 rewrite with improved type handling, batch support; works with Spring JdbcTemplate |
|
||||
| Spring Boot Starter Web | 3.4.3 (parent) | REST controllers, Jackson | Already in POM |
|
||||
| Spring Boot Starter Actuator | 3.4.3 (parent) | Health endpoint, metrics | Standard for health checks |
|
||||
| springdoc-openapi-starter-webmvc-ui | 2.8.6 | OpenAPI 3.1 + Swagger UI | Latest stable for Spring Boot 3.4; generates from annotations |
|
||||
| Testcontainers (clickhouse) | 2.0.2 | Integration tests with real ClickHouse | Spins up ClickHouse in Docker for tests |
|
||||
| Testcontainers (junit-jupiter) | 2.0.2 | JUnit 5 integration | Lifecycle management for test containers |
|
||||
| HikariCP | (Spring Boot managed) | JDBC connection pool | Default Spring Boot pool; works with ClickHouse JDBC |
|
||||
|
||||
### Supporting
|
||||
|
||||
| Library | Version | Purpose | When to Use |
|
||||
|---------|---------|---------|-------------|
|
||||
| Jackson JavaTimeModule | (Spring Boot managed) | Instant/Duration serialization | Already noted in project; needed for all timestamp fields |
|
||||
| Micrometer | (Spring Boot managed) | Buffer depth metrics, ingestion rate | Expose queue.size() and flush latency as metrics |
|
||||
| Awaitility | (Spring Boot managed) | Async test assertions | Testing batch flush timing in integration tests |
|
||||
|
||||
### Alternatives Considered
|
||||
|
||||
| Instead of | Could Use | Tradeoff |
|
||||
|------------|-----------|----------|
|
||||
| clickhouse-jdbc 0.9.7 | client-v2 0.9.7 (standalone) | client-v2 has POJO insert API but no JdbcTemplate/Spring integration; JDBC is more conventional |
|
||||
| ArrayBlockingQueue | LMAX Disruptor | Disruptor is faster under extreme contention but adds complexity; ABQ is sufficient for this throughput |
|
||||
| Spring JdbcTemplate | Raw JDBC PreparedStatement | JdbcTemplate provides cleaner error handling and resource management; no meaningful overhead |
|
||||
|
||||
**Installation (add to cameleer3-server-app/pom.xml):**
|
||||
```xml
|
||||
<!-- ClickHouse JDBC V2 -->
|
||||
<dependency>
|
||||
<groupId>com.clickhouse</groupId>
|
||||
<artifactId>clickhouse-jdbc</artifactId>
|
||||
<version>0.9.7</version>
|
||||
<classifier>all</classifier>
|
||||
</dependency>
|
||||
|
||||
<!-- API Documentation -->
|
||||
<dependency>
|
||||
<groupId>org.springdoc</groupId>
|
||||
<artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
|
||||
<version>2.8.6</version>
|
||||
</dependency>
|
||||
|
||||
<!-- Actuator for health endpoint -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-actuator</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Testing -->
|
||||
<dependency>
|
||||
<groupId>org.testcontainers</groupId>
|
||||
<artifactId>testcontainers-clickhouse</artifactId>
|
||||
<version>2.0.2</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.testcontainers</groupId>
|
||||
<artifactId>junit-jupiter</artifactId>
|
||||
<version>2.0.2</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.awaitility</groupId>
|
||||
<artifactId>awaitility</artifactId>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
**Add to cameleer3-server-core/pom.xml:**
|
||||
```xml
|
||||
<!-- SLF4J for logging (no Spring dependency) -->
|
||||
<dependency>
|
||||
<groupId>org.slf4j</groupId>
|
||||
<artifactId>slf4j-api</artifactId>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Recommended Project Structure
|
||||
|
||||
```
|
||||
cameleer3-server-core/src/main/java/com/cameleer3/server/core/
|
||||
ingestion/
|
||||
WriteBuffer.java # Bounded queue + flush logic
|
||||
IngestionService.java # Accepts data, routes to buffer
|
||||
storage/
|
||||
ExecutionRepository.java # Interface: batch insert + query
|
||||
DiagramRepository.java # Interface: store/retrieve diagrams
|
||||
MetricsRepository.java # Interface: store metrics
|
||||
model/
|
||||
(extend/complement cameleer3-common models as needed)
|
||||
|
||||
cameleer3-server-app/src/main/java/com/cameleer3/server/app/
|
||||
config/
|
||||
ClickHouseConfig.java # DataSource + JdbcTemplate bean
|
||||
IngestionConfig.java # Buffer size, flush interval from YAML
|
||||
WebConfig.java # Protocol version interceptor
|
||||
controller/
|
||||
ExecutionController.java # POST /api/v1/data/executions
|
||||
DiagramController.java # POST /api/v1/data/diagrams
|
||||
MetricsController.java # POST /api/v1/data/metrics
|
||||
HealthController.java # GET /api/v1/health (or use Actuator)
|
||||
storage/
|
||||
ClickHouseExecutionRepository.java
|
||||
ClickHouseDiagramRepository.java
|
||||
ClickHouseMetricsRepository.java
|
||||
interceptor/
|
||||
ProtocolVersionInterceptor.java
|
||||
```
|
||||
|
||||
### Pattern 1: Bounded Write Buffer with Scheduled Flush
|
||||
|
||||
**What:** ArrayBlockingQueue between HTTP endpoint and ClickHouse. Scheduled task drains and batch-inserts.
|
||||
**When to use:** Always for ClickHouse ingestion.
|
||||
|
||||
```java
|
||||
// In core module -- no Spring dependency
|
||||
public class WriteBuffer<T> {
|
||||
private final BlockingQueue<T> queue;
|
||||
private final int capacity;
|
||||
|
||||
public WriteBuffer(int capacity) {
|
||||
this.capacity = capacity;
|
||||
this.queue = new ArrayBlockingQueue<>(capacity);
|
||||
}
|
||||
|
||||
/** Returns false when buffer is full (caller should return 503) */
|
||||
public boolean offer(T item) {
|
||||
return queue.offer(item);
|
||||
}
|
||||
|
||||
public boolean offerBatch(List<T> items) {
|
||||
// Try to add all; if any fails, none were lost (already in list)
|
||||
for (T item : items) {
|
||||
if (!queue.offer(item)) return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/** Drain up to maxBatch items. Called by scheduled flush. */
|
||||
public List<T> drain(int maxBatch) {
|
||||
List<T> batch = new ArrayList<>(maxBatch);
|
||||
queue.drainTo(batch, maxBatch);
|
||||
return batch;
|
||||
}
|
||||
|
||||
public int size() { return queue.size(); }
|
||||
public int capacity() { return capacity; }
|
||||
public boolean isFull() { return queue.remainingCapacity() == 0; }
|
||||
}
|
||||
```
|
||||
|
||||
```java
|
||||
// In app module -- Spring wiring
|
||||
@Component
|
||||
public class ClickHouseFlushScheduler {
|
||||
private final WriteBuffer<RouteExecution> executionBuffer;
|
||||
private final ExecutionRepository repository;
|
||||
|
||||
@Scheduled(fixedDelayString = "${ingestion.flush-interval-ms:1000}")
|
||||
public void flushExecutions() {
|
||||
List<RouteExecution> batch = executionBuffer.drain(
|
||||
ingestionConfig.getBatchSize()); // default 5000
|
||||
if (!batch.isEmpty()) {
|
||||
repository.insertBatch(batch);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Controller Returns 202 or 503
|
||||
|
||||
**What:** Ingestion endpoints accept data asynchronously. Return 202 on success, 503 when buffer full.
|
||||
**When to use:** All ingestion POST endpoints.
|
||||
|
||||
```java
|
||||
@RestController
|
||||
@RequestMapping("/api/v1/data")
|
||||
public class ExecutionController {
|
||||
|
||||
@PostMapping("/executions")
|
||||
public ResponseEntity<Void> ingestExecutions(
|
||||
@RequestBody List<RouteExecution> executions) {
|
||||
if (!ingestionService.accept(executions)) {
|
||||
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
|
||||
.header("Retry-After", "5")
|
||||
.build();
|
||||
}
|
||||
return ResponseEntity.accepted().build();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: ClickHouse Batch Insert via JdbcTemplate
|
||||
|
||||
**What:** Use JdbcTemplate.batchUpdate with PreparedStatement for efficient ClickHouse inserts.
|
||||
|
||||
```java
|
||||
@Repository
|
||||
public class ClickHouseExecutionRepository implements ExecutionRepository {
|
||||
|
||||
private final JdbcTemplate jdbc;
|
||||
|
||||
@Override
|
||||
public void insertBatch(List<RouteExecution> executions) {
|
||||
String sql = "INSERT INTO route_executions (execution_id, route_id, "
|
||||
+ "agent_id, status, start_time, end_time, duration_ms, "
|
||||
+ "correlation_id, error_message) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)";
|
||||
|
||||
jdbc.batchUpdate(sql, new BatchPreparedStatementSetter() {
|
||||
@Override
|
||||
public void setValues(PreparedStatement ps, int i) throws SQLException {
|
||||
RouteExecution e = executions.get(i);
|
||||
ps.setString(1, e.getExecutionId());
|
||||
ps.setString(2, e.getRouteId());
|
||||
ps.setString(3, e.getAgentId());
|
||||
ps.setString(4, e.getStatus().name());
|
||||
ps.setObject(5, e.getStartTime()); // Instant -> DateTime64
|
||||
ps.setObject(6, e.getEndTime());
|
||||
ps.setLong(7, e.getDurationMs());
|
||||
ps.setString(8, e.getCorrelationId());
|
||||
ps.setString(9, e.getErrorMessage());
|
||||
}
|
||||
@Override
|
||||
public int getBatchSize() { return executions.size(); }
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 4: Protocol Version Interceptor
|
||||
|
||||
**What:** Validate X-Cameleer-Protocol-Version header on all /api/v1/ requests.
|
||||
|
||||
```java
|
||||
public class ProtocolVersionInterceptor implements HandlerInterceptor {
|
||||
@Override
|
||||
public boolean preHandle(HttpServletRequest request,
|
||||
HttpServletResponse response, Object handler) throws Exception {
|
||||
String version = request.getHeader("X-Cameleer-Protocol-Version");
|
||||
if (version == null || !"1".equals(version)) {
|
||||
response.setStatus(HttpStatus.BAD_REQUEST.value());
|
||||
response.getWriter().write(
|
||||
"{\"error\":\"Missing or unsupported X-Cameleer-Protocol-Version header\"}");
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note: Health and OpenAPI endpoints should be excluded from this interceptor.
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
|
||||
- **Individual row inserts to ClickHouse:** Each insert creates a data part. At 50+ agents, you get "too many parts" errors within hours. Always batch.
|
||||
- **Unbounded write buffer:** Without a capacity limit, agent reconnection storms cause OOM. ArrayBlockingQueue with fixed capacity is mandatory.
|
||||
- **Synchronous ClickHouse writes in controller:** Blocks HTTP threads during ClickHouse inserts. Always decouple via buffer.
|
||||
- **Using JPA/Hibernate with ClickHouse:** ClickHouse is not relational. JPA adds friction with zero benefit. Use JdbcTemplate directly.
|
||||
- **Bare DateTime in ClickHouse (no timezone):** Defaults to server timezone. Always use DateTime64(3, 'UTC').
|
||||
|
||||
## Don't Hand-Roll
|
||||
|
||||
| Problem | Don't Build | Use Instead | Why |
|
||||
|---------|-------------|-------------|-----|
|
||||
| JDBC connection pooling | Custom connection management | HikariCP (Spring Boot default) | Handles timeouts, leak detection, sizing |
|
||||
| OpenAPI documentation | Manual JSON/YAML spec | springdoc-openapi | Generates from code; stays in sync automatically |
|
||||
| Health endpoint | Custom /health servlet | Spring Boot Actuator | Standard format, integrates with Docker healthchecks |
|
||||
| JSON serialization config | Custom ObjectMapper setup | Spring Boot auto-config + application.yml | Spring Boot already configures Jackson correctly |
|
||||
| Test database lifecycle | Manual Docker commands | Testcontainers | Automatic container lifecycle per test class |
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: Wrong ClickHouse ORDER BY Design
|
||||
**What goes wrong:** Choosing ORDER BY (execution_id) makes time-range queries scan entire partitions.
|
||||
**Why it happens:** Instinct from relational DB where primary key = UUID.
|
||||
**How to avoid:** ORDER BY must match dominant query pattern. For this project: `ORDER BY (agent_id, status, start_time, execution_id)` puts the most-filtered columns first. execution_id last because it's high-cardinality.
|
||||
**Warning signs:** EXPLAIN shows rows_read >> result set size.
|
||||
|
||||
### Pitfall 2: ClickHouse TTL Fragmenting Partitions
|
||||
**What goes wrong:** Row-level TTL rewrites data parts, causing merge pressure.
|
||||
**Why it happens:** Default TTL behavior deletes individual rows.
|
||||
**How to avoid:** Use daily partitioning (`PARTITION BY toYYYYMMDD(start_time)`) combined with `SETTINGS ttl_only_drop_parts = 1`. This drops entire parts instead of rewriting. Alternatively, use a scheduled job with `ALTER TABLE DROP PARTITION` for partitions older than 30 days.
|
||||
**Warning signs:** Continuous high merge activity, elevated CPU during TTL cleanup.
|
||||
|
||||
### Pitfall 3: Data Loss on Server Restart
|
||||
**What goes wrong:** In-memory buffer loses unflushed data on SIGTERM or crash.
|
||||
**Why it happens:** Default Spring Boot shutdown does not drain custom queues.
|
||||
**How to avoid:** Implement `SmartLifecycle` with ordered shutdown: flush buffer before stopping. Accept that crash (not graceful shutdown) may lose up to flush-interval-ms of data -- this is acceptable for observability.
|
||||
**Warning signs:** Missing transactions around deployment timestamps.
|
||||
|
||||
### Pitfall 4: DateTime Timezone Mismatch
|
||||
**What goes wrong:** Agents send UTC Instants, ClickHouse stores in server-local timezone, queries return wrong time ranges.
|
||||
**Why it happens:** ClickHouse DateTime defaults to server timezone if not specified.
|
||||
**How to avoid:** Always use `DateTime64(3, 'UTC')` in schema. Ensure Jackson serializes Instants as ISO-8601 with Z suffix. Add `server_received_at` timestamp for clock skew detection.
|
||||
|
||||
### Pitfall 5: springdoc Not Scanning Controllers
|
||||
**What goes wrong:** OpenAPI spec is empty; Swagger UI shows no endpoints.
|
||||
**Why it happens:** springdoc defaults to scanning the main application package. If controllers are in a different package hierarchy, they are missed.
|
||||
**How to avoid:** Ensure `@SpringBootApplication` is in a parent package of all controllers, or configure `springdoc.packagesToScan` in application.yml.
|
||||
|
||||
## Code Examples
|
||||
|
||||
### ClickHouse Schema: Route Executions Table
|
||||
|
||||
```sql
|
||||
-- Source: ClickHouse MergeTree docs + project requirements
|
||||
CREATE TABLE route_executions (
|
||||
execution_id String,
|
||||
route_id LowCardinality(String),
|
||||
agent_id LowCardinality(String),
|
||||
status LowCardinality(String), -- COMPLETED, FAILED, RUNNING
|
||||
start_time DateTime64(3, 'UTC'),
|
||||
end_time Nullable(DateTime64(3, 'UTC')),
|
||||
duration_ms UInt64,
|
||||
correlation_id String,
|
||||
exchange_id String,
|
||||
error_message Nullable(String),
|
||||
error_stacktrace Nullable(String),
|
||||
-- Nested processor executions stored as arrays (ClickHouse nested pattern)
|
||||
processor_ids Array(String),
|
||||
processor_types Array(LowCardinality(String)),
|
||||
processor_starts Array(DateTime64(3, 'UTC')),
|
||||
processor_ends Array(DateTime64(3, 'UTC')),
|
||||
processor_durations Array(UInt64),
|
||||
processor_statuses Array(LowCardinality(String)),
|
||||
-- Metadata
|
||||
server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC'),
|
||||
-- Skip index for future full-text search (Phase 2)
|
||||
INDEX idx_correlation correlation_id TYPE bloom_filter GRANULARITY 4,
|
||||
INDEX idx_error error_message TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 4
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
PARTITION BY toYYYYMMDD(start_time)
|
||||
ORDER BY (agent_id, status, start_time, execution_id)
|
||||
TTL start_time + INTERVAL 30 DAY
|
||||
SETTINGS ttl_only_drop_parts = 1;
|
||||
```
|
||||
|
||||
### ClickHouse Schema: Route Diagrams Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE route_diagrams (
|
||||
content_hash String, -- SHA-256 of definition
|
||||
route_id LowCardinality(String),
|
||||
agent_id LowCardinality(String),
|
||||
definition String, -- JSON graph definition
|
||||
created_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC'),
|
||||
-- No TTL -- diagrams are small and versioned
|
||||
)
|
||||
ENGINE = ReplacingMergeTree(created_at)
|
||||
ORDER BY (content_hash);
|
||||
```
|
||||
|
||||
### ClickHouse Schema: Metrics Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE agent_metrics (
|
||||
agent_id LowCardinality(String),
|
||||
collected_at DateTime64(3, 'UTC'),
|
||||
metric_name LowCardinality(String),
|
||||
metric_value Float64,
|
||||
tags Map(String, String),
|
||||
server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC')
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
PARTITION BY toYYYYMMDD(collected_at)
|
||||
ORDER BY (agent_id, metric_name, collected_at)
|
||||
TTL collected_at + INTERVAL 30 DAY
|
||||
SETTINGS ttl_only_drop_parts = 1;
|
||||
```
|
||||
|
||||
### Docker Compose: Local ClickHouse
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml (development)
|
||||
services:
|
||||
clickhouse:
|
||||
image: clickhouse/clickhouse-server:25.3
|
||||
ports:
|
||||
- "8123:8123" # HTTP interface
|
||||
- "9000:9000" # Native protocol
|
||||
volumes:
|
||||
- clickhouse-data:/var/lib/clickhouse
|
||||
- ./clickhouse/init:/docker-entrypoint-initdb.d
|
||||
environment:
|
||||
CLICKHOUSE_USER: cameleer
|
||||
CLICKHOUSE_PASSWORD: cameleer_dev
|
||||
CLICKHOUSE_DB: cameleer3
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 262144
|
||||
hard: 262144
|
||||
|
||||
volumes:
|
||||
clickhouse-data:
|
||||
```
|
||||
|
||||
### application.yml Configuration
|
||||
|
||||
```yaml
|
||||
server:
|
||||
port: 8081
|
||||
|
||||
spring:
|
||||
datasource:
|
||||
url: jdbc:ch://localhost:8123/cameleer3
|
||||
username: cameleer
|
||||
password: cameleer_dev
|
||||
driver-class-name: com.clickhouse.jdbc.ClickHouseDriver
|
||||
jackson:
|
||||
serialization:
|
||||
write-dates-as-timestamps: false
|
||||
deserialization:
|
||||
fail-on-unknown-properties: false # API-05: forward compat (also Spring Boot default)
|
||||
|
||||
ingestion:
|
||||
buffer-capacity: 50000
|
||||
batch-size: 5000
|
||||
flush-interval-ms: 1000
|
||||
|
||||
clickhouse:
|
||||
ttl-days: 30
|
||||
|
||||
springdoc:
|
||||
api-docs:
|
||||
path: /api/v1/api-docs
|
||||
swagger-ui:
|
||||
path: /api/v1/swagger-ui
|
||||
|
||||
management:
|
||||
endpoints:
|
||||
web:
|
||||
base-path: /api/v1
|
||||
exposure:
|
||||
include: health
|
||||
endpoint:
|
||||
health:
|
||||
show-details: always
|
||||
```
|
||||
|
||||
## State of the Art
|
||||
|
||||
| Old Approach | Current Approach | When Changed | Impact |
|
||||
|--------------|------------------|--------------|--------|
|
||||
| clickhouse-http-client 0.6.x | clickhouse-jdbc 0.9.7 (V2) | 2025 | V1 client deprecated; V2 has proper type mapping, batch support |
|
||||
| tokenbf_v1 bloom filter index | TYPE text() full-text index | March 2026 (GA) | Native full-text search in ClickHouse; may eliminate need for OpenSearch in Phase 2 |
|
||||
| springdoc-openapi 2.3.x | springdoc-openapi 2.8.6 | 2025 | Latest for Spring Boot 3.4; v3.x is for Spring Boot 4 only |
|
||||
| Testcontainers 1.19.x | Testcontainers 2.0.2 | 2025 | Major version bump; new artifact names (testcontainers-clickhouse) |
|
||||
|
||||
**Deprecated/outdated:**
|
||||
- `clickhouse-http-client` artifact: replaced by `clickhouse-jdbc` with JDBC V2
|
||||
- `tokenbf_v1` / `ngrambf_v1` skip indexes: deprecated in favor of TYPE text() index (though still functional)
|
||||
- Testcontainers artifact `org.testcontainers:clickhouse`: replaced by `org.testcontainers:testcontainers-clickhouse`
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Exact cameleer3-common model structure**
|
||||
- What we know: Models include RouteExecution, ProcessorExecution, ExchangeSnapshot, RouteGraph, RouteNode, RouteEdge
|
||||
- What's unclear: Exact field names, types, nesting structure -- needed to design ClickHouse schema precisely
|
||||
- Recommendation: Read cameleer3-common source code before implementing schema. Schema must match the wire format.
|
||||
|
||||
2. **ClickHouse JDBC V2 + HikariCP compatibility**
|
||||
- What we know: clickhouse-jdbc 0.9.7 implements JDBC spec; HikariCP is Spring Boot default
|
||||
- What's unclear: Whether HikariCP validation queries work correctly with ClickHouse JDBC V2
|
||||
- Recommendation: Test in integration test; may need `spring.datasource.hikari.connection-test-query=SELECT 1`
|
||||
|
||||
3. **Nested data: arrays vs separate table for ProcessorExecutions**
|
||||
- What we know: ClickHouse supports Array columns and Nested type
|
||||
- What's unclear: Whether flattening processor executions into arrays in the execution row is better than a separate table with JOIN
|
||||
- Recommendation: Arrays are faster for co-located reads (no JOIN) but harder to query individually. Start with arrays; add a materialized view if individual processor queries are needed in Phase 2.
|
||||
|
||||
## Validation Architecture
|
||||
|
||||
### Test Framework
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Framework | JUnit 5 (Spring Boot managed) + Testcontainers 2.0.2 |
|
||||
| Config file | cameleer3-server-app/src/test/resources/application-test.yml (Wave 0) |
|
||||
| Quick run command | `mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q` |
|
||||
| Full suite command | `mvn verify` |
|
||||
|
||||
### Phase Requirements -> Test Map
|
||||
|
||||
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||||
|--------|----------|-----------|-------------------|-------------|
|
||||
| INGST-01 | POST /api/v1/data/executions returns 202, data in ClickHouse | integration | `mvn test -pl cameleer3-server-app -Dtest=ExecutionControllerIT -q` | Wave 0 |
|
||||
| INGST-02 | POST /api/v1/data/diagrams returns 202 | integration | `mvn test -pl cameleer3-server-app -Dtest=DiagramControllerIT -q` | Wave 0 |
|
||||
| INGST-03 | POST /api/v1/data/metrics returns 202 | integration | `mvn test -pl cameleer3-server-app -Dtest=MetricsControllerIT -q` | Wave 0 |
|
||||
| INGST-04 | Buffer flushes at interval/size | unit | `mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q` | Wave 0 |
|
||||
| INGST-05 | 503 when buffer full | unit+integration | `mvn test -pl cameleer3-server-app -Dtest=BackpressureIT -q` | Wave 0 |
|
||||
| INGST-06 | TTL removes old data | integration | `mvn test -pl cameleer3-server-app -Dtest=ClickHouseTtlIT -q` | Wave 0 |
|
||||
| API-01 | Endpoints under /api/v1/ | integration | Covered by controller ITs | Wave 0 |
|
||||
| API-02 | OpenAPI docs available | integration | `mvn test -pl cameleer3-server-app -Dtest=OpenApiIT -q` | Wave 0 |
|
||||
| API-03 | GET /api/v1/health responds | integration | `mvn test -pl cameleer3-server-app -Dtest=HealthControllerIT -q` | Wave 0 |
|
||||
| API-04 | Protocol version header validated | integration | `mvn test -pl cameleer3-server-app -Dtest=ProtocolVersionIT -q` | Wave 0 |
|
||||
| API-05 | Unknown JSON fields accepted | unit | `mvn test -pl cameleer3-server-app -Dtest=ForwardCompatIT -q` | Wave 0 |
|
||||
|
||||
### Sampling Rate
|
||||
- **Per task commit:** `mvn test -pl cameleer3-server-core -q` (unit tests, fast)
|
||||
- **Per wave merge:** `mvn verify` (full suite with Testcontainers integration tests)
|
||||
- **Phase gate:** Full suite green before verification
|
||||
|
||||
### Wave 0 Gaps
|
||||
- [ ] `cameleer3-server-app/src/test/resources/application-test.yml` -- test ClickHouse config
|
||||
- [ ] `cameleer3-server-core/src/test/java/.../WriteBufferTest.java` -- buffer unit tests
|
||||
- [ ] `cameleer3-server-app/src/test/java/.../AbstractClickHouseIT.java` -- shared Testcontainers base class
|
||||
- [ ] `cameleer3-server-app/src/test/java/.../ExecutionControllerIT.java` -- ingestion integration test
|
||||
- [ ] Docker available on test machine for Testcontainers
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary (HIGH confidence)
|
||||
- [ClickHouse Java Client releases](https://github.com/ClickHouse/clickhouse-java/releases) -- confirmed v0.9.7 as latest (March 2026)
|
||||
- [ClickHouse JDBC V2 docs](https://clickhouse.com/docs/integrations/language-clients/java/jdbc) -- JDBC driver API, batch insert patterns
|
||||
- [ClickHouse Java Client V2 docs](https://clickhouse.com/docs/en/integrations/java/client-v2) -- standalone client API, POJO insert
|
||||
- [ClickHouse full-text search blog](https://clickhouse.com/blog/clickhouse-full-text-search) -- TYPE text() index GA March 2026
|
||||
- [ClickHouse MergeTree settings](https://clickhouse.com/docs/operations/settings/merge-tree-settings) -- ttl_only_drop_parts
|
||||
- [Testcontainers ClickHouse module](https://java.testcontainers.org/modules/databases/clickhouse/) -- v2.0.2, dependency coordinates
|
||||
- [springdoc-openapi releases](https://github.com/springdoc/springdoc-openapi/releases) -- v2.8.x for Spring Boot 3.4
|
||||
|
||||
### Secondary (MEDIUM confidence)
|
||||
- [Spring Boot Jackson default config](https://github.com/spring-projects/spring-boot/issues/12684) -- FAIL_ON_UNKNOWN_PROPERTIES=false is default
|
||||
- [ClickHouse Docker Compose docs](https://clickhouse.com/docs/use-cases/observability/clickstack/deployment/docker-compose) -- container setup
|
||||
- [Baeldung ClickHouse + Spring Boot](https://www.baeldung.com/spring-boot-olap-clickhouse-database) -- integration patterns
|
||||
|
||||
### Tertiary (LOW confidence)
|
||||
- ClickHouse ORDER BY optimization -- based on training data knowledge of MergeTree internals; should validate with EXPLAIN on real data
|
||||
|
||||
## Metadata
|
||||
|
||||
**Confidence breakdown:**
|
||||
- Standard stack: HIGH -- versions verified against live sources (GitHub releases, Maven Central)
|
||||
- Architecture: HIGH -- write buffer + batch flush is established ClickHouse pattern used by SigNoz, Uptrace
|
||||
- ClickHouse schema: MEDIUM -- ORDER BY design is sound but should be validated with realistic query patterns
|
||||
- Pitfalls: HIGH -- well-documented ClickHouse failure modes, confirmed by multiple sources
|
||||
|
||||
**Research date:** 2026-03-11
|
||||
**Valid until:** 2026-04-11 (30 days -- stack is stable)
|
||||
@@ -0,0 +1,86 @@
|
||||
---
|
||||
phase: 1
|
||||
slug: ingestion-pipeline-api-foundation
|
||||
status: draft
|
||||
nyquist_compliant: false
|
||||
wave_0_complete: false
|
||||
created: 2026-03-11
|
||||
---
|
||||
|
||||
# Phase 1 — Validation Strategy
|
||||
|
||||
> Per-phase validation contract for feedback sampling during execution.
|
||||
|
||||
---
|
||||
|
||||
## Test Infrastructure
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Framework** | JUnit 5 (Spring Boot managed) + Testcontainers 2.0.2 |
|
||||
| **Config file** | cameleer3-server-app/src/test/resources/application-test.yml (Wave 0) |
|
||||
| **Quick run command** | `mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q` |
|
||||
| **Full suite command** | `mvn verify` |
|
||||
| **Estimated runtime** | ~30 seconds |
|
||||
|
||||
---
|
||||
|
||||
## Sampling Rate
|
||||
|
||||
- **After every task commit:** Run `mvn test -pl cameleer3-server-core -q`
|
||||
- **After every plan wave:** Run `mvn verify`
|
||||
- **Before `/gsd:verify-work`:** Full suite must be green
|
||||
- **Max feedback latency:** 30 seconds
|
||||
|
||||
---
|
||||
|
||||
## Per-Task Verification Map
|
||||
|
||||
| Task ID | Plan | Wave | Requirement | Test Type | Automated Command | File Exists | Status |
|
||||
|---------|------|------|-------------|-----------|-------------------|-------------|--------|
|
||||
| 1-01-01 | 01 | 1 | INGST-04 | unit | `mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q` | no W0 | pending |
|
||||
| 1-01-02 | 01 | 1 | INGST-01 | integration | `mvn test -pl cameleer3-server-app -Dtest=ExecutionControllerIT -q` | no W0 | pending |
|
||||
| 1-01-03 | 01 | 1 | INGST-05 | integration | `mvn test -pl cameleer3-server-app -Dtest=BackpressureIT -q` | no W0 | pending |
|
||||
| 1-01-04 | 01 | 1 | INGST-06 | integration | `mvn test -pl cameleer3-server-app -Dtest=HealthControllerIT#ttlConfigured* -q` | no W0 | pending |
|
||||
| 1-02-01 | 02 | 1 | INGST-01 | integration | `mvn test -pl cameleer3-server-app -Dtest=ExecutionControllerIT -q` | no W0 | pending |
|
||||
| 1-02-02 | 02 | 1 | INGST-02 | integration | `mvn test -pl cameleer3-server-app -Dtest=DiagramControllerIT -q` | no W0 | pending |
|
||||
| 1-02-03 | 02 | 1 | INGST-03 | integration | `mvn test -pl cameleer3-server-app -Dtest=MetricsControllerIT -q` | no W0 | pending |
|
||||
| 1-02-04 | 02 | 1 | API-02 | integration | `mvn test -pl cameleer3-server-app -Dtest=OpenApiIT -q` | no W0 | pending |
|
||||
| 1-02-05 | 02 | 1 | API-03 | integration | `mvn test -pl cameleer3-server-app -Dtest=HealthControllerIT -q` | no W0 | pending |
|
||||
| 1-02-06 | 02 | 1 | API-04 | integration | `mvn test -pl cameleer3-server-app -Dtest=ProtocolVersionIT -q` | no W0 | pending |
|
||||
| 1-02-07 | 02 | 1 | API-05 | unit | `mvn test -pl cameleer3-server-app -Dtest=ForwardCompatIT -q` | no W0 | pending |
|
||||
|
||||
*Status: pending / green / red / flaky*
|
||||
|
||||
---
|
||||
|
||||
## Wave 0 Requirements
|
||||
|
||||
- [ ] `cameleer3-server-app/src/test/resources/application-test.yml` — test ClickHouse config
|
||||
- [ ] `cameleer3-server-core/src/test/java/.../WriteBufferTest.java` — buffer unit tests
|
||||
- [ ] `cameleer3-server-app/src/test/java/.../AbstractClickHouseIT.java` — shared Testcontainers base class
|
||||
- [ ] `cameleer3-server-app/src/test/java/.../ExecutionControllerIT.java` — ingestion integration test
|
||||
- [ ] Docker available on test machine for Testcontainers
|
||||
|
||||
*If none: "Existing infrastructure covers all phase requirements."*
|
||||
|
||||
---
|
||||
|
||||
## Manual-Only Verifications
|
||||
|
||||
| Behavior | Requirement | Why Manual | Test Instructions |
|
||||
|----------|-------------|------------|-------------------|
|
||||
| ClickHouse TTL removes data after 30 days | INGST-06 | Cannot fast-forward time in ClickHouse | Verify TTL clause in CREATE TABLE DDL; automated test in HealthControllerIT asserts DDL contains TTL |
|
||||
|
||||
---
|
||||
|
||||
## Validation Sign-Off
|
||||
|
||||
- [ ] All tasks have `<automated>` verify or Wave 0 dependencies
|
||||
- [ ] Sampling continuity: no 3 consecutive tasks without automated verify
|
||||
- [ ] Wave 0 covers all MISSING references
|
||||
- [ ] No watch-mode flags
|
||||
- [ ] Feedback latency < 30s
|
||||
- [ ] `nyquist_compliant: true` set in frontmatter
|
||||
|
||||
**Approval:** pending
|
||||
@@ -0,0 +1,153 @@
|
||||
---
|
||||
phase: 01-ingestion-pipeline-api-foundation
|
||||
verified: 2026-03-11T12:00:00Z
|
||||
status: passed
|
||||
score: 5/5 must-haves verified
|
||||
re_verification: false
|
||||
---
|
||||
|
||||
# Phase 1: Ingestion Pipeline + API Foundation Verification Report
|
||||
|
||||
**Phase Goal:** Agents can POST execution data, diagrams, and metrics to the server, which batch-writes them to ClickHouse with TTL retention and backpressure protection
|
||||
**Verified:** 2026-03-11
|
||||
**Status:** PASSED
|
||||
**Re-verification:** No — initial verification
|
||||
|
||||
## Goal Achievement
|
||||
|
||||
### Observable Truths (from ROADMAP.md Success Criteria)
|
||||
|
||||
| # | Truth | Status | Evidence |
|
||||
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------|------------------------------------------------------------------------------------------------------------------------|
|
||||
| 1 | An HTTP client can POST a RouteExecution payload to `/api/v1/data/executions` and receive 202 Accepted, and the data appears in ClickHouse within the flush interval | VERIFIED | `ExecutionController` returns 202; `ExecutionControllerIT.postExecution_dataAppearsInClickHouseAfterFlush` uses Awaitility to confirm row in `route_executions` |
|
||||
| 2 | An HTTP client can POST RouteGraph and metrics payloads to their respective endpoints and receive 202 Accepted | VERIFIED | `DiagramController` and `MetricsController` both return 202; `DiagramControllerIT.postDiagram_dataAppearsInClickHouseAfterFlush` and `MetricsControllerIT.postMetrics_dataAppearsInClickHouseAfterFlush` confirm ClickHouse persistence |
|
||||
| 3 | When the write buffer is full, the server returns 503 and does not lose already-buffered data | VERIFIED | `BackpressureIT.whenBufferFull_returns503WithRetryAfter` confirms 503 + `Retry-After` header; `bufferedDataNotLost_afterBackpressure` confirms buffered items remain in buffer (diagram flush-to-ClickHouse path separately covered by DiagramControllerIT) |
|
||||
| 4 | Data older than the configured TTL (default 30 days) is automatically removed by ClickHouse | VERIFIED | `HealthControllerIT.ttlConfiguredOnRouteExecutions` and `ttlConfiguredOnAgentMetrics` query `SHOW CREATE TABLE` and assert `TTL` + `toIntervalDay(30)` present in schema |
|
||||
| 5 | The health endpoint responds at `/api/v1/health`, OpenAPI docs are available, protocol version header is validated, and unknown JSON fields are accepted | VERIFIED | `HealthControllerIT` confirms 200; `OpenApiIT` confirms OpenAPI JSON + Swagger UI accessible; `ProtocolVersionIT` confirms 400 without header, 400 on wrong version, passes on version "1"; `ForwardCompatIT` confirms unknown fields do not cause 400/422 |
|
||||
|
||||
**Score:** 5/5 truths verified
|
||||
|
||||
---
|
||||
|
||||
### Required Artifacts
|
||||
|
||||
#### Plan 01-01 Artifacts
|
||||
|
||||
| Artifact | Expected | Status | Details |
|
||||
|---|---|---|---|
|
||||
| `cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/WriteBuffer.java` | Generic bounded write buffer with offer/drain/isFull | VERIFIED | 80 lines; `ArrayBlockingQueue`-backed; implements `offer`, `offerBatch` (all-or-nothing), `drain`, `isFull`, `size`, `capacity`, `remainingCapacity` |
|
||||
| `clickhouse/init/01-schema.sql` | ClickHouse DDL for all three tables | VERIFIED | Contains `CREATE TABLE route_executions`, `route_diagrams`, `agent_metrics`; correct ENGINE, ORDER BY, PARTITION BY, TTL with `toDateTime()` cast |
|
||||
| `docker-compose.yml` | Local ClickHouse service | VERIFIED | `clickhouse/clickhouse-server:25.3`; ports 8123/9000; init volume mount; credentials configured |
|
||||
| `cameleer3-server-core/src/main/java/com/cameleer3/server/core/storage/ExecutionRepository.java` | Repository interface for execution batch inserts | VERIFIED | Declares `void insertBatch(List<RouteExecution> executions)` |
|
||||
|
||||
#### Plan 01-02 Artifacts
|
||||
|
||||
| Artifact | Expected | Status | Details |
|
||||
|---|---|---|---|
|
||||
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/ExecutionController.java` | POST /api/v1/data/executions endpoint | VERIFIED | 79 lines; `@PostMapping("/executions")`; handles single/array via raw String parsing; returns 202 or 503 + Retry-After |
|
||||
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseExecutionRepository.java` | Batch insert to route_executions via JdbcTemplate | VERIFIED | 118 lines; `@Repository`; `BatchPreparedStatementSetter`; flattens processor tree to parallel arrays |
|
||||
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/ingestion/ClickHouseFlushScheduler.java` | Scheduled drain of WriteBuffer into ClickHouse | VERIFIED | 160 lines; `@Scheduled(fixedDelayString="${ingestion.flush-interval-ms:1000}")`; implements `SmartLifecycle` for shutdown drain |
|
||||
| `cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/IngestionService.java` | Routes data to appropriate WriteBuffer instances | VERIFIED | 115 lines; plain class; `acceptExecution`, `acceptExecutions`, `acceptDiagram`, `acceptDiagrams`, `acceptMetrics`; delegates to typed `WriteBuffer` instances |
|
||||
|
||||
#### Plan 01-03 Artifacts
|
||||
|
||||
| Artifact | Expected | Status | Details |
|
||||
|---|---|---|---|
|
||||
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/interceptor/ProtocolVersionInterceptor.java` | Validates X-Cameleer-Protocol-Version:1 header on data endpoints | VERIFIED | 47 lines; implements `HandlerInterceptor.preHandle`; returns 400 JSON on missing/wrong version |
|
||||
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/WebConfig.java` | Registers interceptor with path patterns | VERIFIED | 35 lines; `addInterceptors` registers interceptor on `/api/v1/data/**` and `/api/v1/agents/**`; excludes health, api-docs, swagger-ui |
|
||||
| `cameleer3-server-app/src/test/java/com/cameleer3/server/app/AbstractClickHouseIT.java` | Shared Testcontainers base class for integration tests | VERIFIED | 73 lines; static `ClickHouseContainer`; `@DynamicPropertySource`; `@BeforeAll` schema init from SQL file; `JdbcTemplate` exposed to subclasses |
|
||||
|
||||
---
|
||||
|
||||
### Key Link Verification
|
||||
|
||||
#### Plan 01-01 Key Links
|
||||
|
||||
| From | To | Via | Status | Details |
|
||||
|---|---|---|---|---|
|
||||
| `ClickHouseConfig.java` | `application.yml` | `spring.datasource` properties | VERIFIED | `application.yml` defines `spring.datasource.url`, `username`, `password`, `driver-class-name`; `ClickHouseConfig` creates `JdbcTemplate(dataSource)` relying on Spring Boot auto-config |
|
||||
| `IngestionConfig.java` | `application.yml` | `ingestion.*` properties | VERIFIED | `application.yml` defines `ingestion.buffer-capacity`, `batch-size`, `flush-interval-ms`; `IngestionConfig` is `@ConfigurationProperties(prefix="ingestion")` |
|
||||
|
||||
#### Plan 01-02 Key Links
|
||||
|
||||
| From | To | Via | Status | Details |
|
||||
|---|---|---|---|---|
|
||||
| `ExecutionController.java` | `IngestionService.java` | Constructor injection | VERIFIED | `ExecutionController(IngestionService ingestionService, ...)` — `IngestionService` injected and called on every POST |
|
||||
| `IngestionService.java` | `WriteBuffer.java` | offer/offerBatch calls | VERIFIED | `executionBuffer.offerBatch(executions)` and `executionBuffer.offer(execution)` in `acceptExecutions`/`acceptExecution` |
|
||||
| `ClickHouseFlushScheduler.java` | `WriteBuffer.java` | drain call on scheduled interval | VERIFIED | `executionBuffer.drain(batchSize)` inside `flushExecutions()` called by `@Scheduled flushAll()` |
|
||||
| `ClickHouseFlushScheduler.java` | `ClickHouseExecutionRepository.java` | insertBatch call | VERIFIED | `executionRepository.insertBatch(batch)` in `flushExecutions()` |
|
||||
| `ClickHouseFlushScheduler.java` | `ClickHouseDiagramRepository.java` | store call after drain | VERIFIED | `diagramRepository.store(graph)` for each item drained in `flushDiagrams()` |
|
||||
| `ClickHouseFlushScheduler.java` | `ClickHouseMetricsRepository.java` | insertBatch call after drain | VERIFIED | `metricsRepository.insertBatch(batch)` in `flushMetrics()` |
|
||||
|
||||
#### Plan 01-03 Key Links
|
||||
|
||||
| From | To | Via | Status | Details |
|
||||
|---|---|---|---|---|
|
||||
| `WebConfig.java` | `ProtocolVersionInterceptor.java` | addInterceptors registration | VERIFIED | `registry.addInterceptor(protocolVersionInterceptor).addPathPatterns(...)` |
|
||||
| `application.yml` | Actuator health endpoint | `management.endpoints` config | VERIFIED | `management.endpoints.web.base-path: /api/v1` and `exposure.include: health` |
|
||||
| `application.yml` | springdoc | `springdoc.api-docs.path` and `swagger-ui.path` | VERIFIED | `springdoc.api-docs.path: /api/v1/api-docs` and `springdoc.swagger-ui.path: /api/v1/swagger-ui` |
|
||||
|
||||
---
|
||||
|
||||
### Requirements Coverage
|
||||
|
||||
| Requirement | Source Plan | Description | Status | Evidence |
|
||||
|---|---|---|---|---|
|
||||
| INGST-01 (#1) | 01-02 | POST /api/v1/data/executions returns 202 | SATISFIED | `ExecutionController`, `ExecutionControllerIT.postSingleExecution_returns202` and `postArrayOfExecutions_returns202` |
|
||||
| INGST-02 (#2) | 01-02 | POST /api/v1/data/diagrams returns 202 | SATISFIED | `DiagramController`, `DiagramControllerIT.postSingleDiagram_returns202` and `postArrayOfDiagrams_returns202` |
|
||||
| INGST-03 (#3) | 01-02 | POST /api/v1/data/metrics returns 202 | SATISFIED | `MetricsController`, `MetricsControllerIT.postMetrics_returns202` |
|
||||
| INGST-04 (#4) | 01-01 | In-memory batch buffer with configurable flush interval/size | SATISFIED | `WriteBuffer` with `ArrayBlockingQueue`; `IngestionConfig` with `buffer-capacity`, `batch-size`, `flush-interval-ms`; `ClickHouseFlushScheduler` drains on interval |
|
||||
| INGST-05 (#5) | 01-01, 01-02 | 503 when write buffer is full | SATISFIED | `ExecutionController` checks `!accepted` and returns `503` + `Retry-After: 5`; `BackpressureIT.whenBufferFull_returns503WithRetryAfter` |
|
||||
| INGST-06 (#6) | 01-01, 01-03 | ClickHouse TTL expires data after 30 days | SATISFIED | `01-schema.sql` TTL clauses `toDateTime(start_time) + toIntervalDay(30)` on `route_executions` and `agent_metrics`; `HealthControllerIT.ttlConfiguredOnRouteExecutions` and `ttlConfiguredOnAgentMetrics` |
|
||||
| API-01 (#28) | 01-03 | All endpoints follow /api/v1/... path structure | SATISFIED | All controllers use `@RequestMapping("/api/v1/data")`; actuator at `/api/v1`; springdoc at `/api/v1/api-docs` |
|
||||
| API-02 (#29) | 01-03 | API documented via OpenAPI/Swagger | SATISFIED | `springdoc-openapi 2.8.6` in pom; `@Operation`/`@Tag` annotations on controllers; `OpenApiIT.apiDocsReturnsOpenApiSpec` |
|
||||
| API-03 (#30) | 01-03 | GET /api/v1/health endpoint | SATISFIED | Spring Boot Actuator health at `/api/v1/health`; `HealthControllerIT.healthEndpointReturns200WithStatus` |
|
||||
| API-04 (#31) | 01-03 | X-Cameleer-Protocol-Version:1 header validated | SATISFIED | `ProtocolVersionInterceptor` returns 400 on missing/wrong version; `ProtocolVersionIT` with 5 test cases |
|
||||
| API-05 (#32) | 01-03 | Unknown JSON fields accepted | SATISFIED | `spring.jackson.deserialization.fail-on-unknown-properties: false` in `application.yml`; `ForwardCompatIT.unknownFieldsInRequestBodyDoNotCauseError` |
|
||||
|
||||
**All 11 phase-1 requirements: SATISFIED**
|
||||
|
||||
No orphaned requirements — all 11 IDs declared in plan frontmatter match the REQUIREMENTS.md Phase 1 assignment.
|
||||
|
||||
---
|
||||
|
||||
### Anti-Patterns Found
|
||||
|
||||
No anti-patterns detected. Scanned all source files in `cameleer3-server-app/src/main` and `cameleer3-server-core/src/main` for TODO/FIXME/PLACEHOLDER/stub return patterns. None found.
|
||||
|
||||
One minor observation (not a blocker):
|
||||
|
||||
| File | Observation | Severity | Impact |
|
||||
|---|---|---|---|
|
||||
| `BackpressureIT.java:79-103` | `bufferedDataNotLost_afterBackpressure` asserts `getDiagramBufferDepth() >= 3` rather than querying ClickHouse after a flush. Verifies data stays in buffer, not that it ultimately persists. | Info | Not a blocker — the scheduler flush path for diagrams is fully verified by `DiagramControllerIT.postDiagram_dataAppearsInClickHouseAfterFlush`. The test correctly guards against the buffer accepting data but discarding it before flush. |
|
||||
|
||||
Also notable (by design): `ClickHouseExecutionRepository` sets `agent_id = ""` for all inserts (line 59), since the HTTP controller does not extract an agent ID from headers. This is an intentional gap left for Phase 3 (agent registry) and does not block Phase 1 goal achievement.
|
||||
|
||||
---
|
||||
|
||||
### Human Verification Required
|
||||
|
||||
None. All phase-1 success criteria are verifiable programmatically. Integration tests with Testcontainers cover the full stack including ClickHouse.
|
||||
|
||||
One item that would benefit from a quick runtime smoke test if the team desires confidence beyond the test suite:
|
||||
|
||||
**Optional smoke test:** Run `docker compose up -d`, then POST to `/api/v1/data/executions` with curl, wait 2 seconds, query ClickHouse directly to confirm the row arrived. This is already covered by `ExecutionControllerIT` against Testcontainers but can be done end-to-end against Docker Compose if desired.
|
||||
|
||||
---
|
||||
|
||||
### Gaps Summary
|
||||
|
||||
No gaps. All phase goal truths are verified, all required artifacts exist and are substantively implemented, all key wiring links are confirmed, and all 11 requirements are satisfied. The phase delivers on its stated goal:
|
||||
|
||||
> Agents can POST execution data, diagrams, and metrics to the server, which batch-writes them to ClickHouse with TTL retention and backpressure protection.
|
||||
|
||||
Specific confirmations:
|
||||
- **Batch buffering and flush:** `WriteBuffer` (ArrayBlockingQueue) decouples HTTP from ClickHouse; `ClickHouseFlushScheduler` drains at configurable interval with graceful shutdown drain via `SmartLifecycle`
|
||||
- **Backpressure:** `WriteBuffer.offer/offerBatch` returning false causes controllers to return `503 Service Unavailable` with `Retry-After: 5` header
|
||||
- **TTL retention:** ClickHouse DDL includes `TTL toDateTime(start_time) + toIntervalDay(30)` on `route_executions` and `TTL toDateTime(collected_at) + toIntervalDay(30)` on `agent_metrics`, verified by integration test querying `SHOW CREATE TABLE`
|
||||
- **API foundation:** Health at `/api/v1/health`, OpenAPI at `/api/v1/api-docs`, protocol version header enforced on data/agent paths, unknown JSON fields accepted
|
||||
|
||||
---
|
||||
|
||||
*Verified: 2026-03-11*
|
||||
*Verifier: Claude (gsd-verifier)*
|
||||
593
.planning/research/ARCHITECTURE.md
Normal file
593
.planning/research/ARCHITECTURE.md
Normal file
@@ -0,0 +1,593 @@
|
||||
# Architecture Patterns
|
||||
|
||||
**Domain:** Transaction monitoring / observability server for Apache Camel route executions
|
||||
**Researched:** 2026-03-11
|
||||
**Confidence:** MEDIUM (based on established observability architecture patterns; no live web verification available)
|
||||
|
||||
## Recommended Architecture
|
||||
|
||||
### High-Level Overview
|
||||
|
||||
The system follows a **write-heavy, read-occasional** observability pattern with three distinct data paths:
|
||||
|
||||
```
|
||||
Agents (50+) Users / UI
|
||||
| |
|
||||
v v
|
||||
[Ingestion Pipeline] [Query Engine]
|
||||
| |
|
||||
v |
|
||||
[Write Buffer / Batcher] |
|
||||
| |
|
||||
v v
|
||||
[ClickHouse] <----- reads ----------+
|
||||
[Text Index] <----- full-text ------+
|
||||
^
|
||||
|
|
||||
[Diagram Store] (versioned)
|
||||
|
||||
[SSE Channel Manager] --push--> Agents
|
||||
```
|
||||
|
||||
### Component Boundaries
|
||||
|
||||
| Component | Module | Responsibility | Communicates With |
|
||||
|-----------|--------|---------------|-------------------|
|
||||
| **Ingestion Controller** | app | HTTP POST endpoint, request validation, deserialization | Write Buffer |
|
||||
| **Write Buffer** | core | In-memory batching, backpressure signaling | ClickHouse Writer, Text Indexer |
|
||||
| **ClickHouse Writer** | core | Batch INSERT into ClickHouse, retry logic | ClickHouse |
|
||||
| **Text Indexer** | core | Extract searchable text, write to text index | Text index (ClickHouse or external) |
|
||||
| **Transaction Service** | core | Domain logic: transactions, activities, correlations | Storage interfaces |
|
||||
| **Query Engine** | core | Combines structured + full-text queries, pagination | ClickHouse, Text index |
|
||||
| **Agent Registry** | core | Track agent instances, lifecycle (LIVE/STALE/DEAD), heartbeat | SSE Channel Manager |
|
||||
| **SSE Channel Manager** | core (interface) + app (impl) | Manage SSE connections, push config/commands | Agent Registry |
|
||||
| **Diagram Service** | core | Version diagrams, link to transactions, trigger rendering | Diagram Store |
|
||||
| **Diagram Renderer** | core | Server-side rendering of route definitions to visual output | Diagram Service |
|
||||
| **Auth Service** | core | JWT validation, Ed25519 signing, bootstrap token flow | All controllers |
|
||||
| **REST Controllers** | app | HTTP endpoints for transactions, agents, diagrams, config | All core services |
|
||||
| **SSE Controller** | app | SSE endpoint, connection lifecycle | SSE Channel Manager |
|
||||
| **Config Controller** | app | Config CRUD, push triggers | SSE Channel Manager, Config store |
|
||||
|
||||
### Data Flow
|
||||
|
||||
#### 1. Transaction Ingestion (Hot Path)
|
||||
|
||||
```
|
||||
Agent POST /api/v1/ingest
|
||||
|
|
||||
v
|
||||
[IngestController] -- validates JWT, deserializes using cameleer3-common models
|
||||
|
|
||||
v
|
||||
[IngestionService.accept(batch)] -- accepts TransactionData/ActivityData
|
||||
|
|
||||
v
|
||||
[WriteBuffer] -- in-memory queue (bounded, per-partition)
|
||||
| signals backpressure via HTTP 429 when full
|
||||
|
|
||||
+---(flush trigger: size threshold OR time interval)---+
|
||||
| |
|
||||
v v
|
||||
[ClickHouseWriter.insertBatch()] [TextIndexer.indexBatch()]
|
||||
| |
|
||||
v v
|
||||
ClickHouse (MergeTree tables) ClickHouse full-text index
|
||||
(or separate text index)
|
||||
```
|
||||
|
||||
#### 2. Transaction Query (Read Path)
|
||||
|
||||
```
|
||||
UI GET /api/v1/transactions?state=ERROR&from=...&to=...&q=free+text
|
||||
|
|
||||
v
|
||||
[TransactionController] -- validates, builds query criteria
|
||||
|
|
||||
v
|
||||
[QueryEngine.search(criteria)] -- combines structured filters + full-text
|
||||
|
|
||||
+--- structured filters --> ClickHouse WHERE clauses
|
||||
+--- full-text query -----> text index lookup (returns transaction IDs)
|
||||
+--- merge results -------> intersect, sort, paginate
|
||||
|
|
||||
v
|
||||
[Page<TransactionSummary>] -- paginated response with cursor
|
||||
```
|
||||
|
||||
#### 3. Agent SSE Communication
|
||||
|
||||
```
|
||||
Agent GET /api/v1/agents/{id}/events (SSE)
|
||||
|
|
||||
v
|
||||
[SseController] -- authenticates, registers SseEmitter
|
||||
|
|
||||
v
|
||||
[SseChannelManager.register(agentId, emitter)]
|
||||
|
|
||||
v
|
||||
[AgentRegistry.markLive(agentId)]
|
||||
|
||||
--- Later, when config changes ---
|
||||
|
||||
[ConfigController.update(config)]
|
||||
|
|
||||
v
|
||||
[SseChannelManager.broadcast(configEvent)]
|
||||
|
|
||||
v
|
||||
Each registered SseEmitter sends event to connected agent
|
||||
```
|
||||
|
||||
#### 4. Diagram Versioning
|
||||
|
||||
```
|
||||
Agent POST /api/v1/diagrams (on startup or route change)
|
||||
|
|
||||
v
|
||||
[DiagramController] -- receives route definition (XML/YAML/JSON from cameleer3-common)
|
||||
|
|
||||
v
|
||||
[DiagramService.storeVersion(definition)]
|
||||
|
|
||||
+--- compute content hash
|
||||
+--- if hash differs from latest: store new version with timestamp
|
||||
+--- if identical: skip (idempotent)
|
||||
|
|
||||
v
|
||||
[DiagramStore] -- versioned storage (content-addressable)
|
||||
|
||||
--- On transaction query ---
|
||||
|
||||
[TransactionService] -- looks up diagram version active at transaction timestamp
|
||||
|
|
||||
v
|
||||
[DiagramService.getVersionAt(routeId, instant)]
|
||||
|
|
||||
v
|
||||
[DiagramRenderer.render(definition)] -- produces SVG/PNG for display
|
||||
```
|
||||
|
||||
## Patterns to Follow
|
||||
|
||||
### Pattern 1: Bounded Write Buffer with Backpressure
|
||||
|
||||
**What:** In-memory queue between ingestion endpoint and storage writes. Bounded size. When full, return HTTP 429 to agents so they back off and retry.
|
||||
|
||||
**When:** Always -- this is the critical buffer between high-throughput ingestion and batch-oriented database writes.
|
||||
|
||||
**Why:** ClickHouse performs best with large batch inserts (thousands of rows). Individual inserts per HTTP request would destroy write performance. The buffer decouples ingestion rate from write rate.
|
||||
|
||||
**Example:**
|
||||
```java
|
||||
public class WriteBuffer<T> {
|
||||
private final BlockingQueue<T> queue;
|
||||
private final int batchSize;
|
||||
private final Duration maxFlushInterval;
|
||||
private final Consumer<List<T>> flushAction;
|
||||
|
||||
public boolean offer(T item) {
|
||||
// Returns false when queue is full -> caller returns 429
|
||||
return queue.offer(item);
|
||||
}
|
||||
|
||||
// Scheduled flush: drains up to batchSize items
|
||||
@Scheduled(fixedDelayString = "${ingestion.flush-interval-ms:1000}")
|
||||
void flush() {
|
||||
List<T> batch = new ArrayList<>(batchSize);
|
||||
queue.drainTo(batch, batchSize);
|
||||
if (!batch.isEmpty()) {
|
||||
flushAction.accept(batch);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Implementation detail:** Use `ArrayBlockingQueue` with a capacity that matches your memory budget. At ~2KB per transaction record and 10,000 capacity, that is ~20MB -- well within bounds.
|
||||
|
||||
### Pattern 2: Repository Abstraction over ClickHouse
|
||||
|
||||
**What:** Define storage interfaces in core module, implement with ClickHouse JDBC in app module. Core never imports ClickHouse driver directly.
|
||||
|
||||
**When:** Always -- this is the key module boundary principle.
|
||||
|
||||
**Why:** Keeps core testable without a database. Allows swapping storage in tests (in-memory) and theoretically in production. More importantly, it enforces that domain logic does not leak storage concerns.
|
||||
|
||||
**Example:**
|
||||
```java
|
||||
// In core module
|
||||
public interface TransactionRepository {
|
||||
void insertBatch(List<Transaction> transactions);
|
||||
Page<TransactionSummary> search(TransactionQuery query, PageRequest page);
|
||||
Optional<Transaction> findById(String transactionId);
|
||||
}
|
||||
|
||||
// In app module
|
||||
@Repository
|
||||
public class ClickHouseTransactionRepository implements TransactionRepository {
|
||||
private final JdbcTemplate jdbc;
|
||||
// ClickHouse-specific SQL, batch inserts, etc.
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: SseEmitter Registry with Heartbeat
|
||||
|
||||
**What:** Maintain a concurrent map of agent ID to SseEmitter. Send periodic heartbeat events. Remove on timeout, error, or completion.
|
||||
|
||||
**When:** For all SSE connections.
|
||||
|
||||
**Why:** SSE connections are long-lived. Without heartbeat, you cannot distinguish between a healthy idle connection and a silently dropped one. The registry is the source of truth for which agents are reachable.
|
||||
|
||||
**Example:**
|
||||
```java
|
||||
public class SseChannelManager {
|
||||
private final ConcurrentHashMap<String, SseEmitter> emitters = new ConcurrentHashMap<>();
|
||||
|
||||
public SseEmitter register(String agentId) {
|
||||
SseEmitter emitter = new SseEmitter(Long.MAX_VALUE); // no framework timeout
|
||||
emitter.onCompletion(() -> remove(agentId));
|
||||
emitter.onTimeout(() -> remove(agentId));
|
||||
emitter.onError(e -> remove(agentId));
|
||||
emitters.put(agentId, emitter);
|
||||
return emitter;
|
||||
}
|
||||
|
||||
@Scheduled(fixedDelay = 15_000)
|
||||
void heartbeat() {
|
||||
emitters.forEach((id, emitter) -> {
|
||||
try {
|
||||
emitter.send(SseEmitter.event().name("heartbeat").data(""));
|
||||
} catch (IOException e) {
|
||||
remove(id);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
public void send(String agentId, String eventName, Object data) {
|
||||
SseEmitter emitter = emitters.get(agentId);
|
||||
if (emitter != null) {
|
||||
emitter.send(SseEmitter.event().name(eventName).data(data));
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 4: Content-Addressable Diagram Versioning
|
||||
|
||||
**What:** Hash diagram definitions. Store each unique definition once. Link transactions to the definition hash + a version timestamp.
|
||||
|
||||
**When:** For diagram storage.
|
||||
|
||||
**Why:** Many transactions reference the same diagram version. Content-addressing deduplicates storage. A separate version table maps (routeId, timestamp) to content hash, enabling "what diagram was active at time T?" queries.
|
||||
|
||||
**Schema sketch:**
|
||||
```sql
|
||||
-- Diagram definitions (content-addressable)
|
||||
CREATE TABLE diagram_definitions (
|
||||
content_hash String, -- SHA-256 of definition
|
||||
route_id String,
|
||||
definition String, -- raw XML/YAML/JSON
|
||||
rendered_svg String, -- pre-rendered SVG (nullable, filled async)
|
||||
created_at DateTime64(3)
|
||||
) ENGINE = MergeTree()
|
||||
ORDER BY (content_hash);
|
||||
|
||||
-- Version history (which definition was active when)
|
||||
CREATE TABLE diagram_versions (
|
||||
route_id String,
|
||||
active_from DateTime64(3),
|
||||
content_hash String
|
||||
) ENGINE = MergeTree()
|
||||
ORDER BY (route_id, active_from);
|
||||
```
|
||||
|
||||
### Pattern 5: Cursor-Based Pagination for Time-Series Data
|
||||
|
||||
**What:** Use cursor-based pagination (keyset pagination) instead of OFFSET/LIMIT for transaction listing.
|
||||
|
||||
**When:** For all list/search endpoints returning time-ordered transaction data.
|
||||
|
||||
**Why:** OFFSET-based pagination degrades as offset grows -- ClickHouse must scan and skip rows. Cursor-based pagination using `(timestamp, id) > (last_seen_timestamp, last_seen_id)` gives constant-time page fetches regardless of how deep you paginate.
|
||||
|
||||
**Example:**
|
||||
```java
|
||||
public record PageCursor(Instant timestamp, String id) {}
|
||||
|
||||
// Query: WHERE (timestamp, id) < (:cursorTs, :cursorId) ORDER BY timestamp DESC, id DESC LIMIT :size
|
||||
```
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### Anti-Pattern 1: Individual Row Inserts to ClickHouse
|
||||
|
||||
**What:** Inserting one transaction per HTTP request directly to ClickHouse.
|
||||
|
||||
**Why bad:** ClickHouse is designed for bulk inserts. Individual inserts create excessive parts in MergeTree tables, causing merge pressure and degraded read performance. At 50+ agents posting concurrently, this would quickly become a bottleneck.
|
||||
|
||||
**Instead:** Buffer in memory, flush in batches of 1,000-10,000 rows per insert.
|
||||
|
||||
### Anti-Pattern 2: Storing Rendered Diagrams in ClickHouse BLOBs
|
||||
|
||||
**What:** Putting SVG/PNG binary data directly in the main ClickHouse tables alongside transaction data.
|
||||
|
||||
**Why bad:** ClickHouse is columnar and optimized for analytical queries. Large binary data in columns degrades compression ratios and query performance for all queries touching that table.
|
||||
|
||||
**Instead:** Store rendered output in filesystem or object storage. Store only the content hash reference in ClickHouse. Or use a separate ClickHouse table with the rendered content that is rarely queried alongside transaction data.
|
||||
|
||||
### Anti-Pattern 3: Blocking SSE Writes on the Request Thread
|
||||
|
||||
**What:** Sending SSE events synchronously from the thread handling a config update request.
|
||||
|
||||
**Why bad:** If an agent's connection is slow or dead, the config update request blocks. With 50+ agents, this creates cascading latency.
|
||||
|
||||
**Instead:** Send SSE events asynchronously. Use a thread pool or virtual threads (Java 21+) to handle SSE writes. Return success to the config updater immediately, handle delivery failures in the background.
|
||||
|
||||
### Anti-Pattern 4: Fat Core Module with Spring Dependencies
|
||||
|
||||
**What:** Adding Spring annotations (@Service, @Repository, @Autowired) throughout the core module.
|
||||
|
||||
**Why bad:** Couples domain logic to Spring. Makes unit testing harder. Violates the purpose of the core/app split.
|
||||
|
||||
**Instead:** Core module defines plain Java interfaces and classes. App module wires them with Spring. Core can use `@Scheduled` or similar only if Spring is already a dependency; otherwise, keep scheduling in app.
|
||||
|
||||
### Anti-Pattern 5: Unbounded SSE Emitter Timeouts
|
||||
|
||||
**What:** Setting SseEmitter timeout to 0 or Long.MAX_VALUE without any heartbeat or cleanup.
|
||||
|
||||
**Why bad:** Dead connections accumulate. Memory leaks. Agent registry shows agents as LIVE when they are actually gone.
|
||||
|
||||
**Instead:** Use heartbeat (Pattern 3). Track last successful send. Transition agents to STALE after N missed heartbeats, DEAD after M.
|
||||
|
||||
## Module Boundary Design
|
||||
|
||||
### Core Module (`cameleer3-server-core`)
|
||||
|
||||
The core module is the domain layer. It contains:
|
||||
|
||||
- **Domain models** -- Transaction, Activity, Agent, DiagramVersion, etc. (may extend or complement cameleer3-common models)
|
||||
- **Service interfaces and implementations** -- TransactionService, AgentRegistryService, DiagramService, QueryEngine
|
||||
- **Repository interfaces** -- TransactionRepository, DiagramRepository, AgentRepository (interfaces only, no implementations)
|
||||
- **Ingestion logic** -- WriteBuffer, batch assembly, backpressure signaling
|
||||
- **Text indexing abstraction** -- TextIndexer interface
|
||||
- **Event/notification abstractions** -- SseChannelManager interface (not the Spring SseEmitter impl)
|
||||
- **Security abstractions** -- JwtValidator interface, Ed25519Signer/Verifier
|
||||
- **Query model** -- TransactionQuery, PageCursor, search criteria builders
|
||||
|
||||
**No Spring Boot dependencies.** Jackson is acceptable (already present). JUnit for tests.
|
||||
|
||||
### App Module (`cameleer3-server-app`)
|
||||
|
||||
The app module is the infrastructure/adapter layer. It contains:
|
||||
|
||||
- **Spring Boot application class**
|
||||
- **REST controllers** -- IngestController, TransactionController, AgentController, DiagramController, ConfigController, SseController
|
||||
- **Repository implementations** -- ClickHouseTransactionRepository, etc.
|
||||
- **SSE implementation** -- SpringSseChannelManager using SseEmitter
|
||||
- **Security filters** -- JWT filter, bootstrap token filter
|
||||
- **Configuration** -- application.yml, ClickHouse connection config, scheduler config
|
||||
- **Diagram rendering implementation** -- if using an external library for SVG generation
|
||||
- **Static resources** -- UI assets (later phase)
|
||||
|
||||
**Depends on core.** Wires everything together with Spring configuration.
|
||||
|
||||
### Boundary Rule
|
||||
|
||||
```
|
||||
app --> core (allowed)
|
||||
core --> app (NEVER)
|
||||
core --> cameleer3-common (allowed)
|
||||
app --> cameleer3-common (transitively via core)
|
||||
```
|
||||
|
||||
## Ingestion Pipeline Detail
|
||||
|
||||
### Buffering Strategy
|
||||
|
||||
Use a two-stage approach:
|
||||
|
||||
1. **Accept stage** -- IngestController deserializes, validates, places into WriteBuffer. Returns 202 Accepted (or 429 if buffer full).
|
||||
2. **Flush stage** -- Scheduled task drains buffer into batches. Each batch goes to ClickHouseWriter and TextIndexer.
|
||||
|
||||
### Backpressure Mechanism
|
||||
|
||||
- WriteBuffer has a bounded capacity (configurable, default 50,000 items).
|
||||
- When buffer is >80% full, respond with HTTP 429 + `Retry-After` header.
|
||||
- Agents (cameleer3) should implement exponential backoff on 429.
|
||||
- Monitor buffer fill level as a metric.
|
||||
|
||||
### Batch Size Tuning
|
||||
|
||||
- Target: 5,000-10,000 rows per ClickHouse INSERT.
|
||||
- Flush interval: 1-2 seconds (configurable).
|
||||
- Flush triggers: whichever comes first -- batch size reached OR interval elapsed.
|
||||
|
||||
## Storage Architecture
|
||||
|
||||
### Write Path (ClickHouse)
|
||||
|
||||
ClickHouse excels at:
|
||||
- Columnar compression (10:1 or better for structured transaction data)
|
||||
- Time-partitioned tables with automatic TTL-based expiry (30-day retention)
|
||||
- Massive batch INSERT throughput
|
||||
- Analytical queries over time ranges
|
||||
|
||||
**Table design principles:**
|
||||
- Partition by month: `PARTITION BY toYYYYMM(execution_time)`
|
||||
- Order by query pattern: `ORDER BY (execution_time, transaction_id)` for time-range scans
|
||||
- TTL: `TTL execution_time + INTERVAL 30 DAY`
|
||||
- Use `LowCardinality(String)` for state, agent_id, route_id columns
|
||||
|
||||
### Full-Text Search
|
||||
|
||||
Two viable approaches:
|
||||
|
||||
**Option A: ClickHouse built-in full-text index (recommended for simplicity)**
|
||||
- ClickHouse supports `tokenbf_v1` and `ngrambf_v1` bloom filter indexes
|
||||
- Not as powerful as Elasticsearch/Lucene but avoids a separate system
|
||||
- Good enough for "find transactions containing this string" queries
|
||||
- Add a `search_text` column that concatenates searchable fields
|
||||
|
||||
**Option B: External search index (Elasticsearch/OpenSearch)**
|
||||
- More powerful: fuzzy matching, relevance scoring, complex text analysis
|
||||
- Additional infrastructure to manage
|
||||
- Only justified if full-text search quality is a key differentiator
|
||||
|
||||
**Recommendation:** Start with ClickHouse bloom filter indexes. The query pattern described (incident-driven, searching by known strings like correlation IDs or error messages) does not require Lucene-level text analysis. If users need fuzzy/ranked search later, add an external index as a separate phase.
|
||||
|
||||
### Read Path
|
||||
|
||||
- Structured queries go directly to ClickHouse SQL.
|
||||
- Full-text queries use the bloom filter index for pre-filtering, then exact match.
|
||||
- Results are merged at the QueryEngine level.
|
||||
- Pagination uses cursor-based approach (Pattern 5).
|
||||
|
||||
## SSE Connection Management at Scale
|
||||
|
||||
### Connection Lifecycle
|
||||
|
||||
```
|
||||
Agent connects --> authenticate JWT --> register SseEmitter --> mark LIVE
|
||||
|
|
||||
+-- heartbeat every 15s --> success: stays LIVE
|
||||
| --> failure: mark STALE, remove emitter
|
||||
|
|
||||
+-- agent reconnects --> new SseEmitter replaces old one
|
||||
|
|
||||
+-- no reconnect within 5min --> mark DEAD
|
||||
```
|
||||
|
||||
### Scaling Considerations
|
||||
|
||||
- 50 agents = 50 concurrent SSE connections. This is trivially handled by a single Spring Boot instance.
|
||||
- At 500+ agents: consider sticky sessions behind a load balancer, or move to a pub/sub system (Redis Pub/Sub) for cross-instance coordination.
|
||||
- Spring's SseEmitter uses Servlet async support. Each emitter holds a thread from the Servlet container's async pool, not a request thread.
|
||||
- With virtual threads (Java 21+), SSE connection overhead becomes negligible even at scale.
|
||||
|
||||
### Reconnection Protocol
|
||||
|
||||
- Agents should reconnect with `Last-Event-Id` header.
|
||||
- Server tracks last event ID per agent.
|
||||
- On reconnect, replay missed events (if any) from a small in-memory or persistent event log.
|
||||
- For config push: since config is idempotent, replaying the latest config on reconnect is sufficient.
|
||||
|
||||
## REST API Organization
|
||||
|
||||
### Controller Structure
|
||||
|
||||
```
|
||||
/api/v1/
|
||||
ingest/ IngestController
|
||||
POST /transactions -- batch ingest from agents
|
||||
POST /activities -- batch ingest activities
|
||||
|
||||
transactions/ TransactionController
|
||||
GET / -- search/list with filters
|
||||
GET /{id} -- single transaction detail
|
||||
GET /{id}/activities -- activities within a transaction
|
||||
|
||||
agents/ AgentController
|
||||
GET / -- list all agents with status
|
||||
GET /{id} -- agent detail
|
||||
GET /{id}/events -- SSE stream (SseController)
|
||||
POST /register -- bootstrap registration
|
||||
|
||||
diagrams/ DiagramController
|
||||
POST / -- store new diagram version
|
||||
GET /{routeId} -- latest diagram
|
||||
GET /{routeId}/at -- diagram at specific timestamp
|
||||
GET /{routeId}/rendered -- rendered SVG/PNG
|
||||
|
||||
config/ ConfigController
|
||||
GET / -- current config
|
||||
PUT / -- update config (triggers SSE push)
|
||||
POST /commands -- send ad-hoc command to agent(s)
|
||||
```
|
||||
|
||||
### Response Conventions
|
||||
|
||||
- List endpoints return `Page<T>` with cursor-based pagination.
|
||||
- All timestamps in ISO-8601 UTC.
|
||||
- Error responses follow RFC 7807 Problem Details.
|
||||
- Use `@RestControllerAdvice` for global exception handling.
|
||||
|
||||
## Scalability Considerations
|
||||
|
||||
| Concern | At 50 agents | At 500 agents | At 5,000 agents |
|
||||
|---------|-------------|---------------|-----------------|
|
||||
| **Ingestion throughput** | Single instance, in-memory buffer | Single instance, larger buffer | Multiple instances, partition by agent behind LB |
|
||||
| **SSE connections** | Single instance, ConcurrentHashMap | Sticky sessions + Redis Pub/Sub for cross-instance events | Dedicated SSE gateway service |
|
||||
| **ClickHouse writes** | Single writer thread, batch every 1-2s | Multiple writer threads, parallel batches | ClickHouse cluster with sharding |
|
||||
| **Query latency** | Single ClickHouse node | Read replicas | Distributed ClickHouse cluster |
|
||||
| **Diagram rendering** | Synchronous on request | Async pre-rendering on store | Worker pool with rendering queue |
|
||||
|
||||
## Suggested Build Order
|
||||
|
||||
Based on component dependencies:
|
||||
|
||||
```
|
||||
Phase 1: Foundation
|
||||
Domain models (core)
|
||||
Repository interfaces (core)
|
||||
Basic Spring Boot wiring (app)
|
||||
|
||||
Phase 2: Ingestion Pipeline
|
||||
WriteBuffer (core)
|
||||
ClickHouse schema + connection (app)
|
||||
ClickHouseWriter (app)
|
||||
IngestController (app)
|
||||
--> Can receive and store transactions
|
||||
|
||||
Phase 3: Query Engine
|
||||
TransactionQuery model (core)
|
||||
QueryEngine (core)
|
||||
ClickHouse query implementation (app)
|
||||
TransactionController (app)
|
||||
--> Can search stored transactions
|
||||
|
||||
Phase 4: Agent Registry + SSE
|
||||
AgentRegistryService (core)
|
||||
SseChannelManager interface (core) + impl (app)
|
||||
AgentController + SseController (app)
|
||||
--> Agents can register and receive push events
|
||||
|
||||
Phase 5: Diagram Service
|
||||
DiagramService (core)
|
||||
DiagramRepository interface (core) + impl (app)
|
||||
DiagramRenderer (core/app)
|
||||
DiagramController (app)
|
||||
--> Versioned diagrams linked to transactions
|
||||
|
||||
Phase 6: Security
|
||||
JWT validation (core interface, app impl)
|
||||
Ed25519 config signing (core)
|
||||
Bootstrap token flow (app)
|
||||
Security filters (app)
|
||||
--> All endpoints secured
|
||||
|
||||
Phase 7: Full-Text Search
|
||||
TextIndexer (core interface, app impl)
|
||||
ClickHouse bloom filter index setup
|
||||
QueryEngine full-text integration
|
||||
--> Combined structured + text search
|
||||
|
||||
Phase 8: UI
|
||||
Static resources (app)
|
||||
Frontend consuming REST API
|
||||
```
|
||||
|
||||
**Ordering rationale:**
|
||||
- Storage before query (you need data to query)
|
||||
- Ingestion before agents (agents need an endpoint to POST to)
|
||||
- Query before full-text (structured search first, text search layers on top)
|
||||
- Security can be added at any point but is cleanest as a cross-cutting concern after core flows work
|
||||
- Diagrams are semi-independent but reference transactions, so after query
|
||||
- UI is last because API-first means the API must be stable
|
||||
|
||||
## Sources
|
||||
|
||||
- ClickHouse documentation on MergeTree engines, TTL, bloom filter indexes (official docs, verified against training data)
|
||||
- Spring Boot SseEmitter documentation (Spring Framework reference)
|
||||
- Observability system architecture patterns from Jaeger, Zipkin, and SigNoz architectures (well-established open-source projects)
|
||||
- Content-addressable storage patterns from Git internals and Docker image layers
|
||||
- Cursor-based pagination patterns from Slack API and Stripe API design guides
|
||||
- Confidence: MEDIUM -- based on established patterns in training data, not live-verified against current documentation
|
||||
194
.planning/research/FEATURES.md
Normal file
194
.planning/research/FEATURES.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# Feature Landscape
|
||||
|
||||
**Domain:** Transaction monitoring / observability for Apache Camel route executions
|
||||
**Researched:** 2026-03-11
|
||||
**Confidence:** MEDIUM (based on domain expertise from njams Server, Jaeger, Zipkin, Dynatrace; web search unavailable for latest feature sets)
|
||||
|
||||
## Table Stakes
|
||||
|
||||
Features users expect. Missing = product feels incomplete.
|
||||
|
||||
### Transaction Search and Filtering
|
||||
|
||||
| Feature | Why Expected | Complexity | Notes |
|
||||
|---------|--------------|------------|-------|
|
||||
| Search by time range | Every monitoring tool has this; primary axis for incident investigation | Low | Date picker with presets (last 15m, 1h, 24h, 7d, custom) |
|
||||
| Filter by transaction state | SUCCESS/ERROR/WARNING is the first thing ops checks | Low | Multi-select checkboxes, counts per state |
|
||||
| Filter by duration | Finding slow transactions is core use case | Low | Min/max duration inputs, or predefined buckets |
|
||||
| Full-text search across payload/attributes | Users need to find "that one order ID" across millions of records | Medium | Requires text index; match highlighting in results |
|
||||
| Combined/compound filters | Users always combine: "errors in last hour on instance X" | Medium | AND-composition of all filter criteria |
|
||||
| Paginated result list | Cannot load millions of rows; must page or virtual-scroll | Low | Cursor-based pagination preferred over offset for large datasets |
|
||||
| Sort by time, duration, state | Basic result ordering | Low | Default: newest first |
|
||||
| Filter by agent/instance | "Show me only transactions from production-instance-3" | Low | Dropdown populated from agent registry |
|
||||
| Filter by route name | Users think in routes, not raw IDs | Low | Autocomplete from known route definitions |
|
||||
| Save/bookmark search queries | Ops teams reuse the same searches during incidents | Medium | Named saved searches, shareable via URL |
|
||||
|
||||
### Transaction Detail and Drill-Down
|
||||
|
||||
| Feature | Why Expected | Complexity | Notes |
|
||||
|---------|--------------|------------|-------|
|
||||
| Transaction summary view | One-glance: state, start time, duration, instance, route entry point | Low | Header card in detail page |
|
||||
| Activity list (per-route breakdown) | Hierarchical view of all route executions within a transaction | Medium | Tree or table showing each activity with timing |
|
||||
| Activity timing waterfall | Visual timeline showing which routes executed when, and their overlap | Medium | Horizontal bar chart; critical for finding bottlenecks |
|
||||
| Payload/attribute inspection | View message body, headers, properties at each activity step | Medium | Expandable sections; JSON/XML pretty-printing |
|
||||
| Error detail with stack trace | When a transaction fails, users need the exception detail immediately | Low | Rendered stack trace with copy button |
|
||||
| Cross-instance correlation | Transaction spans instances A and B -- show the full chain | High | Requires correlation ID propagation; single unified view |
|
||||
| Link to route diagram | From any activity, jump to the diagram showing the route definition | Low | Hyperlink; depends on diagram storage existing |
|
||||
|
||||
### Route Diagram Visualization
|
||||
|
||||
| Feature | Why Expected | Complexity | Notes |
|
||||
|---------|--------------|------------|-------|
|
||||
| Render route diagram from stored definition | The core differentiator vs generic tracing tools; users think in Camel routes | High | Server-side or client-side rendering from graph model |
|
||||
| Diagram versioning | Route changed last Tuesday -- show the diagram as it was when the transaction ran | Medium | Version stored per diagram; transaction references specific version |
|
||||
| Zoom and pan | Diagrams can be large (50+ nodes); must be navigable | Medium | Standard canvas controls; minimap helpful for large diagrams |
|
||||
| Execution overlay on diagram | Highlight which path the transaction actually took through the route | High | Color/annotate nodes with state (success/error), timing |
|
||||
| Node click for activity detail | Click a node in the diagram to see the activity data for that step | Medium | Links diagram nodes to activity records |
|
||||
|
||||
### Agent Management
|
||||
|
||||
| Feature | Why Expected | Complexity | Notes |
|
||||
|---------|--------------|------------|-------|
|
||||
| Agent list with status | See all connected agents and their lifecycle state (LIVE/STALE/DEAD) | Low | Table with status indicator; auto-refresh |
|
||||
| Agent heartbeat monitoring | Detect when an agent goes silent | Low | Timestamp of last heartbeat; threshold-based state transitions |
|
||||
| Agent detail view | Instance name, version, connected routes, uptime, config | Low | Detail page per agent |
|
||||
| Agent registration/deregistration | New agents register via bootstrap token; dead agents get cleaned up | Medium | Registration endpoint; TTL-based cleanup |
|
||||
|
||||
### Authentication and Security
|
||||
|
||||
| Feature | Why Expected | Complexity | Notes |
|
||||
|---------|--------------|------------|-------|
|
||||
| JWT-based API authentication | Secure the REST API; every enterprise monitoring tool requires auth | Medium | Token issuance, validation, refresh |
|
||||
| Bootstrap token for agent registration | Agents need a way to initially register without pre-existing credentials | Low | Shared secret, single-use or time-limited |
|
||||
| Ed25519 config signing | Agents must verify config came from the server, not tampered | Medium | Key management, signature generation/verification |
|
||||
|
||||
### Dashboard and Overview
|
||||
|
||||
| Feature | Why Expected | Complexity | Notes |
|
||||
|---------|--------------|------------|-------|
|
||||
| Transaction volume chart (time series) | "How many transactions are we processing?" -- first question on login | Medium | Bar or line chart, grouped by time bucket |
|
||||
| Error rate chart | "Is something broken right now?" -- second question | Medium | Error count or percentage over time |
|
||||
| Active agents count | Quick health check of the agent fleet | Low | Simple counter with status breakdown |
|
||||
| Recent errors list | Quick access to the latest failures without searching | Low | Pre-filtered list, auto-refreshing |
|
||||
|
||||
## Differentiators
|
||||
|
||||
Features that set product apart from generic tracing tools. Not expected, but valued.
|
||||
|
||||
### Diagram-Centric Experience
|
||||
|
||||
| Feature | Value Proposition | Complexity | Notes |
|
||||
|---------|-------------------|------------|-------|
|
||||
| Route diagram as primary navigation | Instead of trace waterfall, users navigate via the Camel route diagram -- this is how they think | High | Diagram becomes the entry point, not just a visualization |
|
||||
| Execution heatmap on diagram | Color nodes by frequency/error rate over a time window -- shows hotspots | High | Aggregate stats per node; requires efficient querying |
|
||||
| Side-by-side diagram comparison | Compare two diagram versions to see what changed in a route | Medium | Diff view highlighting added/removed/changed nodes |
|
||||
| Diagram-based search | "Show me all failed transactions that passed through this node" | High | Click a node, get filtered transaction list |
|
||||
|
||||
### Advanced Search and Analytics
|
||||
|
||||
| Feature | Value Proposition | Complexity | Notes |
|
||||
|---------|-------------------|------------|-------|
|
||||
| Statistical duration analysis | P50/P95/P99 duration for a route over time -- detect degradation trends | Medium | Requires ClickHouse aggregation queries |
|
||||
| Transaction comparison | Side-by-side diff of two transactions through the same route | Medium | Useful for "why did this one fail but that one succeed?" |
|
||||
| Search result aggregations | Faceted counts: N errors, N warnings, distribution by route, by instance | Medium | ClickHouse GROUP BY queries alongside search results |
|
||||
| Correlation graph | Visual graph showing how transactions flow across instances | High | Network diagram; requires correlation data |
|
||||
|
||||
### Configuration Push
|
||||
|
||||
| Feature | Value Proposition | Complexity | Notes |
|
||||
|---------|-------------------|------------|-------|
|
||||
| Per-route tracing level control | Turn on detailed tracing for one problematic route without restarting the agent | Medium | SSE push of config change; agent applies dynamically |
|
||||
| Bulk config push to agent groups | "Enable debug tracing on all production instances" | Medium | Agent tagging/grouping + batch SSE dispatch |
|
||||
| Config history and rollback | See what config was active when, roll back a bad change | Medium | Versioned config storage with timestamps |
|
||||
| Ad-hoc command dispatch | Send a "flush cache" or "reconnect" command to specific agents | Medium | Command/response pattern over SSE; command status tracking |
|
||||
|
||||
### Operational Intelligence
|
||||
|
||||
| Feature | Value Proposition | Complexity | Notes |
|
||||
|---------|-------------------|------------|-------|
|
||||
| Alerting on error rate thresholds | Notify when error rate exceeds threshold for a route | High | Threshold evaluation, notification channels (email, webhook) |
|
||||
| Anomaly detection on duration | Alert when P95 duration spikes compared to baseline | High | Statistical baseline computation; deviation detection |
|
||||
| Scheduled data export | Export transaction data as CSV/JSON for compliance or reporting | Medium | Job scheduler; file generation; download endpoint |
|
||||
| Retention policy management | Configure per-route or per-instance retention periods | Medium | TTL management in ClickHouse; UI for policy CRUD |
|
||||
|
||||
## Anti-Features
|
||||
|
||||
Features to explicitly NOT build.
|
||||
|
||||
| Anti-Feature | Why Avoid | What to Do Instead |
|
||||
|--------------|-----------|-------------------|
|
||||
| General APM metrics (CPU, memory, GC) | Out of scope; Cameleer is transaction-focused, not an APM tool. Adding metrics creates scope creep and competes with Prometheus/Grafana which do it better | Provide a link/integration point to external metrics tools if needed |
|
||||
| Log aggregation/viewer | Transactions are not logs. Mixing them confuses the data model and competes with ELK/Loki | Store transaction payloads and attributes, not raw log lines |
|
||||
| Custom dashboard builder | Enormous complexity for marginal value. Ops teams already have Grafana for custom dashboards | Provide good built-in dashboards; expose metrics via Prometheus endpoint for Grafana |
|
||||
| Multi-tenancy | Adds auth complexity, data isolation, billing concerns. Single-tenant deployment is simpler and sufficient for the target audience | Deploy separate instances per environment/team |
|
||||
| Mobile app | Ops teams use desktop browsers during incidents. Mobile adds huge UI complexity | Responsive web UI that works on tablets if needed |
|
||||
| Plugin/extension system | Premature abstraction; adds API stability burden before the core is stable | Build features directly; consider plugins much later if demand emerges |
|
||||
| Real-time streaming transaction view | "Firehose" views of all transactions in real-time look impressive but are useless at scale (millions/day). Users cannot process the stream | Provide auto-refreshing search results and recent errors list |
|
||||
| AI/ML-powered root cause analysis | Hype-driven feature with poor reliability. Requires massive training data and domain-specific models | Provide good search, filtering, and comparison tools so humans can find root causes efficiently |
|
||||
|
||||
## Feature Dependencies
|
||||
|
||||
```
|
||||
Agent Registration --> Agent List/Status
|
||||
Agent Registration --> SSE Connection --> Config Push
|
||||
Agent Registration --> SSE Connection --> Ad-hoc Commands
|
||||
|
||||
Transaction Ingestion --> Transaction Storage
|
||||
Transaction Storage --> Transaction Search/Filtering
|
||||
Transaction Search --> Transaction Detail View
|
||||
Transaction Detail --> Activity Waterfall
|
||||
Transaction Detail --> Payload Inspection
|
||||
Transaction Detail --> Error Detail
|
||||
|
||||
Diagram Storage --> Diagram Rendering
|
||||
Diagram Versioning --> Transaction-to-Diagram Linking
|
||||
Diagram Rendering --> Execution Overlay (requires both diagram + activity data)
|
||||
Diagram Rendering --> Execution Heatmap (requires aggregated activity data)
|
||||
Diagram Rendering --> Diagram-based Search
|
||||
|
||||
Transaction Search --> Statistical Duration Analysis (aggregation of search results)
|
||||
Transaction Search --> Search Result Aggregations
|
||||
|
||||
JWT Auth --> All REST API endpoints
|
||||
Bootstrap Token --> Agent Registration
|
||||
Ed25519 Signing --> Config Push
|
||||
|
||||
Transaction Volume Chart --> Transaction Storage (aggregation queries)
|
||||
Error Rate Chart --> Transaction Storage (aggregation queries)
|
||||
```
|
||||
|
||||
## MVP Recommendation
|
||||
|
||||
**Prioritize (Phase 1 -- Foundation):**
|
||||
1. Transaction ingestion and storage -- nothing works without data flowing in
|
||||
2. Agent registration and lifecycle -- must know who is sending data
|
||||
3. Basic transaction search (time range, state, duration) -- core value proposition
|
||||
4. Transaction detail with activity breakdown -- users need to drill down
|
||||
|
||||
**Prioritize (Phase 2 -- Core Experience):**
|
||||
5. Full-text search -- the "find that one transaction" use case
|
||||
6. Route diagram rendering with version linking -- the Camel-specific differentiator
|
||||
7. JWT authentication -- required before any production deployment
|
||||
8. Dashboard overview (volume chart, error rate, agent status)
|
||||
|
||||
**Prioritize (Phase 3 -- Differentiation):**
|
||||
9. Execution overlay on diagrams -- the killer feature that generic tools cannot offer
|
||||
10. Config push via SSE -- operational value that justifies the agent-server architecture
|
||||
11. Cross-instance correlation -- required for complex multi-instance Camel deployments
|
||||
|
||||
**Defer:**
|
||||
- Alerting: defer until core search and dashboard are solid; alerting without good data is noise
|
||||
- Data export: useful but not blocking; add when compliance demands arise
|
||||
- Anomaly detection: requires baseline data that only accumulates over time
|
||||
- Diagram-based search: powerful but depends on both diagram rendering and search being mature
|
||||
- Execution heatmap: requires significant aggregation infrastructure
|
||||
|
||||
## Sources
|
||||
|
||||
- Domain knowledge from njams Server (Integration Matters) feature set -- transaction monitoring for integration platforms, hierarchical transaction/activity model, route diagram visualization
|
||||
- Jaeger UI and Zipkin UI -- distributed tracing search, trace detail waterfall views, service dependency graphs
|
||||
- Dynatrace PurePath -- transaction-level drill-down, service flow visualization, statistical analysis
|
||||
- Apache Camel route model -- EIP-based visual representation, route definition structure
|
||||
- Project context from PROJECT.md and CLAUDE.md -- specific requirements, constraints, and architectural decisions
|
||||
|
||||
**Confidence note:** Feature categorization is based on training data knowledge of these products. Web search was unavailable to verify latest feature additions in 2025-2026 releases. The core feature landscape for this domain is mature and unlikely to have shifted dramatically, but specific UI patterns and newer differentiators may be missed. Confidence: MEDIUM.
|
||||
322
.planning/research/PITFALLS.md
Normal file
322
.planning/research/PITFALLS.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# Domain Pitfalls
|
||||
|
||||
**Domain:** Transaction monitoring / observability server (Cameleer3 Server)
|
||||
**Researched:** 2026-03-11
|
||||
**Confidence:** MEDIUM (based on established patterns for ClickHouse, SSE, high-volume ingestion; no web verification available)
|
||||
|
||||
---
|
||||
|
||||
## Critical Pitfalls
|
||||
|
||||
Mistakes that cause data loss, rewrites, or production outages.
|
||||
|
||||
### Pitfall 1: Inserting Rows One-at-a-Time into ClickHouse
|
||||
|
||||
**What goes wrong:** ClickHouse is a columnar OLAP engine optimized for bulk inserts. Sending one INSERT per incoming transaction (or per activity) creates a new data part per insert. ClickHouse merges parts in the background, but if parts accumulate faster than merges complete, you get "too many parts" errors and the table becomes read-only or the server OOMs.
|
||||
|
||||
**Why it happens:** Developers coming from PostgreSQL/MySQL treat ClickHouse like an OLTP database. The agent sends a transaction, the server writes it immediately -- natural but catastrophic at scale.
|
||||
|
||||
**Consequences:** At 50+ agents sending thousands of transactions/minute, row-by-row inserts will produce hundreds of parts per second. ClickHouse will reject inserts within hours. Data loss follows.
|
||||
|
||||
**Warning signs:**
|
||||
- `system.parts` table shows thousands of active parts per partition
|
||||
- ClickHouse logs show "too many parts" warnings
|
||||
- Insert latency increases progressively over hours
|
||||
|
||||
**Prevention:**
|
||||
- Buffer incoming transactions in memory (or a local queue) and flush in batches of 1,000-10,000 rows every 1-5 seconds
|
||||
- Use ClickHouse's `Buffer` table engine as a safety net, but do not rely on it as the primary batching mechanism -- it has its own quirks (data visible before flush, lost on crash)
|
||||
- Alternatively, write to a Kafka topic and use ClickHouse's Kafka engine for consumption (adds infrastructure but is the most robust pattern at high scale)
|
||||
- Set `max_insert_block_size` and monitor `system.parts` in your health checks
|
||||
|
||||
**Phase relevance:** Must be correct from the very first storage implementation (Phase 1). Retrofitting batching into a synchronous write path is painful.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 2: Wrong ClickHouse Primary Key / ORDER BY Design
|
||||
|
||||
**What goes wrong:** ClickHouse does not have traditional indexes. The `ORDER BY` clause defines how data is physically sorted on disk, and this sorting IS the primary access optimization. Choosing the wrong ORDER BY makes your most common queries scan entire partitions.
|
||||
|
||||
**Why it happens:** Developers pick `ORDER BY (id)` by instinct (UUID primary key). But ClickHouse queries for this project will filter by time range, agent, state, and transaction attributes -- not by UUID.
|
||||
|
||||
**Consequences:** A query like "find all ERROR transactions in the last hour from agent X" does a full partition scan instead of reading a narrow range. At millions of rows per day with 30-day retention, this means scanning tens of millions of rows for simple queries.
|
||||
|
||||
**Warning signs:**
|
||||
- `EXPLAIN` shows large `rows_read` relative to result set
|
||||
- Queries that should take milliseconds take seconds
|
||||
- CPU spikes on simple filtered queries
|
||||
|
||||
**Prevention:**
|
||||
- Design ORDER BY around your dominant query pattern: `ORDER BY (agent_id, status, toStartOfHour(execution_time), transaction_id)` or similar
|
||||
- PARTITION BY month or day (e.g., `toYYYYMM(execution_time)`) to enable efficient TTL and partition dropping
|
||||
- Put high-cardinality columns (like transaction_id) last in the ORDER BY
|
||||
- Add `GRANULARITY`-based skip indexes (e.g., `INDEX idx_text content TYPE tokenbf_v1(...)`) for full-text-like searches
|
||||
- Test with realistic data volumes before committing to a schema -- ClickHouse schema changes require table recreation or materialized views
|
||||
|
||||
**Phase relevance:** Must be designed correctly before any data is stored (Phase 1). Changing ORDER BY requires recreating the table and re-ingesting all data.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 3: SSE Connection Leaks and Unbounded Memory Growth
|
||||
|
||||
**What goes wrong:** Each connected agent holds an open SSE connection. If the server does not detect dead connections, does not limit per-agent connections, and does not bound the event buffer per connection, memory grows unboundedly. Agents that disconnect uncleanly (network failure, OOM kill) leave orphaned `SseEmitter` objects on the server.
|
||||
|
||||
**Why it happens:** Spring's `SseEmitter` does not automatically detect a dead TCP connection. The server happily buffers events for a dead connection until memory runs out. HTTP keep-alive and TCP timeouts are often far too long (minutes to hours).
|
||||
|
||||
**Consequences:** With 50+ agents, each potentially disconnecting/reconnecting multiple times per day, orphaned emitters accumulate. The server eventually OOMs or becomes unresponsive. Config pushes go to dead connections and are silently lost.
|
||||
|
||||
**Warning signs:**
|
||||
- Heap usage grows steadily over days without corresponding agent count increase
|
||||
- `SseEmitter` count in metrics diverges from known active agent count
|
||||
- Config pushes succeed (no error) but agents never receive them
|
||||
|
||||
**Prevention:**
|
||||
- Set `SseEmitter` timeout explicitly (e.g., 60 seconds idle, with periodic heartbeat/ping events)
|
||||
- Implement server-side heartbeat: send a comment event (`: ping`) every 15-30 seconds. If the write fails, the connection is dead -- clean it up immediately
|
||||
- Register `onCompletion`, `onTimeout`, and `onError` callbacks on every `SseEmitter` to remove it from the registry
|
||||
- Limit to one SSE connection per agent instance (keyed by agent ID). If a new connection arrives for the same agent, close the old one
|
||||
- Bound the outbound event queue per connection (drop oldest events if agent is too slow)
|
||||
- Use Spring WebFlux `Flux<ServerSentEvent>` instead of `SseEmitter` if possible -- it integrates better with reactive backpressure and connection lifecycle
|
||||
|
||||
**Phase relevance:** Must be correct from the first SSE implementation (Phase 2 or whenever SSE is introduced). Connection leaks are silent and cumulative.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 4: No Backpressure on Ingestion Endpoint
|
||||
|
||||
**What goes wrong:** The HTTP POST endpoint that receives transaction data from agents accepts requests unboundedly. Under burst load (agent reconnection storm, batch replay), the server runs out of memory buffering writes, or overwhelms ClickHouse with insert pressure.
|
||||
|
||||
**Why it happens:** The default Spring Boot behavior is to accept all incoming requests. Without explicit rate limiting or queue depth control, the server cannot signal agents to slow down.
|
||||
|
||||
**Consequences:** Server OOMs during agent reconnection storms (all 50+ agents replay buffered data simultaneously). Or ClickHouse falls behind on merges, enters "too many parts" state, and rejects writes -- causing data loss.
|
||||
|
||||
**Warning signs:**
|
||||
- Memory spikes correlated with agent reconnect events
|
||||
- HTTP 503s during burst periods
|
||||
- ClickHouse merge queue growing faster than it drains
|
||||
|
||||
**Prevention:**
|
||||
- Implement a bounded in-memory queue (e.g., `ArrayBlockingQueue` or Disruptor ring buffer) between the HTTP endpoint and the ClickHouse writer
|
||||
- Return HTTP 429 (Too Many Requests) with `Retry-After` header when the queue is full -- agents should implement exponential backoff
|
||||
- Size the queue based on expected burst duration (e.g., 30 seconds of peak throughput)
|
||||
- Monitor queue depth as a key metric
|
||||
- Consider writing to local disk (append-only log) as overflow when queue is full, then draining asynchronously
|
||||
|
||||
**Phase relevance:** Should be designed into the ingestion layer from the start (Phase 1). Retrofitting backpressure requires changing both server and agent behavior.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 5: Storing Full Transaction Payloads in ClickHouse for Full-Text Search
|
||||
|
||||
**What goes wrong:** Developers store large text fields (message bodies, stack traces, XML/JSON payloads) directly in ClickHouse columns and try to search them with `LIKE '%term%'` or `hasToken()`. ClickHouse is not a text search engine. These queries scan every row in the partition and are extremely slow at scale.
|
||||
|
||||
**Why it happens:** The requirement says "full-text search." ClickHouse can technically do string matching. So developers avoid adding a second storage system.
|
||||
|
||||
**Consequences:** Full-text queries on 30 days of data (hundreds of millions of rows) take 30+ seconds or time out entirely. Users cannot find transactions by content, which is a core value proposition.
|
||||
|
||||
**Warning signs:**
|
||||
- Full-text queries take >5 seconds even on recent data
|
||||
- ClickHouse CPU pegged at 100% during text searches
|
||||
- Users avoid the search feature because it is too slow
|
||||
|
||||
**Prevention:**
|
||||
- Use a dedicated text search index alongside ClickHouse. Options:
|
||||
- **OpenSearch/Elasticsearch:** Battle-tested for log/observability search. Index the searchable text fields (message content, stack traces) with the transaction ID as a foreign key. Query OpenSearch for matching transaction IDs, then fetch details from ClickHouse.
|
||||
- **ClickHouse `tokenbf_v1` or `ngrambf_v1` skip indexes:** Viable for token-based search on specific columns if the search vocabulary is limited. Not a replacement for real full-text search but can handle "find transactions containing this exact correlation ID" well.
|
||||
- **Tantivy/Lucene sidecar:** If you want to avoid a full OpenSearch cluster, embed a Lucene-based index in the server process. Higher coupling but lower infrastructure cost.
|
||||
- For MVP, ClickHouse token bloom filter indexes may suffice for exact-token searches. Plan the architecture to swap in OpenSearch later without changing the query API.
|
||||
|
||||
**Phase relevance:** Architecture decision needed in Phase 1 (storage design). Implementation can be phased -- start with ClickHouse skip indexes, add OpenSearch when query patterns demand it.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 6: Losing Data During Server Restart or Crash
|
||||
|
||||
**What goes wrong:** If the server buffers transactions in memory before batch-flushing to ClickHouse (as recommended in Pitfall 1), a server crash or restart loses all buffered data.
|
||||
|
||||
**Why it happens:** In-memory buffering is the obvious first implementation. Nobody thinks about crash recovery until data is lost.
|
||||
|
||||
**Consequences:** Every server restart during deployment loses 1-5 seconds of transaction data. In a crash scenario, potentially more.
|
||||
|
||||
**Warning signs:**
|
||||
- Missing transactions in ClickHouse around server restart timestamps
|
||||
- Agents report successful POSTs but transactions are absent from storage
|
||||
|
||||
**Prevention:**
|
||||
- Accept that some data loss on crash is tolerable for an observability system (this is not a financial ledger). Document the guarantee: "at-most-once delivery with bounded loss window of N seconds"
|
||||
- Implement graceful shutdown: on SIGTERM, flush the current buffer before stopping (`@PreDestroy` or `SmartLifecycle` with ordered shutdown)
|
||||
- For zero data loss: write to a Write-Ahead Log (local append file) before acknowledging the HTTP POST, then batch from the WAL to ClickHouse. This adds complexity -- only do it if the data loss window from in-memory buffering is unacceptable
|
||||
- Size the flush interval to minimize the loss window (1 second flush = max 1 second of data lost)
|
||||
|
||||
**Phase relevance:** Graceful shutdown should be in Phase 1. WAL-based durability is a later optimization if needed.
|
||||
|
||||
---
|
||||
|
||||
## Moderate Pitfalls
|
||||
|
||||
### Pitfall 7: Timezone and Instant Handling Inconsistency
|
||||
|
||||
**What goes wrong:** Transaction timestamps arrive from agents in various formats or timezones. The server stores them inconsistently, leading to queries that miss transactions or return wrong time ranges. ClickHouse's `DateTime` type is timezone-aware but defaults to server timezone if not specified.
|
||||
|
||||
**Prevention:**
|
||||
- Mandate UTC everywhere: agents send `Instant` (epoch millis or ISO-8601 with Z), server stores as ClickHouse `DateTime64(3, 'UTC')`, UI converts to local timezone for display only
|
||||
- Use Jackson's `JavaTimeModule` (already noted in CLAUDE.md) and ensure `WRITE_DATES_AS_TIMESTAMPS` is disabled so Instant serializes as ISO-8601
|
||||
- ClickHouse: always use `DateTime64(3, 'UTC')` not bare `DateTime`
|
||||
- Add a server-received timestamp alongside the agent-reported timestamp so you can detect clock skew
|
||||
|
||||
**Phase relevance:** Must be correct from first data model design (Phase 1).
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 8: Correlation ID Design That Cannot Span Instances
|
||||
|
||||
**What goes wrong:** Transactions that span multiple Camel instances (route A on instance 1 calls route B on instance 2) need a shared correlation ID. If the correlation ID is generated per-instance or per-route, you cannot reconstruct the full transaction path.
|
||||
|
||||
**Prevention:**
|
||||
- Use a single correlation ID (propagated via message headers) that is generated at the entry point and carried through all downstream calls
|
||||
- Store both `transactionId` (the correlation ID spanning instances) and `activityId` (unique per route execution) as separate fields
|
||||
- Ensure the agent propagates the correlation ID through Camel exchange properties and any external endpoint calls (HTTP headers, JMS properties, etc.)
|
||||
- Index `transactionId` in ClickHouse ORDER BY so correlation lookups are fast
|
||||
- This is primarily an agent-side concern, but the server schema must support it
|
||||
|
||||
**Phase relevance:** Data model design (Phase 1). Agent protocol must define correlation ID propagation.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 9: N+1 Queries When Loading Transaction Details
|
||||
|
||||
**What goes wrong:** A transaction detail view needs: the transaction record, all activities within it, the route diagram for each activity, and possibly the message content. If each is a separate query, a transaction with 20 activities generates 40+ queries.
|
||||
|
||||
**Prevention:**
|
||||
- Design the API to return a fully hydrated transaction in one call: transaction + activities in a single ClickHouse query (they share the same `transactionId`, and if ORDER BY is designed correctly, they are physically co-located)
|
||||
- Cache route diagrams aggressively (they are versioned and immutable once stored) -- a transaction with 20 activities likely references only 2-3 distinct diagrams
|
||||
- For list views (search results), return summary data only (no activities, no content). Load details on demand via a separate detail endpoint
|
||||
- Consider storing the diagram version hash with each activity so the detail endpoint can batch-fetch unique diagrams
|
||||
|
||||
**Phase relevance:** API design (Phase 2). Must be considered during data model design (Phase 1).
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 10: SSE Reconnection Without Last-Event-ID
|
||||
|
||||
**What goes wrong:** When an agent's SSE connection drops and reconnects, it misses all events sent during the disconnection. Without `Last-Event-ID` support, the agent has no way to request missed events, so configuration changes are silently lost.
|
||||
|
||||
**Prevention:**
|
||||
- Assign a monotonically increasing ID to every SSE event
|
||||
- On reconnection, the agent sends `Last-Event-ID` header. The server replays events since that ID
|
||||
- Keep a bounded event log (last N events or last T minutes) for replay. Events older than the replay window trigger a full state sync instead
|
||||
- For config push specifically: make config idempotent and include a version number. On reconnection, always send the current full config state rather than relying on event replay. This is simpler and more robust than event sourcing for config
|
||||
|
||||
**Phase relevance:** SSE implementation (Phase 2). The "full config sync on reconnect" pattern should be the default from day one.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 11: ClickHouse TTL That Fragments Partitions
|
||||
|
||||
**What goes wrong:** ClickHouse TTL deletes individual rows, which fragments existing data parts. At high data volumes with daily TTL expiration, this creates continuous background merge pressure and degrades query performance.
|
||||
|
||||
**Prevention:**
|
||||
- PARTITION BY `toYYYYMMDD(execution_time)` (daily partitions) and use `ALTER TABLE DROP PARTITION` via a scheduled job instead of row-level TTL
|
||||
- Dropping a partition is an instant metadata operation -- no data scanning, no merge pressure
|
||||
- A simple daily cron (or Spring `@Scheduled`) that drops partitions older than 30 days is more predictable than TTL
|
||||
- If you use TTL, set `ttl_only_drop_parts = 1` in the table settings so ClickHouse drops entire parts rather than rewriting them with rows removed (available in recent ClickHouse versions)
|
||||
|
||||
**Phase relevance:** Storage design (Phase 1). Must be decided before data accumulates.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 12: JWT Token Management Without Rotation
|
||||
|
||||
**What goes wrong:** JWT tokens are issued with no expiration or with very long expiration. If a token is compromised, there is no way to revoke it. Alternatively, tokens expire too quickly and agents disconnect/reconnect constantly.
|
||||
|
||||
**Prevention:**
|
||||
- Use short-lived access tokens (15-60 minutes) with a refresh token mechanism
|
||||
- For agent authentication specifically: the bootstrap token is used once to register and obtain a long-lived agent credential. The agent credential is used to obtain short-lived JWTs
|
||||
- Maintain a server-side token denylist (or use token versioning per agent) so compromised tokens can be revoked
|
||||
- Ed25519 signing for config push is separate from JWT auth -- do not conflate the two. Ed25519 ensures config integrity (agent verifies server signature). JWT ensures identity (server verifies agent identity)
|
||||
- Store agent public keys server-side so you can revoke individual agents
|
||||
|
||||
**Phase relevance:** Security implementation (later phase). Design the token lifecycle model early even if implementation comes later.
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 13: Schema Evolution Without Migration Strategy
|
||||
|
||||
**What goes wrong:** The agent protocol is "still evolving" (per project constraints). When the data model changes, existing data in ClickHouse becomes incompatible. ClickHouse does not support `ALTER TABLE` for changing ORDER BY, and column type changes are limited.
|
||||
|
||||
**Prevention:**
|
||||
- Version your schema explicitly (e.g., `schema_version` column or table naming convention)
|
||||
- For additive changes (new nullable columns): `ALTER TABLE ADD COLUMN` works fine in ClickHouse
|
||||
- For breaking changes (ORDER BY change, column type change): create a new table with the new schema and use a materialized view to transform data from old tables, or accept that old data stays in old format and queries span both tables
|
||||
- Design the ingestion layer to normalize incoming data to the current schema version, handling backward compatibility with older agents
|
||||
- Include a `protocol_version` field in agent registration so the server knows what format to expect
|
||||
|
||||
**Phase relevance:** Must be considered in Phase 1 data model design. The migration strategy becomes critical as soon as you have production data.
|
||||
|
||||
---
|
||||
|
||||
## Minor Pitfalls
|
||||
|
||||
### Pitfall 14: Overloading Agents via SSE Event Storms
|
||||
|
||||
**What goes wrong:** A bulk config change pushes 50 events in rapid succession to all agents. Agents process events synchronously and stall their main Camel routes while handling config updates.
|
||||
|
||||
**Prevention:**
|
||||
- Batch config changes into a single SSE event containing the full updated config
|
||||
- Rate-limit SSE event emission (no more than 1 config event per second per agent)
|
||||
- Agents should process SSE events asynchronously on a separate thread from the Camel context
|
||||
|
||||
**Phase relevance:** SSE implementation (Phase 2).
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 15: Not Monitoring the Monitoring System
|
||||
|
||||
**What goes wrong:** The observability server itself has no observability. When it degrades, nobody knows until agents report failures or users complain about missing data.
|
||||
|
||||
**Prevention:**
|
||||
- Expose Prometheus/Micrometer metrics for: ingestion rate, batch flush latency, ClickHouse insert latency, SSE active connections, queue depth, error rates
|
||||
- Add a `/health` endpoint that checks ClickHouse connectivity and queue depth
|
||||
- Alert on: ingestion rate dropping below expected baseline, queue depth exceeding threshold, ClickHouse insert errors
|
||||
|
||||
**Phase relevance:** Should be layered in alongside each component as it is built, not deferred to a "monitoring phase."
|
||||
|
||||
---
|
||||
|
||||
### Pitfall 16: Route Diagram Versioning Without Content Hashing
|
||||
|
||||
**What goes wrong:** Storing a new diagram version every time an agent reports, even if the diagram has not changed. With 50 agents reporting the same routes, you get 50 copies of identical diagrams.
|
||||
|
||||
**Prevention:**
|
||||
- Content-hash each diagram definition (SHA-256 of the normalized diagram content)
|
||||
- Store diagrams keyed by content hash. If the hash already exists, skip the insert
|
||||
- Link activities to diagrams via the content hash, not a sequential version number
|
||||
- This deduplicates across agents running the same routes and across deployments where routes did not change
|
||||
|
||||
**Phase relevance:** Diagram storage design (Phase 2 or whenever diagrams are implemented).
|
||||
|
||||
---
|
||||
|
||||
## Phase-Specific Warnings
|
||||
|
||||
| Phase Topic | Likely Pitfall | Mitigation |
|
||||
|-------------|---------------|------------|
|
||||
| Storage / ClickHouse setup | Row-by-row inserts (Pitfall 1), wrong ORDER BY (Pitfall 2), TTL fragmentation (Pitfall 11) | Design batch ingestion and ORDER BY before writing any code. Prototype with realistic volume. |
|
||||
| Ingestion endpoint | No backpressure (Pitfall 4), crash data loss (Pitfall 6) | Bounded queue + graceful shutdown from day one. |
|
||||
| Full-text search | ClickHouse as text engine (Pitfall 5) | Start with skip indexes, design API to allow backend swap. |
|
||||
| SSE implementation | Connection leaks (Pitfall 3), no reconnection handling (Pitfall 10), event storms (Pitfall 14) | Heartbeat + timeout + one-connection-per-agent from first implementation. |
|
||||
| Data model | Timezone inconsistency (Pitfall 7), correlation ID design (Pitfall 8), schema evolution (Pitfall 13) | UTC everywhere, correlation ID in protocol spec, versioned schema. |
|
||||
| Security | Token management (Pitfall 12) | Design token lifecycle early, implement in security phase. |
|
||||
| API design | N+1 queries (Pitfall 9) | Co-locate activities with transactions in storage, cache diagrams. |
|
||||
| Operations | No self-monitoring (Pitfall 15) | Add metrics alongside each component, not as a separate phase. |
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- ClickHouse documentation on MergeTree engine, partitioning, and TTL (training data, MEDIUM confidence)
|
||||
- Spring Framework SSE / SseEmitter documentation (training data, MEDIUM confidence)
|
||||
- Production experience patterns from observability platforms (Jaeger, Zipkin, Grafana Tempo architecture docs) (training data, MEDIUM confidence)
|
||||
- General distributed systems ingestion patterns (training data, MEDIUM confidence)
|
||||
|
||||
**Note:** WebSearch was unavailable during this research session. All findings are based on training data (cutoff May 2025). Confidence is MEDIUM across the board -- the patterns are well-established but specific version details (e.g., ClickHouse TTL settings, Spring Boot 3.4.3 SSE behavior) should be verified against current documentation during implementation.
|
||||
271
.planning/research/STACK.md
Normal file
271
.planning/research/STACK.md
Normal file
@@ -0,0 +1,271 @@
|
||||
# Technology Stack
|
||||
|
||||
**Project:** Cameleer3 Server
|
||||
**Researched:** 2026-03-11
|
||||
**Overall confidence:** MEDIUM (no live source verification available; versions based on training data up to May 2025)
|
||||
|
||||
## Recommended Stack
|
||||
|
||||
### Core Framework (Already Decided)
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| Java | 17+ | Runtime | Already established; LTS, well-supported | HIGH |
|
||||
| Spring Boot | 3.4.3 | Application framework | Already in POM; provides web, security, configuration | HIGH |
|
||||
| Maven | 3.9+ | Build system | Already established; multi-module project | HIGH |
|
||||
|
||||
### Primary Data Store: ClickHouse
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| ClickHouse | 24.x+ | Transaction/activity storage | Column-oriented, built for billions of rows, native TTL, excellent time-range queries, MergeTree engine handles millions of inserts/day trivially | MEDIUM |
|
||||
| clickhouse-java (HTTP) | 0.6.x+ | Java client | Official ClickHouse Java client; HTTP transport is simpler and more reliable than native TCP for Spring Boot apps | MEDIUM |
|
||||
|
||||
**Why ClickHouse over alternatives:**
|
||||
|
||||
- **vs Elasticsearch/OpenSearch:** ClickHouse is 5-10x more storage-efficient for structured columnar data. For time-series-like transaction data with known schema, ClickHouse drastically outperforms ES on aggregation queries (avg duration, count by state, time bucketing). ES is overkill when you don't need its inverted index for *every* field.
|
||||
- **vs TimescaleDB:** TimescaleDB is PostgreSQL-based and good for moderate scale, but ClickHouse handles the "millions of inserts per day" tier with less operational overhead. TimescaleDB's row-oriented heritage means larger storage footprint for wide transaction records. ClickHouse's columnar compression achieves 10-20x compression on typical observability data.
|
||||
- **vs PostgreSQL (plain):** PostgreSQL cannot efficiently handle this insert volume with 30-day retention and fast analytical queries. Partitioning and vacuuming become operational nightmares at this scale.
|
||||
|
||||
**ClickHouse key features for this project:**
|
||||
- **TTL on tables:** `TTL executionDate + INTERVAL 30 DAY` — automatic 30-day retention with zero application code
|
||||
- **MergeTree engine:** Handles high insert throughput; batch inserts of 10K+ rows are trivial
|
||||
- **Materialized views:** Pre-aggregate common queries (transactions by state per hour, etc.)
|
||||
- **Low storage cost:** 10-20x compression means 30 days of millions of transactions fits in modest disk
|
||||
|
||||
### Full-Text Search: OpenSearch
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| OpenSearch | 2.x | Full-text search over payloads, metadata, attributes | True inverted index for arbitrary text search; ClickHouse's full-text is rudimentary | MEDIUM |
|
||||
| opensearch-java | 2.x | Java client | Official OpenSearch Java client; works well with Spring Boot | MEDIUM |
|
||||
|
||||
**Why a separate search engine instead of ClickHouse alone:**
|
||||
|
||||
ClickHouse has token-level bloom filter indexes and `hasToken()`/`LIKE` matching, but these are not true full-text search. For the requirement "search by any content in payloads, metadata, and attributes," you need an inverted index with:
|
||||
- Tokenization and analysis (stemming, case folding)
|
||||
- Relevance scoring
|
||||
- Phrase matching
|
||||
- Highlighting of matched terms in results
|
||||
|
||||
**Why OpenSearch over Elasticsearch:**
|
||||
- Apache 2.0 licensed (no SSPL concerns for self-hosted deployment)
|
||||
- API-compatible with Elasticsearch 7.x
|
||||
- Active development, large community
|
||||
- OpenSearch Dashboards available if needed later
|
||||
- No licensing ambiguity for Docker deployment
|
||||
|
||||
**Dual-store pattern:**
|
||||
- ClickHouse = source of truth for structured queries (time range, state, duration, aggregations)
|
||||
- OpenSearch = search index for full-text queries
|
||||
- Application writes to both; OpenSearch indexed asynchronously from an internal queue
|
||||
- Structured filters (time, state) applied in ClickHouse; full-text queries in OpenSearch return transaction IDs, then ClickHouse fetches full records
|
||||
|
||||
### Caching Layer: Caffeine + Redis (phased)
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| Caffeine | 3.1.x | In-process cache for agent registry, diagram versions, hot config | Fastest JVM cache; zero network overhead; perfect for single-instance start | MEDIUM |
|
||||
| Spring Cache (`@Cacheable`) | (Spring Boot) | Cache abstraction | Switch cache backends without code changes | HIGH |
|
||||
| Redis | 7.x | Distributed cache (Phase 2+, when horizontal scaling) | Shared state across multiple server instances; SSE session coordination | MEDIUM |
|
||||
|
||||
**Phased approach:**
|
||||
1. **Phase 1:** Caffeine only. Single server instance. Agent registry, diagram cache, recent query results all in-process.
|
||||
2. **Phase 2 (horizontal scaling):** Add Redis for shared state. Agent registry must be consistent across instances. SSE sessions need coordination.
|
||||
|
||||
### Message Ingestion: Internal Buffer with Backpressure
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| LMAX Disruptor | 4.0.x | High-performance ring buffer for ingestion | Lock-free, single-writer principle, handles burst traffic without blocking HTTP threads | MEDIUM |
|
||||
| *Alternative:* `java.util.concurrent.LinkedBlockingQueue` | (JDK) | Simpler bounded queue | Good enough for initial implementation; switch to Disruptor if profiling shows contention | HIGH |
|
||||
|
||||
**Why an internal buffer, not Kafka:**
|
||||
|
||||
Kafka is the standard answer for "high-volume ingestion," but it adds massive operational complexity for a system that:
|
||||
- Has a single data producer type (Cameleer agents via HTTP POST)
|
||||
- Does not need replay from an external topic
|
||||
- Does not need multi-consumer fan-out
|
||||
- Is already receiving data via HTTP (not streaming)
|
||||
|
||||
The right pattern here: **HTTP POST -> bounded in-memory queue -> batch writer to ClickHouse + async indexer to OpenSearch**. If the queue fills up, return HTTP 503 with `Retry-After` header — agents should implement exponential backoff.
|
||||
|
||||
**When to add Kafka:** Only if you need cross-datacenter replication, multi-consumer processing, or guaranteed exactly-once delivery beyond what the internal buffer provides. This is a "maybe Phase 3+" decision.
|
||||
|
||||
### API Documentation: springdoc-openapi
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| springdoc-openapi-starter-webmvc-ui | 2.x | OpenAPI 3.1 spec generation + Swagger UI | De facto standard for Spring Boot 3.x API docs; annotation-driven, zero-config for basic setup | MEDIUM |
|
||||
|
||||
**Why springdoc over alternatives:**
|
||||
- **vs SpringFox:** SpringFox is effectively dead; no Spring Boot 3 support
|
||||
- **vs manual OpenAPI:** Too much maintenance overhead; springdoc generates from code
|
||||
- springdoc supports Spring Boot 3.x natively, including Spring Security integration
|
||||
|
||||
### Security
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| Spring Security | (Spring Boot 3.4.3) | Authentication/authorization framework | Already part of Spring Boot; JWT filter chain, method security | HIGH |
|
||||
| java-jwt (Auth0) | 4.x | JWT creation and validation | Lightweight, well-maintained; simpler than Nimbus for this use case | MEDIUM |
|
||||
| Ed25519 (JDK `java.security`) | (JDK 17) | Config signing | JDK 15+ has native EdDSA support; no external library needed | HIGH |
|
||||
|
||||
### Testing
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| JUnit 5 | (Spring Boot) | Unit/integration testing | Already in POM; standard | HIGH |
|
||||
| Testcontainers | 1.19.x+ | Integration tests with ClickHouse and OpenSearch | Spin up real databases in Docker for tests; no mocking storage layer | MEDIUM |
|
||||
| Spring Boot Test | (Spring Boot) | Controller/integration testing | `@SpringBootTest`, `MockMvc`, etc. | HIGH |
|
||||
| Awaitility | 4.2.x | Async testing (SSE, queue processing) | Clean API for testing eventually-consistent behavior | MEDIUM |
|
||||
|
||||
### Containerization
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| Docker | - | Container runtime | Required per project constraints | HIGH |
|
||||
| Docker Compose | - | Local dev + simple deployment | Single command to run server + ClickHouse + OpenSearch + Redis | HIGH |
|
||||
| Eclipse Temurin JDK 17 | - | Base image | Official OpenJDK distribution; `eclipse-temurin:17-jre-alpine` for small image | HIGH |
|
||||
|
||||
### Monitoring (Server Self-Observability)
|
||||
|
||||
| Technology | Version | Purpose | Why | Confidence |
|
||||
|------------|---------|---------|-----|------------|
|
||||
| Micrometer | (Spring Boot) | Metrics facade | Built into Spring Boot; exposes ingestion rates, queue depth, query latencies | HIGH |
|
||||
| Spring Boot Actuator | (Spring Boot) | Health checks, metrics endpoint | `/actuator/health` for Docker health checks, `/actuator/prometheus` for metrics | HIGH |
|
||||
|
||||
## Supporting Libraries
|
||||
|
||||
| Library | Version | Purpose | When to Use | Confidence |
|
||||
|---------|---------|---------|-------------|------------|
|
||||
| MapStruct | 1.5.x | DTO <-> entity mapping | Compile-time mapping; avoids reflection overhead in hot path | MEDIUM |
|
||||
| Jackson JavaTimeModule | (already used) | `Instant` serialization | Already in project for `java.time` types | HIGH |
|
||||
| SLF4J + Logback | (Spring Boot) | Logging | Default Spring Boot logging; structured JSON logging for production | HIGH |
|
||||
|
||||
## What NOT to Use
|
||||
|
||||
| Technology | Why Not |
|
||||
|------------|---------|
|
||||
| Elasticsearch | SSPL license; OpenSearch is API-compatible and Apache 2.0 |
|
||||
| Kafka | Massive operational overhead for a system with single producer type; internal buffer is sufficient initially |
|
||||
| MongoDB | Poor fit for time-series analytical queries; no native TTL with the efficiency of ClickHouse's MergeTree |
|
||||
| PostgreSQL (as primary) | Cannot handle millions of inserts/day with fast analytical queries at 30-day retention |
|
||||
| SpringFox | Dead project; no Spring Boot 3 support |
|
||||
| Hibernate/JPA | ClickHouse is not a relational DB; JPA adds friction with no benefit. Use ClickHouse Java client directly. |
|
||||
| Lombok | Controversial; Java 17 records cover most use cases; explicit code is clearer |
|
||||
| gRPC | Agents already use HTTP POST; adding gRPC doubles protocol complexity for marginal throughput gain |
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
| Category | Recommended | Alternative | Why Not Alternative |
|
||||
|----------|-------------|-------------|---------------------|
|
||||
| Primary store | ClickHouse | TimescaleDB | Row-oriented heritage; larger storage footprint; less efficient for wide analytical queries |
|
||||
| Primary store | ClickHouse | PostgreSQL + partitioning | Vacuum overhead; partition management; slower aggregations |
|
||||
| Search | OpenSearch | Elasticsearch | SSPL license risk; functionally equivalent |
|
||||
| Search | OpenSearch | ClickHouse full-text indexes | Not true full-text search; no relevance scoring, no phrase matching |
|
||||
| Ingestion buffer | Internal queue | Apache Kafka | Operational complexity not justified; single producer type |
|
||||
| Cache | Caffeine | Guava Cache | Caffeine is successor to Guava Cache with better performance |
|
||||
| API docs | springdoc-openapi | SpringFox | SpringFox has no Spring Boot 3 support |
|
||||
| JWT | java-jwt (Auth0) | Nimbus JOSE+JWT | Nimbus is more complex; java-jwt sufficient for symmetric/asymmetric JWT |
|
||||
|
||||
## Installation (Maven Dependencies)
|
||||
|
||||
```xml
|
||||
<!-- ClickHouse -->
|
||||
<dependency>
|
||||
<groupId>com.clickhouse</groupId>
|
||||
<artifactId>clickhouse-http-client</artifactId>
|
||||
<version>0.6.5</version> <!-- verify latest -->
|
||||
</dependency>
|
||||
|
||||
<!-- OpenSearch -->
|
||||
<dependency>
|
||||
<groupId>org.opensearch.client</groupId>
|
||||
<artifactId>opensearch-java</artifactId>
|
||||
<version>2.13.0</version> <!-- verify latest -->
|
||||
</dependency>
|
||||
|
||||
<!-- Caffeine Cache -->
|
||||
<dependency>
|
||||
<groupId>com.github.ben-manes.caffeine</groupId>
|
||||
<artifactId>caffeine</artifactId>
|
||||
<!-- version managed by Spring Boot -->
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-cache</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Security -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-security</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.auth0</groupId>
|
||||
<artifactId>java-jwt</artifactId>
|
||||
<version>4.4.0</version> <!-- verify latest -->
|
||||
</dependency>
|
||||
|
||||
<!-- API Documentation -->
|
||||
<dependency>
|
||||
<groupId>org.springdoc</groupId>
|
||||
<artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
|
||||
<version>2.6.0</version> <!-- verify latest -->
|
||||
</dependency>
|
||||
|
||||
<!-- Monitoring -->
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-actuator</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>io.micrometer</groupId>
|
||||
<artifactId>micrometer-registry-prometheus</artifactId>
|
||||
</dependency>
|
||||
|
||||
<!-- Testing -->
|
||||
<dependency>
|
||||
<groupId>org.testcontainers</groupId>
|
||||
<artifactId>clickhouse</artifactId>
|
||||
<version>1.19.8</version> <!-- verify latest -->
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.testcontainers</groupId>
|
||||
<artifactId>junit-jupiter</artifactId>
|
||||
<version>1.19.8</version> <!-- verify latest -->
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.awaitility</groupId>
|
||||
<artifactId>awaitility</artifactId>
|
||||
<scope>test</scope>
|
||||
<!-- version managed by Spring Boot -->
|
||||
</dependency>
|
||||
|
||||
<!-- Mapping -->
|
||||
<dependency>
|
||||
<groupId>org.mapstruct</groupId>
|
||||
<artifactId>mapstruct</artifactId>
|
||||
<version>1.5.5.Final</version> <!-- verify latest -->
|
||||
</dependency>
|
||||
```
|
||||
|
||||
## Version Verification Notes
|
||||
|
||||
All version numbers are from training data (up to May 2025). Before adding dependencies, verify the latest stable versions on Maven Central:
|
||||
- `clickhouse-http-client`: check https://github.com/ClickHouse/clickhouse-java/releases
|
||||
- `opensearch-java`: check https://github.com/opensearch-project/opensearch-java/releases
|
||||
- `springdoc-openapi-starter-webmvc-ui`: check https://springdoc.org/
|
||||
- `java-jwt`: check https://github.com/auth0/java-jwt/releases
|
||||
- `testcontainers`: check https://github.com/testcontainers/testcontainers-java/releases
|
||||
|
||||
## Sources
|
||||
|
||||
- Training data knowledge (ClickHouse architecture, OpenSearch capabilities, Spring Boot ecosystem)
|
||||
- Project POM analysis (Spring Boot 3.4.3, Jackson 2.17.3, existing module structure)
|
||||
- CLAUDE.md project instructions (ClickHouse mentioned as storage target, JWT/Ed25519 security model)
|
||||
|
||||
**Note:** All external source verification was unavailable during this research session. Version numbers should be validated before implementation.
|
||||
101
.planning/research/SUMMARY.md
Normal file
101
.planning/research/SUMMARY.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# Research Summary: Cameleer3 Server
|
||||
|
||||
**Domain:** Transaction observability server for Apache Camel integrations
|
||||
**Researched:** 2026-03-11
|
||||
**Overall confidence:** MEDIUM (established domain with mature patterns; version numbers unverified against live sources)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Cameleer3 Server is a write-heavy, read-occasional observability system that receives millions of transaction records per day from distributed Apache Camel agents, stores them with 30-day retention, and provides structured + full-text search. The architecture closely parallels established observability platforms like Jaeger, Zipkin, and njams Server, with the key differentiator being Camel route diagram visualization tied to individual transactions.
|
||||
|
||||
The recommended stack centers on **ClickHouse** as the primary data store. ClickHouse's columnar MergeTree engine provides the exact properties this project needs: massive batch insert throughput, excellent time-range query performance, native TTL-based retention, and 10-20x compression on structured observability data. This is a well-established pattern used by production observability platforms (SigNoz, Uptrace, PostHog all run on ClickHouse).
|
||||
|
||||
For full-text search, the recommendation is a **phased approach**: start with ClickHouse's built-in token bloom filter skip indexes (`tokenbf_v1`), which handle exact-token search (correlation IDs, error messages, specific values) well enough for MVP. When query patterns demand fuzzy matching or relevance scoring, add **OpenSearch** as a secondary search index. The architecture should be designed from the start to allow this swap transparently via the repository abstraction in the core module.
|
||||
|
||||
The critical architectural pattern is an **in-memory write buffer** between the HTTP ingestion endpoint and ClickHouse. ClickHouse performs best with batch inserts of 1K-10K rows; individual row inserts are the single most common and most damaging mistake when building on ClickHouse. The buffer also provides the backpressure mechanism (HTTP 429) that prevents the server from being overwhelmed during agent reconnection storms.
|
||||
|
||||
The two-module structure (core for domain logic + interfaces, app for Spring Boot wiring + implementations) enforces clean boundaries. Core defines repository interfaces, service implementations, and the write buffer. App provides ClickHouse repository implementations, Spring SseEmitter integration, REST controllers, and security filters. The boundary rule is strict: app depends on core, never the reverse.
|
||||
|
||||
## Key Findings
|
||||
|
||||
**Stack:** Java 17 / Spring Boot 3.4.3 + ClickHouse (primary store) + ClickHouse skip indexes for text search (phase 1), OpenSearch optional (phase 2+) + Caffeine cache + springdoc-openapi + Auth0 java-jwt. No Kafka, no Elasticsearch, no JPA.
|
||||
|
||||
**Architecture:** Write-heavy CQRS-lite with three data paths: (1) buffered ingestion pipeline to ClickHouse, (2) query engine combining structured ClickHouse queries with text search, (3) SSE connection registry for agent push. Repository abstraction keeps core module storage-agnostic. Content-addressable diagram versioning with async pre-rendering.
|
||||
|
||||
**Critical pitfall:** Row-by-row ClickHouse inserts and wrong ORDER BY design. These two mistakes together will make the system fail within hours under load and cannot be fixed without table recreation. Batch buffering and schema design must be correct from the first implementation.
|
||||
|
||||
## Implications for Roadmap
|
||||
|
||||
Based on research, suggested phase structure:
|
||||
|
||||
1. **Foundation + Ingestion Pipeline** - Data model, ClickHouse schema design, batch write buffer, ingestion endpoint
|
||||
- Addresses: Transaction ingestion, storage with TTL retention
|
||||
- Avoids: Row-by-row inserts, wrong ORDER BY, no backpressure
|
||||
- This phase needs careful design; ClickHouse ORDER BY and partition strategy are nearly impossible to change later
|
||||
|
||||
2. **Transaction Query + API** - Query engine, structured filters (time/state/duration), cursor-based pagination, REST controllers
|
||||
- Addresses: Core search experience, API-first design
|
||||
- Avoids: OFFSET pagination degradation, N+1 queries by co-locating data access
|
||||
|
||||
3. **Agent Registry + SSE** - Agent lifecycle management (LIVE/STALE/DEAD), heartbeat monitoring, SSE connection registry, config push
|
||||
- Addresses: Agent management, real-time server-to-agent communication
|
||||
- Avoids: SSE connection leaks, ghost agents, reconnection without Last-Event-ID
|
||||
|
||||
4. **Diagram Service** - Content-addressable versioned storage, async rendering, transaction-diagram linking
|
||||
- Addresses: Route diagram visualization (key Camel-specific differentiator)
|
||||
- Avoids: Duplicate diagram storage via content hashing, synchronous rendering bottleneck
|
||||
|
||||
5. **Security** - JWT authentication, Ed25519 config signing, bootstrap token registration
|
||||
- Addresses: Production-ready security
|
||||
- Avoids: Token management without rotation
|
||||
- Can be partially layered in earlier if needed for integration testing with agents
|
||||
|
||||
6. **Full-Text Search** - ClickHouse skip indexes initially; OpenSearch integration if bloom filters prove insufficient
|
||||
- Addresses: "Find any transaction by content" requirement
|
||||
- Avoids: Using LIKE/hasToken on large text columns without proper indexing
|
||||
- Decision point: ClickHouse bloom filters may suffice; evaluate before adding OpenSearch
|
||||
|
||||
7. **Dashboard + Aggregations** - Overview charts, error rates, volume trends using ClickHouse aggregation queries
|
||||
- Addresses: At-a-glance operational awareness
|
||||
|
||||
8. **Web UI** - Frontend consuming the REST API exclusively
|
||||
- Addresses: User-facing interface
|
||||
- Must come after API is stable per API-first principle
|
||||
|
||||
**Phase ordering rationale:**
|
||||
- Storage before query: you need data to query
|
||||
- Ingestion before agents: agents need somewhere to POST
|
||||
- Query before full-text: structured search first, text layers on top
|
||||
- Agent registry before config push: must know who to push to
|
||||
- Diagrams after query engine: transactions must exist to link diagrams to
|
||||
- Security is cross-cutting but cleanest after core flows work
|
||||
- UI last because API-first means the API must be stable first
|
||||
|
||||
**Research flags for phases:**
|
||||
- Phase 1 (Storage): NEEDS DEEPER RESEARCH -- ClickHouse Java client API, optimal ORDER BY for the specific query patterns, Docker configuration
|
||||
- Phase 4 (Diagrams): NEEDS DEEPER RESEARCH -- server-side graph rendering library selection (Batik, jsvg, JGraphX, or client-side rendering)
|
||||
- Phase 6 (Full-Text): NEEDS DEEPER RESEARCH -- ClickHouse skip index capabilities vs OpenSearch integration complexity; decision point
|
||||
- Phase 8 (UI): NEEDS DEEPER RESEARCH -- frontend framework selection
|
||||
- Phase 2 (Query): Standard patterns, unlikely to need research
|
||||
- Phase 5 (Security): Standard patterns, unlikely to need research
|
||||
|
||||
## Confidence Assessment
|
||||
|
||||
| Area | Confidence | Notes |
|
||||
|------|------------|-------|
|
||||
| Stack (ClickHouse choice) | HIGH | Well-established pattern for observability; used by SigNoz, Uptrace, PostHog |
|
||||
| Stack (version numbers) | LOW | Could not verify against live sources; all versions from training data (May 2025 cutoff) |
|
||||
| Features | MEDIUM | Based on domain knowledge of njams, Jaeger, Zipkin; could not verify latest feature trends |
|
||||
| Architecture | MEDIUM | Patterns are well-established; batch buffer, SSE registry, content-addressable storage are standard |
|
||||
| Pitfalls | HIGH | ClickHouse pitfalls are well-documented; SSE lifecycle issues are common; ingestion backpressure is standard |
|
||||
| Full-text search approach | MEDIUM | ClickHouse skip indexes vs OpenSearch is a legitimate decision point that needs hands-on evaluation |
|
||||
|
||||
## Gaps to Address
|
||||
|
||||
- **ClickHouse Java client API:** The clickhouse-java library has undergone significant changes. Exact API, connection pooling, and Spring Boot integration patterns need phase-specific research
|
||||
- **cameleer3-common PROTOCOL.md:** Must read the agent protocol definition before designing ClickHouse schema -- this defines the exact data structures being ingested
|
||||
- **ClickHouse Docker setup:** Optimal ClickHouse Docker configuration (memory limits, merge settings) for development and production
|
||||
- **Full-text search decision:** ClickHouse skip indexes may or may not meet the "search by any content" requirement. This needs prototyping with realistic data
|
||||
- **Diagram rendering library:** Server-side route diagram rendering is a significant unknown; needs prototyping with actual Camel route graph data from cameleer3-common
|
||||
- **Frontend framework:** No research on UI technology -- deferred to UI phase
|
||||
- **Agent protocol stability:** The cameleer3-common protocol is still evolving. Schema evolution strategy needs alignment with agent development
|
||||
@@ -27,11 +27,41 @@
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-websocket</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-actuator</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-jdbc</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>com.clickhouse</groupId>
|
||||
<artifactId>clickhouse-jdbc</artifactId>
|
||||
<version>0.9.7</version>
|
||||
<classifier>all</classifier>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.springdoc</groupId>
|
||||
<artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
|
||||
<version>2.8.6</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.springframework.boot</groupId>
|
||||
<artifactId>spring-boot-starter-test</artifactId>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.testcontainers</groupId>
|
||||
<artifactId>testcontainers-clickhouse</artifactId>
|
||||
<version>2.0.3</version>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.awaitility</groupId>
|
||||
<artifactId>awaitility</artifactId>
|
||||
<scope>test</scope>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
|
||||
<build>
|
||||
|
||||
@@ -0,0 +1,25 @@
|
||||
package com.cameleer3.server.app;
|
||||
|
||||
import com.cameleer3.server.app.config.IngestionConfig;
|
||||
import org.springframework.boot.SpringApplication;
|
||||
import org.springframework.boot.autoconfigure.SpringBootApplication;
|
||||
import org.springframework.boot.context.properties.EnableConfigurationProperties;
|
||||
import org.springframework.scheduling.annotation.EnableScheduling;
|
||||
|
||||
/**
|
||||
* Main entry point for the Cameleer3 Server application.
|
||||
* <p>
|
||||
* Scans {@code com.cameleer3.server.app} and {@code com.cameleer3.server.core} packages.
|
||||
*/
|
||||
@SpringBootApplication(scanBasePackages = {
|
||||
"com.cameleer3.server.app",
|
||||
"com.cameleer3.server.core"
|
||||
})
|
||||
@EnableScheduling
|
||||
@EnableConfigurationProperties(IngestionConfig.class)
|
||||
public class Cameleer3ServerApplication {
|
||||
|
||||
public static void main(String[] args) {
|
||||
SpringApplication.run(Cameleer3ServerApplication.class, args);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,22 @@
|
||||
package com.cameleer3.server.app.config;
|
||||
|
||||
import org.springframework.context.annotation.Bean;
|
||||
import org.springframework.context.annotation.Configuration;
|
||||
import org.springframework.jdbc.core.JdbcTemplate;
|
||||
|
||||
import javax.sql.DataSource;
|
||||
|
||||
/**
|
||||
* ClickHouse configuration.
|
||||
* <p>
|
||||
* Spring Boot auto-configures the DataSource from {@code spring.datasource.*} properties.
|
||||
* This class exposes a JdbcTemplate bean for repository implementations.
|
||||
*/
|
||||
@Configuration
|
||||
public class ClickHouseConfig {
|
||||
|
||||
@Bean
|
||||
public JdbcTemplate jdbcTemplate(DataSource dataSource) {
|
||||
return new JdbcTemplate(dataSource);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,41 @@
|
||||
package com.cameleer3.server.app.config;
|
||||
|
||||
import com.cameleer3.common.graph.RouteGraph;
|
||||
import com.cameleer3.common.model.RouteExecution;
|
||||
import com.cameleer3.server.core.ingestion.IngestionService;
|
||||
import com.cameleer3.server.core.ingestion.WriteBuffer;
|
||||
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
|
||||
import org.springframework.context.annotation.Bean;
|
||||
import org.springframework.context.annotation.Configuration;
|
||||
|
||||
/**
|
||||
* Creates the write buffer and ingestion service beans.
|
||||
* <p>
|
||||
* The {@link WriteBuffer} instances are shared between the
|
||||
* {@link IngestionService} (producer side) and the flush scheduler (consumer side).
|
||||
*/
|
||||
@Configuration
|
||||
public class IngestionBeanConfig {
|
||||
|
||||
@Bean
|
||||
public WriteBuffer<RouteExecution> executionBuffer(IngestionConfig config) {
|
||||
return new WriteBuffer<>(config.getBufferCapacity());
|
||||
}
|
||||
|
||||
@Bean
|
||||
public WriteBuffer<RouteGraph> diagramBuffer(IngestionConfig config) {
|
||||
return new WriteBuffer<>(config.getBufferCapacity());
|
||||
}
|
||||
|
||||
@Bean
|
||||
public WriteBuffer<MetricsSnapshot> metricsBuffer(IngestionConfig config) {
|
||||
return new WriteBuffer<>(config.getBufferCapacity());
|
||||
}
|
||||
|
||||
@Bean
|
||||
public IngestionService ingestionService(WriteBuffer<RouteExecution> executionBuffer,
|
||||
WriteBuffer<RouteGraph> diagramBuffer,
|
||||
WriteBuffer<MetricsSnapshot> metricsBuffer) {
|
||||
return new IngestionService(executionBuffer, diagramBuffer, metricsBuffer);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,41 @@
|
||||
package com.cameleer3.server.app.config;
|
||||
|
||||
import org.springframework.boot.context.properties.ConfigurationProperties;
|
||||
|
||||
/**
|
||||
* Configuration properties for the ingestion write buffer.
|
||||
* Bound from the {@code ingestion.*} namespace in application.yml.
|
||||
* <p>
|
||||
* Registered via {@code @EnableConfigurationProperties} on the application class.
|
||||
*/
|
||||
@ConfigurationProperties(prefix = "ingestion")
|
||||
public class IngestionConfig {
|
||||
|
||||
private int bufferCapacity = 50_000;
|
||||
private int batchSize = 5_000;
|
||||
private long flushIntervalMs = 1_000;
|
||||
|
||||
public int getBufferCapacity() {
|
||||
return bufferCapacity;
|
||||
}
|
||||
|
||||
public void setBufferCapacity(int bufferCapacity) {
|
||||
this.bufferCapacity = bufferCapacity;
|
||||
}
|
||||
|
||||
public int getBatchSize() {
|
||||
return batchSize;
|
||||
}
|
||||
|
||||
public void setBatchSize(int batchSize) {
|
||||
this.batchSize = batchSize;
|
||||
}
|
||||
|
||||
public long getFlushIntervalMs() {
|
||||
return flushIntervalMs;
|
||||
}
|
||||
|
||||
public void setFlushIntervalMs(long flushIntervalMs) {
|
||||
this.flushIntervalMs = flushIntervalMs;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,34 @@
|
||||
package com.cameleer3.server.app.config;
|
||||
|
||||
import com.cameleer3.server.app.interceptor.ProtocolVersionInterceptor;
|
||||
import org.springframework.context.annotation.Configuration;
|
||||
import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
|
||||
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
|
||||
|
||||
/**
|
||||
* Web MVC configuration.
|
||||
* <p>
|
||||
* Registers the {@link ProtocolVersionInterceptor} on data and agent endpoint paths,
|
||||
* excluding health, API docs, and Swagger UI paths that do not require protocol versioning.
|
||||
*/
|
||||
@Configuration
|
||||
public class WebConfig implements WebMvcConfigurer {
|
||||
|
||||
private final ProtocolVersionInterceptor protocolVersionInterceptor;
|
||||
|
||||
public WebConfig(ProtocolVersionInterceptor protocolVersionInterceptor) {
|
||||
this.protocolVersionInterceptor = protocolVersionInterceptor;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void addInterceptors(InterceptorRegistry registry) {
|
||||
registry.addInterceptor(protocolVersionInterceptor)
|
||||
.addPathPatterns("/api/v1/data/**", "/api/v1/agents/**")
|
||||
.excludePathPatterns(
|
||||
"/api/v1/health",
|
||||
"/api/v1/api-docs/**",
|
||||
"/api/v1/swagger-ui/**",
|
||||
"/api/v1/swagger-ui.html"
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,77 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.common.graph.RouteGraph;
|
||||
import com.cameleer3.server.core.ingestion.IngestionService;
|
||||
import com.fasterxml.jackson.core.JsonProcessingException;
|
||||
import com.fasterxml.jackson.core.type.TypeReference;
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import io.swagger.v3.oas.annotations.Operation;
|
||||
import io.swagger.v3.oas.annotations.responses.ApiResponse;
|
||||
import io.swagger.v3.oas.annotations.tags.Tag;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.http.HttpStatus;
|
||||
import org.springframework.http.ResponseEntity;
|
||||
import org.springframework.web.bind.annotation.PostMapping;
|
||||
import org.springframework.web.bind.annotation.RequestBody;
|
||||
import org.springframework.web.bind.annotation.RequestMapping;
|
||||
import org.springframework.web.bind.annotation.RestController;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Ingestion endpoint for route diagrams.
|
||||
* <p>
|
||||
* Accepts both single {@link RouteGraph} and arrays. Data is buffered
|
||||
* and flushed to ClickHouse by the flush scheduler.
|
||||
*/
|
||||
@RestController
|
||||
@RequestMapping("/api/v1/data")
|
||||
@Tag(name = "Ingestion", description = "Data ingestion endpoints")
|
||||
public class DiagramController {
|
||||
|
||||
private static final Logger log = LoggerFactory.getLogger(DiagramController.class);
|
||||
|
||||
private final IngestionService ingestionService;
|
||||
private final ObjectMapper objectMapper;
|
||||
|
||||
public DiagramController(IngestionService ingestionService, ObjectMapper objectMapper) {
|
||||
this.ingestionService = ingestionService;
|
||||
this.objectMapper = objectMapper;
|
||||
}
|
||||
|
||||
@PostMapping("/diagrams")
|
||||
@Operation(summary = "Ingest route diagram data",
|
||||
description = "Accepts a single RouteGraph or an array of RouteGraphs")
|
||||
@ApiResponse(responseCode = "202", description = "Data accepted for processing")
|
||||
@ApiResponse(responseCode = "503", description = "Buffer full, retry later")
|
||||
public ResponseEntity<Void> ingestDiagrams(@RequestBody String body) throws JsonProcessingException {
|
||||
List<RouteGraph> graphs = parsePayload(body);
|
||||
boolean accepted;
|
||||
|
||||
if (graphs.size() == 1) {
|
||||
accepted = ingestionService.acceptDiagram(graphs.get(0));
|
||||
} else {
|
||||
accepted = ingestionService.acceptDiagrams(graphs);
|
||||
}
|
||||
|
||||
if (!accepted) {
|
||||
log.warn("Diagram buffer full, returning 503");
|
||||
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
|
||||
.header("Retry-After", "5")
|
||||
.build();
|
||||
}
|
||||
|
||||
return ResponseEntity.accepted().build();
|
||||
}
|
||||
|
||||
private List<RouteGraph> parsePayload(String body) throws JsonProcessingException {
|
||||
String trimmed = body.strip();
|
||||
if (trimmed.startsWith("[")) {
|
||||
return objectMapper.readValue(trimmed, new TypeReference<>() {});
|
||||
} else {
|
||||
RouteGraph single = objectMapper.readValue(trimmed, RouteGraph.class);
|
||||
return List.of(single);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,78 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.common.model.RouteExecution;
|
||||
import com.cameleer3.server.core.ingestion.IngestionService;
|
||||
import com.fasterxml.jackson.core.JsonProcessingException;
|
||||
import com.fasterxml.jackson.core.type.TypeReference;
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import io.swagger.v3.oas.annotations.Operation;
|
||||
import io.swagger.v3.oas.annotations.responses.ApiResponse;
|
||||
import io.swagger.v3.oas.annotations.tags.Tag;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.http.HttpStatus;
|
||||
import org.springframework.http.ResponseEntity;
|
||||
import org.springframework.web.bind.annotation.PostMapping;
|
||||
import org.springframework.web.bind.annotation.RequestBody;
|
||||
import org.springframework.web.bind.annotation.RequestMapping;
|
||||
import org.springframework.web.bind.annotation.RestController;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Ingestion endpoint for route execution data.
|
||||
* <p>
|
||||
* Accepts both single {@link RouteExecution} and arrays. Data is buffered
|
||||
* in a {@link com.cameleer3.server.core.ingestion.WriteBuffer} and flushed
|
||||
* to ClickHouse by the flush scheduler.
|
||||
*/
|
||||
@RestController
|
||||
@RequestMapping("/api/v1/data")
|
||||
@Tag(name = "Ingestion", description = "Data ingestion endpoints")
|
||||
public class ExecutionController {
|
||||
|
||||
private static final Logger log = LoggerFactory.getLogger(ExecutionController.class);
|
||||
|
||||
private final IngestionService ingestionService;
|
||||
private final ObjectMapper objectMapper;
|
||||
|
||||
public ExecutionController(IngestionService ingestionService, ObjectMapper objectMapper) {
|
||||
this.ingestionService = ingestionService;
|
||||
this.objectMapper = objectMapper;
|
||||
}
|
||||
|
||||
@PostMapping("/executions")
|
||||
@Operation(summary = "Ingest route execution data",
|
||||
description = "Accepts a single RouteExecution or an array of RouteExecutions")
|
||||
@ApiResponse(responseCode = "202", description = "Data accepted for processing")
|
||||
@ApiResponse(responseCode = "503", description = "Buffer full, retry later")
|
||||
public ResponseEntity<Void> ingestExecutions(@RequestBody String body) throws JsonProcessingException {
|
||||
List<RouteExecution> executions = parsePayload(body);
|
||||
boolean accepted;
|
||||
|
||||
if (executions.size() == 1) {
|
||||
accepted = ingestionService.acceptExecution(executions.get(0));
|
||||
} else {
|
||||
accepted = ingestionService.acceptExecutions(executions);
|
||||
}
|
||||
|
||||
if (!accepted) {
|
||||
log.warn("Execution buffer full, returning 503");
|
||||
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
|
||||
.header("Retry-After", "5")
|
||||
.build();
|
||||
}
|
||||
|
||||
return ResponseEntity.accepted().build();
|
||||
}
|
||||
|
||||
private List<RouteExecution> parsePayload(String body) throws JsonProcessingException {
|
||||
String trimmed = body.strip();
|
||||
if (trimmed.startsWith("[")) {
|
||||
return objectMapper.readValue(trimmed, new TypeReference<>() {});
|
||||
} else {
|
||||
RouteExecution single = objectMapper.readValue(trimmed, RouteExecution.class);
|
||||
return List.of(single);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,71 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.core.ingestion.IngestionService;
|
||||
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
|
||||
import com.fasterxml.jackson.core.JsonProcessingException;
|
||||
import com.fasterxml.jackson.core.type.TypeReference;
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import io.swagger.v3.oas.annotations.Operation;
|
||||
import io.swagger.v3.oas.annotations.responses.ApiResponse;
|
||||
import io.swagger.v3.oas.annotations.tags.Tag;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.http.HttpStatus;
|
||||
import org.springframework.http.ResponseEntity;
|
||||
import org.springframework.web.bind.annotation.PostMapping;
|
||||
import org.springframework.web.bind.annotation.RequestBody;
|
||||
import org.springframework.web.bind.annotation.RequestMapping;
|
||||
import org.springframework.web.bind.annotation.RestController;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Ingestion endpoint for agent metrics.
|
||||
* <p>
|
||||
* Accepts an array of {@link MetricsSnapshot}. Data is buffered
|
||||
* and flushed to ClickHouse by the flush scheduler.
|
||||
*/
|
||||
@RestController
|
||||
@RequestMapping("/api/v1/data")
|
||||
@Tag(name = "Ingestion", description = "Data ingestion endpoints")
|
||||
public class MetricsController {
|
||||
|
||||
private static final Logger log = LoggerFactory.getLogger(MetricsController.class);
|
||||
|
||||
private final IngestionService ingestionService;
|
||||
private final ObjectMapper objectMapper;
|
||||
|
||||
public MetricsController(IngestionService ingestionService, ObjectMapper objectMapper) {
|
||||
this.ingestionService = ingestionService;
|
||||
this.objectMapper = objectMapper;
|
||||
}
|
||||
|
||||
@PostMapping("/metrics")
|
||||
@Operation(summary = "Ingest agent metrics",
|
||||
description = "Accepts an array of MetricsSnapshot objects")
|
||||
@ApiResponse(responseCode = "202", description = "Data accepted for processing")
|
||||
@ApiResponse(responseCode = "503", description = "Buffer full, retry later")
|
||||
public ResponseEntity<Void> ingestMetrics(@RequestBody String body) throws JsonProcessingException {
|
||||
List<MetricsSnapshot> metrics = parsePayload(body);
|
||||
boolean accepted = ingestionService.acceptMetrics(metrics);
|
||||
|
||||
if (!accepted) {
|
||||
log.warn("Metrics buffer full, returning 503");
|
||||
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
|
||||
.header("Retry-After", "5")
|
||||
.build();
|
||||
}
|
||||
|
||||
return ResponseEntity.accepted().build();
|
||||
}
|
||||
|
||||
private List<MetricsSnapshot> parsePayload(String body) throws JsonProcessingException {
|
||||
String trimmed = body.strip();
|
||||
if (trimmed.startsWith("[")) {
|
||||
return objectMapper.readValue(trimmed, new TypeReference<>() {});
|
||||
} else {
|
||||
MetricsSnapshot single = objectMapper.readValue(trimmed, MetricsSnapshot.class);
|
||||
return List.of(single);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,159 @@
|
||||
package com.cameleer3.server.app.ingestion;
|
||||
|
||||
import com.cameleer3.common.graph.RouteGraph;
|
||||
import com.cameleer3.common.model.RouteExecution;
|
||||
import com.cameleer3.server.app.config.IngestionConfig;
|
||||
import com.cameleer3.server.core.ingestion.WriteBuffer;
|
||||
import com.cameleer3.server.core.storage.DiagramRepository;
|
||||
import com.cameleer3.server.core.storage.ExecutionRepository;
|
||||
import com.cameleer3.server.core.storage.MetricsRepository;
|
||||
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.context.SmartLifecycle;
|
||||
import org.springframework.scheduling.annotation.Scheduled;
|
||||
import org.springframework.stereotype.Component;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Scheduled task that drains the write buffers and batch-inserts into ClickHouse.
|
||||
* <p>
|
||||
* Implements {@link SmartLifecycle} to ensure all remaining buffered data is
|
||||
* flushed on application shutdown.
|
||||
*/
|
||||
@Component
|
||||
public class ClickHouseFlushScheduler implements SmartLifecycle {
|
||||
|
||||
private static final Logger log = LoggerFactory.getLogger(ClickHouseFlushScheduler.class);
|
||||
|
||||
private final WriteBuffer<RouteExecution> executionBuffer;
|
||||
private final WriteBuffer<RouteGraph> diagramBuffer;
|
||||
private final WriteBuffer<MetricsSnapshot> metricsBuffer;
|
||||
private final ExecutionRepository executionRepository;
|
||||
private final DiagramRepository diagramRepository;
|
||||
private final MetricsRepository metricsRepository;
|
||||
private final int batchSize;
|
||||
|
||||
private volatile boolean running = false;
|
||||
|
||||
public ClickHouseFlushScheduler(WriteBuffer<RouteExecution> executionBuffer,
|
||||
WriteBuffer<RouteGraph> diagramBuffer,
|
||||
WriteBuffer<MetricsSnapshot> metricsBuffer,
|
||||
ExecutionRepository executionRepository,
|
||||
DiagramRepository diagramRepository,
|
||||
MetricsRepository metricsRepository,
|
||||
IngestionConfig config) {
|
||||
this.executionBuffer = executionBuffer;
|
||||
this.diagramBuffer = diagramBuffer;
|
||||
this.metricsBuffer = metricsBuffer;
|
||||
this.executionRepository = executionRepository;
|
||||
this.diagramRepository = diagramRepository;
|
||||
this.metricsRepository = metricsRepository;
|
||||
this.batchSize = config.getBatchSize();
|
||||
}
|
||||
|
||||
@Scheduled(fixedDelayString = "${ingestion.flush-interval-ms:1000}")
|
||||
public void flushAll() {
|
||||
flushExecutions();
|
||||
flushDiagrams();
|
||||
flushMetrics();
|
||||
}
|
||||
|
||||
private void flushExecutions() {
|
||||
try {
|
||||
List<RouteExecution> batch = executionBuffer.drain(batchSize);
|
||||
if (!batch.isEmpty()) {
|
||||
executionRepository.insertBatch(batch);
|
||||
log.debug("Flushed {} executions to ClickHouse", batch.size());
|
||||
}
|
||||
} catch (Exception e) {
|
||||
log.error("Failed to flush executions to ClickHouse", e);
|
||||
}
|
||||
}
|
||||
|
||||
private void flushDiagrams() {
|
||||
try {
|
||||
List<RouteGraph> batch = diagramBuffer.drain(batchSize);
|
||||
for (RouteGraph graph : batch) {
|
||||
diagramRepository.store(graph);
|
||||
}
|
||||
if (!batch.isEmpty()) {
|
||||
log.debug("Flushed {} diagrams to ClickHouse", batch.size());
|
||||
}
|
||||
} catch (Exception e) {
|
||||
log.error("Failed to flush diagrams to ClickHouse", e);
|
||||
}
|
||||
}
|
||||
|
||||
private void flushMetrics() {
|
||||
try {
|
||||
List<MetricsSnapshot> batch = metricsBuffer.drain(batchSize);
|
||||
if (!batch.isEmpty()) {
|
||||
metricsRepository.insertBatch(batch);
|
||||
log.debug("Flushed {} metrics to ClickHouse", batch.size());
|
||||
}
|
||||
} catch (Exception e) {
|
||||
log.error("Failed to flush metrics to ClickHouse", e);
|
||||
}
|
||||
}
|
||||
|
||||
// SmartLifecycle -- flush remaining data on shutdown
|
||||
|
||||
@Override
|
||||
public void start() {
|
||||
running = true;
|
||||
log.info("ClickHouseFlushScheduler started");
|
||||
}
|
||||
|
||||
@Override
|
||||
public void stop() {
|
||||
log.info("ClickHouseFlushScheduler stopping -- flushing remaining data");
|
||||
drainAll();
|
||||
running = false;
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean isRunning() {
|
||||
return running;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getPhase() {
|
||||
// Run after most beans but before DataSource shutdown
|
||||
return Integer.MAX_VALUE - 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* Drain all buffers completely (loop until empty).
|
||||
*/
|
||||
private void drainAll() {
|
||||
drainBufferCompletely("executions", executionBuffer, batch -> executionRepository.insertBatch(batch));
|
||||
drainBufferCompletely("diagrams", diagramBuffer, batch -> {
|
||||
for (RouteGraph g : batch) {
|
||||
diagramRepository.store(g);
|
||||
}
|
||||
});
|
||||
drainBufferCompletely("metrics", metricsBuffer, batch -> metricsRepository.insertBatch(batch));
|
||||
}
|
||||
|
||||
private <T> void drainBufferCompletely(String name, WriteBuffer<T> buffer, java.util.function.Consumer<List<T>> inserter) {
|
||||
int total = 0;
|
||||
while (buffer.size() > 0) {
|
||||
List<T> batch = buffer.drain(batchSize);
|
||||
if (batch.isEmpty()) {
|
||||
break;
|
||||
}
|
||||
try {
|
||||
inserter.accept(batch);
|
||||
total += batch.size();
|
||||
} catch (Exception e) {
|
||||
log.error("Failed to flush remaining {} during shutdown", name, e);
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (total > 0) {
|
||||
log.info("Flushed {} remaining {} during shutdown", total, name);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,46 @@
|
||||
package com.cameleer3.server.app.interceptor;
|
||||
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import jakarta.servlet.http.HttpServletRequest;
|
||||
import jakarta.servlet.http.HttpServletResponse;
|
||||
import org.springframework.http.MediaType;
|
||||
import org.springframework.stereotype.Component;
|
||||
import org.springframework.web.servlet.HandlerInterceptor;
|
||||
|
||||
import java.util.Map;
|
||||
|
||||
/**
|
||||
* Validates that all requests to data and agent endpoints include the
|
||||
* {@code X-Cameleer-Protocol-Version} header with value {@code "1"}.
|
||||
* <p>
|
||||
* Requests missing the header or using an unsupported version receive a 400 response
|
||||
* with a JSON error body.
|
||||
*/
|
||||
@Component
|
||||
public class ProtocolVersionInterceptor implements HandlerInterceptor {
|
||||
|
||||
private static final String HEADER_NAME = "X-Cameleer-Protocol-Version";
|
||||
private static final String SUPPORTED_VERSION = "1";
|
||||
|
||||
private final ObjectMapper objectMapper;
|
||||
|
||||
public ProtocolVersionInterceptor(ObjectMapper objectMapper) {
|
||||
this.objectMapper = objectMapper;
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean preHandle(HttpServletRequest request, HttpServletResponse response,
|
||||
Object handler) throws Exception {
|
||||
String version = request.getHeader(HEADER_NAME);
|
||||
|
||||
if (version == null || !SUPPORTED_VERSION.equals(version)) {
|
||||
response.setStatus(HttpServletResponse.SC_BAD_REQUEST);
|
||||
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
|
||||
objectMapper.writeValue(response.getWriter(),
|
||||
Map.of("error", "Missing or unsupported X-Cameleer-Protocol-Version header"));
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,105 @@
|
||||
package com.cameleer3.server.app.storage;
|
||||
|
||||
import com.cameleer3.common.graph.RouteGraph;
|
||||
import com.cameleer3.server.core.storage.DiagramRepository;
|
||||
import com.fasterxml.jackson.core.JsonProcessingException;
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import com.fasterxml.jackson.datatype.jsr310.JavaTimeModule;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.jdbc.core.JdbcTemplate;
|
||||
import org.springframework.stereotype.Repository;
|
||||
|
||||
import java.nio.charset.StandardCharsets;
|
||||
import java.security.MessageDigest;
|
||||
import java.security.NoSuchAlgorithmException;
|
||||
import java.util.HexFormat;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.Optional;
|
||||
|
||||
/**
|
||||
* ClickHouse implementation of {@link DiagramRepository}.
|
||||
* <p>
|
||||
* Stores route graphs as JSON with SHA-256 content-hash deduplication.
|
||||
* The underlying table uses ReplacingMergeTree keyed on content_hash.
|
||||
*/
|
||||
@Repository
|
||||
public class ClickHouseDiagramRepository implements DiagramRepository {
|
||||
|
||||
private static final Logger log = LoggerFactory.getLogger(ClickHouseDiagramRepository.class);
|
||||
|
||||
private static final String INSERT_SQL = """
|
||||
INSERT INTO route_diagrams (content_hash, route_id, agent_id, definition)
|
||||
VALUES (?, ?, ?, ?)
|
||||
""";
|
||||
|
||||
private static final String SELECT_BY_HASH = """
|
||||
SELECT definition FROM route_diagrams WHERE content_hash = ? LIMIT 1
|
||||
""";
|
||||
|
||||
private static final String SELECT_HASH_FOR_ROUTE = """
|
||||
SELECT content_hash FROM route_diagrams
|
||||
WHERE route_id = ? AND agent_id = ?
|
||||
ORDER BY created_at DESC LIMIT 1
|
||||
""";
|
||||
|
||||
private final JdbcTemplate jdbcTemplate;
|
||||
private final ObjectMapper objectMapper;
|
||||
|
||||
public ClickHouseDiagramRepository(JdbcTemplate jdbcTemplate) {
|
||||
this.jdbcTemplate = jdbcTemplate;
|
||||
this.objectMapper = new ObjectMapper();
|
||||
this.objectMapper.registerModule(new JavaTimeModule());
|
||||
}
|
||||
|
||||
@Override
|
||||
public void store(RouteGraph graph) {
|
||||
try {
|
||||
String json = objectMapper.writeValueAsString(graph);
|
||||
String contentHash = sha256Hex(json);
|
||||
String routeId = graph.getRouteId() != null ? graph.getRouteId() : "";
|
||||
// agent_id is not part of RouteGraph -- set empty, controllers can enrich
|
||||
String agentId = "";
|
||||
|
||||
jdbcTemplate.update(INSERT_SQL, contentHash, routeId, agentId, json);
|
||||
log.debug("Stored diagram for route={} with hash={}", routeId, contentHash);
|
||||
} catch (JsonProcessingException e) {
|
||||
throw new RuntimeException("Failed to serialize RouteGraph to JSON", e);
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public Optional<RouteGraph> findByContentHash(String contentHash) {
|
||||
List<Map<String, Object>> rows = jdbcTemplate.queryForList(SELECT_BY_HASH, contentHash);
|
||||
if (rows.isEmpty()) {
|
||||
return Optional.empty();
|
||||
}
|
||||
String json = (String) rows.get(0).get("definition");
|
||||
try {
|
||||
return Optional.of(objectMapper.readValue(json, RouteGraph.class));
|
||||
} catch (JsonProcessingException e) {
|
||||
log.error("Failed to deserialize RouteGraph from ClickHouse", e);
|
||||
return Optional.empty();
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public Optional<String> findContentHashForRoute(String routeId, String agentId) {
|
||||
List<Map<String, Object>> rows = jdbcTemplate.queryForList(SELECT_HASH_FOR_ROUTE, routeId, agentId);
|
||||
if (rows.isEmpty()) {
|
||||
return Optional.empty();
|
||||
}
|
||||
return Optional.of((String) rows.get(0).get("content_hash"));
|
||||
}
|
||||
|
||||
static String sha256Hex(String input) {
|
||||
try {
|
||||
MessageDigest digest = MessageDigest.getInstance("SHA-256");
|
||||
byte[] hash = digest.digest(input.getBytes(StandardCharsets.UTF_8));
|
||||
return HexFormat.of().formatHex(hash);
|
||||
} catch (NoSuchAlgorithmException e) {
|
||||
throw new RuntimeException("SHA-256 not available", e);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,117 @@
|
||||
package com.cameleer3.server.app.storage;
|
||||
|
||||
import com.cameleer3.common.model.ProcessorExecution;
|
||||
import com.cameleer3.common.model.RouteExecution;
|
||||
import com.cameleer3.server.core.storage.ExecutionRepository;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
|
||||
import org.springframework.jdbc.core.JdbcTemplate;
|
||||
import org.springframework.stereotype.Repository;
|
||||
|
||||
import java.sql.PreparedStatement;
|
||||
import java.sql.SQLException;
|
||||
import java.sql.Timestamp;
|
||||
import java.time.Instant;
|
||||
import java.util.List;
|
||||
import java.util.UUID;
|
||||
|
||||
/**
|
||||
* ClickHouse implementation of {@link ExecutionRepository}.
|
||||
* <p>
|
||||
* Performs batch inserts into the {@code route_executions} table.
|
||||
* Processor executions are flattened into parallel arrays.
|
||||
*/
|
||||
@Repository
|
||||
public class ClickHouseExecutionRepository implements ExecutionRepository {
|
||||
|
||||
private static final Logger log = LoggerFactory.getLogger(ClickHouseExecutionRepository.class);
|
||||
|
||||
private static final String INSERT_SQL = """
|
||||
INSERT INTO route_executions (
|
||||
execution_id, route_id, agent_id, status, start_time, end_time,
|
||||
duration_ms, correlation_id, exchange_id, error_message, error_stacktrace,
|
||||
processor_ids, processor_types, processor_starts, processor_ends,
|
||||
processor_durations, processor_statuses
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""";
|
||||
|
||||
private final JdbcTemplate jdbcTemplate;
|
||||
|
||||
public ClickHouseExecutionRepository(JdbcTemplate jdbcTemplate) {
|
||||
this.jdbcTemplate = jdbcTemplate;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void insertBatch(List<RouteExecution> executions) {
|
||||
if (executions.isEmpty()) {
|
||||
return;
|
||||
}
|
||||
|
||||
jdbcTemplate.batchUpdate(INSERT_SQL, new BatchPreparedStatementSetter() {
|
||||
@Override
|
||||
public void setValues(PreparedStatement ps, int i) throws SQLException {
|
||||
RouteExecution exec = executions.get(i);
|
||||
List<ProcessorExecution> processors = flattenProcessors(exec.getProcessors());
|
||||
|
||||
ps.setString(1, UUID.randomUUID().toString());
|
||||
ps.setString(2, nullSafe(exec.getRouteId()));
|
||||
ps.setString(3, ""); // agent_id set by controller header or empty
|
||||
ps.setString(4, exec.getStatus() != null ? exec.getStatus().name() : "RUNNING");
|
||||
ps.setObject(5, toTimestamp(exec.getStartTime()));
|
||||
ps.setObject(6, toTimestamp(exec.getEndTime()));
|
||||
ps.setLong(7, exec.getDurationMs());
|
||||
ps.setString(8, nullSafe(exec.getCorrelationId()));
|
||||
ps.setString(9, nullSafe(exec.getExchangeId()));
|
||||
ps.setString(10, nullSafe(exec.getErrorMessage()));
|
||||
ps.setString(11, nullSafe(exec.getErrorStackTrace()));
|
||||
|
||||
// Parallel arrays for processor executions
|
||||
ps.setObject(12, processors.stream().map(p -> nullSafe(p.getProcessorId())).toArray(String[]::new));
|
||||
ps.setObject(13, processors.stream().map(p -> nullSafe(p.getProcessorType())).toArray(String[]::new));
|
||||
ps.setObject(14, processors.stream().map(p -> toTimestamp(p.getStartTime())).toArray(Timestamp[]::new));
|
||||
ps.setObject(15, processors.stream().map(p -> toTimestamp(p.getEndTime())).toArray(Timestamp[]::new));
|
||||
ps.setObject(16, processors.stream().mapToLong(ProcessorExecution::getDurationMs).boxed().toArray(Long[]::new));
|
||||
ps.setObject(17, processors.stream().map(p -> p.getStatus() != null ? p.getStatus().name() : "RUNNING").toArray(String[]::new));
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getBatchSize() {
|
||||
return executions.size();
|
||||
}
|
||||
});
|
||||
|
||||
log.debug("Inserted batch of {} route executions into ClickHouse", executions.size());
|
||||
}
|
||||
|
||||
/**
|
||||
* Flatten the processor tree into a flat list (depth-first).
|
||||
*/
|
||||
private List<ProcessorExecution> flattenProcessors(List<ProcessorExecution> processors) {
|
||||
if (processors == null || processors.isEmpty()) {
|
||||
return List.of();
|
||||
}
|
||||
var result = new java.util.ArrayList<ProcessorExecution>();
|
||||
for (ProcessorExecution p : processors) {
|
||||
flatten(p, result);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
private void flatten(ProcessorExecution processor, List<ProcessorExecution> result) {
|
||||
result.add(processor);
|
||||
if (processor.getChildren() != null) {
|
||||
for (ProcessorExecution child : processor.getChildren()) {
|
||||
flatten(child, result);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private static String nullSafe(String value) {
|
||||
return value != null ? value : "";
|
||||
}
|
||||
|
||||
private static Timestamp toTimestamp(Instant instant) {
|
||||
return instant != null ? Timestamp.from(instant) : Timestamp.from(Instant.EPOCH);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,67 @@
|
||||
package com.cameleer3.server.app.storage;
|
||||
|
||||
import com.cameleer3.server.core.storage.MetricsRepository;
|
||||
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
|
||||
import org.slf4j.Logger;
|
||||
import org.slf4j.LoggerFactory;
|
||||
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
|
||||
import org.springframework.jdbc.core.JdbcTemplate;
|
||||
import org.springframework.stereotype.Repository;
|
||||
|
||||
import java.sql.PreparedStatement;
|
||||
import java.sql.SQLException;
|
||||
import java.sql.Timestamp;
|
||||
import java.time.Instant;
|
||||
import java.util.HashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
|
||||
/**
|
||||
* ClickHouse implementation of {@link MetricsRepository}.
|
||||
* <p>
|
||||
* Performs batch inserts into the {@code agent_metrics} table.
|
||||
*/
|
||||
@Repository
|
||||
public class ClickHouseMetricsRepository implements MetricsRepository {
|
||||
|
||||
private static final Logger log = LoggerFactory.getLogger(ClickHouseMetricsRepository.class);
|
||||
|
||||
private static final String INSERT_SQL = """
|
||||
INSERT INTO agent_metrics (agent_id, collected_at, metric_name, metric_value, tags)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
""";
|
||||
|
||||
private final JdbcTemplate jdbcTemplate;
|
||||
|
||||
public ClickHouseMetricsRepository(JdbcTemplate jdbcTemplate) {
|
||||
this.jdbcTemplate = jdbcTemplate;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void insertBatch(List<MetricsSnapshot> metrics) {
|
||||
if (metrics.isEmpty()) {
|
||||
return;
|
||||
}
|
||||
|
||||
jdbcTemplate.batchUpdate(INSERT_SQL, new BatchPreparedStatementSetter() {
|
||||
@Override
|
||||
public void setValues(PreparedStatement ps, int i) throws SQLException {
|
||||
MetricsSnapshot m = metrics.get(i);
|
||||
ps.setString(1, m.agentId() != null ? m.agentId() : "");
|
||||
ps.setObject(2, m.collectedAt() != null ? Timestamp.from(m.collectedAt()) : Timestamp.from(Instant.EPOCH));
|
||||
ps.setString(3, m.metricName() != null ? m.metricName() : "");
|
||||
ps.setDouble(4, m.metricValue());
|
||||
// ClickHouse Map(String, String) -- pass as a java.util.Map
|
||||
Map<String, String> tags = m.tags() != null ? m.tags() : new HashMap<>();
|
||||
ps.setObject(5, tags);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getBatchSize() {
|
||||
return metrics.size();
|
||||
}
|
||||
});
|
||||
|
||||
log.debug("Inserted batch of {} metrics into ClickHouse", metrics.size());
|
||||
}
|
||||
}
|
||||
@@ -1,2 +1,38 @@
|
||||
server:
|
||||
port: 8081
|
||||
|
||||
spring:
|
||||
datasource:
|
||||
url: jdbc:ch://localhost:8123/cameleer3
|
||||
username: cameleer
|
||||
password: cameleer_dev
|
||||
driver-class-name: com.clickhouse.jdbc.ClickHouseDriver
|
||||
jackson:
|
||||
serialization:
|
||||
write-dates-as-timestamps: false
|
||||
deserialization:
|
||||
fail-on-unknown-properties: false
|
||||
|
||||
ingestion:
|
||||
buffer-capacity: 50000
|
||||
batch-size: 5000
|
||||
flush-interval-ms: 1000
|
||||
|
||||
clickhouse:
|
||||
ttl-days: 30
|
||||
|
||||
springdoc:
|
||||
api-docs:
|
||||
path: /api/v1/api-docs
|
||||
swagger-ui:
|
||||
path: /api/v1/swagger-ui
|
||||
|
||||
management:
|
||||
endpoints:
|
||||
web:
|
||||
base-path: /api/v1
|
||||
exposure:
|
||||
include: health
|
||||
endpoint:
|
||||
health:
|
||||
show-details: always
|
||||
|
||||
@@ -0,0 +1,73 @@
|
||||
package com.cameleer3.server.app;
|
||||
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.context.SpringBootTest;
|
||||
import org.springframework.jdbc.core.JdbcTemplate;
|
||||
import org.springframework.test.context.ActiveProfiles;
|
||||
import org.springframework.test.context.DynamicPropertyRegistry;
|
||||
import org.springframework.test.context.DynamicPropertySource;
|
||||
import org.testcontainers.clickhouse.ClickHouseContainer;
|
||||
|
||||
import org.junit.jupiter.api.BeforeAll;
|
||||
|
||||
import java.nio.charset.StandardCharsets;
|
||||
import java.nio.file.Files;
|
||||
import java.nio.file.Path;
|
||||
import java.sql.Connection;
|
||||
import java.sql.DriverManager;
|
||||
import java.sql.Statement;
|
||||
|
||||
/**
|
||||
* Base class for integration tests requiring a ClickHouse instance.
|
||||
* <p>
|
||||
* Uses Testcontainers to spin up a ClickHouse server and initializes the schema
|
||||
* from {@code clickhouse/init/01-schema.sql} before the first test runs.
|
||||
* Subclasses get a {@link JdbcTemplate} for direct database assertions.
|
||||
* <p>
|
||||
* Container lifecycle is managed manually (started once, shared across all test classes).
|
||||
*/
|
||||
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
|
||||
@ActiveProfiles("test")
|
||||
public abstract class AbstractClickHouseIT {
|
||||
|
||||
protected static final ClickHouseContainer CLICKHOUSE;
|
||||
|
||||
static {
|
||||
CLICKHOUSE = new ClickHouseContainer("clickhouse/clickhouse-server:25.3");
|
||||
CLICKHOUSE.start();
|
||||
}
|
||||
|
||||
@Autowired
|
||||
protected JdbcTemplate jdbcTemplate;
|
||||
|
||||
@DynamicPropertySource
|
||||
static void overrideProperties(DynamicPropertyRegistry registry) {
|
||||
registry.add("spring.datasource.url", CLICKHOUSE::getJdbcUrl);
|
||||
registry.add("spring.datasource.username", CLICKHOUSE::getUsername);
|
||||
registry.add("spring.datasource.password", CLICKHOUSE::getPassword);
|
||||
}
|
||||
|
||||
@BeforeAll
|
||||
static void initSchema() throws Exception {
|
||||
// Surefire runs from the module directory; schema is in the project root
|
||||
Path schemaPath = Path.of("clickhouse/init/01-schema.sql");
|
||||
if (!Files.exists(schemaPath)) {
|
||||
schemaPath = Path.of("../clickhouse/init/01-schema.sql");
|
||||
}
|
||||
String sql = Files.readString(schemaPath, StandardCharsets.UTF_8);
|
||||
|
||||
try (Connection conn = DriverManager.getConnection(
|
||||
CLICKHOUSE.getJdbcUrl(),
|
||||
CLICKHOUSE.getUsername(),
|
||||
CLICKHOUSE.getPassword());
|
||||
Statement stmt = conn.createStatement()) {
|
||||
// Execute each statement separately (separated by semicolons)
|
||||
for (String statement : sql.split(";")) {
|
||||
String trimmed = statement.trim();
|
||||
if (!trimmed.isEmpty()) {
|
||||
stmt.execute(trimmed);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,105 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import com.cameleer3.server.core.ingestion.IngestionService;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
import org.springframework.http.HttpEntity;
|
||||
import org.springframework.http.HttpHeaders;
|
||||
import org.springframework.http.HttpStatus;
|
||||
import org.springframework.http.MediaType;
|
||||
import org.springframework.http.ResponseEntity;
|
||||
import org.springframework.test.context.TestPropertySource;
|
||||
|
||||
import static java.util.concurrent.TimeUnit.SECONDS;
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
import static org.awaitility.Awaitility.await;
|
||||
|
||||
/**
|
||||
* Tests backpressure behavior when write buffers are full.
|
||||
* Uses a tiny buffer (capacity=5) and a very long flush interval
|
||||
* to prevent the scheduler from draining the buffer during the test.
|
||||
*/
|
||||
@TestPropertySource(properties = {
|
||||
"ingestion.buffer-capacity=5",
|
||||
"ingestion.batch-size=5",
|
||||
"ingestion.flush-interval-ms=60000" // 60s -- effectively no flush during test
|
||||
})
|
||||
class BackpressureIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Autowired
|
||||
private IngestionService ingestionService;
|
||||
|
||||
@Test
|
||||
void whenBufferFull_returns503WithRetryAfter() {
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
// Wait for any initial scheduled flush to complete, then fill buffer via batch POST
|
||||
// First, wait until the buffer is empty (initial flush may have run)
|
||||
await().atMost(5, SECONDS).until(() -> ingestionService.getExecutionBufferDepth() == 0);
|
||||
|
||||
// Fill the buffer completely with a batch of 5
|
||||
String batchJson = """
|
||||
[
|
||||
{"routeId":"bp-0","exchangeId":"bp-e0","status":"COMPLETED","startTime":"2026-03-11T10:00:00Z","durationMs":100,"processors":[]},
|
||||
{"routeId":"bp-1","exchangeId":"bp-e1","status":"COMPLETED","startTime":"2026-03-11T10:00:00Z","durationMs":100,"processors":[]},
|
||||
{"routeId":"bp-2","exchangeId":"bp-e2","status":"COMPLETED","startTime":"2026-03-11T10:00:00Z","durationMs":100,"processors":[]},
|
||||
{"routeId":"bp-3","exchangeId":"bp-e3","status":"COMPLETED","startTime":"2026-03-11T10:00:00Z","durationMs":100,"processors":[]},
|
||||
{"routeId":"bp-4","exchangeId":"bp-e4","status":"COMPLETED","startTime":"2026-03-11T10:00:00Z","durationMs":100,"processors":[]}
|
||||
]
|
||||
""";
|
||||
|
||||
ResponseEntity<String> batchResponse = restTemplate.postForEntity(
|
||||
"/api/v1/data/executions",
|
||||
new HttpEntity<>(batchJson, headers),
|
||||
String.class);
|
||||
assertThat(batchResponse.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
|
||||
|
||||
// Now buffer should be full -- next POST should get 503
|
||||
String overflowJson = """
|
||||
{"routeId":"bp-overflow","exchangeId":"bp-overflow-e","status":"COMPLETED","startTime":"2026-03-11T10:00:00Z","durationMs":100,"processors":[]}
|
||||
""";
|
||||
|
||||
ResponseEntity<String> response = restTemplate.postForEntity(
|
||||
"/api/v1/data/executions",
|
||||
new HttpEntity<>(overflowJson, headers),
|
||||
String.class);
|
||||
|
||||
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.SERVICE_UNAVAILABLE);
|
||||
assertThat(response.getHeaders().getFirst("Retry-After")).isNotNull();
|
||||
}
|
||||
|
||||
@Test
|
||||
void bufferedDataNotLost_afterBackpressure() {
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
// Post data to the diagram buffer (separate from executions used above)
|
||||
for (int i = 0; i < 3; i++) {
|
||||
String json = String.format("""
|
||||
{
|
||||
"routeId": "bp-persist-diagram-%d",
|
||||
"version": 1,
|
||||
"nodes": [],
|
||||
"edges": []
|
||||
}
|
||||
""", i);
|
||||
|
||||
restTemplate.postForEntity(
|
||||
"/api/v1/data/diagrams",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
}
|
||||
|
||||
// Data is in the buffer. Wait for the scheduled flush (60s in this test).
|
||||
// Instead, verify the buffer has data.
|
||||
assertThat(ingestionService.getDiagramBufferDepth()).isGreaterThanOrEqualTo(3);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,105 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
import org.springframework.http.HttpEntity;
|
||||
import org.springframework.http.HttpHeaders;
|
||||
import org.springframework.http.HttpStatus;
|
||||
import org.springframework.http.MediaType;
|
||||
import org.springframework.http.ResponseEntity;
|
||||
|
||||
import static java.util.concurrent.TimeUnit.SECONDS;
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
import static org.awaitility.Awaitility.await;
|
||||
|
||||
class DiagramControllerIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Test
|
||||
void postSingleDiagram_returns202() {
|
||||
String json = """
|
||||
{
|
||||
"routeId": "diagram-route-1",
|
||||
"description": "Test route",
|
||||
"version": 1,
|
||||
"nodes": [],
|
||||
"edges": [],
|
||||
"processorNodeMapping": {}
|
||||
}
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
ResponseEntity<String> response = restTemplate.postForEntity(
|
||||
"/api/v1/data/diagrams",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
|
||||
}
|
||||
|
||||
@Test
|
||||
void postDiagram_dataAppearsInClickHouseAfterFlush() {
|
||||
String json = """
|
||||
{
|
||||
"routeId": "diagram-flush-route",
|
||||
"description": "Flush test",
|
||||
"version": 1,
|
||||
"nodes": [],
|
||||
"edges": [],
|
||||
"processorNodeMapping": {}
|
||||
}
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
restTemplate.postForEntity(
|
||||
"/api/v1/data/diagrams",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
await().atMost(10, SECONDS).untilAsserted(() -> {
|
||||
Integer count = jdbcTemplate.queryForObject(
|
||||
"SELECT count() FROM route_diagrams WHERE route_id = 'diagram-flush-route'",
|
||||
Integer.class);
|
||||
assertThat(count).isGreaterThanOrEqualTo(1);
|
||||
});
|
||||
}
|
||||
|
||||
@Test
|
||||
void postArrayOfDiagrams_returns202() {
|
||||
String json = """
|
||||
[{
|
||||
"routeId": "diagram-arr-1",
|
||||
"version": 1,
|
||||
"nodes": [],
|
||||
"edges": []
|
||||
},
|
||||
{
|
||||
"routeId": "diagram-arr-2",
|
||||
"version": 1,
|
||||
"nodes": [],
|
||||
"edges": []
|
||||
}]
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
ResponseEntity<String> response = restTemplate.postForEntity(
|
||||
"/api/v1/data/diagrams",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,146 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
import org.springframework.http.HttpEntity;
|
||||
import org.springframework.http.HttpHeaders;
|
||||
import org.springframework.http.HttpStatus;
|
||||
import org.springframework.http.MediaType;
|
||||
import org.springframework.http.ResponseEntity;
|
||||
import org.springframework.jdbc.core.JdbcTemplate;
|
||||
|
||||
import static java.util.concurrent.TimeUnit.SECONDS;
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
import static org.awaitility.Awaitility.await;
|
||||
|
||||
class ExecutionControllerIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Test
|
||||
void postSingleExecution_returns202() {
|
||||
String json = """
|
||||
{
|
||||
"routeId": "route-1",
|
||||
"exchangeId": "exchange-1",
|
||||
"correlationId": "corr-1",
|
||||
"status": "COMPLETED",
|
||||
"startTime": "2026-03-11T10:00:00Z",
|
||||
"endTime": "2026-03-11T10:00:01Z",
|
||||
"durationMs": 1000,
|
||||
"errorMessage": "",
|
||||
"errorStackTrace": "",
|
||||
"processors": []
|
||||
}
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
ResponseEntity<String> response = restTemplate.postForEntity(
|
||||
"/api/v1/data/executions",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
|
||||
}
|
||||
|
||||
@Test
|
||||
void postArrayOfExecutions_returns202() {
|
||||
String json = """
|
||||
[{
|
||||
"routeId": "route-2",
|
||||
"exchangeId": "exchange-2",
|
||||
"status": "COMPLETED",
|
||||
"startTime": "2026-03-11T10:00:00Z",
|
||||
"endTime": "2026-03-11T10:00:01Z",
|
||||
"durationMs": 1000,
|
||||
"processors": []
|
||||
},
|
||||
{
|
||||
"routeId": "route-3",
|
||||
"exchangeId": "exchange-3",
|
||||
"status": "FAILED",
|
||||
"startTime": "2026-03-11T10:00:00Z",
|
||||
"endTime": "2026-03-11T10:00:02Z",
|
||||
"durationMs": 2000,
|
||||
"errorMessage": "Something went wrong",
|
||||
"processors": []
|
||||
}]
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
ResponseEntity<String> response = restTemplate.postForEntity(
|
||||
"/api/v1/data/executions",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
|
||||
}
|
||||
|
||||
@Test
|
||||
void postExecution_dataAppearsInClickHouseAfterFlush() {
|
||||
String json = """
|
||||
{
|
||||
"routeId": "flush-test-route",
|
||||
"exchangeId": "flush-exchange-1",
|
||||
"correlationId": "flush-corr-1",
|
||||
"status": "COMPLETED",
|
||||
"startTime": "2026-03-11T10:00:00Z",
|
||||
"endTime": "2026-03-11T10:00:01Z",
|
||||
"durationMs": 1000,
|
||||
"processors": []
|
||||
}
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
restTemplate.postForEntity(
|
||||
"/api/v1/data/executions",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
await().atMost(10, SECONDS).untilAsserted(() -> {
|
||||
Integer count = jdbcTemplate.queryForObject(
|
||||
"SELECT count() FROM route_executions WHERE route_id = 'flush-test-route'",
|
||||
Integer.class);
|
||||
assertThat(count).isGreaterThanOrEqualTo(1);
|
||||
});
|
||||
}
|
||||
|
||||
@Test
|
||||
void postExecution_unknownFieldsAccepted() {
|
||||
String json = """
|
||||
{
|
||||
"routeId": "route-unk",
|
||||
"exchangeId": "exchange-unk",
|
||||
"status": "COMPLETED",
|
||||
"startTime": "2026-03-11T10:00:00Z",
|
||||
"durationMs": 500,
|
||||
"unknownField": "should-be-ignored",
|
||||
"anotherUnknown": 42,
|
||||
"processors": []
|
||||
}
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
ResponseEntity<String> response = restTemplate.postForEntity(
|
||||
"/api/v1/data/executions",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,50 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
import org.springframework.http.HttpEntity;
|
||||
import org.springframework.http.HttpHeaders;
|
||||
import org.springframework.http.HttpMethod;
|
||||
import org.springframework.http.MediaType;
|
||||
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
|
||||
/**
|
||||
* Integration test for forward compatibility (API-05).
|
||||
* Verifies that unknown JSON fields in request bodies do not cause deserialization errors.
|
||||
*/
|
||||
class ForwardCompatIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Test
|
||||
void unknownFieldsInRequestBodyDoNotCauseError() {
|
||||
// JSON body with an unknown field that should not cause a 400 deserialization error.
|
||||
// Jackson is configured with fail-on-unknown-properties: false in application.yml.
|
||||
// Without the ExecutionController (Plan 01-02), this returns 404 -- which is acceptable.
|
||||
// The key assertion: it must NOT be 400 (i.e., Jackson did not reject unknown fields).
|
||||
String jsonWithUnknownFields = """
|
||||
{
|
||||
"futureField": "value",
|
||||
"anotherUnknown": 42
|
||||
}
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
var entity = new HttpEntity<>(jsonWithUnknownFields, headers);
|
||||
|
||||
var response = restTemplate.exchange(
|
||||
"/api/v1/data/executions", HttpMethod.POST, entity, String.class);
|
||||
|
||||
// The interceptor passes (correct protocol header), and Jackson should not reject
|
||||
// unknown fields. Without a controller, expect 404 (not 400 or 422).
|
||||
assertThat(response.getStatusCode().value())
|
||||
.as("Unknown JSON fields must not cause 400 or 422 deserialization error")
|
||||
.isNotIn(400, 422);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,47 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
|
||||
/**
|
||||
* Integration tests for the health endpoint and ClickHouse TTL verification.
|
||||
*/
|
||||
class HealthControllerIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Test
|
||||
void healthEndpointReturns200WithStatus() {
|
||||
var response = restTemplate.getForEntity("/api/v1/health", String.class);
|
||||
assertThat(response.getStatusCode().value()).isEqualTo(200);
|
||||
assertThat(response.getBody()).contains("status");
|
||||
}
|
||||
|
||||
@Test
|
||||
void healthEndpointDoesNotRequireProtocolVersionHeader() {
|
||||
// Health should be accessible without X-Cameleer-Protocol-Version header
|
||||
var response = restTemplate.getForEntity("/api/v1/health", String.class);
|
||||
assertThat(response.getStatusCode().value()).isEqualTo(200);
|
||||
}
|
||||
|
||||
@Test
|
||||
void ttlConfiguredOnRouteExecutions() {
|
||||
String createTable = jdbcTemplate.queryForObject(
|
||||
"SHOW CREATE TABLE route_executions", String.class);
|
||||
assertThat(createTable).containsIgnoringCase("TTL");
|
||||
assertThat(createTable).contains("toIntervalDay(30)");
|
||||
}
|
||||
|
||||
@Test
|
||||
void ttlConfiguredOnAgentMetrics() {
|
||||
String createTable = jdbcTemplate.queryForObject(
|
||||
"SHOW CREATE TABLE agent_metrics", String.class);
|
||||
assertThat(createTable).containsIgnoringCase("TTL");
|
||||
assertThat(createTable).contains("toIntervalDay(30)");
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,74 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
import org.springframework.http.HttpEntity;
|
||||
import org.springframework.http.HttpHeaders;
|
||||
import org.springframework.http.HttpStatus;
|
||||
import org.springframework.http.MediaType;
|
||||
import org.springframework.http.ResponseEntity;
|
||||
|
||||
import static java.util.concurrent.TimeUnit.SECONDS;
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
import static org.awaitility.Awaitility.await;
|
||||
|
||||
class MetricsControllerIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Test
|
||||
void postMetrics_returns202() {
|
||||
String json = """
|
||||
[{
|
||||
"agentId": "agent-1",
|
||||
"collectedAt": "2026-03-11T10:00:00Z",
|
||||
"metricName": "cpu.usage",
|
||||
"metricValue": 75.5,
|
||||
"tags": {"host": "server-1"}
|
||||
}]
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
ResponseEntity<String> response = restTemplate.postForEntity(
|
||||
"/api/v1/data/metrics",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
assertThat(response.getStatusCode()).isEqualTo(HttpStatus.ACCEPTED);
|
||||
}
|
||||
|
||||
@Test
|
||||
void postMetrics_dataAppearsInClickHouseAfterFlush() {
|
||||
String json = """
|
||||
[{
|
||||
"agentId": "agent-flush-test",
|
||||
"collectedAt": "2026-03-11T10:00:00Z",
|
||||
"metricName": "memory.used",
|
||||
"metricValue": 1024.0,
|
||||
"tags": {}
|
||||
}]
|
||||
""";
|
||||
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
|
||||
restTemplate.postForEntity(
|
||||
"/api/v1/data/metrics",
|
||||
new HttpEntity<>(json, headers),
|
||||
String.class);
|
||||
|
||||
await().atMost(10, SECONDS).untilAsserted(() -> {
|
||||
Integer count = jdbcTemplate.queryForObject(
|
||||
"SELECT count() FROM agent_metrics WHERE agent_id = 'agent-flush-test'",
|
||||
Integer.class);
|
||||
assertThat(count).isGreaterThanOrEqualTo(1);
|
||||
});
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,32 @@
|
||||
package com.cameleer3.server.app.controller;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
|
||||
/**
|
||||
* Integration tests for OpenAPI documentation endpoints.
|
||||
*/
|
||||
class OpenApiIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Test
|
||||
void apiDocsReturnsOpenApiSpec() {
|
||||
var response = restTemplate.getForEntity("/api/v1/api-docs", String.class);
|
||||
assertThat(response.getStatusCode().value()).isEqualTo(200);
|
||||
assertThat(response.getBody()).contains("openapi");
|
||||
assertThat(response.getBody()).contains("paths");
|
||||
}
|
||||
|
||||
@Test
|
||||
void swaggerUiIsAccessible() {
|
||||
var response = restTemplate.getForEntity("/api/v1/swagger-ui/index.html", String.class);
|
||||
// Swagger UI may return 200 directly or 302 redirect
|
||||
assertThat(response.getStatusCode().value()).isIn(200, 302);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,71 @@
|
||||
package com.cameleer3.server.app.interceptor;
|
||||
|
||||
import com.cameleer3.server.app.AbstractClickHouseIT;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||
import org.springframework.http.HttpEntity;
|
||||
import org.springframework.http.HttpHeaders;
|
||||
import org.springframework.http.HttpMethod;
|
||||
import org.springframework.http.MediaType;
|
||||
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
|
||||
/**
|
||||
* Integration tests for the protocol version interceptor.
|
||||
*/
|
||||
class ProtocolVersionIT extends AbstractClickHouseIT {
|
||||
|
||||
@Autowired
|
||||
private TestRestTemplate restTemplate;
|
||||
|
||||
@Test
|
||||
void requestWithoutProtocolHeaderReturns400() {
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
var entity = new HttpEntity<>("{}", headers);
|
||||
|
||||
var response = restTemplate.exchange(
|
||||
"/api/v1/data/executions", HttpMethod.POST, entity, String.class);
|
||||
assertThat(response.getStatusCode().value()).isEqualTo(400);
|
||||
assertThat(response.getBody()).contains("Missing or unsupported X-Cameleer-Protocol-Version header");
|
||||
}
|
||||
|
||||
@Test
|
||||
void requestWithWrongProtocolVersionReturns400() {
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "2");
|
||||
var entity = new HttpEntity<>("{}", headers);
|
||||
|
||||
var response = restTemplate.exchange(
|
||||
"/api/v1/data/executions", HttpMethod.POST, entity, String.class);
|
||||
assertThat(response.getStatusCode().value()).isEqualTo(400);
|
||||
}
|
||||
|
||||
@Test
|
||||
void requestWithCorrectProtocolVersionPassesInterceptor() {
|
||||
HttpHeaders headers = new HttpHeaders();
|
||||
headers.setContentType(MediaType.APPLICATION_JSON);
|
||||
headers.set("X-Cameleer-Protocol-Version", "1");
|
||||
var entity = new HttpEntity<>("{}", headers);
|
||||
|
||||
var response = restTemplate.exchange(
|
||||
"/api/v1/data/executions", HttpMethod.POST, entity, String.class);
|
||||
// The interceptor should NOT reject this request (not 400 from interceptor).
|
||||
// Without the controller (Plan 01-02), this will be 404 -- which is fine.
|
||||
assertThat(response.getStatusCode().value()).isNotEqualTo(400);
|
||||
}
|
||||
|
||||
@Test
|
||||
void healthEndpointExcludedFromInterceptor() {
|
||||
var response = restTemplate.getForEntity("/api/v1/health", String.class);
|
||||
assertThat(response.getStatusCode().value()).isEqualTo(200);
|
||||
}
|
||||
|
||||
@Test
|
||||
void apiDocsExcludedFromInterceptor() {
|
||||
var response = restTemplate.getForEntity("/api/v1/api-docs", String.class);
|
||||
assertThat(response.getStatusCode().value()).isEqualTo(200);
|
||||
}
|
||||
}
|
||||
11
cameleer3-server-app/src/test/resources/application-test.yml
Normal file
11
cameleer3-server-app/src/test/resources/application-test.yml
Normal file
@@ -0,0 +1,11 @@
|
||||
spring:
|
||||
datasource:
|
||||
url: jdbc:ch://placeholder:8123/cameleer3
|
||||
username: default
|
||||
password: ""
|
||||
driver-class-name: com.clickhouse.jdbc.ClickHouseDriver
|
||||
|
||||
ingestion:
|
||||
buffer-capacity: 100
|
||||
batch-size: 10
|
||||
flush-interval-ms: 100
|
||||
@@ -23,6 +23,10 @@
|
||||
<groupId>com.fasterxml.jackson.core</groupId>
|
||||
<artifactId>jackson-databind</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.slf4j</groupId>
|
||||
<artifactId>slf4j-api</artifactId>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.junit.jupiter</groupId>
|
||||
<artifactId>junit-jupiter</artifactId>
|
||||
|
||||
@@ -0,0 +1,115 @@
|
||||
package com.cameleer3.server.core.ingestion;
|
||||
|
||||
import com.cameleer3.common.graph.RouteGraph;
|
||||
import com.cameleer3.common.model.RouteExecution;
|
||||
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Routes incoming data to the appropriate {@link WriteBuffer} instances.
|
||||
* <p>
|
||||
* This is a plain class (no Spring annotations) -- it lives in the core module
|
||||
* and is wired as a bean by the app module configuration.
|
||||
*/
|
||||
public class IngestionService {
|
||||
|
||||
private final WriteBuffer<RouteExecution> executionBuffer;
|
||||
private final WriteBuffer<RouteGraph> diagramBuffer;
|
||||
private final WriteBuffer<MetricsSnapshot> metricsBuffer;
|
||||
|
||||
public IngestionService(WriteBuffer<RouteExecution> executionBuffer,
|
||||
WriteBuffer<RouteGraph> diagramBuffer,
|
||||
WriteBuffer<MetricsSnapshot> metricsBuffer) {
|
||||
this.executionBuffer = executionBuffer;
|
||||
this.diagramBuffer = diagramBuffer;
|
||||
this.metricsBuffer = metricsBuffer;
|
||||
}
|
||||
|
||||
/**
|
||||
* Accept a batch of route executions into the buffer.
|
||||
*
|
||||
* @return true if all items were buffered, false if buffer is full (backpressure)
|
||||
*/
|
||||
public boolean acceptExecutions(List<RouteExecution> executions) {
|
||||
return executionBuffer.offerBatch(executions);
|
||||
}
|
||||
|
||||
/**
|
||||
* Accept a single route execution into the buffer.
|
||||
*
|
||||
* @return true if the item was buffered, false if buffer is full (backpressure)
|
||||
*/
|
||||
public boolean acceptExecution(RouteExecution execution) {
|
||||
return executionBuffer.offer(execution);
|
||||
}
|
||||
|
||||
/**
|
||||
* Accept a single route diagram into the buffer.
|
||||
*
|
||||
* @return true if the item was buffered, false if buffer is full (backpressure)
|
||||
*/
|
||||
public boolean acceptDiagram(RouteGraph graph) {
|
||||
return diagramBuffer.offer(graph);
|
||||
}
|
||||
|
||||
/**
|
||||
* Accept a batch of route diagrams into the buffer.
|
||||
*
|
||||
* @return true if all items were buffered, false if buffer is full (backpressure)
|
||||
*/
|
||||
public boolean acceptDiagrams(List<RouteGraph> graphs) {
|
||||
return diagramBuffer.offerBatch(graphs);
|
||||
}
|
||||
|
||||
/**
|
||||
* Accept a batch of metrics snapshots into the buffer.
|
||||
*
|
||||
* @return true if all items were buffered, false if buffer is full (backpressure)
|
||||
*/
|
||||
public boolean acceptMetrics(List<MetricsSnapshot> metrics) {
|
||||
return metricsBuffer.offerBatch(metrics);
|
||||
}
|
||||
|
||||
/**
|
||||
* @return current number of items in the execution buffer
|
||||
*/
|
||||
public int getExecutionBufferDepth() {
|
||||
return executionBuffer.size();
|
||||
}
|
||||
|
||||
/**
|
||||
* @return current number of items in the diagram buffer
|
||||
*/
|
||||
public int getDiagramBufferDepth() {
|
||||
return diagramBuffer.size();
|
||||
}
|
||||
|
||||
/**
|
||||
* @return current number of items in the metrics buffer
|
||||
*/
|
||||
public int getMetricsBufferDepth() {
|
||||
return metricsBuffer.size();
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the execution write buffer (for use by flush scheduler)
|
||||
*/
|
||||
public WriteBuffer<RouteExecution> getExecutionBuffer() {
|
||||
return executionBuffer;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the diagram write buffer (for use by flush scheduler)
|
||||
*/
|
||||
public WriteBuffer<RouteGraph> getDiagramBuffer() {
|
||||
return diagramBuffer;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the metrics write buffer (for use by flush scheduler)
|
||||
*/
|
||||
public WriteBuffer<MetricsSnapshot> getMetricsBuffer() {
|
||||
return metricsBuffer;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,80 @@
|
||||
package com.cameleer3.server.core.ingestion;
|
||||
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
import java.util.concurrent.ArrayBlockingQueue;
|
||||
import java.util.concurrent.BlockingQueue;
|
||||
|
||||
/**
|
||||
* Bounded write buffer that decouples HTTP ingestion from ClickHouse batch inserts.
|
||||
* <p>
|
||||
* Items are offered to the buffer by controllers and drained in batches by a
|
||||
* scheduled flush task. When the buffer is full, {@link #offer} returns false,
|
||||
* signaling the caller to apply backpressure (HTTP 503).
|
||||
*
|
||||
* @param <T> the type of items buffered
|
||||
*/
|
||||
public class WriteBuffer<T> {
|
||||
|
||||
private final BlockingQueue<T> queue;
|
||||
private final int capacity;
|
||||
|
||||
public WriteBuffer(int capacity) {
|
||||
this.capacity = capacity;
|
||||
this.queue = new ArrayBlockingQueue<>(capacity);
|
||||
}
|
||||
|
||||
/**
|
||||
* Offer a single item to the buffer.
|
||||
*
|
||||
* @return true if the item was added, false if the buffer is full
|
||||
*/
|
||||
public boolean offer(T item) {
|
||||
return queue.offer(item);
|
||||
}
|
||||
|
||||
/**
|
||||
* Offer a batch of items with all-or-nothing semantics.
|
||||
* If the buffer does not have enough remaining capacity for the entire batch,
|
||||
* no items are added and false is returned.
|
||||
*
|
||||
* @return true if all items were added, false if insufficient capacity
|
||||
*/
|
||||
public boolean offerBatch(List<T> items) {
|
||||
if (queue.remainingCapacity() < items.size()) {
|
||||
return false;
|
||||
}
|
||||
for (T item : items) {
|
||||
queue.offer(item);
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Drain up to {@code maxBatch} items from the buffer.
|
||||
* Called by the scheduled flush task.
|
||||
*
|
||||
* @return list of drained items (may be empty)
|
||||
*/
|
||||
public List<T> drain(int maxBatch) {
|
||||
List<T> batch = new ArrayList<>(maxBatch);
|
||||
queue.drainTo(batch, maxBatch);
|
||||
return batch;
|
||||
}
|
||||
|
||||
public int size() {
|
||||
return queue.size();
|
||||
}
|
||||
|
||||
public int capacity() {
|
||||
return capacity;
|
||||
}
|
||||
|
||||
public boolean isFull() {
|
||||
return queue.remainingCapacity() == 0;
|
||||
}
|
||||
|
||||
public int remainingCapacity() {
|
||||
return queue.remainingCapacity();
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,26 @@
|
||||
package com.cameleer3.server.core.storage;
|
||||
|
||||
import com.cameleer3.common.graph.RouteGraph;
|
||||
|
||||
import java.util.Optional;
|
||||
|
||||
/**
|
||||
* Repository for route diagram storage with content-hash deduplication.
|
||||
*/
|
||||
public interface DiagramRepository {
|
||||
|
||||
/**
|
||||
* Store a route graph. Uses content-hash deduplication via ReplacingMergeTree.
|
||||
*/
|
||||
void store(RouteGraph graph);
|
||||
|
||||
/**
|
||||
* Find a route graph by its content hash.
|
||||
*/
|
||||
Optional<RouteGraph> findByContentHash(String contentHash);
|
||||
|
||||
/**
|
||||
* Find the content hash for the latest diagram of a given route and agent.
|
||||
*/
|
||||
Optional<String> findContentHashForRoute(String routeId, String agentId);
|
||||
}
|
||||
@@ -0,0 +1,17 @@
|
||||
package com.cameleer3.server.core.storage;
|
||||
|
||||
import com.cameleer3.common.model.RouteExecution;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Repository for route execution batch inserts into ClickHouse.
|
||||
*/
|
||||
public interface ExecutionRepository {
|
||||
|
||||
/**
|
||||
* Insert a batch of route executions.
|
||||
* Implementations must perform a single batch insert for efficiency.
|
||||
*/
|
||||
void insertBatch(List<RouteExecution> executions);
|
||||
}
|
||||
@@ -0,0 +1,17 @@
|
||||
package com.cameleer3.server.core.storage;
|
||||
|
||||
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
* Repository for agent metrics batch inserts into ClickHouse.
|
||||
*/
|
||||
public interface MetricsRepository {
|
||||
|
||||
/**
|
||||
* Insert a batch of metrics snapshots.
|
||||
* Implementations must perform a single batch insert for efficiency.
|
||||
*/
|
||||
void insertBatch(List<MetricsSnapshot> metrics);
|
||||
}
|
||||
@@ -0,0 +1,16 @@
|
||||
package com.cameleer3.server.core.storage.model;
|
||||
|
||||
import java.time.Instant;
|
||||
import java.util.Map;
|
||||
|
||||
/**
|
||||
* A single metrics data point from an agent.
|
||||
*/
|
||||
public record MetricsSnapshot(
|
||||
String agentId,
|
||||
Instant collectedAt,
|
||||
String metricName,
|
||||
double metricValue,
|
||||
Map<String, String> tags
|
||||
) {
|
||||
}
|
||||
@@ -0,0 +1,99 @@
|
||||
package com.cameleer3.server.core.ingestion;
|
||||
|
||||
import org.junit.jupiter.api.BeforeEach;
|
||||
import org.junit.jupiter.api.Test;
|
||||
|
||||
import java.util.List;
|
||||
|
||||
import static org.junit.jupiter.api.Assertions.*;
|
||||
|
||||
class WriteBufferTest {
|
||||
|
||||
private WriteBuffer<String> buffer;
|
||||
|
||||
@BeforeEach
|
||||
void setUp() {
|
||||
buffer = new WriteBuffer<>(10);
|
||||
}
|
||||
|
||||
@Test
|
||||
void offerSucceedsUntilCapacity() {
|
||||
for (int i = 0; i < 10; i++) {
|
||||
assertTrue(buffer.offer("item-" + i), "offer should succeed for item " + i);
|
||||
}
|
||||
assertEquals(10, buffer.size());
|
||||
}
|
||||
|
||||
@Test
|
||||
void offerReturnsFalseWhenFull() {
|
||||
for (int i = 0; i < 10; i++) {
|
||||
buffer.offer("item-" + i);
|
||||
}
|
||||
assertFalse(buffer.offer("overflow"), "offer should return false when buffer is full");
|
||||
}
|
||||
|
||||
@Test
|
||||
void offerBatchSucceedsWhenCapacitySufficient() {
|
||||
List<String> batch = List.of("a", "b", "c");
|
||||
assertTrue(buffer.offerBatch(batch));
|
||||
assertEquals(3, buffer.size());
|
||||
}
|
||||
|
||||
@Test
|
||||
void offerBatchReturnsFalseWithoutPartialInsertWhenOverflow() {
|
||||
for (int i = 0; i < 8; i++) {
|
||||
buffer.offer("item-" + i);
|
||||
}
|
||||
// Buffer has 2 remaining, batch of 3 should fail entirely
|
||||
List<String> batch = List.of("a", "b", "c");
|
||||
assertFalse(buffer.offerBatch(batch));
|
||||
assertEquals(8, buffer.size(), "no items should have been added on failed batch");
|
||||
}
|
||||
|
||||
@Test
|
||||
void drainReturnsItemsAndRemovesThem() {
|
||||
buffer.offer("a");
|
||||
buffer.offer("b");
|
||||
buffer.offer("c");
|
||||
|
||||
List<String> drained = buffer.drain(2);
|
||||
assertEquals(2, drained.size());
|
||||
assertEquals(1, buffer.size());
|
||||
}
|
||||
|
||||
@Test
|
||||
void drainWithEmptyQueueReturnsEmptyList() {
|
||||
List<String> drained = buffer.drain(5);
|
||||
assertTrue(drained.isEmpty());
|
||||
}
|
||||
|
||||
@Test
|
||||
void isFullReturnsTrueAtCapacity() {
|
||||
assertFalse(buffer.isFull());
|
||||
for (int i = 0; i < 10; i++) {
|
||||
buffer.offer("item-" + i);
|
||||
}
|
||||
assertTrue(buffer.isFull());
|
||||
}
|
||||
|
||||
@Test
|
||||
void sizeTracksCurrentDepth() {
|
||||
assertEquals(0, buffer.size());
|
||||
buffer.offer("a");
|
||||
assertEquals(1, buffer.size());
|
||||
buffer.drain(1);
|
||||
assertEquals(0, buffer.size());
|
||||
}
|
||||
|
||||
@Test
|
||||
void capacityReturnsConfiguredCapacity() {
|
||||
assertEquals(10, buffer.capacity());
|
||||
}
|
||||
|
||||
@Test
|
||||
void remainingCapacityDecreasesWithOffers() {
|
||||
assertEquals(10, buffer.remainingCapacity());
|
||||
buffer.offer("a");
|
||||
assertEquals(9, buffer.remainingCapacity());
|
||||
}
|
||||
}
|
||||
57
clickhouse/init/01-schema.sql
Normal file
57
clickhouse/init/01-schema.sql
Normal file
@@ -0,0 +1,57 @@
|
||||
-- Cameleer3 ClickHouse Schema
|
||||
-- Tables for route executions, route diagrams, and agent metrics.
|
||||
|
||||
CREATE TABLE IF NOT EXISTS route_executions (
|
||||
execution_id String,
|
||||
route_id LowCardinality(String),
|
||||
agent_id LowCardinality(String),
|
||||
status LowCardinality(String),
|
||||
start_time DateTime64(3, 'UTC'),
|
||||
end_time Nullable(DateTime64(3, 'UTC')),
|
||||
duration_ms UInt64,
|
||||
correlation_id String,
|
||||
exchange_id String,
|
||||
error_message String DEFAULT '',
|
||||
error_stacktrace String DEFAULT '',
|
||||
-- Nested processor executions stored as parallel arrays
|
||||
processor_ids Array(String),
|
||||
processor_types Array(LowCardinality(String)),
|
||||
processor_starts Array(DateTime64(3, 'UTC')),
|
||||
processor_ends Array(DateTime64(3, 'UTC')),
|
||||
processor_durations Array(UInt64),
|
||||
processor_statuses Array(LowCardinality(String)),
|
||||
-- Metadata
|
||||
server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC'),
|
||||
-- Skip indexes
|
||||
INDEX idx_correlation correlation_id TYPE bloom_filter GRANULARITY 4,
|
||||
INDEX idx_error error_message TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 4
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
PARTITION BY toYYYYMMDD(start_time)
|
||||
ORDER BY (agent_id, status, start_time, execution_id)
|
||||
TTL toDateTime(start_time) + toIntervalDay(30)
|
||||
SETTINGS ttl_only_drop_parts = 1;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS route_diagrams (
|
||||
content_hash String,
|
||||
route_id LowCardinality(String),
|
||||
agent_id LowCardinality(String),
|
||||
definition String,
|
||||
created_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC')
|
||||
)
|
||||
ENGINE = ReplacingMergeTree(created_at)
|
||||
ORDER BY (content_hash);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS agent_metrics (
|
||||
agent_id LowCardinality(String),
|
||||
collected_at DateTime64(3, 'UTC'),
|
||||
metric_name LowCardinality(String),
|
||||
metric_value Float64,
|
||||
tags Map(String, String),
|
||||
server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC')
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
PARTITION BY toYYYYMMDD(collected_at)
|
||||
ORDER BY (agent_id, metric_name, collected_at)
|
||||
TTL toDateTime(collected_at) + toIntervalDay(30)
|
||||
SETTINGS ttl_only_drop_parts = 1;
|
||||
20
docker-compose.yml
Normal file
20
docker-compose.yml
Normal file
@@ -0,0 +1,20 @@
|
||||
services:
|
||||
clickhouse:
|
||||
image: clickhouse/clickhouse-server:25.3
|
||||
ports:
|
||||
- "8123:8123"
|
||||
- "9000:9000"
|
||||
volumes:
|
||||
- clickhouse-data:/var/lib/clickhouse
|
||||
- ./clickhouse/init:/docker-entrypoint-initdb.d
|
||||
environment:
|
||||
CLICKHOUSE_USER: cameleer
|
||||
CLICKHOUSE_PASSWORD: cameleer_dev
|
||||
CLICKHOUSE_DB: cameleer3
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 262144
|
||||
hard: 262144
|
||||
|
||||
volumes:
|
||||
clickhouse-data:
|
||||
Reference in New Issue
Block a user