.planning/research/SUMMARY.md

# Research Summary: Cameleer Server

**Domain:** Transaction observability server for Apache Camel integrations
**Researched:** 2026-03-11
**Overall confidence:** MEDIUM (established domain with mature patterns; version numbers unverified against live sources)

## Executive Summary

Cameleer Server is a write-heavy, read-occasional observability system that receives millions of transaction records per day from distributed Apache Camel agents, stores them with 30-day retention, and provides structured + full-text search. The architecture closely parallels established observability platforms like Jaeger, Zipkin, and njams Server, with the key differentiator being Camel route diagram visualization tied to individual transactions.

The recommended stack centers on **ClickHouse** as the primary data store. ClickHouse's columnar MergeTree engine provides the exact properties this project needs: massive batch insert throughput, excellent time-range query performance, native TTL-based retention, and 10-20x compression on structured observability data. This is a well-established pattern used by production observability platforms (SigNoz, Uptrace, PostHog all run on ClickHouse).

For full-text search, the recommendation is a **phased approach**: start with ClickHouse's built-in token bloom filter skip indexes (`tokenbf_v1`), which handle exact-token search (correlation IDs, error messages, specific values) well enough for MVP. When query patterns demand fuzzy matching or relevance scoring, add **OpenSearch** as a secondary search index. The architecture should be designed from the start to allow this swap transparently via the repository abstraction in the core module.

The critical architectural pattern is an **in-memory write buffer** between the HTTP ingestion endpoint and ClickHouse. ClickHouse performs best with batch inserts of 1K-10K rows; individual row inserts are the single most common and most damaging mistake when building on ClickHouse. The buffer also provides the backpressure mechanism (HTTP 429) that prevents the server from being overwhelmed during agent reconnection storms.

The two-module structure (core for domain logic + interfaces, app for Spring Boot wiring + implementations) enforces clean boundaries. Core defines repository interfaces, service implementations, and the write buffer. App provides ClickHouse repository implementations, Spring SseEmitter integration, REST controllers, and security filters. The boundary rule is strict: app depends on core, never the reverse.

## Key Findings

**Stack:** Java 17 / Spring Boot 3.4.3 + ClickHouse (primary store) + ClickHouse skip indexes for text search (phase 1), OpenSearch optional (phase 2+) + Caffeine cache + springdoc-openapi + Auth0 java-jwt. No Kafka, no Elasticsearch, no JPA.

**Architecture:** Write-heavy CQRS-lite with three data paths: (1) buffered ingestion pipeline to ClickHouse, (2) query engine combining structured ClickHouse queries with text search, (3) SSE connection registry for agent push. Repository abstraction keeps core module storage-agnostic. Content-addressable diagram versioning with async pre-rendering.

**Critical pitfall:** Row-by-row ClickHouse inserts and wrong ORDER BY design. These two mistakes together will make the system fail within hours under load and cannot be fixed without table recreation. Batch buffering and schema design must be correct from the first implementation.

## Implications for Roadmap

Based on research, suggested phase structure:

1. **Foundation + Ingestion Pipeline** - Data model, ClickHouse schema design, batch write buffer, ingestion endpoint
   - Addresses: Transaction ingestion, storage with TTL retention
   - Avoids: Row-by-row inserts, wrong ORDER BY, no backpressure
   - This phase needs careful design; ClickHouse ORDER BY and partition strategy are nearly impossible to change later

2. **Transaction Query + API** - Query engine, structured filters (time/state/duration), cursor-based pagination, REST controllers
   - Addresses: Core search experience, API-first design
   - Avoids: OFFSET pagination degradation, N+1 queries by co-locating data access

3. **Agent Registry + SSE** - Agent lifecycle management (LIVE/STALE/DEAD), heartbeat monitoring, SSE connection registry, config push
   - Addresses: Agent management, real-time server-to-agent communication
   - Avoids: SSE connection leaks, ghost agents, reconnection without Last-Event-ID

4. **Diagram Service** - Content-addressable versioned storage, async rendering, transaction-diagram linking
   - Addresses: Route diagram visualization (key Camel-specific differentiator)
   - Avoids: Duplicate diagram storage via content hashing, synchronous rendering bottleneck

5. **Security** - JWT authentication, Ed25519 config signing, bootstrap token registration
   - Addresses: Production-ready security
   - Avoids: Token management without rotation
   - Can be partially layered in earlier if needed for integration testing with agents

6. **Full-Text Search** - ClickHouse skip indexes initially; OpenSearch integration if bloom filters prove insufficient
   - Addresses: "Find any transaction by content" requirement
   - Avoids: Using LIKE/hasToken on large text columns without proper indexing
   - Decision point: ClickHouse bloom filters may suffice; evaluate before adding OpenSearch

7. **Dashboard + Aggregations** - Overview charts, error rates, volume trends using ClickHouse aggregation queries
   - Addresses: At-a-glance operational awareness

8. **Web UI** - Frontend consuming the REST API exclusively
   - Addresses: User-facing interface
   - Must come after API is stable per API-first principle

**Phase ordering rationale:**
- Storage before query: you need data to query
- Ingestion before agents: agents need somewhere to POST
- Query before full-text: structured search first, text layers on top
- Agent registry before config push: must know who to push to
- Diagrams after query engine: transactions must exist to link diagrams to
- Security is cross-cutting but cleanest after core flows work
- UI last because API-first means the API must be stable first

**Research flags for phases:**
- Phase 1 (Storage): NEEDS DEEPER RESEARCH -- ClickHouse Java client API, optimal ORDER BY for the specific query patterns, Docker configuration
- Phase 4 (Diagrams): NEEDS DEEPER RESEARCH -- server-side graph rendering library selection (Batik, jsvg, JGraphX, or client-side rendering)
- Phase 6 (Full-Text): NEEDS DEEPER RESEARCH -- ClickHouse skip index capabilities vs OpenSearch integration complexity; decision point
- Phase 8 (UI): NEEDS DEEPER RESEARCH -- frontend framework selection
- Phase 2 (Query): Standard patterns, unlikely to need research
- Phase 5 (Security): Standard patterns, unlikely to need research

## Confidence Assessment

| Area | Confidence | Notes |
|------|------------|-------|
| Stack (ClickHouse choice) | HIGH | Well-established pattern for observability; used by SigNoz, Uptrace, PostHog |
| Stack (version numbers) | LOW | Could not verify against live sources; all versions from training data (May 2025 cutoff) |
| Features | MEDIUM | Based on domain knowledge of njams, Jaeger, Zipkin; could not verify latest feature trends |
| Architecture | MEDIUM | Patterns are well-established; batch buffer, SSE registry, content-addressable storage are standard |
| Pitfalls | HIGH | ClickHouse pitfalls are well-documented; SSE lifecycle issues are common; ingestion backpressure is standard |
| Full-text search approach | MEDIUM | ClickHouse skip indexes vs OpenSearch is a legitimate decision point that needs hands-on evaluation |

## Gaps to Address

- **ClickHouse Java client API:** The clickhouse-java library has undergone significant changes. Exact API, connection pooling, and Spring Boot integration patterns need phase-specific research
- **cameleer-common PROTOCOL.md:** Must read the agent protocol definition before designing ClickHouse schema -- this defines the exact data structures being ingested
- **ClickHouse Docker setup:** Optimal ClickHouse Docker configuration (memory limits, merge settings) for development and production
- **Full-text search decision:** ClickHouse skip indexes may or may not meet the "search by any content" requirement. This needs prototyping with realistic data
- **Diagram rendering library:** Server-side route diagram rendering is a significant unknown; needs prototyping with actual Camel route graph data from cameleer-common
- **Frontend framework:** No research on UI technology -- deferred to UI phase
- **Agent protocol stability:** The cameleer-common protocol is still evolving. Schema evolution strategy needs alignment with agent development
chore: rename cameleer3 to cameleer Rename Java packages from com.cameleer3 to com.cameleer, module directories from cameleer3-* to cameleer-*, and all references throughout workflows, Dockerfiles, docs, migrations, and pom.xml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-15 15:28:42 +02:00			`# Research Summary: Cameleer Server`
docs: add domain research (stack, features, architecture, pitfalls) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-11 11:05:37 +01:00
			`Domain: Transaction observability server for Apache Camel integrations`
			`Researched: 2026-03-11`
			`Overall confidence: MEDIUM (established domain with mature patterns; version numbers unverified against live sources)`

			`## Executive Summary`

chore: rename cameleer3 to cameleer Rename Java packages from com.cameleer3 to com.cameleer, module directories from cameleer3-* to cameleer-*, and all references throughout workflows, Dockerfiles, docs, migrations, and pom.xml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-15 15:28:42 +02:00			`Cameleer Server is a write-heavy, read-occasional observability system that receives millions of transaction records per day from distributed Apache Camel agents, stores them with 30-day retention, and provides structured + full-text search. The architecture closely parallels established observability platforms like Jaeger, Zipkin, and njams Server, with the key differentiator being Camel route diagram visualization tied to individual transactions.`
docs: add domain research (stack, features, architecture, pitfalls) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-11 11:05:37 +01:00
			`The recommended stack centers on ClickHouse as the primary data store. ClickHouse's columnar MergeTree engine provides the exact properties this project needs: massive batch insert throughput, excellent time-range query performance, native TTL-based retention, and 10-20x compression on structured observability data. This is a well-established pattern used by production observability platforms (SigNoz, Uptrace, PostHog all run on ClickHouse).`

			For full-text search, the recommendation is a phased approach: start with ClickHouse's built-in token bloom filter skip indexes (`tokenbf_v1`), which handle exact-token search (correlation IDs, error messages, specific values) well enough for MVP. When query patterns demand fuzzy matching or relevance scoring, add OpenSearch as a secondary search index. The architecture should be designed from the start to allow this swap transparently via the repository abstraction in the core module.

			`The critical architectural pattern is an in-memory write buffer between the HTTP ingestion endpoint and ClickHouse. ClickHouse performs best with batch inserts of 1K-10K rows; individual row inserts are the single most common and most damaging mistake when building on ClickHouse. The buffer also provides the backpressure mechanism (HTTP 429) that prevents the server from being overwhelmed during agent reconnection storms.`

			`The two-module structure (core for domain logic + interfaces, app for Spring Boot wiring + implementations) enforces clean boundaries. Core defines repository interfaces, service implementations, and the write buffer. App provides ClickHouse repository implementations, Spring SseEmitter integration, REST controllers, and security filters. The boundary rule is strict: app depends on core, never the reverse.`

			`## Key Findings`

			`Stack: Java 17 / Spring Boot 3.4.3 + ClickHouse (primary store) + ClickHouse skip indexes for text search (phase 1), OpenSearch optional (phase 2+) + Caffeine cache + springdoc-openapi + Auth0 java-jwt. No Kafka, no Elasticsearch, no JPA.`

			`Architecture: Write-heavy CQRS-lite with three data paths: (1) buffered ingestion pipeline to ClickHouse, (2) query engine combining structured ClickHouse queries with text search, (3) SSE connection registry for agent push. Repository abstraction keeps core module storage-agnostic. Content-addressable diagram versioning with async pre-rendering.`

			`Critical pitfall: Row-by-row ClickHouse inserts and wrong ORDER BY design. These two mistakes together will make the system fail within hours under load and cannot be fixed without table recreation. Batch buffering and schema design must be correct from the first implementation.`

			`## Implications for Roadmap`

			`Based on research, suggested phase structure:`

			`1. Foundation + Ingestion Pipeline - Data model, ClickHouse schema design, batch write buffer, ingestion endpoint`
			`- Addresses: Transaction ingestion, storage with TTL retention`
			`- Avoids: Row-by-row inserts, wrong ORDER BY, no backpressure`
			`- This phase needs careful design; ClickHouse ORDER BY and partition strategy are nearly impossible to change later`

			`2. Transaction Query + API - Query engine, structured filters (time/state/duration), cursor-based pagination, REST controllers`
			`- Addresses: Core search experience, API-first design`
			`- Avoids: OFFSET pagination degradation, N+1 queries by co-locating data access`

			`3. Agent Registry + SSE - Agent lifecycle management (LIVE/STALE/DEAD), heartbeat monitoring, SSE connection registry, config push`
			`- Addresses: Agent management, real-time server-to-agent communication`
			`- Avoids: SSE connection leaks, ghost agents, reconnection without Last-Event-ID`

			`4. Diagram Service - Content-addressable versioned storage, async rendering, transaction-diagram linking`
			`- Addresses: Route diagram visualization (key Camel-specific differentiator)`
			`- Avoids: Duplicate diagram storage via content hashing, synchronous rendering bottleneck`

			`5. Security - JWT authentication, Ed25519 config signing, bootstrap token registration`
			`- Addresses: Production-ready security`
			`- Avoids: Token management without rotation`
			`- Can be partially layered in earlier if needed for integration testing with agents`

			`6. Full-Text Search - ClickHouse skip indexes initially; OpenSearch integration if bloom filters prove insufficient`
			`- Addresses: "Find any transaction by content" requirement`
			`- Avoids: Using LIKE/hasToken on large text columns without proper indexing`
			`- Decision point: ClickHouse bloom filters may suffice; evaluate before adding OpenSearch`

			`7. Dashboard + Aggregations - Overview charts, error rates, volume trends using ClickHouse aggregation queries`
			`- Addresses: At-a-glance operational awareness`

			`8. Web UI - Frontend consuming the REST API exclusively`
			`- Addresses: User-facing interface`
			`- Must come after API is stable per API-first principle`

			`Phase ordering rationale:`
			`- Storage before query: you need data to query`
			`- Ingestion before agents: agents need somewhere to POST`
			`- Query before full-text: structured search first, text layers on top`
			`- Agent registry before config push: must know who to push to`
			`- Diagrams after query engine: transactions must exist to link diagrams to`
			`- Security is cross-cutting but cleanest after core flows work`
			`- UI last because API-first means the API must be stable first`

			`Research flags for phases:`
			`- Phase 1 (Storage): NEEDS DEEPER RESEARCH -- ClickHouse Java client API, optimal ORDER BY for the specific query patterns, Docker configuration`
			`- Phase 4 (Diagrams): NEEDS DEEPER RESEARCH -- server-side graph rendering library selection (Batik, jsvg, JGraphX, or client-side rendering)`
			`- Phase 6 (Full-Text): NEEDS DEEPER RESEARCH -- ClickHouse skip index capabilities vs OpenSearch integration complexity; decision point`
			`- Phase 8 (UI): NEEDS DEEPER RESEARCH -- frontend framework selection`
			`- Phase 2 (Query): Standard patterns, unlikely to need research`
			`- Phase 5 (Security): Standard patterns, unlikely to need research`

			`## Confidence Assessment`

			`\| Area \| Confidence \| Notes \|`
			`\|------\|------------\|-------\|`
			`\| Stack (ClickHouse choice) \| HIGH \| Well-established pattern for observability; used by SigNoz, Uptrace, PostHog \|`
			`\| Stack (version numbers) \| LOW \| Could not verify against live sources; all versions from training data (May 2025 cutoff) \|`
			`\| Features \| MEDIUM \| Based on domain knowledge of njams, Jaeger, Zipkin; could not verify latest feature trends \|`
			`\| Architecture \| MEDIUM \| Patterns are well-established; batch buffer, SSE registry, content-addressable storage are standard \|`
			`\| Pitfalls \| HIGH \| ClickHouse pitfalls are well-documented; SSE lifecycle issues are common; ingestion backpressure is standard \|`
			`\| Full-text search approach \| MEDIUM \| ClickHouse skip indexes vs OpenSearch is a legitimate decision point that needs hands-on evaluation \|`

			`## Gaps to Address`

			`- ClickHouse Java client API: The clickhouse-java library has undergone significant changes. Exact API, connection pooling, and Spring Boot integration patterns need phase-specific research`
chore: rename cameleer3 to cameleer Rename Java packages from com.cameleer3 to com.cameleer, module directories from cameleer3-* to cameleer-*, and all references throughout workflows, Dockerfiles, docs, migrations, and pom.xml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-15 15:28:42 +02:00			`- cameleer-common PROTOCOL.md: Must read the agent protocol definition before designing ClickHouse schema -- this defines the exact data structures being ingested`
docs: add domain research (stack, features, architecture, pitfalls) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-11 11:05:37 +01:00			`- ClickHouse Docker setup: Optimal ClickHouse Docker configuration (memory limits, merge settings) for development and production`
			`- Full-text search decision: ClickHouse skip indexes may or may not meet the "search by any content" requirement. This needs prototyping with realistic data`
chore: rename cameleer3 to cameleer Rename Java packages from com.cameleer3 to com.cameleer, module directories from cameleer3-* to cameleer-*, and all references throughout workflows, Dockerfiles, docs, migrations, and pom.xml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-15 15:28:42 +02:00			`- Diagram rendering library: Server-side route diagram rendering is a significant unknown; needs prototyping with actual Camel route graph data from cameleer-common`
docs: add domain research (stack, features, architecture, pitfalls) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-11 11:05:37 +01:00			`- Frontend framework: No research on UI technology -- deferred to UI phase`
chore: rename cameleer3 to cameleer Rename Java packages from com.cameleer3 to com.cameleer, module directories from cameleer3-* to cameleer-*, and all references throughout workflows, Dockerfiles, docs, migrations, and pom.xml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-15 15:28:42 +02:00			`- Agent protocol stability: The cameleer-common protocol is still evolving. Schema evolution strategy needs alignment with agent development`