Files

hsiegeln 6f39e29707 docs: add domain research (stack, features, architecture, pitfalls)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-11 11:05:37 +01:00

8.5 KiB

Raw Blame History

Research Summary: Cameleer3 Server

Domain: Transaction observability server for Apache Camel integrations Researched: 2026-03-11 Overall confidence: MEDIUM (established domain with mature patterns; version numbers unverified against live sources)

Executive Summary

Cameleer3 Server is a write-heavy, read-occasional observability system that receives millions of transaction records per day from distributed Apache Camel agents, stores them with 30-day retention, and provides structured + full-text search. The architecture closely parallels established observability platforms like Jaeger, Zipkin, and njams Server, with the key differentiator being Camel route diagram visualization tied to individual transactions.

The recommended stack centers on ClickHouse as the primary data store. ClickHouse's columnar MergeTree engine provides the exact properties this project needs: massive batch insert throughput, excellent time-range query performance, native TTL-based retention, and 10-20x compression on structured observability data. This is a well-established pattern used by production observability platforms (SigNoz, Uptrace, PostHog all run on ClickHouse).

For full-text search, the recommendation is a phased approach: start with ClickHouse's built-in token bloom filter skip indexes (tokenbf_v1), which handle exact-token search (correlation IDs, error messages, specific values) well enough for MVP. When query patterns demand fuzzy matching or relevance scoring, add OpenSearch as a secondary search index. The architecture should be designed from the start to allow this swap transparently via the repository abstraction in the core module.

The critical architectural pattern is an in-memory write buffer between the HTTP ingestion endpoint and ClickHouse. ClickHouse performs best with batch inserts of 1K-10K rows; individual row inserts are the single most common and most damaging mistake when building on ClickHouse. The buffer also provides the backpressure mechanism (HTTP 429) that prevents the server from being overwhelmed during agent reconnection storms.

The two-module structure (core for domain logic + interfaces, app for Spring Boot wiring + implementations) enforces clean boundaries. Core defines repository interfaces, service implementations, and the write buffer. App provides ClickHouse repository implementations, Spring SseEmitter integration, REST controllers, and security filters. The boundary rule is strict: app depends on core, never the reverse.

Key Findings

Stack: Java 17 / Spring Boot 3.4.3 + ClickHouse (primary store) + ClickHouse skip indexes for text search (phase 1), OpenSearch optional (phase 2+) + Caffeine cache + springdoc-openapi + Auth0 java-jwt. No Kafka, no Elasticsearch, no JPA.

Architecture: Write-heavy CQRS-lite with three data paths: (1) buffered ingestion pipeline to ClickHouse, (2) query engine combining structured ClickHouse queries with text search, (3) SSE connection registry for agent push. Repository abstraction keeps core module storage-agnostic. Content-addressable diagram versioning with async pre-rendering.

Critical pitfall: Row-by-row ClickHouse inserts and wrong ORDER BY design. These two mistakes together will make the system fail within hours under load and cannot be fixed without table recreation. Batch buffering and schema design must be correct from the first implementation.

Implications for Roadmap

Based on research, suggested phase structure:

Foundation + Ingestion Pipeline - Data model, ClickHouse schema design, batch write buffer, ingestion endpoint
- Addresses: Transaction ingestion, storage with TTL retention
- Avoids: Row-by-row inserts, wrong ORDER BY, no backpressure
- This phase needs careful design; ClickHouse ORDER BY and partition strategy are nearly impossible to change later
Transaction Query + API - Query engine, structured filters (time/state/duration), cursor-based pagination, REST controllers
- Addresses: Core search experience, API-first design
- Avoids: OFFSET pagination degradation, N+1 queries by co-locating data access
Agent Registry + SSE - Agent lifecycle management (LIVE/STALE/DEAD), heartbeat monitoring, SSE connection registry, config push
- Addresses: Agent management, real-time server-to-agent communication
- Avoids: SSE connection leaks, ghost agents, reconnection without Last-Event-ID
Diagram Service - Content-addressable versioned storage, async rendering, transaction-diagram linking
- Addresses: Route diagram visualization (key Camel-specific differentiator)
- Avoids: Duplicate diagram storage via content hashing, synchronous rendering bottleneck
Security - JWT authentication, Ed25519 config signing, bootstrap token registration
- Addresses: Production-ready security
- Avoids: Token management without rotation
- Can be partially layered in earlier if needed for integration testing with agents
Full-Text Search - ClickHouse skip indexes initially; OpenSearch integration if bloom filters prove insufficient
- Addresses: "Find any transaction by content" requirement
- Avoids: Using LIKE/hasToken on large text columns without proper indexing
- Decision point: ClickHouse bloom filters may suffice; evaluate before adding OpenSearch
Dashboard + Aggregations - Overview charts, error rates, volume trends using ClickHouse aggregation queries
- Addresses: At-a-glance operational awareness
Web UI - Frontend consuming the REST API exclusively
- Addresses: User-facing interface
- Must come after API is stable per API-first principle

Phase ordering rationale:

Storage before query: you need data to query
Ingestion before agents: agents need somewhere to POST
Query before full-text: structured search first, text layers on top
Agent registry before config push: must know who to push to
Diagrams after query engine: transactions must exist to link diagrams to
Security is cross-cutting but cleanest after core flows work
UI last because API-first means the API must be stable first

Research flags for phases:

Phase 1 (Storage): NEEDS DEEPER RESEARCH -- ClickHouse Java client API, optimal ORDER BY for the specific query patterns, Docker configuration
Phase 4 (Diagrams): NEEDS DEEPER RESEARCH -- server-side graph rendering library selection (Batik, jsvg, JGraphX, or client-side rendering)
Phase 6 (Full-Text): NEEDS DEEPER RESEARCH -- ClickHouse skip index capabilities vs OpenSearch integration complexity; decision point
Phase 8 (UI): NEEDS DEEPER RESEARCH -- frontend framework selection
Phase 2 (Query): Standard patterns, unlikely to need research
Phase 5 (Security): Standard patterns, unlikely to need research

Confidence Assessment

Area	Confidence	Notes
Stack (ClickHouse choice)	HIGH	Well-established pattern for observability; used by SigNoz, Uptrace, PostHog
Stack (version numbers)	LOW	Could not verify against live sources; all versions from training data (May 2025 cutoff)
Features	MEDIUM	Based on domain knowledge of njams, Jaeger, Zipkin; could not verify latest feature trends
Architecture	MEDIUM	Patterns are well-established; batch buffer, SSE registry, content-addressable storage are standard
Pitfalls	HIGH	ClickHouse pitfalls are well-documented; SSE lifecycle issues are common; ingestion backpressure is standard
Full-text search approach	MEDIUM	ClickHouse skip indexes vs OpenSearch is a legitimate decision point that needs hands-on evaluation

Gaps to Address

ClickHouse Java client API: The clickhouse-java library has undergone significant changes. Exact API, connection pooling, and Spring Boot integration patterns need phase-specific research
cameleer3-common PROTOCOL.md: Must read the agent protocol definition before designing ClickHouse schema -- this defines the exact data structures being ingested
ClickHouse Docker setup: Optimal ClickHouse Docker configuration (memory limits, merge settings) for development and production
Full-text search decision: ClickHouse skip indexes may or may not meet the "search by any content" requirement. This needs prototyping with realistic data
Diagram rendering library: Server-side route diagram rendering is a significant unknown; needs prototyping with actual Camel route graph data from cameleer3-common
Frontend framework: No research on UI technology -- deferred to UI phase
Agent protocol stability: The cameleer3-common protocol is still evolving. Schema evolution strategy needs alignment with agent development

8.5 KiB Raw Blame History