| ClickHouse | 24.x+ | Transaction/activity storage | Column-oriented, built for billions of rows, native TTL, excellent time-range queries, MergeTree engine handles millions of inserts/day trivially | MEDIUM |
| clickhouse-java (HTTP) | 0.6.x+ | Java client | Official ClickHouse Java client; HTTP transport is simpler and more reliable than native TCP for Spring Boot apps | MEDIUM |
**Why ClickHouse over alternatives:**
- **vs Elasticsearch/OpenSearch:** ClickHouse is 5-10x more storage-efficient for structured columnar data. For time-series-like transaction data with known schema, ClickHouse drastically outperforms ES on aggregation queries (avg duration, count by state, time bucketing). ES is overkill when you don't need its inverted index for *every* field.
- **vs TimescaleDB:** TimescaleDB is PostgreSQL-based and good for moderate scale, but ClickHouse handles the "millions of inserts per day" tier with less operational overhead. TimescaleDB's row-oriented heritage means larger storage footprint for wide transaction records. ClickHouse's columnar compression achieves 10-20x compression on typical observability data.
- **vs PostgreSQL (plain):** PostgreSQL cannot efficiently handle this insert volume with 30-day retention and fast analytical queries. Partitioning and vacuuming become operational nightmares at this scale.
**ClickHouse key features for this project:**
- **TTL on tables:** `TTL executionDate + INTERVAL 30 DAY` — automatic 30-day retention with zero application code
- **MergeTree engine:** Handles high insert throughput; batch inserts of 10K+ rows are trivial
- **Materialized views:** Pre-aggregate common queries (transactions by state per hour, etc.)
- **Low storage cost:** 10-20x compression means 30 days of millions of transactions fits in modest disk
| OpenSearch | 2.x | Full-text search over payloads, metadata, attributes | True inverted index for arbitrary text search; ClickHouse's full-text is rudimentary | MEDIUM |
| opensearch-java | 2.x | Java client | Official OpenSearch Java client; works well with Spring Boot | MEDIUM |
**Why a separate search engine instead of ClickHouse alone:**
ClickHouse has token-level bloom filter indexes and `hasToken()`/`LIKE` matching, but these are not true full-text search. For the requirement "search by any content in payloads, metadata, and attributes," you need an inverted index with:
- Tokenization and analysis (stemming, case folding)
- Relevance scoring
- Phrase matching
- Highlighting of matched terms in results
**Why OpenSearch over Elasticsearch:**
- Apache 2.0 licensed (no SSPL concerns for self-hosted deployment)
- API-compatible with Elasticsearch 7.x
- Active development, large community
- OpenSearch Dashboards available if needed later
- No licensing ambiguity for Docker deployment
**Dual-store pattern:**
- ClickHouse = source of truth for structured queries (time range, state, duration, aggregations)
- OpenSearch = search index for full-text queries
- Application writes to both; OpenSearch indexed asynchronously from an internal queue
- Structured filters (time, state) applied in ClickHouse; full-text queries in OpenSearch return transaction IDs, then ClickHouse fetches full records
| Caffeine | 3.1.x | In-process cache for agent registry, diagram versions, hot config | Fastest JVM cache; zero network overhead; perfect for single-instance start | MEDIUM |
| Spring Cache (`@Cacheable`) | (Spring Boot) | Cache abstraction | Switch cache backends without code changes | HIGH |
| Redis | 7.x | Distributed cache (Phase 2+, when horizontal scaling) | Shared state across multiple server instances; SSE session coordination | MEDIUM |
**Phased approach:**
1.**Phase 1:** Caffeine only. Single server instance. Agent registry, diagram cache, recent query results all in-process.
2.**Phase 2 (horizontal scaling):** Add Redis for shared state. Agent registry must be consistent across instances. SSE sessions need coordination.
### Message Ingestion: Internal Buffer with Backpressure
| LMAX Disruptor | 4.0.x | High-performance ring buffer for ingestion | Lock-free, single-writer principle, handles burst traffic without blocking HTTP threads | MEDIUM |
| *Alternative:*`java.util.concurrent.LinkedBlockingQueue` | (JDK) | Simpler bounded queue | Good enough for initial implementation; switch to Disruptor if profiling shows contention | HIGH |
**Why an internal buffer, not Kafka:**
Kafka is the standard answer for "high-volume ingestion," but it adds massive operational complexity for a system that:
- Has a single data producer type (Cameleer agents via HTTP POST)
- Does not need replay from an external topic
- Does not need multi-consumer fan-out
- Is already receiving data via HTTP (not streaming)
The right pattern here: **HTTP POST -> bounded in-memory queue -> batch writer to ClickHouse + async indexer to OpenSearch**. If the queue fills up, return HTTP 503 with `Retry-After` header — agents should implement exponential backoff.
**When to add Kafka:** Only if you need cross-datacenter replication, multi-consumer processing, or guaranteed exactly-once delivery beyond what the internal buffer provides. This is a "maybe Phase 3+" decision.
| springdoc-openapi-starter-webmvc-ui | 2.x | OpenAPI 3.1 spec generation + Swagger UI | De facto standard for Spring Boot 3.x API docs; annotation-driven, zero-config for basic setup | MEDIUM |
**Why springdoc over alternatives:**
- **vs SpringFox:** SpringFox is effectively dead; no Spring Boot 3 support
- **vs manual OpenAPI:** Too much maintenance overhead; springdoc generates from code
- springdoc supports Spring Boot 3.x natively, including Spring Security integration
| Spring Security | (Spring Boot 3.4.3) | Authentication/authorization framework | Already part of Spring Boot; JWT filter chain, method security | HIGH |
| java-jwt (Auth0) | 4.x | JWT creation and validation | Lightweight, well-maintained; simpler than Nimbus for this use case | MEDIUM |
| Ed25519 (JDK `java.security`) | (JDK 17) | Config signing | JDK 15+ has native EdDSA support; no external library needed | HIGH |
| JUnit 5 | (Spring Boot) | Unit/integration testing | Already in POM; standard | HIGH |
| Testcontainers | 1.19.x+ | Integration tests with ClickHouse and OpenSearch | Spin up real databases in Docker for tests; no mocking storage layer | MEDIUM |
| Spring Boot Test | (Spring Boot) | Controller/integration testing | `@SpringBootTest`, `MockMvc`, etc. | HIGH |
| Awaitility | 4.2.x | Async testing (SSE, queue processing) | Clean API for testing eventually-consistent behavior | MEDIUM |
| Micrometer | (Spring Boot) | Metrics facade | Built into Spring Boot; exposes ingestion rates, queue depth, query latencies | HIGH |
| Spring Boot Actuator | (Spring Boot) | Health checks, metrics endpoint | `/actuator/health` for Docker health checks, `/actuator/prometheus` for metrics | HIGH |
## Supporting Libraries
| Library | Version | Purpose | When to Use | Confidence |