579 lines
27 KiB
Markdown
579 lines
27 KiB
Markdown
|
|
# Phase 1: Ingestion Pipeline + API Foundation - Research
|
||
|
|
|
||
|
|
**Researched:** 2026-03-11
|
||
|
|
**Domain:** ClickHouse batch ingestion, Spring Boot REST API, write buffer with backpressure
|
||
|
|
**Confidence:** HIGH
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
Phase 1 establishes the data pipeline and API skeleton for Cameleer3 Server. Agents POST execution data, diagrams, and metrics to REST endpoints; the server buffers these in memory and batch-flushes to ClickHouse. The ClickHouse schema design is the most critical and least reversible decision in this phase -- ORDER BY and partitioning cannot be changed without table recreation.
|
||
|
|
|
||
|
|
The ClickHouse Java ecosystem has undergone significant changes. The recommended approach is **clickhouse-jdbc v0.9.7** (JDBC V2 driver) with Spring Boot's JdbcTemplate for batch inserts. An alternative is the standalone **client-v2** artifact which offers a POJO-based insert API, but JDBC integration with Spring Boot is more conventional and better documented. ClickHouse now has a native full-text index (TYPE text, GA as of March 2026) that supersedes the older tokenbf_v1 bloom filter approach -- this is relevant for Phase 2 but should be accounted for in schema design now.
|
||
|
|
|
||
|
|
**Primary recommendation:** Use clickhouse-jdbc 0.9.7 with Spring JdbcTemplate, ArrayBlockingQueue write buffer with scheduled batch flush, daily partitioning with TTL + ttl_only_drop_parts, and Docker Compose for local ClickHouse. Keep Spring Security out of Phase 1 -- all endpoints open, security layered in Phase 4.
|
||
|
|
|
||
|
|
<phase_requirements>
|
||
|
|
## Phase Requirements
|
||
|
|
|
||
|
|
| ID | Description | Research Support |
|
||
|
|
|----|-------------|-----------------|
|
||
|
|
| INGST-01 (#1) | Accept RouteExecution via POST /api/v1/data/executions, return 202 | REST controller + async write buffer pattern; Jackson deserialization of cameleer3-common models |
|
||
|
|
| INGST-02 (#2) | Accept RouteGraph via POST /api/v1/data/diagrams, return 202 | Same pattern; separate ClickHouse table for diagrams with content-hash dedup |
|
||
|
|
| INGST-03 (#3) | Accept metrics via POST /api/v1/data/metrics, return 202 | Same pattern; separate ClickHouse table for metrics |
|
||
|
|
| INGST-04 (#4) | In-memory batch buffer with configurable flush interval/size | ArrayBlockingQueue + @Scheduled flush; configurable via application.yml |
|
||
|
|
| INGST-05 (#5) | Return 503 when write buffer full (backpressure) | queue.offer() returns false when full -> controller returns 503 + Retry-After header |
|
||
|
|
| INGST-06 (#6) | ClickHouse TTL expires data after 30 days (configurable) | Daily partitioning + TTL + ttl_only_drop_parts=1; configurable interval |
|
||
|
|
| API-01 (#28) | All endpoints under /api/v1/ path | Spring @RequestMapping("/api/v1") base path |
|
||
|
|
| API-02 (#29) | OpenAPI/Swagger via springdoc-openapi | springdoc-openapi-starter-webmvc-ui 2.8.6 |
|
||
|
|
| API-03 (#30) | GET /api/v1/health endpoint | Spring Boot Actuator or custom health controller |
|
||
|
|
| API-04 (#31) | Validate X-Cameleer-Protocol-Version: 1 header | Spring HandlerInterceptor or servlet filter |
|
||
|
|
| API-05 (#32) | Accept unknown JSON fields (forward compat) | Spring Boot default: FAIL_ON_UNKNOWN_PROPERTIES=false (already the default) |
|
||
|
|
</phase_requirements>
|
||
|
|
|
||
|
|
## Standard Stack
|
||
|
|
|
||
|
|
### Core (Phase 1 specific)
|
||
|
|
|
||
|
|
| Library | Version | Purpose | Why Standard |
|
||
|
|
|---------|---------|---------|--------------|
|
||
|
|
| clickhouse-jdbc | 0.9.7 (classifier: all) | ClickHouse JDBC V2 driver | Latest stable; V2 rewrite with improved type handling, batch support; works with Spring JdbcTemplate |
|
||
|
|
| Spring Boot Starter Web | 3.4.3 (parent) | REST controllers, Jackson | Already in POM |
|
||
|
|
| Spring Boot Starter Actuator | 3.4.3 (parent) | Health endpoint, metrics | Standard for health checks |
|
||
|
|
| springdoc-openapi-starter-webmvc-ui | 2.8.6 | OpenAPI 3.1 + Swagger UI | Latest stable for Spring Boot 3.4; generates from annotations |
|
||
|
|
| Testcontainers (clickhouse) | 2.0.2 | Integration tests with real ClickHouse | Spins up ClickHouse in Docker for tests |
|
||
|
|
| Testcontainers (junit-jupiter) | 2.0.2 | JUnit 5 integration | Lifecycle management for test containers |
|
||
|
|
| HikariCP | (Spring Boot managed) | JDBC connection pool | Default Spring Boot pool; works with ClickHouse JDBC |
|
||
|
|
|
||
|
|
### Supporting
|
||
|
|
|
||
|
|
| Library | Version | Purpose | When to Use |
|
||
|
|
|---------|---------|---------|-------------|
|
||
|
|
| Jackson JavaTimeModule | (Spring Boot managed) | Instant/Duration serialization | Already noted in project; needed for all timestamp fields |
|
||
|
|
| Micrometer | (Spring Boot managed) | Buffer depth metrics, ingestion rate | Expose queue.size() and flush latency as metrics |
|
||
|
|
| Awaitility | (Spring Boot managed) | Async test assertions | Testing batch flush timing in integration tests |
|
||
|
|
|
||
|
|
### Alternatives Considered
|
||
|
|
|
||
|
|
| Instead of | Could Use | Tradeoff |
|
||
|
|
|------------|-----------|----------|
|
||
|
|
| clickhouse-jdbc 0.9.7 | client-v2 0.9.7 (standalone) | client-v2 has POJO insert API but no JdbcTemplate/Spring integration; JDBC is more conventional |
|
||
|
|
| ArrayBlockingQueue | LMAX Disruptor | Disruptor is faster under extreme contention but adds complexity; ABQ is sufficient for this throughput |
|
||
|
|
| Spring JdbcTemplate | Raw JDBC PreparedStatement | JdbcTemplate provides cleaner error handling and resource management; no meaningful overhead |
|
||
|
|
|
||
|
|
**Installation (add to cameleer3-server-app/pom.xml):**
|
||
|
|
```xml
|
||
|
|
<!-- ClickHouse JDBC V2 -->
|
||
|
|
<dependency>
|
||
|
|
<groupId>com.clickhouse</groupId>
|
||
|
|
<artifactId>clickhouse-jdbc</artifactId>
|
||
|
|
<version>0.9.7</version>
|
||
|
|
<classifier>all</classifier>
|
||
|
|
</dependency>
|
||
|
|
|
||
|
|
<!-- API Documentation -->
|
||
|
|
<dependency>
|
||
|
|
<groupId>org.springdoc</groupId>
|
||
|
|
<artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
|
||
|
|
<version>2.8.6</version>
|
||
|
|
</dependency>
|
||
|
|
|
||
|
|
<!-- Actuator for health endpoint -->
|
||
|
|
<dependency>
|
||
|
|
<groupId>org.springframework.boot</groupId>
|
||
|
|
<artifactId>spring-boot-starter-actuator</artifactId>
|
||
|
|
</dependency>
|
||
|
|
|
||
|
|
<!-- Testing -->
|
||
|
|
<dependency>
|
||
|
|
<groupId>org.testcontainers</groupId>
|
||
|
|
<artifactId>testcontainers-clickhouse</artifactId>
|
||
|
|
<version>2.0.2</version>
|
||
|
|
<scope>test</scope>
|
||
|
|
</dependency>
|
||
|
|
<dependency>
|
||
|
|
<groupId>org.testcontainers</groupId>
|
||
|
|
<artifactId>junit-jupiter</artifactId>
|
||
|
|
<version>2.0.2</version>
|
||
|
|
<scope>test</scope>
|
||
|
|
</dependency>
|
||
|
|
<dependency>
|
||
|
|
<groupId>org.awaitility</groupId>
|
||
|
|
<artifactId>awaitility</artifactId>
|
||
|
|
<scope>test</scope>
|
||
|
|
</dependency>
|
||
|
|
```
|
||
|
|
|
||
|
|
**Add to cameleer3-server-core/pom.xml:**
|
||
|
|
```xml
|
||
|
|
<!-- SLF4J for logging (no Spring dependency) -->
|
||
|
|
<dependency>
|
||
|
|
<groupId>org.slf4j</groupId>
|
||
|
|
<artifactId>slf4j-api</artifactId>
|
||
|
|
</dependency>
|
||
|
|
```
|
||
|
|
|
||
|
|
## Architecture Patterns
|
||
|
|
|
||
|
|
### Recommended Project Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
cameleer3-server-core/src/main/java/com/cameleer3/server/core/
|
||
|
|
ingestion/
|
||
|
|
WriteBuffer.java # Bounded queue + flush logic
|
||
|
|
IngestionService.java # Accepts data, routes to buffer
|
||
|
|
storage/
|
||
|
|
ExecutionRepository.java # Interface: batch insert + query
|
||
|
|
DiagramRepository.java # Interface: store/retrieve diagrams
|
||
|
|
MetricsRepository.java # Interface: store metrics
|
||
|
|
model/
|
||
|
|
(extend/complement cameleer3-common models as needed)
|
||
|
|
|
||
|
|
cameleer3-server-app/src/main/java/com/cameleer3/server/app/
|
||
|
|
config/
|
||
|
|
ClickHouseConfig.java # DataSource + JdbcTemplate bean
|
||
|
|
IngestionConfig.java # Buffer size, flush interval from YAML
|
||
|
|
WebConfig.java # Protocol version interceptor
|
||
|
|
controller/
|
||
|
|
ExecutionController.java # POST /api/v1/data/executions
|
||
|
|
DiagramController.java # POST /api/v1/data/diagrams
|
||
|
|
MetricsController.java # POST /api/v1/data/metrics
|
||
|
|
HealthController.java # GET /api/v1/health (or use Actuator)
|
||
|
|
storage/
|
||
|
|
ClickHouseExecutionRepository.java
|
||
|
|
ClickHouseDiagramRepository.java
|
||
|
|
ClickHouseMetricsRepository.java
|
||
|
|
interceptor/
|
||
|
|
ProtocolVersionInterceptor.java
|
||
|
|
```
|
||
|
|
|
||
|
|
### Pattern 1: Bounded Write Buffer with Scheduled Flush
|
||
|
|
|
||
|
|
**What:** ArrayBlockingQueue between HTTP endpoint and ClickHouse. Scheduled task drains and batch-inserts.
|
||
|
|
**When to use:** Always for ClickHouse ingestion.
|
||
|
|
|
||
|
|
```java
|
||
|
|
// In core module -- no Spring dependency
|
||
|
|
public class WriteBuffer<T> {
|
||
|
|
private final BlockingQueue<T> queue;
|
||
|
|
private final int capacity;
|
||
|
|
|
||
|
|
public WriteBuffer(int capacity) {
|
||
|
|
this.capacity = capacity;
|
||
|
|
this.queue = new ArrayBlockingQueue<>(capacity);
|
||
|
|
}
|
||
|
|
|
||
|
|
/** Returns false when buffer is full (caller should return 503) */
|
||
|
|
public boolean offer(T item) {
|
||
|
|
return queue.offer(item);
|
||
|
|
}
|
||
|
|
|
||
|
|
public boolean offerBatch(List<T> items) {
|
||
|
|
// Try to add all; if any fails, none were lost (already in list)
|
||
|
|
for (T item : items) {
|
||
|
|
if (!queue.offer(item)) return false;
|
||
|
|
}
|
||
|
|
return true;
|
||
|
|
}
|
||
|
|
|
||
|
|
/** Drain up to maxBatch items. Called by scheduled flush. */
|
||
|
|
public List<T> drain(int maxBatch) {
|
||
|
|
List<T> batch = new ArrayList<>(maxBatch);
|
||
|
|
queue.drainTo(batch, maxBatch);
|
||
|
|
return batch;
|
||
|
|
}
|
||
|
|
|
||
|
|
public int size() { return queue.size(); }
|
||
|
|
public int capacity() { return capacity; }
|
||
|
|
public boolean isFull() { return queue.remainingCapacity() == 0; }
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
```java
|
||
|
|
// In app module -- Spring wiring
|
||
|
|
@Component
|
||
|
|
public class ClickHouseFlushScheduler {
|
||
|
|
private final WriteBuffer<RouteExecution> executionBuffer;
|
||
|
|
private final ExecutionRepository repository;
|
||
|
|
|
||
|
|
@Scheduled(fixedDelayString = "${ingestion.flush-interval-ms:1000}")
|
||
|
|
public void flushExecutions() {
|
||
|
|
List<RouteExecution> batch = executionBuffer.drain(
|
||
|
|
ingestionConfig.getBatchSize()); // default 5000
|
||
|
|
if (!batch.isEmpty()) {
|
||
|
|
repository.insertBatch(batch);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Pattern 2: Controller Returns 202 or 503
|
||
|
|
|
||
|
|
**What:** Ingestion endpoints accept data asynchronously. Return 202 on success, 503 when buffer full.
|
||
|
|
**When to use:** All ingestion POST endpoints.
|
||
|
|
|
||
|
|
```java
|
||
|
|
@RestController
|
||
|
|
@RequestMapping("/api/v1/data")
|
||
|
|
public class ExecutionController {
|
||
|
|
|
||
|
|
@PostMapping("/executions")
|
||
|
|
public ResponseEntity<Void> ingestExecutions(
|
||
|
|
@RequestBody List<RouteExecution> executions) {
|
||
|
|
if (!ingestionService.accept(executions)) {
|
||
|
|
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
|
||
|
|
.header("Retry-After", "5")
|
||
|
|
.build();
|
||
|
|
}
|
||
|
|
return ResponseEntity.accepted().build();
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Pattern 3: ClickHouse Batch Insert via JdbcTemplate
|
||
|
|
|
||
|
|
**What:** Use JdbcTemplate.batchUpdate with PreparedStatement for efficient ClickHouse inserts.
|
||
|
|
|
||
|
|
```java
|
||
|
|
@Repository
|
||
|
|
public class ClickHouseExecutionRepository implements ExecutionRepository {
|
||
|
|
|
||
|
|
private final JdbcTemplate jdbc;
|
||
|
|
|
||
|
|
@Override
|
||
|
|
public void insertBatch(List<RouteExecution> executions) {
|
||
|
|
String sql = "INSERT INTO route_executions (execution_id, route_id, "
|
||
|
|
+ "agent_id, status, start_time, end_time, duration_ms, "
|
||
|
|
+ "correlation_id, error_message) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)";
|
||
|
|
|
||
|
|
jdbc.batchUpdate(sql, new BatchPreparedStatementSetter() {
|
||
|
|
@Override
|
||
|
|
public void setValues(PreparedStatement ps, int i) throws SQLException {
|
||
|
|
RouteExecution e = executions.get(i);
|
||
|
|
ps.setString(1, e.getExecutionId());
|
||
|
|
ps.setString(2, e.getRouteId());
|
||
|
|
ps.setString(3, e.getAgentId());
|
||
|
|
ps.setString(4, e.getStatus().name());
|
||
|
|
ps.setObject(5, e.getStartTime()); // Instant -> DateTime64
|
||
|
|
ps.setObject(6, e.getEndTime());
|
||
|
|
ps.setLong(7, e.getDurationMs());
|
||
|
|
ps.setString(8, e.getCorrelationId());
|
||
|
|
ps.setString(9, e.getErrorMessage());
|
||
|
|
}
|
||
|
|
@Override
|
||
|
|
public int getBatchSize() { return executions.size(); }
|
||
|
|
});
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Pattern 4: Protocol Version Interceptor
|
||
|
|
|
||
|
|
**What:** Validate X-Cameleer-Protocol-Version header on all /api/v1/ requests.
|
||
|
|
|
||
|
|
```java
|
||
|
|
public class ProtocolVersionInterceptor implements HandlerInterceptor {
|
||
|
|
@Override
|
||
|
|
public boolean preHandle(HttpServletRequest request,
|
||
|
|
HttpServletResponse response, Object handler) throws Exception {
|
||
|
|
String version = request.getHeader("X-Cameleer-Protocol-Version");
|
||
|
|
if (version == null || !"1".equals(version)) {
|
||
|
|
response.setStatus(HttpStatus.BAD_REQUEST.value());
|
||
|
|
response.getWriter().write(
|
||
|
|
"{\"error\":\"Missing or unsupported X-Cameleer-Protocol-Version header\"}");
|
||
|
|
return false;
|
||
|
|
}
|
||
|
|
return true;
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Note: Health and OpenAPI endpoints should be excluded from this interceptor.
|
||
|
|
|
||
|
|
### Anti-Patterns to Avoid
|
||
|
|
|
||
|
|
- **Individual row inserts to ClickHouse:** Each insert creates a data part. At 50+ agents, you get "too many parts" errors within hours. Always batch.
|
||
|
|
- **Unbounded write buffer:** Without a capacity limit, agent reconnection storms cause OOM. ArrayBlockingQueue with fixed capacity is mandatory.
|
||
|
|
- **Synchronous ClickHouse writes in controller:** Blocks HTTP threads during ClickHouse inserts. Always decouple via buffer.
|
||
|
|
- **Using JPA/Hibernate with ClickHouse:** ClickHouse is not relational. JPA adds friction with zero benefit. Use JdbcTemplate directly.
|
||
|
|
- **Bare DateTime in ClickHouse (no timezone):** Defaults to server timezone. Always use DateTime64(3, 'UTC').
|
||
|
|
|
||
|
|
## Don't Hand-Roll
|
||
|
|
|
||
|
|
| Problem | Don't Build | Use Instead | Why |
|
||
|
|
|---------|-------------|-------------|-----|
|
||
|
|
| JDBC connection pooling | Custom connection management | HikariCP (Spring Boot default) | Handles timeouts, leak detection, sizing |
|
||
|
|
| OpenAPI documentation | Manual JSON/YAML spec | springdoc-openapi | Generates from code; stays in sync automatically |
|
||
|
|
| Health endpoint | Custom /health servlet | Spring Boot Actuator | Standard format, integrates with Docker healthchecks |
|
||
|
|
| JSON serialization config | Custom ObjectMapper setup | Spring Boot auto-config + application.yml | Spring Boot already configures Jackson correctly |
|
||
|
|
| Test database lifecycle | Manual Docker commands | Testcontainers | Automatic container lifecycle per test class |
|
||
|
|
|
||
|
|
## Common Pitfalls
|
||
|
|
|
||
|
|
### Pitfall 1: Wrong ClickHouse ORDER BY Design
|
||
|
|
**What goes wrong:** Choosing ORDER BY (execution_id) makes time-range queries scan entire partitions.
|
||
|
|
**Why it happens:** Instinct from relational DB where primary key = UUID.
|
||
|
|
**How to avoid:** ORDER BY must match dominant query pattern. For this project: `ORDER BY (agent_id, status, start_time, execution_id)` puts the most-filtered columns first. execution_id last because it's high-cardinality.
|
||
|
|
**Warning signs:** EXPLAIN shows rows_read >> result set size.
|
||
|
|
|
||
|
|
### Pitfall 2: ClickHouse TTL Fragmenting Partitions
|
||
|
|
**What goes wrong:** Row-level TTL rewrites data parts, causing merge pressure.
|
||
|
|
**Why it happens:** Default TTL behavior deletes individual rows.
|
||
|
|
**How to avoid:** Use daily partitioning (`PARTITION BY toYYYYMMDD(start_time)`) combined with `SETTINGS ttl_only_drop_parts = 1`. This drops entire parts instead of rewriting. Alternatively, use a scheduled job with `ALTER TABLE DROP PARTITION` for partitions older than 30 days.
|
||
|
|
**Warning signs:** Continuous high merge activity, elevated CPU during TTL cleanup.
|
||
|
|
|
||
|
|
### Pitfall 3: Data Loss on Server Restart
|
||
|
|
**What goes wrong:** In-memory buffer loses unflushed data on SIGTERM or crash.
|
||
|
|
**Why it happens:** Default Spring Boot shutdown does not drain custom queues.
|
||
|
|
**How to avoid:** Implement `SmartLifecycle` with ordered shutdown: flush buffer before stopping. Accept that crash (not graceful shutdown) may lose up to flush-interval-ms of data -- this is acceptable for observability.
|
||
|
|
**Warning signs:** Missing transactions around deployment timestamps.
|
||
|
|
|
||
|
|
### Pitfall 4: DateTime Timezone Mismatch
|
||
|
|
**What goes wrong:** Agents send UTC Instants, ClickHouse stores in server-local timezone, queries return wrong time ranges.
|
||
|
|
**Why it happens:** ClickHouse DateTime defaults to server timezone if not specified.
|
||
|
|
**How to avoid:** Always use `DateTime64(3, 'UTC')` in schema. Ensure Jackson serializes Instants as ISO-8601 with Z suffix. Add `server_received_at` timestamp for clock skew detection.
|
||
|
|
|
||
|
|
### Pitfall 5: springdoc Not Scanning Controllers
|
||
|
|
**What goes wrong:** OpenAPI spec is empty; Swagger UI shows no endpoints.
|
||
|
|
**Why it happens:** springdoc defaults to scanning the main application package. If controllers are in a different package hierarchy, they are missed.
|
||
|
|
**How to avoid:** Ensure `@SpringBootApplication` is in a parent package of all controllers, or configure `springdoc.packagesToScan` in application.yml.
|
||
|
|
|
||
|
|
## Code Examples
|
||
|
|
|
||
|
|
### ClickHouse Schema: Route Executions Table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
-- Source: ClickHouse MergeTree docs + project requirements
|
||
|
|
CREATE TABLE route_executions (
|
||
|
|
execution_id String,
|
||
|
|
route_id LowCardinality(String),
|
||
|
|
agent_id LowCardinality(String),
|
||
|
|
status LowCardinality(String), -- COMPLETED, FAILED, RUNNING
|
||
|
|
start_time DateTime64(3, 'UTC'),
|
||
|
|
end_time Nullable(DateTime64(3, 'UTC')),
|
||
|
|
duration_ms UInt64,
|
||
|
|
correlation_id String,
|
||
|
|
exchange_id String,
|
||
|
|
error_message Nullable(String),
|
||
|
|
error_stacktrace Nullable(String),
|
||
|
|
-- Nested processor executions stored as arrays (ClickHouse nested pattern)
|
||
|
|
processor_ids Array(String),
|
||
|
|
processor_types Array(LowCardinality(String)),
|
||
|
|
processor_starts Array(DateTime64(3, 'UTC')),
|
||
|
|
processor_ends Array(DateTime64(3, 'UTC')),
|
||
|
|
processor_durations Array(UInt64),
|
||
|
|
processor_statuses Array(LowCardinality(String)),
|
||
|
|
-- Metadata
|
||
|
|
server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC'),
|
||
|
|
-- Skip index for future full-text search (Phase 2)
|
||
|
|
INDEX idx_correlation correlation_id TYPE bloom_filter GRANULARITY 4,
|
||
|
|
INDEX idx_error error_message TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 4
|
||
|
|
)
|
||
|
|
ENGINE = MergeTree()
|
||
|
|
PARTITION BY toYYYYMMDD(start_time)
|
||
|
|
ORDER BY (agent_id, status, start_time, execution_id)
|
||
|
|
TTL start_time + INTERVAL 30 DAY
|
||
|
|
SETTINGS ttl_only_drop_parts = 1;
|
||
|
|
```
|
||
|
|
|
||
|
|
### ClickHouse Schema: Route Diagrams Table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE route_diagrams (
|
||
|
|
content_hash String, -- SHA-256 of definition
|
||
|
|
route_id LowCardinality(String),
|
||
|
|
agent_id LowCardinality(String),
|
||
|
|
definition String, -- JSON graph definition
|
||
|
|
created_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC'),
|
||
|
|
-- No TTL -- diagrams are small and versioned
|
||
|
|
)
|
||
|
|
ENGINE = ReplacingMergeTree(created_at)
|
||
|
|
ORDER BY (content_hash);
|
||
|
|
```
|
||
|
|
|
||
|
|
### ClickHouse Schema: Metrics Table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
CREATE TABLE agent_metrics (
|
||
|
|
agent_id LowCardinality(String),
|
||
|
|
collected_at DateTime64(3, 'UTC'),
|
||
|
|
metric_name LowCardinality(String),
|
||
|
|
metric_value Float64,
|
||
|
|
tags Map(String, String),
|
||
|
|
server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC')
|
||
|
|
)
|
||
|
|
ENGINE = MergeTree()
|
||
|
|
PARTITION BY toYYYYMMDD(collected_at)
|
||
|
|
ORDER BY (agent_id, metric_name, collected_at)
|
||
|
|
TTL collected_at + INTERVAL 30 DAY
|
||
|
|
SETTINGS ttl_only_drop_parts = 1;
|
||
|
|
```
|
||
|
|
|
||
|
|
### Docker Compose: Local ClickHouse
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# docker-compose.yml (development)
|
||
|
|
services:
|
||
|
|
clickhouse:
|
||
|
|
image: clickhouse/clickhouse-server:25.3
|
||
|
|
ports:
|
||
|
|
- "8123:8123" # HTTP interface
|
||
|
|
- "9000:9000" # Native protocol
|
||
|
|
volumes:
|
||
|
|
- clickhouse-data:/var/lib/clickhouse
|
||
|
|
- ./clickhouse/init:/docker-entrypoint-initdb.d
|
||
|
|
environment:
|
||
|
|
CLICKHOUSE_USER: cameleer
|
||
|
|
CLICKHOUSE_PASSWORD: cameleer_dev
|
||
|
|
CLICKHOUSE_DB: cameleer3
|
||
|
|
ulimits:
|
||
|
|
nofile:
|
||
|
|
soft: 262144
|
||
|
|
hard: 262144
|
||
|
|
|
||
|
|
volumes:
|
||
|
|
clickhouse-data:
|
||
|
|
```
|
||
|
|
|
||
|
|
### application.yml Configuration
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
server:
|
||
|
|
port: 8081
|
||
|
|
|
||
|
|
spring:
|
||
|
|
datasource:
|
||
|
|
url: jdbc:ch://localhost:8123/cameleer3
|
||
|
|
username: cameleer
|
||
|
|
password: cameleer_dev
|
||
|
|
driver-class-name: com.clickhouse.jdbc.ClickHouseDriver
|
||
|
|
jackson:
|
||
|
|
serialization:
|
||
|
|
write-dates-as-timestamps: false
|
||
|
|
deserialization:
|
||
|
|
fail-on-unknown-properties: false # API-05: forward compat (also Spring Boot default)
|
||
|
|
|
||
|
|
ingestion:
|
||
|
|
buffer-capacity: 50000
|
||
|
|
batch-size: 5000
|
||
|
|
flush-interval-ms: 1000
|
||
|
|
|
||
|
|
clickhouse:
|
||
|
|
ttl-days: 30
|
||
|
|
|
||
|
|
springdoc:
|
||
|
|
api-docs:
|
||
|
|
path: /api/v1/api-docs
|
||
|
|
swagger-ui:
|
||
|
|
path: /api/v1/swagger-ui
|
||
|
|
|
||
|
|
management:
|
||
|
|
endpoints:
|
||
|
|
web:
|
||
|
|
base-path: /api/v1
|
||
|
|
exposure:
|
||
|
|
include: health
|
||
|
|
endpoint:
|
||
|
|
health:
|
||
|
|
show-details: always
|
||
|
|
```
|
||
|
|
|
||
|
|
## State of the Art
|
||
|
|
|
||
|
|
| Old Approach | Current Approach | When Changed | Impact |
|
||
|
|
|--------------|------------------|--------------|--------|
|
||
|
|
| clickhouse-http-client 0.6.x | clickhouse-jdbc 0.9.7 (V2) | 2025 | V1 client deprecated; V2 has proper type mapping, batch support |
|
||
|
|
| tokenbf_v1 bloom filter index | TYPE text() full-text index | March 2026 (GA) | Native full-text search in ClickHouse; may eliminate need for OpenSearch in Phase 2 |
|
||
|
|
| springdoc-openapi 2.3.x | springdoc-openapi 2.8.6 | 2025 | Latest for Spring Boot 3.4; v3.x is for Spring Boot 4 only |
|
||
|
|
| Testcontainers 1.19.x | Testcontainers 2.0.2 | 2025 | Major version bump; new artifact names (testcontainers-clickhouse) |
|
||
|
|
|
||
|
|
**Deprecated/outdated:**
|
||
|
|
- `clickhouse-http-client` artifact: replaced by `clickhouse-jdbc` with JDBC V2
|
||
|
|
- `tokenbf_v1` / `ngrambf_v1` skip indexes: deprecated in favor of TYPE text() index (though still functional)
|
||
|
|
- Testcontainers artifact `org.testcontainers:clickhouse`: replaced by `org.testcontainers:testcontainers-clickhouse`
|
||
|
|
|
||
|
|
## Open Questions
|
||
|
|
|
||
|
|
1. **Exact cameleer3-common model structure**
|
||
|
|
- What we know: Models include RouteExecution, ProcessorExecution, ExchangeSnapshot, RouteGraph, RouteNode, RouteEdge
|
||
|
|
- What's unclear: Exact field names, types, nesting structure -- needed to design ClickHouse schema precisely
|
||
|
|
- Recommendation: Read cameleer3-common source code before implementing schema. Schema must match the wire format.
|
||
|
|
|
||
|
|
2. **ClickHouse JDBC V2 + HikariCP compatibility**
|
||
|
|
- What we know: clickhouse-jdbc 0.9.7 implements JDBC spec; HikariCP is Spring Boot default
|
||
|
|
- What's unclear: Whether HikariCP validation queries work correctly with ClickHouse JDBC V2
|
||
|
|
- Recommendation: Test in integration test; may need `spring.datasource.hikari.connection-test-query=SELECT 1`
|
||
|
|
|
||
|
|
3. **Nested data: arrays vs separate table for ProcessorExecutions**
|
||
|
|
- What we know: ClickHouse supports Array columns and Nested type
|
||
|
|
- What's unclear: Whether flattening processor executions into arrays in the execution row is better than a separate table with JOIN
|
||
|
|
- Recommendation: Arrays are faster for co-located reads (no JOIN) but harder to query individually. Start with arrays; add a materialized view if individual processor queries are needed in Phase 2.
|
||
|
|
|
||
|
|
## Validation Architecture
|
||
|
|
|
||
|
|
### Test Framework
|
||
|
|
|
||
|
|
| Property | Value |
|
||
|
|
|----------|-------|
|
||
|
|
| Framework | JUnit 5 (Spring Boot managed) + Testcontainers 2.0.2 |
|
||
|
|
| Config file | cameleer3-server-app/src/test/resources/application-test.yml (Wave 0) |
|
||
|
|
| Quick run command | `mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q` |
|
||
|
|
| Full suite command | `mvn verify` |
|
||
|
|
|
||
|
|
### Phase Requirements -> Test Map
|
||
|
|
|
||
|
|
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||
|
|
|--------|----------|-----------|-------------------|-------------|
|
||
|
|
| INGST-01 | POST /api/v1/data/executions returns 202, data in ClickHouse | integration | `mvn test -pl cameleer3-server-app -Dtest=ExecutionControllerIT -q` | Wave 0 |
|
||
|
|
| INGST-02 | POST /api/v1/data/diagrams returns 202 | integration | `mvn test -pl cameleer3-server-app -Dtest=DiagramControllerIT -q` | Wave 0 |
|
||
|
|
| INGST-03 | POST /api/v1/data/metrics returns 202 | integration | `mvn test -pl cameleer3-server-app -Dtest=MetricsControllerIT -q` | Wave 0 |
|
||
|
|
| INGST-04 | Buffer flushes at interval/size | unit | `mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q` | Wave 0 |
|
||
|
|
| INGST-05 | 503 when buffer full | unit+integration | `mvn test -pl cameleer3-server-app -Dtest=BackpressureIT -q` | Wave 0 |
|
||
|
|
| INGST-06 | TTL removes old data | integration | `mvn test -pl cameleer3-server-app -Dtest=ClickHouseTtlIT -q` | Wave 0 |
|
||
|
|
| API-01 | Endpoints under /api/v1/ | integration | Covered by controller ITs | Wave 0 |
|
||
|
|
| API-02 | OpenAPI docs available | integration | `mvn test -pl cameleer3-server-app -Dtest=OpenApiIT -q` | Wave 0 |
|
||
|
|
| API-03 | GET /api/v1/health responds | integration | `mvn test -pl cameleer3-server-app -Dtest=HealthControllerIT -q` | Wave 0 |
|
||
|
|
| API-04 | Protocol version header validated | integration | `mvn test -pl cameleer3-server-app -Dtest=ProtocolVersionIT -q` | Wave 0 |
|
||
|
|
| API-05 | Unknown JSON fields accepted | unit | `mvn test -pl cameleer3-server-app -Dtest=ForwardCompatIT -q` | Wave 0 |
|
||
|
|
|
||
|
|
### Sampling Rate
|
||
|
|
- **Per task commit:** `mvn test -pl cameleer3-server-core -q` (unit tests, fast)
|
||
|
|
- **Per wave merge:** `mvn verify` (full suite with Testcontainers integration tests)
|
||
|
|
- **Phase gate:** Full suite green before verification
|
||
|
|
|
||
|
|
### Wave 0 Gaps
|
||
|
|
- [ ] `cameleer3-server-app/src/test/resources/application-test.yml` -- test ClickHouse config
|
||
|
|
- [ ] `cameleer3-server-core/src/test/java/.../WriteBufferTest.java` -- buffer unit tests
|
||
|
|
- [ ] `cameleer3-server-app/src/test/java/.../AbstractClickHouseIT.java` -- shared Testcontainers base class
|
||
|
|
- [ ] `cameleer3-server-app/src/test/java/.../ExecutionControllerIT.java` -- ingestion integration test
|
||
|
|
- [ ] Docker available on test machine for Testcontainers
|
||
|
|
|
||
|
|
## Sources
|
||
|
|
|
||
|
|
### Primary (HIGH confidence)
|
||
|
|
- [ClickHouse Java Client releases](https://github.com/ClickHouse/clickhouse-java/releases) -- confirmed v0.9.7 as latest (March 2026)
|
||
|
|
- [ClickHouse JDBC V2 docs](https://clickhouse.com/docs/integrations/language-clients/java/jdbc) -- JDBC driver API, batch insert patterns
|
||
|
|
- [ClickHouse Java Client V2 docs](https://clickhouse.com/docs/en/integrations/java/client-v2) -- standalone client API, POJO insert
|
||
|
|
- [ClickHouse full-text search blog](https://clickhouse.com/blog/clickhouse-full-text-search) -- TYPE text() index GA March 2026
|
||
|
|
- [ClickHouse MergeTree settings](https://clickhouse.com/docs/operations/settings/merge-tree-settings) -- ttl_only_drop_parts
|
||
|
|
- [Testcontainers ClickHouse module](https://java.testcontainers.org/modules/databases/clickhouse/) -- v2.0.2, dependency coordinates
|
||
|
|
- [springdoc-openapi releases](https://github.com/springdoc/springdoc-openapi/releases) -- v2.8.x for Spring Boot 3.4
|
||
|
|
|
||
|
|
### Secondary (MEDIUM confidence)
|
||
|
|
- [Spring Boot Jackson default config](https://github.com/spring-projects/spring-boot/issues/12684) -- FAIL_ON_UNKNOWN_PROPERTIES=false is default
|
||
|
|
- [ClickHouse Docker Compose docs](https://clickhouse.com/docs/use-cases/observability/clickstack/deployment/docker-compose) -- container setup
|
||
|
|
- [Baeldung ClickHouse + Spring Boot](https://www.baeldung.com/spring-boot-olap-clickhouse-database) -- integration patterns
|
||
|
|
|
||
|
|
### Tertiary (LOW confidence)
|
||
|
|
- ClickHouse ORDER BY optimization -- based on training data knowledge of MergeTree internals; should validate with EXPLAIN on real data
|
||
|
|
|
||
|
|
## Metadata
|
||
|
|
|
||
|
|
**Confidence breakdown:**
|
||
|
|
- Standard stack: HIGH -- versions verified against live sources (GitHub releases, Maven Central)
|
||
|
|
- Architecture: HIGH -- write buffer + batch flush is established ClickHouse pattern used by SigNoz, Uptrace
|
||
|
|
- ClickHouse schema: MEDIUM -- ORDER BY design is sound but should be validated with realistic query patterns
|
||
|
|
- Pitfalls: HIGH -- well-documented ClickHouse failure modes, confirmed by multiple sources
|
||
|
|
|
||
|
|
**Research date:** 2026-03-11
|
||
|
|
**Valid until:** 2026-04-11 (30 days -- stack is stable)
|