cameleer/cameleer-server

Fork 0

Files

hsiegeln cb3ebfea7c

CI / cleanup-branch (push) Has been skipped

Details

CI / build (push) Failing after 18s

Details

CI / docker (push) Has been skipped

Details

CI / deploy (push) Has been skipped

Details

CI / deploy-feature (push) Has been skipped

Details

chore: rename cameleer3 to cameleer

Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-15 15:28:42 +02:00

27 KiB

Raw Blame History

Phase 1: Ingestion Pipeline + API Foundation - Research

Researched: 2026-03-11 Domain: ClickHouse batch ingestion, Spring Boot REST API, write buffer with backpressure Confidence: HIGH

Summary

Phase 1 establishes the data pipeline and API skeleton for Cameleer Server. Agents POST execution data, diagrams, and metrics to REST endpoints; the server buffers these in memory and batch-flushes to ClickHouse. The ClickHouse schema design is the most critical and least reversible decision in this phase -- ORDER BY and partitioning cannot be changed without table recreation.

The ClickHouse Java ecosystem has undergone significant changes. The recommended approach is clickhouse-jdbc v0.9.7 (JDBC V2 driver) with Spring Boot's JdbcTemplate for batch inserts. An alternative is the standalone client-v2 artifact which offers a POJO-based insert API, but JDBC integration with Spring Boot is more conventional and better documented. ClickHouse now has a native full-text index (TYPE text, GA as of March 2026) that supersedes the older tokenbf_v1 bloom filter approach -- this is relevant for Phase 2 but should be accounted for in schema design now.

Primary recommendation: Use clickhouse-jdbc 0.9.7 with Spring JdbcTemplate, ArrayBlockingQueue write buffer with scheduled batch flush, daily partitioning with TTL + ttl_only_drop_parts, and Docker Compose for local ClickHouse. Keep Spring Security out of Phase 1 -- all endpoints open, security layered in Phase 4.

<phase_requirements>

Phase Requirements

ID	Description	Research Support
INGST-01 (#1)	Accept RouteExecution via POST /api/v1/data/executions, return 202	REST controller + async write buffer pattern; Jackson deserialization of cameleer-common models
INGST-02 (#2)	Accept RouteGraph via POST /api/v1/data/diagrams, return 202	Same pattern; separate ClickHouse table for diagrams with content-hash dedup
INGST-03 (#3)	Accept metrics via POST /api/v1/data/metrics, return 202	Same pattern; separate ClickHouse table for metrics
INGST-04 (#4)	In-memory batch buffer with configurable flush interval/size	ArrayBlockingQueue + @Scheduled flush; configurable via application.yml
INGST-05 (#5)	Return 503 when write buffer full (backpressure)	queue.offer() returns false when full -> controller returns 503 + Retry-After header
INGST-06 (#6)	ClickHouse TTL expires data after 30 days (configurable)	Daily partitioning + TTL + ttl_only_drop_parts=1; configurable interval
API-01 (#28)	All endpoints under /api/v1/ path	Spring @RequestMapping("/api/v1") base path
API-02 (#29)	OpenAPI/Swagger via springdoc-openapi	springdoc-openapi-starter-webmvc-ui 2.8.6
API-03 (#30)	GET /api/v1/health endpoint	Spring Boot Actuator or custom health controller
API-04 (#31)	Validate X-Cameleer-Protocol-Version: 1 header	Spring HandlerInterceptor or servlet filter
API-05 (#32)	Accept unknown JSON fields (forward compat)	Spring Boot default: FAIL_ON_UNKNOWN_PROPERTIES=false (already the default)
</phase_requirements>

Standard Stack

Core (Phase 1 specific)

Library	Version	Purpose	Why Standard
clickhouse-jdbc	0.9.7 (classifier: all)	ClickHouse JDBC V2 driver	Latest stable; V2 rewrite with improved type handling, batch support; works with Spring JdbcTemplate
Spring Boot Starter Web	3.4.3 (parent)	REST controllers, Jackson	Already in POM
Spring Boot Starter Actuator	3.4.3 (parent)	Health endpoint, metrics	Standard for health checks
springdoc-openapi-starter-webmvc-ui	2.8.6	OpenAPI 3.1 + Swagger UI	Latest stable for Spring Boot 3.4; generates from annotations
Testcontainers (clickhouse)	2.0.2	Integration tests with real ClickHouse	Spins up ClickHouse in Docker for tests
Testcontainers (junit-jupiter)	2.0.2	JUnit 5 integration	Lifecycle management for test containers
HikariCP	(Spring Boot managed)	JDBC connection pool	Default Spring Boot pool; works with ClickHouse JDBC

Supporting

Library	Version	Purpose	When to Use
Jackson JavaTimeModule	(Spring Boot managed)	Instant/Duration serialization	Already noted in project; needed for all timestamp fields
Micrometer	(Spring Boot managed)	Buffer depth metrics, ingestion rate	Expose queue.size() and flush latency as metrics
Awaitility	(Spring Boot managed)	Async test assertions	Testing batch flush timing in integration tests

Alternatives Considered

Instead of	Could Use	Tradeoff
clickhouse-jdbc 0.9.7	client-v2 0.9.7 (standalone)	client-v2 has POJO insert API but no JdbcTemplate/Spring integration; JDBC is more conventional
ArrayBlockingQueue	LMAX Disruptor	Disruptor is faster under extreme contention but adds complexity; ABQ is sufficient for this throughput
Spring JdbcTemplate	Raw JDBC PreparedStatement	JdbcTemplate provides cleaner error handling and resource management; no meaningful overhead

Installation (add to cameleer-server-app/pom.xml):

<!-- ClickHouse JDBC V2 -->
<dependency>
    <groupId>com.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.9.7</version>
    <classifier>all</classifier>
</dependency>

<!-- API Documentation -->
<dependency>
    <groupId>org.springdoc</groupId>
    <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
    <version>2.8.6</version>
</dependency>

<!-- Actuator for health endpoint -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<!-- Testing -->
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>testcontainers-clickhouse</artifactId>
    <version>2.0.2</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>junit-jupiter</artifactId>
    <version>2.0.2</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.awaitility</groupId>
    <artifactId>awaitility</artifactId>
    <scope>test</scope>
</dependency>

Add to cameleer-server-core/pom.xml:

<!-- SLF4J for logging (no Spring dependency) -->
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-api</artifactId>
</dependency>

Architecture Patterns

Recommended Project Structure

cameleer-server-core/src/main/java/com/cameleer/server/core/
    ingestion/
        WriteBuffer.java              # Bounded queue + flush logic
        IngestionService.java         # Accepts data, routes to buffer
    storage/
        ExecutionRepository.java      # Interface: batch insert + query
        DiagramRepository.java        # Interface: store/retrieve diagrams
        MetricsRepository.java        # Interface: store metrics
    model/
        (extend/complement cameleer-common models as needed)

cameleer-server-app/src/main/java/com/cameleer/server/app/
    config/
        ClickHouseConfig.java         # DataSource + JdbcTemplate bean
        IngestionConfig.java          # Buffer size, flush interval from YAML
        WebConfig.java                # Protocol version interceptor
    controller/
        ExecutionController.java      # POST /api/v1/data/executions
        DiagramController.java        # POST /api/v1/data/diagrams
        MetricsController.java        # POST /api/v1/data/metrics
        HealthController.java         # GET /api/v1/health (or use Actuator)
    storage/
        ClickHouseExecutionRepository.java
        ClickHouseDiagramRepository.java
        ClickHouseMetricsRepository.java
    interceptor/
        ProtocolVersionInterceptor.java

Pattern 1: Bounded Write Buffer with Scheduled Flush

What: ArrayBlockingQueue between HTTP endpoint and ClickHouse. Scheduled task drains and batch-inserts. When to use: Always for ClickHouse ingestion.

// In core module -- no Spring dependency
public class WriteBuffer<T> {
    private final BlockingQueue<T> queue;
    private final int capacity;

    public WriteBuffer(int capacity) {
        this.capacity = capacity;
        this.queue = new ArrayBlockingQueue<>(capacity);
    }

    /** Returns false when buffer is full (caller should return 503) */
    public boolean offer(T item) {
        return queue.offer(item);
    }

    public boolean offerBatch(List<T> items) {
        // Try to add all; if any fails, none were lost (already in list)
        for (T item : items) {
            if (!queue.offer(item)) return false;
        }
        return true;
    }

    /** Drain up to maxBatch items. Called by scheduled flush. */
    public List<T> drain(int maxBatch) {
        List<T> batch = new ArrayList<>(maxBatch);
        queue.drainTo(batch, maxBatch);
        return batch;
    }

    public int size() { return queue.size(); }
    public int capacity() { return capacity; }
    public boolean isFull() { return queue.remainingCapacity() == 0; }
}

// In app module -- Spring wiring
@Component
public class ClickHouseFlushScheduler {
    private final WriteBuffer<RouteExecution> executionBuffer;
    private final ExecutionRepository repository;

    @Scheduled(fixedDelayString = "${ingestion.flush-interval-ms:1000}")
    public void flushExecutions() {
        List<RouteExecution> batch = executionBuffer.drain(
            ingestionConfig.getBatchSize()); // default 5000
        if (!batch.isEmpty()) {
            repository.insertBatch(batch);
        }
    }
}

Pattern 2: Controller Returns 202 or 503

What: Ingestion endpoints accept data asynchronously. Return 202 on success, 503 when buffer full. When to use: All ingestion POST endpoints.

@RestController
@RequestMapping("/api/v1/data")
public class ExecutionController {

    @PostMapping("/executions")
    public ResponseEntity<Void> ingestExecutions(
            @RequestBody List<RouteExecution> executions) {
        if (!ingestionService.accept(executions)) {
            return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
                .header("Retry-After", "5")
                .build();
        }
        return ResponseEntity.accepted().build();
    }
}

Pattern 3: ClickHouse Batch Insert via JdbcTemplate

What: Use JdbcTemplate.batchUpdate with PreparedStatement for efficient ClickHouse inserts.

@Repository
public class ClickHouseExecutionRepository implements ExecutionRepository {

    private final JdbcTemplate jdbc;

    @Override
    public void insertBatch(List<RouteExecution> executions) {
        String sql = "INSERT INTO route_executions (execution_id, route_id, "
            + "agent_id, status, start_time, end_time, duration_ms, "
            + "correlation_id, error_message) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)";

        jdbc.batchUpdate(sql, new BatchPreparedStatementSetter() {
            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                RouteExecution e = executions.get(i);
                ps.setString(1, e.getExecutionId());
                ps.setString(2, e.getRouteId());
                ps.setString(3, e.getAgentId());
                ps.setString(4, e.getStatus().name());
                ps.setObject(5, e.getStartTime()); // Instant -> DateTime64
                ps.setObject(6, e.getEndTime());
                ps.setLong(7, e.getDurationMs());
                ps.setString(8, e.getCorrelationId());
                ps.setString(9, e.getErrorMessage());
            }
            @Override
            public int getBatchSize() { return executions.size(); }
        });
    }
}

Pattern 4: Protocol Version Interceptor

What: Validate X-Cameleer-Protocol-Version header on all /api/v1/ requests.

public class ProtocolVersionInterceptor implements HandlerInterceptor {
    @Override
    public boolean preHandle(HttpServletRequest request,
            HttpServletResponse response, Object handler) throws Exception {
        String version = request.getHeader("X-Cameleer-Protocol-Version");
        if (version == null || !"1".equals(version)) {
            response.setStatus(HttpStatus.BAD_REQUEST.value());
            response.getWriter().write(
                "{\"error\":\"Missing or unsupported X-Cameleer-Protocol-Version header\"}");
            return false;
        }
        return true;
    }
}

Note: Health and OpenAPI endpoints should be excluded from this interceptor.

Anti-Patterns to Avoid

Individual row inserts to ClickHouse: Each insert creates a data part. At 50+ agents, you get "too many parts" errors within hours. Always batch.
Unbounded write buffer: Without a capacity limit, agent reconnection storms cause OOM. ArrayBlockingQueue with fixed capacity is mandatory.
Synchronous ClickHouse writes in controller: Blocks HTTP threads during ClickHouse inserts. Always decouple via buffer.
Using JPA/Hibernate with ClickHouse: ClickHouse is not relational. JPA adds friction with zero benefit. Use JdbcTemplate directly.
Bare DateTime in ClickHouse (no timezone): Defaults to server timezone. Always use DateTime64(3, 'UTC').

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
JDBC connection pooling	Custom connection management	HikariCP (Spring Boot default)	Handles timeouts, leak detection, sizing
OpenAPI documentation	Manual JSON/YAML spec	springdoc-openapi	Generates from code; stays in sync automatically
Health endpoint	Custom /health servlet	Spring Boot Actuator	Standard format, integrates with Docker healthchecks
JSON serialization config	Custom ObjectMapper setup	Spring Boot auto-config + application.yml	Spring Boot already configures Jackson correctly
Test database lifecycle	Manual Docker commands	Testcontainers	Automatic container lifecycle per test class

Common Pitfalls

Pitfall 1: Wrong ClickHouse ORDER BY Design

What goes wrong: Choosing ORDER BY (execution_id) makes time-range queries scan entire partitions. Why it happens: Instinct from relational DB where primary key = UUID. How to avoid: ORDER BY must match dominant query pattern. For this project: ORDER BY (agent_id, status, start_time, execution_id) puts the most-filtered columns first. execution_id last because it's high-cardinality. Warning signs: EXPLAIN shows rows_read >> result set size.

Pitfall 2: ClickHouse TTL Fragmenting Partitions

What goes wrong: Row-level TTL rewrites data parts, causing merge pressure. Why it happens: Default TTL behavior deletes individual rows. How to avoid: Use daily partitioning (PARTITION BY toYYYYMMDD(start_time)) combined with SETTINGS ttl_only_drop_parts = 1. This drops entire parts instead of rewriting. Alternatively, use a scheduled job with ALTER TABLE DROP PARTITION for partitions older than 30 days. Warning signs: Continuous high merge activity, elevated CPU during TTL cleanup.

Pitfall 3: Data Loss on Server Restart

What goes wrong: In-memory buffer loses unflushed data on SIGTERM or crash. Why it happens: Default Spring Boot shutdown does not drain custom queues. How to avoid: Implement SmartLifecycle with ordered shutdown: flush buffer before stopping. Accept that crash (not graceful shutdown) may lose up to flush-interval-ms of data -- this is acceptable for observability. Warning signs: Missing transactions around deployment timestamps.

Pitfall 4: DateTime Timezone Mismatch

What goes wrong: Agents send UTC Instants, ClickHouse stores in server-local timezone, queries return wrong time ranges. Why it happens: ClickHouse DateTime defaults to server timezone if not specified. How to avoid: Always use DateTime64(3, 'UTC') in schema. Ensure Jackson serializes Instants as ISO-8601 with Z suffix. Add server_received_at timestamp for clock skew detection.

Pitfall 5: springdoc Not Scanning Controllers

What goes wrong: OpenAPI spec is empty; Swagger UI shows no endpoints. Why it happens: springdoc defaults to scanning the main application package. If controllers are in a different package hierarchy, they are missed. How to avoid: Ensure @SpringBootApplication is in a parent package of all controllers, or configure springdoc.packagesToScan in application.yml.

Code Examples

ClickHouse Schema: Route Executions Table

-- Source: ClickHouse MergeTree docs + project requirements
CREATE TABLE route_executions (
    execution_id     String,
    route_id         LowCardinality(String),
    agent_id         LowCardinality(String),
    status           LowCardinality(String),  -- COMPLETED, FAILED, RUNNING
    start_time       DateTime64(3, 'UTC'),
    end_time         Nullable(DateTime64(3, 'UTC')),
    duration_ms      UInt64,
    correlation_id   String,
    exchange_id      String,
    error_message    Nullable(String),
    error_stacktrace Nullable(String),
    -- Nested processor executions stored as arrays (ClickHouse nested pattern)
    processor_ids    Array(String),
    processor_types  Array(LowCardinality(String)),
    processor_starts Array(DateTime64(3, 'UTC')),
    processor_ends   Array(DateTime64(3, 'UTC')),
    processor_durations Array(UInt64),
    processor_statuses  Array(LowCardinality(String)),
    -- Metadata
    server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC'),
    -- Skip index for future full-text search (Phase 2)
    INDEX idx_correlation correlation_id TYPE bloom_filter GRANULARITY 4,
    INDEX idx_error error_message TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 4
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(start_time)
ORDER BY (agent_id, status, start_time, execution_id)
TTL start_time + INTERVAL 30 DAY
SETTINGS ttl_only_drop_parts = 1;

ClickHouse Schema: Route Diagrams Table

CREATE TABLE route_diagrams (
    content_hash     String,          -- SHA-256 of definition
    route_id         LowCardinality(String),
    agent_id         LowCardinality(String),
    definition       String,          -- JSON graph definition
    created_at       DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC'),
    -- No TTL -- diagrams are small and versioned
)
ENGINE = ReplacingMergeTree(created_at)
ORDER BY (content_hash);

ClickHouse Schema: Metrics Table

CREATE TABLE agent_metrics (
    agent_id         LowCardinality(String),
    collected_at     DateTime64(3, 'UTC'),
    metric_name      LowCardinality(String),
    metric_value     Float64,
    tags             Map(String, String),
    server_received_at DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC')
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(collected_at)
ORDER BY (agent_id, metric_name, collected_at)
TTL collected_at + INTERVAL 30 DAY
SETTINGS ttl_only_drop_parts = 1;

Docker Compose: Local ClickHouse

# docker-compose.yml (development)
services:
  clickhouse:
    image: clickhouse/clickhouse-server:25.3
    ports:
      - "8123:8123"   # HTTP interface
      - "9000:9000"   # Native protocol
    volumes:
      - clickhouse-data:/var/lib/clickhouse
      - ./clickhouse/init:/docker-entrypoint-initdb.d
    environment:
      CLICKHOUSE_USER: cameleer
      CLICKHOUSE_PASSWORD: cameleer_dev
      CLICKHOUSE_DB: cameleer
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

volumes:
  clickhouse-data:

application.yml Configuration

server:
  port: 8081

spring:
  datasource:
    url: jdbc:ch://localhost:8123/cameleer
    username: cameleer
    password: cameleer_dev
    driver-class-name: com.clickhouse.jdbc.ClickHouseDriver
  jackson:
    serialization:
      write-dates-as-timestamps: false
    deserialization:
      fail-on-unknown-properties: false  # API-05: forward compat (also Spring Boot default)

ingestion:
  buffer-capacity: 50000
  batch-size: 5000
  flush-interval-ms: 1000

clickhouse:
  ttl-days: 30

springdoc:
  api-docs:
    path: /api/v1/api-docs
  swagger-ui:
    path: /api/v1/swagger-ui

management:
  endpoints:
    web:
      base-path: /api/v1
      exposure:
        include: health
  endpoint:
    health:
      show-details: always

State of the Art

Old Approach	Current Approach	When Changed	Impact
clickhouse-http-client 0.6.x	clickhouse-jdbc 0.9.7 (V2)	2025	V1 client deprecated; V2 has proper type mapping, batch support
tokenbf_v1 bloom filter index	TYPE text() full-text index	March 2026 (GA)	Native full-text search in ClickHouse; may eliminate need for OpenSearch in Phase 2
springdoc-openapi 2.3.x	springdoc-openapi 2.8.6	2025	Latest for Spring Boot 3.4; v3.x is for Spring Boot 4 only
Testcontainers 1.19.x	Testcontainers 2.0.2	2025	Major version bump; new artifact names (testcontainers-clickhouse)

Deprecated/outdated:

clickhouse-http-client artifact: replaced by clickhouse-jdbc with JDBC V2
tokenbf_v1 / ngrambf_v1 skip indexes: deprecated in favor of TYPE text() index (though still functional)
Testcontainers artifact org.testcontainers:clickhouse: replaced by org.testcontainers:testcontainers-clickhouse

Open Questions

Exact cameleer-common model structure
- What we know: Models include RouteExecution, ProcessorExecution, ExchangeSnapshot, RouteGraph, RouteNode, RouteEdge
- What's unclear: Exact field names, types, nesting structure -- needed to design ClickHouse schema precisely
- Recommendation: Read cameleer-common source code before implementing schema. Schema must match the wire format.
ClickHouse JDBC V2 + HikariCP compatibility
- What we know: clickhouse-jdbc 0.9.7 implements JDBC spec; HikariCP is Spring Boot default
- What's unclear: Whether HikariCP validation queries work correctly with ClickHouse JDBC V2
- Recommendation: Test in integration test; may need spring.datasource.hikari.connection-test-query=SELECT 1
Nested data: arrays vs separate table for ProcessorExecutions
- What we know: ClickHouse supports Array columns and Nested type
- What's unclear: Whether flattening processor executions into arrays in the execution row is better than a separate table with JOIN
- Recommendation: Arrays are faster for co-located reads (no JOIN) but harder to query individually. Start with arrays; add a materialized view if individual processor queries are needed in Phase 2.

Validation Architecture

Test Framework

Property	Value
Framework	JUnit 5 (Spring Boot managed) + Testcontainers 2.0.2
Config file	cameleer-server-app/src/test/resources/application-test.yml (Wave 0)
Quick run command	`mvn test -pl cameleer-server-core -Dtest=WriteBufferTest -q`
Full suite command	`mvn verify`

Phase Requirements -> Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
INGST-01	POST /api/v1/data/executions returns 202, data in ClickHouse	integration	`mvn test -pl cameleer-server-app -Dtest=ExecutionControllerIT -q`	Wave 0
INGST-02	POST /api/v1/data/diagrams returns 202	integration	`mvn test -pl cameleer-server-app -Dtest=DiagramControllerIT -q`	Wave 0
INGST-03	POST /api/v1/data/metrics returns 202	integration	`mvn test -pl cameleer-server-app -Dtest=MetricsControllerIT -q`	Wave 0
INGST-04	Buffer flushes at interval/size	unit	`mvn test -pl cameleer-server-core -Dtest=WriteBufferTest -q`	Wave 0
INGST-05	503 when buffer full	unit+integration	`mvn test -pl cameleer-server-app -Dtest=BackpressureIT -q`	Wave 0
INGST-06	TTL removes old data	integration	`mvn test -pl cameleer-server-app -Dtest=ClickHouseTtlIT -q`	Wave 0
API-01	Endpoints under /api/v1/	integration	Covered by controller ITs	Wave 0
API-02	OpenAPI docs available	integration	`mvn test -pl cameleer-server-app -Dtest=OpenApiIT -q`	Wave 0
API-03	GET /api/v1/health responds	integration	`mvn test -pl cameleer-server-app -Dtest=HealthControllerIT -q`	Wave 0
API-04	Protocol version header validated	integration	`mvn test -pl cameleer-server-app -Dtest=ProtocolVersionIT -q`	Wave 0
API-05	Unknown JSON fields accepted	unit	`mvn test -pl cameleer-server-app -Dtest=ForwardCompatIT -q`	Wave 0

Sampling Rate

Per task commit: mvn test -pl cameleer-server-core -q (unit tests, fast)
Per wave merge: mvn verify (full suite with Testcontainers integration tests)
Phase gate: Full suite green before verification

Wave 0 Gaps

cameleer-server-app/src/test/resources/application-test.yml -- test ClickHouse config
cameleer-server-core/src/test/java/.../WriteBufferTest.java -- buffer unit tests
cameleer-server-app/src/test/java/.../AbstractClickHouseIT.java -- shared Testcontainers base class
cameleer-server-app/src/test/java/.../ExecutionControllerIT.java -- ingestion integration test
Docker available on test machine for Testcontainers

Sources

Primary (HIGH confidence)

ClickHouse Java Client releases -- confirmed v0.9.7 as latest (March 2026)
ClickHouse JDBC V2 docs -- JDBC driver API, batch insert patterns
ClickHouse Java Client V2 docs -- standalone client API, POJO insert
ClickHouse full-text search blog -- TYPE text() index GA March 2026
ClickHouse MergeTree settings -- ttl_only_drop_parts
Testcontainers ClickHouse module -- v2.0.2, dependency coordinates
springdoc-openapi releases -- v2.8.x for Spring Boot 3.4

Secondary (MEDIUM confidence)

Spring Boot Jackson default config -- FAIL_ON_UNKNOWN_PROPERTIES=false is default
ClickHouse Docker Compose docs -- container setup
Baeldung ClickHouse + Spring Boot -- integration patterns

Tertiary (LOW confidence)

ClickHouse ORDER BY optimization -- based on training data knowledge of MergeTree internals; should validate with EXPLAIN on real data

Metadata

Confidence breakdown:

Standard stack: HIGH -- versions verified against live sources (GitHub releases, Maven Central)
Architecture: HIGH -- write buffer + batch flush is established ClickHouse pattern used by SigNoz, Uptrace
ClickHouse schema: MEDIUM -- ORDER BY design is sound but should be validated with realistic query patterns
Pitfalls: HIGH -- well-documented ClickHouse failure modes, confirmed by multiple sources

Research date: 2026-03-11 Valid until: 2026-04-11 (30 days -- stack is stable)

27 KiB Raw Blame History

Phase 1: Ingestion Pipeline + API Foundation - Research

Summary

Phase Requirements

Standard Stack

Core (Phase 1 specific)

Supporting

Alternatives Considered

Architecture Patterns

Recommended Project Structure

Pattern 1: Bounded Write Buffer with Scheduled Flush

Pattern 2: Controller Returns 202 or 503

Pattern 3: ClickHouse Batch Insert via JdbcTemplate

Pattern 4: Protocol Version Interceptor

Anti-Patterns to Avoid

Don't Hand-Roll

Common Pitfalls

Pitfall 1: Wrong ClickHouse ORDER BY Design

Pitfall 2: ClickHouse TTL Fragmenting Partitions

Pitfall 3: Data Loss on Server Restart

Pitfall 4: DateTime Timezone Mismatch

Pitfall 5: springdoc Not Scanning Controllers

Code Examples

ClickHouse Schema: Route Executions Table

ClickHouse Schema: Route Diagrams Table

ClickHouse Schema: Metrics Table

Docker Compose: Local ClickHouse

application.yml Configuration

State of the Art

Open Questions

Validation Architecture

Test Framework

Phase Requirements -> Test Map

Sampling Rate

Wave 0 Gaps

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

Metadata

27 KiB

Raw Blame History