# ClickHouse Phase 3: Stats & Analytics — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace TimescaleDB continuous aggregates with ClickHouse materialized views and implement a `ClickHouseStatsStore` that reads from them using `-Merge` aggregate functions.
**Architecture:** 5 DDL scripts create AggregatingMergeTree target tables + materialized views that trigger on INSERT to `executions` and `processor_executions`. A `ClickHouseStatsStore` implements the existing `StatsStore` interface, translating `time_bucket()` → `toStartOfInterval()`, `SUM(total_count)` → `countMerge(total_count)`, `approx_percentile` → `quantileMerge`, etc. SLA and topErrors queries hit the raw `executions` / `processor_executions` tables with `FINAL`. Feature flag `cameleer.storage.stats=postgres|clickhouse` controls which implementation is active.
All 5 table+MV pairs in a single DDL file. Tables use `AggregatingMergeTree()`. MVs use `-State` combinators and trigger on INSERT to `executions` or `processor_executions`.
ORDER BY (tenant_id, application_name, route_id, processor_id, bucket)
TTL bucket + INTERVAL 365 DAY DELETE;
CREATE MATERIALIZED VIEW IF NOT EXISTS stats_1m_processor_detail_mv TO stats_1m_processor_detail AS
SELECT
tenant_id,
application_name,
route_id,
processor_id,
toStartOfMinute(start_time) AS bucket,
countState() AS total_count,
countIfState(status = 'FAILED') AS failed_count,
sumState(duration_ms) AS duration_sum,
maxState(duration_ms) AS duration_max,
quantileState(0.99)(duration_ms) AS p99_duration
FROM processor_executions
GROUP BY tenant_id, application_name, route_id, processor_id, bucket;
```
Note: The `ClickHouseSchemaInitializer` runs each `.sql` file as a single statement. ClickHouse supports multiple statements separated by `;` in a single call, BUT the JDBC driver may not. If the initializer fails, each CREATE statement may need to be in its own file. Check during testing.
**IMPORTANT**: The ClickHouseSchemaInitializer needs to handle multi-statement files. Read it first — if it uses `jdbc.execute(sql)` for each file, the semicolons between statements will cause issues. If so, split into separate files (V4a, V4b, etc.) or modify the initializer to split on `;`.
Read `cameleer-server-app/src/main/java/com/cameleer/server/app/config/ClickHouseSchemaInitializer.java`. If it runs each file as a single `jdbc.execute()`, modify it to split on `;` and run each statement separately. If it already handles this, proceed.
The store implements `StatsStore` using ClickHouse `-Merge` functions. It follows the same pattern as `PostgresStatsStore` but with ClickHouse SQL syntax.
**Key implementation patterns:**
1.**Stats queries** (queryStats): Read from `stats_1m_*` tables using `-Merge` combinators:
```sql
SELECT
countMerge(total_count) AS total_count,
countIfMerge(failed_count) AS failed_count,
CASE WHEN countMerge(total_count) > 0
THEN sumMerge(duration_sum) / countMerge(total_count) ELSE 0 END AS avg_duration,
quantileMerge(0.99)(p99_duration) AS p99_duration,
countIfMerge(running_count) AS active_count
FROM stats_1m_all
WHERE tenant_id = 'default' AND bucket >= ? AND bucket < ?
```
Same pattern for prev-24h and today queries (identical to PostgresStatsStore logic).
2.**Timeseries queries** (queryTimeseries): Group by time period:
```sql
SELECT
toStartOfInterval(bucket, INTERVAL ? SECOND) AS period,
countMerge(total_count) AS total_count,
countIfMerge(failed_count) AS failed_count,
CASE WHEN countMerge(total_count) > 0
THEN sumMerge(duration_sum) / countMerge(total_count) ELSE 0 END AS avg_duration,
quantileMerge(0.99)(p99_duration) AS p99_duration,
countIfMerge(running_count) AS active_count
FROM stats_1m_app
WHERE tenant_id = 'default' AND bucket >= ? AND bucket < ? AND application_name = ?
GROUP BY period ORDER BY period
```
3.**Grouped timeseries**: Same as timeseries but with extra GROUP BY column (application_name or route_id), returned as `Map<String, StatsTimeseries>`.
4.**SLA compliance**: Hit raw `executions FINAL` table:
```sql
SELECT
countIf(duration_ms <= ? AND status != 'RUNNING') AS compliant,
countIf(status != 'RUNNING') AS total
FROM executions FINAL
WHERE tenant_id = 'default' AND start_time >= ? AND start_time < ?
AND application_name = ?
```
5.**SLA counts by app/route**: Same pattern with GROUP BY.
6.**Top errors**: Hit raw `executions FINAL` or `processor_executions` table with CTE for counts + velocity. ClickHouse differences:
- No `FILTER (WHERE ...)` → use `countIf(...)`
- No `LEFT(s, n)` → use `substring(s, 1, n)`
- CTE syntax is identical (`WITH ... AS (...)`)
7.**Active error types**: `SELECT uniq(...)` or `COUNT(DISTINCT ...)` from raw executions.
**Test approach**: Seed data by inserting directly into `executions` and `processor_executions` tables (the MVs trigger automatically on INSERT). Then query via the StatsStore methods and verify results.
**Test data seeding**: Insert 10 executions across 2 apps, 3 routes, spanning 10 minutes. Include some FAILED, some COMPLETED, varying durations. Then verify:
-`stats()` returns correct totals
-`statsForApp()` filters correctly
-`timeseries()` returns multiple buckets
-`slaCompliance()` returns correct percentage
-`topErrors()` returns ranked errors
-`punchcard()` returns non-empty cells
- [ ]**Step 1: Write the failing integration test**
Follow the `PostgresStatsStore` structure closely. Same private `Filter` record, same `queryStats`/`queryTimeseries`/`queryGroupedTimeseries` helper methods. Replace PG-specific SQL with CH equivalents per the translation table above.