**Scope:** Replace ClickHouse with PostgreSQL/TimescaleDB (source of truth, analytics) and OpenSearch (full-text search index)
## Motivation
The current all-in-ClickHouse storage layer has structural problems at scale:
- **OOM on batch inserts**: Wide rows with parallel arrays of variable-length blobs (exchange bodies, headers, stacktraces) exhaust ClickHouse server memory during batch insert processing.
- **CRUD misfit**: Users, OIDC config, and diagrams use ReplacingMergeTree, requiring `FINAL` on every read and workarounds to prevent version row accumulation.
- **Weak full-text search**: LIKE patterns with tokenbf_v1 skip indexes provide no ranking, stemming, fuzzy matching, or infix wildcard support.
- **Rigid data model**: Parallel arrays for processor executions prevent chunked/streaming ingestion and require ~100 lines of array type conversion workarounds.
## Requirements
- Hundreds of applications, thousands of routes, billions of records
- Statistics at four levels: all, application, route, processor
- Arbitrary time bucket sizes for statistics queries
- Full-text wildcard (infix) search across all fields
- P99 response time < 2000ms
- Support chunked/streaming execution ingestion (partial updates for long-running routes)
- Idempotent inserts (deduplication on execution_id)
- All software must be free (Apache 2.0, MIT, BSD, PostgreSQL License)
- Deployment target: Kubernetes (k3s)
- Data expired per day by dropping partitions/indexes — no row-level deletes
## Architecture
Two backends:
1.**PostgreSQL + TimescaleDB** — source of truth for all data, analytics via continuous aggregates
2.**OpenSearch** — asynchronous search index for full-text and wildcard queries
OpenSearch is a derived index, not a source of truth. If it goes down, writes and detail views continue via PostgreSQL. If an index corrupts, it is rebuilt from PostgreSQL.
| created_at | TIMESTAMPTZ | NOT NULL DEFAULT now() |
| updated_at | TIMESTAMPTZ | NOT NULL DEFAULT now() |
- Hypertable chunk interval: 1 day
- Primary key: `(execution_id, start_time)` — TimescaleDB requires the partition column in unique constraints
-`ON CONFLICT (execution_id, start_time) DO UPDATE` for dedup and status progression (RUNNING -> COMPLETED/FAILED guard: only update if new status supersedes old)
-`ON CONFLICT (execution_id, processor_id, start_time) DO UPDATE` for re-sent processors
-`group_name` and `route_id` denormalized from the parent execution (immutable per execution, set at ingestion) to enable JOIN-free continuous aggregates
-`timescaledb_toolkit` — `percentile_agg()` and `approx_percentile()` for P99 calculations
The TimescaleDB Docker image (`timescale/timescaledb`) includes both extensions. The Flyway V1 migration must `CREATE EXTENSION IF NOT EXISTS` for both before creating hypertables.
## Continuous Aggregates (Statistics)
Four continuous aggregates at 1-minute resolution, one per aggregation level.
### `stats_1m_all` — global
```sql
CREATE MATERIALIZED VIEW stats_1m_all
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 minute', start_time) AS bucket,
COUNT(*) AS total_count,
COUNT(*) FILTER (WHERE status = 'FAILED') AS failed_count,
COUNT(*) FILTER (WHERE status = 'RUNNING') AS running_count,
SUM(duration_ms) AS duration_sum,
MAX(duration_ms) AS duration_max,
approx_percentile(0.99, percentile_agg(duration_ms::DOUBLE PRECISION)) AS p99_duration
FROM executions
WHERE status IS NOT NULL
GROUP BY bucket;
```
### `stats_1m_app` — per application
Group by: `bucket, group_name`
### `stats_1m_route` — per route
Group by: `bucket, group_name, route_id`
### `stats_1m_processor` — per processor type within a route
```sql
-- No JOIN needed: group_name and route_id are denormalized on processor_executions
CREATE MATERIALIZED VIEW stats_1m_processor
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 minute', start_time) AS bucket,
group_name,
route_id,
processor_type,
COUNT(*) AS total_count,
COUNT(*) FILTER (WHERE status = 'FAILED') AS failed_count,
SUM(duration_ms) AS duration_sum,
MAX(duration_ms) AS duration_max,
approx_percentile(0.99, percentile_agg(duration_ms::DOUBLE PRECISION)) AS p99_duration
FROM processor_executions
GROUP BY bucket, group_name, route_id, processor_type;
```
Note: TimescaleDB continuous aggregates only support single-hypertable queries (no JOINs). This is why `group_name` and `route_id` are denormalized onto `processor_executions`.
### Query pattern for arbitrary buckets
```sql
SELECT time_bucket('30 minutes', bucket) AS period,
SUM(total_count) AS total_count,
SUM(failed_count) AS failed_count,
SUM(duration_sum) / NULLIF(SUM(total_count), 0) AS avg_duration
FROM stats_1m_route
WHERE route_id = ? AND bucket >= now() - interval '24 hours'
Metrics ingestion retains the write buffer pattern:
```
MetricsController (HTTP POST)
|-- WriteBuffer<MetricsSnapshot>.offer(batch)
MetricsFlushScheduler (@Scheduled)
|-- drain(batchSize)
'-- MetricsStore.insertBatch(batch)
```
### What gets deleted
-`ClickHouseExecutionRepository` — replaced by `PostgresExecutionStore`
-`ClickHouseSearchEngine` — replaced by `OpenSearchIndex`
-`ClickHouseFlushScheduler` — simplified, only retained for metrics
-`ClickHouseDiagramRepository` — replaced by `PostgresDiagramStore`
-`ClickHouseUserRepository` — replaced by `PostgresUserStore`
-`ClickHouseOidcConfigRepository` — replaced by `PostgresOidcConfigStore`
-`ClickHouseMetricsRepository` — replaced by `PostgresMetricsStore`
-`ClickHouseSchemaInitializer` — replaced by Flyway
- All `clickhouse/*.sql` migration files — replaced by Flyway migrations
- Array type conversion helpers, `FINAL` workarounds, `ifNotFinite()` guards
## Error Handling and Resilience
### PostgreSQL (source of truth)
- **Write failure**: Return 503 to agent with `Retry-After` header.
- **Connection pool exhaustion**: HikariCP handles queueing. Sustained exhaustion triggers backpressure via 503.
- **Schema migrations**: Flyway with versioned migrations. Validates on startup.
### OpenSearch (search index)
- **Unavailable at write time**: Events accumulate in bounded in-memory queue. If queue fills, new index events are dropped silently (log warning). No data loss — PostgreSQL has the data.
- **Unavailable at search time**: Search endpoint returns 503. Stats and detail endpoints still work via PostgreSQL.
- **Index corruption/drift**: Rebuild via admin API endpoint (`POST /api/v1/admin/search/rebuild?from=&to=`) that re-indexes from PostgreSQL, scoped by time range. Processes in background, returns job status.
- **Document staleness**: Debouncer provides eventual consistency with 1-2 second typical lag.
### Partial execution handling
1. Execution arrives with `status=RUNNING` -> `INSERT ... ON CONFLICT (execution_id, start_time) DO UPDATE` with status progression guard (only update if new status supersedes old: RUNNING < COMPLETED/FAILED)
2. Processors arrive in chunks -> `INSERT ... ON CONFLICT (execution_id, processor_id, start_time) DO UPDATE` per processor
3. Completion signal -> same upsert as step 1, with `status='COMPLETED'`, `duration_ms`, `end_time` — the progression guard allows this update
4. Each mutation publishes `ExecutionUpdatedEvent` -> debounced OpenSearch re-index
5. Timeout: configurable threshold marks executions stuck in RUNNING as STALE
### Data lifecycle
- All time-series data partitioned by day
- Expiry by dropping entire daily partitions/indexes — no row-level deletes
- PostgreSQL: `SELECT drop_chunks(older_than => INTERVAL 'N days')` on each hypertable
- OpenSearch: ILM delete action on daily indexes
- Retention periods (configurable):
- Raw data (executions, processor_executions, metrics): default 30 days
- 1-minute continuous aggregates: default 90 days
- Users, diagrams, OIDC config: no expiry
## Testing Strategy
### Unit tests
- Store interface tests using Testcontainers `PostgreSQLContainer` with TimescaleDB image
- SearchIndex tests using Testcontainers `OpensearchContainer`
- Dedup: insert same execution_id twice, verify single row
- Chunked arrival: insert execution, then processors in separate calls, verify consolidated state
- Upsert: update execution status from RUNNING to COMPLETED, verify single row with updated fields
### Integration tests
- Full ingestion flow: POST execution -> verify PostgreSQL row -> verify OpenSearch document
- Partial execution: POST RUNNING -> POST processors -> POST COMPLETED -> verify state transitions and OpenSearch updates
- Search -> detail roundtrip: index document, search by text, fetch detail by returned execution_id
-`CAMELEER_RETENTION_DAYS=30` (applies to both PostgreSQL and OpenSearch)
-`CAMELEER_BODY_SIZE_LIMIT=16384` (configurable body size limit in bytes)
-`CAMELEER_OPENSEARCH_QUEUE_SIZE=10000` (bounded in-memory queue for async indexing)
### Health checks
- PostgreSQL: Spring Boot actuator datasource health
- OpenSearch: cluster health endpoint
- Combined health at `/api/v1/health`
## Migration Strategy
This is a clean cutover, not a live migration. No data migration from ClickHouse to PostgreSQL/OpenSearch is planned. Existing ClickHouse data will be abandoned (it has a 30-day TTL anyway). The refactor is implemented on a separate git branch and deployed as a replacement.