ClickHouse scaling plan for 1,000-10,000 tx/s production load #122
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
At 3 tx/s testbed load, ClickHouse CPU was reduced from 407m to 130m by batching processor and log inserts (commit
633a61d). However, production loads are expected at 1,000-10,000 tx/s (~333-3,333x current), which will require further optimization.Current State (3 tx/s baseline)
quantileState(0.99)on every insertScaling Levers (in priority order)
1. Remove
quantileStatefrom MVsp99_durationcolumn from all 5 stats MV target tablesquantile(0.99)(duration_ms)over raw data2. Consolidate 5 MVs into 2
stats_1m_all+stats_1m_app+stats_1m_routeinto one MV with all dimensions (tenant, app, route). Query-time GROUP BY filters dimensions.stats_1m_processor+stats_1m_processor_detailsimilarly3. Switch to native TCP protocol
jdbc:clickhouse://host:8123tojdbc:ch://host:9000async_insertas safety net (native protocol separates query from data)clickhouse-jdbc:0.9.7:all)4. Increase flush interval at scale
buffer-capacitybeyond 50,0005. Consider ClickHouse Buffer engine
Bufferengine sits in front of MergeTree and auto-flushesDiagnostics
Useful queries for measuring impact at each step:
Acceptance Criteria