Verify processor stats dedup after deployment #128
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Commit
e2f784bfixes inflated processor execution counts caused by duplicate inserts into the plainMergeTreeprocessor_executionstable. The fix replacescount()withuniq(execution_id)in bothstats_1m_processorandstats_1m_processor_detailmaterialized views.What to verify after deployment
Counts are correct: In the dashboard process diagram tooltip, processors earlier in a route should have counts >= processors later in the same route. Check the
try-catch-testroute inquarkus-native-appspecifically —process5should have count >=log12.Backfill ran successfully: The init.sql drops and recreates both processor stats tables on startup, backfilling from
processor_executions. Check server logs forClickHouse schema initialization completewithout errors.Startup time: The backfill scans the full
processor_executionstable (365-day TTL). Monitor startup duration — if it becomes a problem with large datasets, consider making the migration one-time (check column type before dropping).Duration metrics still accurate:
avg_duration_msnow dividessumMerge(duration_sum)byuniqMerge(total_count). With duplicates,duration_sumis still inflated (sum of all rows including dupes) while the denominator is now deduplicated. This means avg duration may be slightly higher than reality for historically duplicated data. New data going forward is correct.Verification query
raw_uniqueandmv_totalshould match.Follow-up consideration
The
duration_sumaggregate (sumState) still double-counts duplicate inserts. A full fix would require deduplicatingprocessor_executionsitself (e.g.,ReplacingMergeTreekeyed onexecution_id + seq) or changingduration_sumto also use a dedup-aware aggregate. Low priority since avg/p99 are approximately correct and new data is clean.