Commit Graph

1004 Commits

Author SHA1 Message Date
hsiegeln
9057981cf7 fix: use composite ID for routes in command palette search data
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 55s
Routes with the same name across different applications (e.g., "route1"
in both QUARKUS-APP and BACKEND-APP) were deduplicated because they
shared the same id (routeId). Use appId/routeId as the id so all
routes appear in cmd-k results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:33:23 +02:00
hsiegeln
b30a5b5760 fix: prevent cmd-k scroll reset on catalog poll refresh
All checks were successful
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 2m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m0s
The searchData useMemo recomputed on every catalog poll cycle because
catalogData got a new array reference even when content was unchanged.
This caused the CommandPalette list to re-render and reset scroll.

Use a ref with deep equality check to keep a stable catalog reference,
only updating when the actual data changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:22:50 +02:00
hsiegeln
910230cbf8 fix: add <mark> highlighting to search match context snippets
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m14s
CI / docker (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The command palette renders matchContext via dangerouslySetInnerHTML
expecting HTML with <mark> tags, but extractSnippet() returned plain
text. Wrap the matched term in <mark> tags and escape surrounding
text to prevent XSS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:18:04 +02:00
hsiegeln
1d791bb329 fix: use exact match for ID fields in full-text search
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
ID fields (execution_id, correlation_id, exchange_id) should use
exact equality, not LIKE with wildcards. LIKE is only needed for
the _search_text full-text columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:13:54 +02:00
hsiegeln
9781fe0d7c fix: include execution/correlation/exchange IDs in full-text search
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
The _search_text materialized column only contained error messages,
bodies, and headers — not execution_id, correlation_id, exchange_id,
or route_id. Searching by ID via cmd-k returned no results.

- Add ID fields to _search_text in ClickHouse DDL (covered by ngram
  bloom filter index)
- Add direct LIKE matches on execution_id, correlation_id, exchange_id
  in the text search WHERE clause for faster exact ID lookups

Requires ClickHouse table recreation (fresh install).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:12:15 +02:00
hsiegeln
92951f1dcf chore: update @cameleer/design-system to v0.1.22
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m27s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Sidebar selectedPath now uses sidebarReveal on all tabs, not just
exchanges. This fixes sidebar highlighting on dashboard and runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:09:20 +02:00
hsiegeln
a7d256b38a fix: compute hasTraceData from processor records in chunk accumulator
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The chunked ingestion path hardcoded hasTraceData=false because the
execution envelope doesn't carry processor bodies. But the processor
records DO have inputBody/outputBody — we just need to check them.

Track hasTraceData across chunks in PendingExchange and pass it to
MergedExecution when the final chunk arrives or on stale sweep.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:04:34 +02:00
hsiegeln
e26266532a fix: regenerate OpenAPI types, fix search scoping by applicationId
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
The identity rename (application→applicationId) broke search filtering
because the stale schema.d.ts still had 'application' as the field name.
The backend silently ignored the unknown field, returning unfiltered results.

- Regenerate openapi.json and schema.d.ts from live backend
- Fix Dashboard: application→applicationId in search request
- Fix RouteDetail: application→applicationId in search request (2 places)
- LayoutShell: scope command palette search by appId/routeId
- LayoutShell: pass sidebarReveal state on sidebar click navigation

Note for DS team: the Sidebar selectedPath logic (line 5451 in dist)
has a hardcoded pathname.startsWith("/exchanges/") guard. This should
be broadened to simply `S ? S : $.pathname` so sidebarReveal works on
all tabs (dashboard, runtime), not just exchanges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:55:19 +02:00
hsiegeln
178bc40706 Revert "fix: sidebar selection highlight and scoped command palette search"
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
This reverts commit 4168a6d45b.
2026-04-01 20:43:27 +02:00
hsiegeln
4168a6d45b fix: sidebar selection highlight and scoped command palette search
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Two fixes:
- Pass sidebarReveal state on sidebar navigation so the design system
  can highlight the selected entry (it compares internal /apps/... paths
  against this state value, not the browser URL)
- Command palette search now includes scope.appId and scope.routeId
  so results are filtered to the current sidebar selection

Note: sidebar highlighting works on the exchanges tab. The design
system's selectedPath logic only checks pathname.startsWith("/exchanges/")
for sidebarReveal — a DS update is needed to support /dashboard/ and
/runtime/ tabs too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:41:42 +02:00
hsiegeln
a028905e41 fix: update agent field names in frontend to match backend DTO
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
The AgentInstanceResponse backend DTO uses instanceId, displayName,
applicationId, status — but the stale schema.d.ts still had id, name,
application, state. This caused the runtime table to show no data.

- Update schema.d.ts AgentInstanceResponse fields
- Fix AgentHealth: row.id→instanceId, row.name→displayName,
  row.application→applicationId, inst.id→instanceId
- Fix AgentInstance: agent.id→instanceId, agent.name→displayName
- Fix ExchangeHeader: agent.id→instanceId, agent.state→status
- Fix LayoutShell search: agent.state→status, agentTps→tps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:36:31 +02:00
hsiegeln
f82aa26371 fix: improve ClickHouse admin page, fix AgentHealth type error
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 3m46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 58s
Rewrite ClickHouse admin to show useful storage metrics instead of
often-empty system.events data. Add active queries section.

- Replace performance endpoint: query system.parts for disk size,
  uncompressed size, compression ratio, total rows, part count
- Add /queries endpoint querying system.processes for active queries
- Frontend: storage overview strip, tables with total size, active
  queries DataTable
- Fix AgentHealth.tsx type: agentId → instanceId in inline type cast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:18:06 +02:00
hsiegeln
188810e54b feat: remove TimescaleDB, dead PG stores, and storage feature flags
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.

- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
  events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
  only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
  (docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:10:58 +02:00
hsiegeln
283e38a20d feat: remove OpenSearch, add ClickHouse admin page
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 33s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Remove all OpenSearch code, dependencies, configuration, deployment
manifests, and CI/CD references. Replace the OpenSearch admin page
with a ClickHouse admin page showing cluster status, table sizes,
performance metrics, and indexer pipeline stats.

- Delete 11 OpenSearch Java files (config, search impl, admin controller, DTOs, tests)
- Delete 3 OpenSearch frontend files (admin page, CSS, query hooks)
- Delete deploy/opensearch.yaml K8s manifest
- Remove opensearch Maven dependencies from pom.xml
- Remove opensearch config from application.yml, Dockerfile, docker-compose
- Remove opensearch from CI workflow (secrets, deploy, cleanup steps)
- Simplify ThresholdConfig (remove OpenSearch thresholds, database-only)
- Change default search backend from opensearch to clickhouse
- Add ClickHouseAdminController with /status, /tables, /performance, /pipeline
- Add ClickHouseAdminPage with StatCards, pipeline ProgressBar, tables DataTable
- Update CLAUDE.md, HOWTO.md, and source comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:56:06 +02:00
hsiegeln
5ed7d38bf7 fix: sort sidebar entries alphanumerically
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 29s
CI / docker (push) Has been skipped
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been skipped
Applications, routes within each app, and agents within each app
are now sorted by name using localeCompare.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:24:39 +02:00
hsiegeln
4cdbcdaeea fix: update frontend field names for identity rename (applicationId, instanceId)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
The backend identity rename (applicationName → applicationId,
agentId → instanceId) was not reflected in the frontend. This caused
drilldown to fail (detail.applicationName was undefined, disabling
the diagram fetch) and various display issues.

Updated schema.d.ts, ExchangeHeader, ExecutionDiagram, Dashboard,
AgentHealth, AgentInstance, LayoutShell, LogTab, InfoTab, DetailPanel,
ExchangesPage, and tracing-store.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:22:16 +02:00
hsiegeln
aa2d203f4e feat: add UI usage analytics tracking
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 1m14s
CI / deploy (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
Tracks authenticated UI user requests to understand usage patterns:
- New ClickHouse usage_events table with 90-day TTL
- UsageTrackingInterceptor captures method, path, duration, user
- Path normalization groups dynamic segments ({id}, {hash})
- Buffered writes via WriteBuffer + periodic flush
- Admin endpoint GET /api/v1/admin/usage with groupBy=endpoint|user|hour
- Skips agent requests, health checks, and data ingestion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:53:32 +02:00
hsiegeln
ce4abaf862 fix: infer compound node color from descendants when no own overlay state
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
Path containers (EIP_WHEN, EIP_OTHERWISE, etc.) don't have their own
processor records, so they never get an overlay entry. Now inferred
from descendants: green if any descendant executed, red if any failed.
Gated (amber) only when no descendants executed at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:37:47 +02:00
hsiegeln
40ce4a57b4 fix: only show amber on containers where gate blocked all children
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
A container is only gated (amber) when filterMatched=false or
duplicateMessage=true AND no descendants were executed. Containers
with executed children (split, choice, idempotent that passed) now
correctly show green/red based on their execution status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:32:39 +02:00
hsiegeln
b44ffd08be fix: color compound nodes by execution status in overlay
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 51s
CompoundNode now uses execution overlay status to color its header:
failed (red) > completed (green) > default. Previously only used
static type-based color regardless of execution state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:20:59 +02:00
hsiegeln
cf439248b5 feat: expose iteration/iterationSize fields for diagram overlay
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 52s
Replace synthetic wrapper node approach with direct iteration fields:
- ProcessorNode gains iteration (child's index) and iterationSize
  (container's total) fields, populated from ClickHouse flat records
- Frontend hooks detect iteration containers from iterationSize != null
  instead of scanning for wrapper processorTypes
- useExecutionOverlay filters children by iteration field instead of
  wrapper nodes, eliminating ITERATION_WRAPPER_TYPES entirely
- Cleaner data contract: API returns exactly what the DB stores

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:14:36 +02:00
hsiegeln
e8f9ada1d1 fix: inject ClickHouse JdbcTemplate into stats-querying controllers
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 49s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
RouteCatalogController, RouteMetricsController, and AgentRegistrationController
had unqualified JdbcTemplate injection, receiving the PostgreSQL template
instead of ClickHouse. The stats queries silently failed (caught exception)
returning 0 counts. Added @Qualifier("clickHouseJdbcTemplate") to all three.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:34:56 +02:00
hsiegeln
bc70797e31 fix: force UTC timezone in Docker runtime
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 47s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Sets TZ=UTC and -Duser.timezone=UTC to guarantee all JVM time operations
use UTC regardless of the container's base image or host configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:24:23 +02:00
hsiegeln
f6123b8a7c fix: use explicit UTC formatting in ClickHouse DateTime literals
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 50s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 57s
Timestamp.toString() uses JVM local timezone which can mismatch with
ClickHouse's UTC timezone, causing time-filtered queries to return empty
results. Replaced with DateTimeFormatter.withZone(UTC) in all lit() methods.

Also added warn logging to RouteCatalogController catch blocks to surface
query errors instead of silently swallowing them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:13:52 +02:00
hsiegeln
d739094a56 fix: update ClickHouse DDL files with new column names instead of ALTER RENAME
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
ClickHouse can't rename columns that are part of ORDER BY keys.
Updated V1-V8 DDL files directly with new column names (instance_id,
application_id) and removed V9 migration. Wipe ClickHouse and restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:40:54 +02:00
hsiegeln
91400defe9 fix: add missing V9 (ClickHouse) and V14 (PostgreSQL) identity column rename migrations
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
Migration files were lost during worktree merge — recreated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:33:02 +02:00
hsiegeln
909d713837 feat: rename agent identity fields for protocol v2 + add SHUTDOWN lifecycle state
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 22s
Align all internal naming with the agent team's protocol v2 identity rename:
- agentId → instanceId (unique per-JVM identifier)
- applicationName → applicationId (shared app identifier)
- AgentInfo: id → instanceId, name → displayName, application → applicationId

Add SHUTDOWN lifecycle state for graceful agent shutdowns:
- New POST /data/events endpoint receives agent lifecycle events
- AGENT_STOPPED event transitions agent to SHUTDOWN (skips STALE/DEAD)
- New POST /{id}/deregister endpoint removes agent from registry
- Server now distinguishes graceful shutdown from crash (heartbeat timeout)

Includes ClickHouse V9 and PostgreSQL V14 migrations for column renames.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:22:42 +02:00
hsiegeln
ad8dd73596 fix: update ChunkAccumulator tests for DiagramStore constructor param
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 52s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:58:27 +02:00
hsiegeln
e50c9fa60d fix: address SonarQube reliability issues
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 39s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
- ElkDiagramRenderer.getElkRoot(): add null guard to prevent NPE
  when node is null (SQ java:S2259)
- WriteBuffer: add offerOrWarn() that logs when buffer is full instead
  of silently dropping data. ChunkAccumulator now uses this method
  so ingestion backpressure is visible in logs (SQ java:S899)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:55:31 +02:00
hsiegeln
d4dbfa7ae6 fix: populate diagramContentHash in chunked ingestion pipeline
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 43s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
ChunkAccumulator now injects DiagramStore and looks up the content hash
when converting to MergedExecution. Without this, the detail page had
no diagram hash, so the overlay couldn't find the route diagram.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:50:34 +02:00
hsiegeln
59374482bc fix: replace PostgreSQL aggregate functions with ClickHouse -Merge combinators
RouteCatalogController, RouteMetricsController, AgentRegistrationController
all had inline SQL using SUM() on AggregateFunction columns from stats_1m_*
AggregatingMergeTree tables. Replace with countMerge/countIfMerge/sumMerge.
Also fix time_bucket() → toStartOfInterval() and ::double → toFloat64().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:49:06 +02:00
hsiegeln
43e187a023 fix: ChunkIngestionController ObjectMapper missing FAIL_ON_UNKNOWN_PROPERTIES
Adds DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES=false (required
by PROTOCOL.md) and explicit TypeReference<List<ExecutionChunk>> for
array parsing. Without this, batched chunks from ChunkedExporter
(2+ chunks in a JSON array) were silently rejected, causing final:true
chunks to be lost and all exchanges to go stale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:45:12 +02:00
hsiegeln
bc1c71277c fix: resolve duplicate ExecutionStore bean conflict
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 49s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 55s
ClickHouseExecutionStore implements ExecutionStore, so the concrete bean
already satisfies the interface — remove redundant wrapper bean. Align
ChunkAccumulator and ExecutionFlushScheduler conditions to
cameleer.storage.executions flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 09:44:02 +02:00
hsiegeln
520181d241 test(clickhouse): add integration tests for execution read path and tree reconstruction
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m16s
SonarQube / sonarqube (push) Failing after 2m21s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:11:44 +02:00
hsiegeln
95b9dea5c4 feat(clickhouse): wire ClickHouseExecutionStore as active ExecutionStore
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:09:14 +02:00
hsiegeln
151b96a680 feat: seq-based tree reconstruction for ClickHouse flat processor model
Dual-mode buildTree: detects seq presence and uses seq/parentSeq linkage
instead of processorId map. Handles duplicate processorIds across
iterations correctly. Old processorId-based mode kept for PG compat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:07:20 +02:00
hsiegeln
0661fd995f feat(clickhouse): add read methods to ClickHouseExecutionStore
Implements ExecutionStore interface with findById (FINAL for
ReplacingMergeTree), findProcessors (ORDER BY seq), findProcessorById,
and findProcessorBySeq. Write methods unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:04:03 +02:00
hsiegeln
190ae2797d refactor: extend ProcessorRecord with seq/iteration fields for ClickHouse model
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:02:03 +02:00
hsiegeln
968117c41a feat(clickhouse): wire Phase 4 stores with feature flags
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:44:10 +02:00
hsiegeln
7d7eb52afb feat(clickhouse): add ClickHouseLogStore with LogIndex interface
Extract LogIndex interface from OpenSearchLogIndex. Both ClickHouseLogStore
and OpenSearchLogIndex implement it. Controllers now inject LogIndex.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:42:07 +02:00
hsiegeln
c73e4abf68 feat(clickhouse): add ClickHouseAgentEventRepository with integration tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:37:51 +02:00
hsiegeln
cd63d300b3 feat(clickhouse): add ClickHouseDiagramStore with integration tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:35:32 +02:00
hsiegeln
f7daadaaa9 feat(clickhouse): add DDL for route_diagrams, agent_events, and logs tables
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:30:38 +02:00
hsiegeln
af080337f5 feat: comprehensive ClickHouse low-memory tuning and switch all storage to ClickHouse
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 58s
Replace partial memory config with full Altinity low-memory guide
settings. Revert container limit from 6Gi back to 4Gi — proper
tuning (mlock=false, reduced caches/pools/threads, disk spill for
aggregations) makes the original budget sufficient.

Switch all storage feature flags to ClickHouse:
- CAMELEER_STORAGE_SEARCH: opensearch → clickhouse
- CAMELEER_STORAGE_METRICS: postgres → clickhouse
- CAMELEER_STORAGE_STATS: already clickhouse

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:27:10 +02:00
hsiegeln
606f81a970 fix: align server with protocol v2 chunked transport spec
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m45s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 46s
- ChunkIngestionController: /data/chunks → /data/executions (matches
  PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
  avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
  envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:18:35 +02:00
hsiegeln
154bce366a fix: remove references to deleted ProcessorExecution tree fields
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m15s
CI / docker (push) Successful in 44s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m0s
cameleer3-common removed children, loopIndex, splitIndex,
multicastIndex from ProcessorExecution (flat model only now).
Iteration context lives on synthetic wrapper nodes via processorType.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:00:11 +02:00
hsiegeln
a669df08bd fix(clickhouse): tune memory settings to prevent OOM on insert
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 40s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
ClickHouse 24.12 auto-sizes caches from the cgroup limit, leaving
insufficient headroom for MV processing and background merges.
Adds a custom config that shrinks mark/index/expression caches and
caps per-query memory at 2 GiB. Bumps container limit 4Gi → 6Gi.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:54:43 +02:00
hsiegeln
af18fc4142 Merge branch 'worktree-clickhouse-phase2'
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
2026-03-31 22:06:35 +02:00
hsiegeln
1a00eed389 fix: schema initializer skips comment-only SQL segments
The V4 DDL had a semicolon inside a comment which caused the
split-on-semicolon logic to produce a comment-only segment that
ClickHouse rejected as empty query. Fixed the comment and made
the initializer strip comment-only segments before execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:06:31 +02:00
hsiegeln
0423518f72 feat: ClickHouse Phase 3 — Stats & Analytics (materialized views)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
- DDL for 5 AggregatingMergeTree tables + 5 materialized views
- ClickHouseStatsStore: all 15 StatsStore methods using -Merge combinators
- Stats/timeseries read from pre-aggregated MVs (countMerge, sumMerge, quantileMerge)
- SLA/topErrors/punchcard query raw executions FINAL table
- Feature flag: cameleer.storage.stats (default: clickhouse)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:52:13 +02:00