Applications, routes within each app, and agents within each app
are now sorted by name using localeCompare.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backend identity rename (applicationName → applicationId,
agentId → instanceId) was not reflected in the frontend. This caused
drilldown to fail (detail.applicationName was undefined, disabling
the diagram fetch) and various display issues.
Updated schema.d.ts, ExchangeHeader, ExecutionDiagram, Dashboard,
AgentHealth, AgentInstance, LayoutShell, LogTab, InfoTab, DetailPanel,
ExchangesPage, and tracing-store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tracks authenticated UI user requests to understand usage patterns:
- New ClickHouse usage_events table with 90-day TTL
- UsageTrackingInterceptor captures method, path, duration, user
- Path normalization groups dynamic segments ({id}, {hash})
- Buffered writes via WriteBuffer + periodic flush
- Admin endpoint GET /api/v1/admin/usage with groupBy=endpoint|user|hour
- Skips agent requests, health checks, and data ingestion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Path containers (EIP_WHEN, EIP_OTHERWISE, etc.) don't have their own
processor records, so they never get an overlay entry. Now inferred
from descendants: green if any descendant executed, red if any failed.
Gated (amber) only when no descendants executed at all.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A container is only gated (amber) when filterMatched=false or
duplicateMessage=true AND no descendants were executed. Containers
with executed children (split, choice, idempotent that passed) now
correctly show green/red based on their execution status.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CompoundNode now uses execution overlay status to color its header:
failed (red) > completed (green) > default. Previously only used
static type-based color regardless of execution state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace synthetic wrapper node approach with direct iteration fields:
- ProcessorNode gains iteration (child's index) and iterationSize
(container's total) fields, populated from ClickHouse flat records
- Frontend hooks detect iteration containers from iterationSize != null
instead of scanning for wrapper processorTypes
- useExecutionOverlay filters children by iteration field instead of
wrapper nodes, eliminating ITERATION_WRAPPER_TYPES entirely
- Cleaner data contract: API returns exactly what the DB stores
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteCatalogController, RouteMetricsController, and AgentRegistrationController
had unqualified JdbcTemplate injection, receiving the PostgreSQL template
instead of ClickHouse. The stats queries silently failed (caught exception)
returning 0 counts. Added @Qualifier("clickHouseJdbcTemplate") to all three.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sets TZ=UTC and -Duser.timezone=UTC to guarantee all JVM time operations
use UTC regardless of the container's base image or host configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Timestamp.toString() uses JVM local timezone which can mismatch with
ClickHouse's UTC timezone, causing time-filtered queries to return empty
results. Replaced with DateTimeFormatter.withZone(UTC) in all lit() methods.
Also added warn logging to RouteCatalogController catch blocks to surface
query errors instead of silently swallowing them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse can't rename columns that are part of ORDER BY keys.
Updated V1-V8 DDL files directly with new column names (instance_id,
application_id) and removed V9 migration. Wipe ClickHouse and restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align all internal naming with the agent team's protocol v2 identity rename:
- agentId → instanceId (unique per-JVM identifier)
- applicationName → applicationId (shared app identifier)
- AgentInfo: id → instanceId, name → displayName, application → applicationId
Add SHUTDOWN lifecycle state for graceful agent shutdowns:
- New POST /data/events endpoint receives agent lifecycle events
- AGENT_STOPPED event transitions agent to SHUTDOWN (skips STALE/DEAD)
- New POST /{id}/deregister endpoint removes agent from registry
- Server now distinguishes graceful shutdown from crash (heartbeat timeout)
Includes ClickHouse V9 and PostgreSQL V14 migrations for column renames.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ElkDiagramRenderer.getElkRoot(): add null guard to prevent NPE
when node is null (SQ java:S2259)
- WriteBuffer: add offerOrWarn() that logs when buffer is full instead
of silently dropping data. ChunkAccumulator now uses this method
so ingestion backpressure is visible in logs (SQ java:S899)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChunkAccumulator now injects DiagramStore and looks up the content hash
when converting to MergedExecution. Without this, the detail page had
no diagram hash, so the overlay couldn't find the route diagram.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteCatalogController, RouteMetricsController, AgentRegistrationController
all had inline SQL using SUM() on AggregateFunction columns from stats_1m_*
AggregatingMergeTree tables. Replace with countMerge/countIfMerge/sumMerge.
Also fix time_bucket() → toStartOfInterval() and ::double → toFloat64().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES=false (required
by PROTOCOL.md) and explicit TypeReference<List<ExecutionChunk>> for
array parsing. Without this, batched chunks from ChunkedExporter
(2+ chunks in a JSON array) were silently rejected, causing final:true
chunks to be lost and all exchanges to go stale.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouseExecutionStore implements ExecutionStore, so the concrete bean
already satisfies the interface — remove redundant wrapper bean. Align
ChunkAccumulator and ExecutionFlushScheduler conditions to
cameleer.storage.executions flag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dual-mode buildTree: detects seq presence and uses seq/parentSeq linkage
instead of processorId map. Handles duplicate processorIds across
iterations correctly. Old processorId-based mode kept for PG compat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements ExecutionStore interface with findById (FINAL for
ReplacingMergeTree), findProcessors (ORDER BY seq), findProcessorById,
and findProcessorBySeq. Write methods unchanged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract LogIndex interface from OpenSearchLogIndex. Both ClickHouseLogStore
and OpenSearchLogIndex implement it. Controllers now inject LogIndex.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace partial memory config with full Altinity low-memory guide
settings. Revert container limit from 6Gi back to 4Gi — proper
tuning (mlock=false, reduced caches/pools/threads, disk spill for
aggregations) makes the original budget sufficient.
Switch all storage feature flags to ClickHouse:
- CAMELEER_STORAGE_SEARCH: opensearch → clickhouse
- CAMELEER_STORAGE_METRICS: postgres → clickhouse
- CAMELEER_STORAGE_STATS: already clickhouse
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkIngestionController: /data/chunks → /data/executions (matches
PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cameleer3-common removed children, loopIndex, splitIndex,
multicastIndex from ProcessorExecution (flat model only now).
Iteration context lives on synthetic wrapper nodes via processorType.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse 24.12 auto-sizes caches from the cgroup limit, leaving
insufficient headroom for MV processing and background merges.
Adds a custom config that shrinks mark/index/expression caches and
caps per-query memory at 2 GiB. Bumps container limit 4Gi → 6Gi.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The V4 DDL had a semicolon inside a comment which caused the
split-on-semicolon logic to produce a comment-only segment that
ClickHouse rejected as empty query. Fixed the comment and made
the initializer strip comment-only segments before execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements StatsStore interface for ClickHouse using AggregatingMergeTree
tables with -Merge combinators (countMerge, countIfMerge, sumMerge,
quantileMerge). Uses literal SQL for aggregate table queries to avoid
ClickHouse JDBC driver PreparedStatement issues with AggregateFunction
columns. Raw table queries (SLA, topErrors, activeErrorTypes) use normal
prepared statements.
Includes 13 integration tests covering stats, timeseries, grouped
timeseries, SLA compliance, SLA counts by app/route, top errors, active
error types, punchcard, and processor stats. Also fixes AggregateFunction
type signatures in V4 DDL (count() takes no args, countIf takes UInt8).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The jdbcTemplate() method was calling dataSource(properties) directly,
creating a new DataSource instance instead of using the Spring-managed
@Primary bean. This caused some repositories to receive the ClickHouse
connection instead of PostgreSQL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DDL for executions (ReplacingMergeTree) and processor_executions (MergeTree with seq/parentSeq/iteration)
- ClickHouseExecutionStore with batch INSERT for both tables
- ChunkAccumulator: buffers exchange envelope across chunks, inserts processors immediately, writes execution on final chunk
- ExecutionFlushScheduler drains WriteBuffers to ClickHouse
- ChunkIngestionController: POST /api/v1/data/chunks endpoint
- ClickHouseSearchIndex: ngram-accelerated SQL search implementing SearchIndex interface
- Feature flags: cameleer.storage.search=opensearch|clickhouse
- Uses cameleer3-common ExecutionChunk and FlatProcessorRecord models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ExecutionFlushScheduler drains MergedExecution and ProcessorBatch write
buffers on a fixed interval and delegates batch inserts to
ClickHouseExecutionStore. Also sweeps stale exchanges every 60s.
ChunkIngestionController exposes POST /api/v1/data/chunks, accepts
single or array ExecutionChunk payloads, and feeds them into the
ChunkAccumulator. Conditional on ChunkAccumulator bean (clickhouse.enabled).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>