ChunkAccumulator now injects DiagramStore and looks up the content hash
when converting to MergedExecution. Without this, the detail page had
no diagram hash, so the overlay couldn't find the route diagram.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteCatalogController, RouteMetricsController, AgentRegistrationController
all had inline SQL using SUM() on AggregateFunction columns from stats_1m_*
AggregatingMergeTree tables. Replace with countMerge/countIfMerge/sumMerge.
Also fix time_bucket() → toStartOfInterval() and ::double → toFloat64().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES=false (required
by PROTOCOL.md) and explicit TypeReference<List<ExecutionChunk>> for
array parsing. Without this, batched chunks from ChunkedExporter
(2+ chunks in a JSON array) were silently rejected, causing final:true
chunks to be lost and all exchanges to go stale.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouseExecutionStore implements ExecutionStore, so the concrete bean
already satisfies the interface — remove redundant wrapper bean. Align
ChunkAccumulator and ExecutionFlushScheduler conditions to
cameleer.storage.executions flag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements ExecutionStore interface with findById (FINAL for
ReplacingMergeTree), findProcessors (ORDER BY seq), findProcessorById,
and findProcessorBySeq. Write methods unchanged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract LogIndex interface from OpenSearchLogIndex. Both ClickHouseLogStore
and OpenSearchLogIndex implement it. Controllers now inject LogIndex.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkIngestionController: /data/chunks → /data/executions (matches
PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The V4 DDL had a semicolon inside a comment which caused the
split-on-semicolon logic to produce a comment-only segment that
ClickHouse rejected as empty query. Fixed the comment and made
the initializer strip comment-only segments before execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements StatsStore interface for ClickHouse using AggregatingMergeTree
tables with -Merge combinators (countMerge, countIfMerge, sumMerge,
quantileMerge). Uses literal SQL for aggregate table queries to avoid
ClickHouse JDBC driver PreparedStatement issues with AggregateFunction
columns. Raw table queries (SLA, topErrors, activeErrorTypes) use normal
prepared statements.
Includes 13 integration tests covering stats, timeseries, grouped
timeseries, SLA compliance, SLA counts by app/route, top errors, active
error types, punchcard, and processor stats. Also fixes AggregateFunction
type signatures in V4 DDL (count() takes no args, countIf takes UInt8).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The jdbcTemplate() method was calling dataSource(properties) directly,
creating a new DataSource instance instead of using the Spring-managed
@Primary bean. This caused some repositories to receive the ClickHouse
connection instead of PostgreSQL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ExecutionFlushScheduler drains MergedExecution and ProcessorBatch write
buffers on a fixed interval and delegates batch inserts to
ClickHouseExecutionStore. Also sweeps stale exchanges every 60s.
ChunkIngestionController exposes POST /api/v1/data/chunks, accepts
single or array ExecutionChunk payloads, and feeds them into the
ChunkAccumulator. Conditional on ChunkAccumulator bean (clickhouse.enabled).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When clickhouse.enabled=true, the ClickHouse JdbcTemplate bean prevents
Spring Boot auto-config from creating the default PG JdbcTemplate.
All PG repositories then get the CH JdbcTemplate and fail with
"Table cameleer.audit_log does not exist".
Fix: explicitly create @Primary DataSource and JdbcTemplate from
DataSourceProperties so PG remains the default for unqualified injections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clickhouse-jdbc 0.9.7 rejects async_insert and wait_for_async_insert as
unknown URL parameters. These are server-side settings, not driver config.
Can be set per-query later if needed via custom_settings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set CLICKHOUSE_USER/PASSWORD via k8s secret (fixes "disabling network
access for user 'default'" when no password is set)
- Add clickhouse-credentials secret to CI deploy + feature branch copy
- Pass CLICKHOUSE_USERNAME/PASSWORD env vars to server pod
- Make schema initializer non-fatal so server starts even if CH is
temporarily unavailable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Testcontainers tests need Docker which isn't available in CI.
Rename to *IT so Surefire skips them (Failsafe runs them with -DskipITs=false).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements MetricsQueryStore using ClickHouse toStartOfInterval() for
time-bucketed aggregation queries; verified with 4 Testcontainers tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TDD implementation of MetricsStore backed by ClickHouse. Uses native
Map(String,String) column type (no JSON cast), relies on ClickHouse
DEFAULT for server_received_at, and handles null tags by substituting
an empty HashMap. All 4 Testcontainers tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ClickHouseSchemaInitializer that runs on ApplicationReadyEvent,
scanning classpath:clickhouse/*.sql in filename order and executing each
statement. Adds V1__agent_metrics.sql with MergeTree table, tenant/agent
partitioning, and 365-day TTL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ClickHouseProperties (bound to clickhouse.*), ClickHouseConfig
(conditional HikariDataSource + JdbcTemplate beans), and extends
application.yml with clickhouse.enabled/url/username/password and
cameleer.storage.metrics properties.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ExecutionDocument and ExecutionRecord records gained an isReplay
field but the integration tests were not updated, breaking CI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ElkDiagramRenderer: guard against null containingNode before getElkRoot()
- OpenSearchAdminController: return 503/502 instead of 200 on errors
- DatabaseAdminController: return 503 instead of 200 on connection failure
- SpaForwardController: replace unbound {path} variables with /** wildcards
- WriteBuffer: check offer() return value and log on unexpected rejection
- ApiExceptionHandler: extract getReason() to local var for null safety
- Admin UI pages: handle isError state for disconnected service display
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detect replayed exchanges via X-Cameleer-Replay header during ingestion,
persist the flag through PostgreSQL and OpenSearch, and surface it in
the dashboard (amber replay icon) and exchange detail chain view.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replay audit log now records the agent's reply status (SUCCESS/FAILURE),
message, and error details. Timeout and internal errors are also logged
as FAILURE with the cause.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add dedicated POST /agents/{id}/replay endpoint that uses
addCommandWithReply to wait for the agent ACK (30s timeout).
Returns the actual replay result (status, message, data) instead
of just a delivery confirmation.
Frontend toast now reflects the agent's response: "Replay completed"
on success, agent error message on failure, timeout message if the
agent doesn't respond.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ROUTE_CONTROL command type and route-control mapping in
AgentCommandController. New RouteControlBar component in the exchange
header shows Start/Stop/Suspend/Resume actions (grouped pill bar) and
a Replay button, gated by agent capabilities and OPERATOR/ADMIN role.
Fix useReplayExchange hook to match protocol section 16: payload now
uses { routeId, exchange: { body, headers }, originalExchangeId, nonce }
instead of the flat { headers, body } format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Treemap: rectangle area = transaction volume, color = SLA compliance
(green→red). Shows apps at L1, routes at L2. Click navigates deeper.
Punchcard heatmap: 7-day rolling weekday x 24-hour grid showing
transaction volume and error patterns. Two side-by-side views
(transactions + errors) reveal temporal clustering.
Backend: new GET /search/stats/punchcard endpoint aggregating
stats_1m_all/app by DOW x hour over rolling 7 days.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteGraph no longer stores a separate nodes list; getNodes() computes
from root tree. Tests now build proper tree via setRoot() + setChildren()
instead of calling setNodes().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent now sends shallow copies (without children) in the flat nodes
list. Build nodeById map by walking graph.getRoot() tree which preserves
children, falling back to flat list via putIfAbsent for compatibility.
Also adds EIP_FILTER, EIP_IDEMPOTENT_CONSUMER, EIP_RECIPIENT_LIST as
new compound container types per updated DIAGRAMS.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restores e8039f9. The compound rendering regression was caused by
the agent sending flat nodes without children, not the renderer code.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>