Replace ACK-based route state inference with agent-reported state.
Heartbeats now carry optional routeStates map, and ROUTE_STATE_CHANGED
events update the registry immediately.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In-memory registry that infers route state (started/stopped/suspended)
from successful route-control command ACKs. Updates state only when all
agents in a group confirm success.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add addGroupCommandWithReplies() to AgentRegistryService that sends commands
to all LIVE agents in a group and returns CompletableFuture per agent for
collecting replies. Update sendGroupCommand() and pushConfigToAgents() to
wait with a shared 10-second deadline, returning CommandGroupResponse with
per-agent status, timeouts, and overall success. Config update endpoint now
returns ConfigUpdateResponse wrapping both the saved config and push result.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GET /search/attributes/keys endpoint that queries distinct
attribute key names from ClickHouse using JSONExtractKeys. Attribute
keys appear in the cmd-k Attributes tab alongside attribute value
matches from exchange results.
- SearchIndex.distinctAttributeKeys() interface method
- ClickHouseSearchIndex implementation using arrayJoin(JSONExtractKeys)
- SearchController /attributes/keys endpoint
- useAttributeKeys() React Query hook
- buildSearchData includes attribute keys as 'attribute' category items
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The chunked ingestion path hardcoded hasTraceData=false because the
execution envelope doesn't carry processor bodies. But the processor
records DO have inputBody/outputBody — we just need to check them.
Track hasTraceData across chunks in PendingExchange and pass it to
MergedExecution when the final chunk arrives or on stale sweep.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tracks authenticated UI user requests to understand usage patterns:
- New ClickHouse usage_events table with 90-day TTL
- UsageTrackingInterceptor captures method, path, duration, user
- Path normalization groups dynamic segments ({id}, {hash})
- Buffered writes via WriteBuffer + periodic flush
- Admin endpoint GET /api/v1/admin/usage with groupBy=endpoint|user|hour
- Skips agent requests, health checks, and data ingestion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace synthetic wrapper node approach with direct iteration fields:
- ProcessorNode gains iteration (child's index) and iterationSize
(container's total) fields, populated from ClickHouse flat records
- Frontend hooks detect iteration containers from iterationSize != null
instead of scanning for wrapper processorTypes
- useExecutionOverlay filters children by iteration field instead of
wrapper nodes, eliminating ITERATION_WRAPPER_TYPES entirely
- Cleaner data contract: API returns exactly what the DB stores
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align all internal naming with the agent team's protocol v2 identity rename:
- agentId → instanceId (unique per-JVM identifier)
- applicationName → applicationId (shared app identifier)
- AgentInfo: id → instanceId, name → displayName, application → applicationId
Add SHUTDOWN lifecycle state for graceful agent shutdowns:
- New POST /data/events endpoint receives agent lifecycle events
- AGENT_STOPPED event transitions agent to SHUTDOWN (skips STALE/DEAD)
- New POST /{id}/deregister endpoint removes agent from registry
- Server now distinguishes graceful shutdown from crash (heartbeat timeout)
Includes ClickHouse V9 and PostgreSQL V14 migrations for column renames.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ElkDiagramRenderer.getElkRoot(): add null guard to prevent NPE
when node is null (SQ java:S2259)
- WriteBuffer: add offerOrWarn() that logs when buffer is full instead
of silently dropping data. ChunkAccumulator now uses this method
so ingestion backpressure is visible in logs (SQ java:S899)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChunkAccumulator now injects DiagramStore and looks up the content hash
when converting to MergedExecution. Without this, the detail page had
no diagram hash, so the overlay couldn't find the route diagram.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dual-mode buildTree: detects seq presence and uses seq/parentSeq linkage
instead of processorId map. Handles duplicate processorIds across
iterations correctly. Old processorId-based mode kept for PG compat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract LogIndex interface from OpenSearchLogIndex. Both ClickHouseLogStore
and OpenSearchLogIndex implement it. Controllers now inject LogIndex.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkIngestionController: /data/chunks → /data/executions (matches
PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cameleer3-common removed children, loopIndex, splitIndex,
multicastIndex from ProcessorExecution (flat model only now).
Iteration context lives on synthetic wrapper nodes via processorType.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The X-Cameleer-Replay header is only available when inputSnapshot is
captured (DETAILED/DEEP engine level). The agent always sets
replayExchangeId on RouteExecution, so check that first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ElkDiagramRenderer: guard against null containingNode before getElkRoot()
- OpenSearchAdminController: return 503/502 instead of 200 on errors
- DatabaseAdminController: return 503 instead of 200 on connection failure
- SpaForwardController: replace unbound {path} variables with /** wildcards
- WriteBuffer: check offer() return value and log on unexpected rejection
- ApiExceptionHandler: extract getReason() to local var for null safety
- Admin UI pages: handle isError state for disconnected service display
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detect replayed exchanges via X-Cameleer-Replay header during ingestion,
persist the flag through PostgreSQL and OpenSearch, and surface it in
the dashboard (amber replay icon) and exchange detail chain view.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ROUTE_CONTROL command type and route-control mapping in
AgentCommandController. New RouteControlBar component in the exchange
header shows Start/Stop/Suspend/Resume actions (grouped pill bar) and
a Replay button, gated by agent capabilities and OPERATOR/ADMIN role.
Fix useReplayExchange hook to match protocol section 16: payload now
uses { routeId, exchange: { body, headers }, originalExchangeId, nonce }
instead of the flat { headers, body } format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a filter processor rejects a message (filterMatched=false) or an
idempotent consumer detects a duplicate (duplicateMessage=true), the
compound container turns amber (header, border, body tint).
Also adds red pulsing rings on the failed processor badge (same SMIL
pattern as the teal hasTraceData pulse).
Backend: ProcessorNode gains filterMatched/duplicateMessage fields,
threaded from ProcessorExecution JSON path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Treemap: rectangle area = transaction volume, color = SLA compliance
(green→red). Shows apps at L1, routes at L2. Click navigates deeper.
Punchcard heatmap: 7-day rolling weekday x 24-hour grid showing
transaction volume and error patterns. Two side-by-side views
(transactions + errors) reveal temporal clustering.
Backend: new GET /search/stats/punchcard endpoint aggregating
stats_1m_all/app by DOW x hour over rolling 7 days.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trace data visibility:
- ProcessorNode now includes hasTraceData flag computed from captured
body/headers during tree conversion
- ConfigBadge shows teal for tracing configured, green when data captured
- Search results show green footprints icon for exchanges with trace data
- New has_trace_data column on executions table (V11 migration with backfill)
- OpenSearch documents and ExecutionSummary include the flag
Inline tap configuration:
- Extracted reusable TapConfigModal component from RouteDetail
- Diagram context menu opens tap modal inline instead of navigating away
- Toggle-trace action works immediately with toast feedback
- Modal closes only on ESC, Cancel, Save, or Delete (not backdrop click)
Detail panel tab gating:
- Headers, Input, Output tabs disabled when no data is available
- Works at both exchange and processor level
- Falls back to Info tab when active tab becomes empty
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same issue as IngestionService — the ObjectMapper deserializing
processors_json lacked JavaTimeModule, causing Instant parsing to fail
silently and falling back to the broken flat reconstruction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ObjectMapper used to serialize the processor tree to JSON lacked
JavaTimeModule, causing Instant fields (startTime, endTime) to fail
silently — processors_json was always null.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes iteration overlay corruption caused by flat storage collapsing
duplicate processorIds across loop iterations.
Server:
- Store raw processor tree as processors_json JSONB on executions table
- Detail endpoint serves from processors_json (faithful tree), falls back
to flat record reconstruction for older executions
- V10 migration: processors_json, error categorization (errorType,
errorCategory, rootCauseType, rootCauseMessage), OTel (traceId, spanId),
circuit breaker (circuitBreakerState, fallbackTriggered), drops
erroneous splitDepth/loopDepth columns
- Add all new fields through full ingestion/storage/API chain
UI:
- Fix overlay wrapper filtering: check wrapper type before status filter
- Add new fields to schema.d.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add resolvedEndpointUri, splitDepth, loopDepth arguments to
ProcessorRecord constructors in TreeReconstructionTest and
PostgresExecutionStoreIT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire resolvedEndpointUri through the full chain:
- V9 migration adds resolved_endpoint_uri column
- IngestionService extracts from ProcessorExecution
- PostgresExecutionStore persists and reads the column
- ProcessorNode includes field in detail API response
- UI schema updated for ProcessorNode and PositionedNode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server:
- Add endpointUri to PositionedNode (from RouteNode)
- Add fromEndpointUri to RouteSummary (catalog API)
- Catalog controller resolves endpoint URI from diagram store
UI:
- Build endpointRouteMap from catalog's fromEndpointUri field
- Drill-down uses exact match on node.endpointUri against the map
- Remove label parsing heuristics (extractTargetEndpoint, camelToKebab)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests were using the old 18-param constructor, missing the 5 new
iteration fields (loopIndex, loopSize, splitIndex, splitSize,
multicastIndex) added in V8 migration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GET /executions/{id}/processors/by-id/{processorId}/snapshot endpoint
that fetches processor snapshot data by processorId instead of positional
index, which is fragile when the tree structure changes. The existing
index-based endpoint remains unchanged for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add loop_index, loop_size, split_index, split_size, multicast_index
columns to processor_executions table and thread them through the
full storage → ingestion → detail pipeline. These fields enable
execution overlay to display iteration context for loop, split,
and multicast EIPs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New interactive route diagram component with SVG rendering using
server-computed ELK layout coordinates. TIBCO BW5-inspired top-bar
card node style with zoom/pan, hover toolbars, config badges, and
error handler sections below the main flow.
Backend: add direction query parameter (LR/TB) to diagram render
endpoints, defaulting to left-to-right layout.
Frontend: 14-file ProcessDiagram component in ui/src/components/
with DiagramNode, CompoundNode, DiagramEdge, ConfigBadge, NodeToolbar,
ErrorSection, ZoomControls, and supporting hooks. Dev test page at
/dev/diagram for validation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TreeReconstructionTest and PostgresExecutionStoreIT still passed the
removed diagramNodeId parameter. Missed by mvn compile (main only);
caught by mvn verify (test compilation).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store application_name in route_diagrams at ingestion time (V7 migration),
resolve from agent registry same as ExecutionController. Move
findProcessorRouteMapping from ExecutionStore to DiagramStore using a
JSONB query that extracts node IDs directly from stored RouteGraph
definitions. This makes the mapping available as soon as diagrams are
sent, before any executions are recorded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent now uses Camel processorId as RouteNode.id, eliminating the
nodeId mapping layer. Drop diagram_node_id column (V6 migration),
remove from ProcessorRecord/ProcessorNode/IngestionService/DetailService,
add /processor-routes endpoint for processorId→routeId lookup,
simplify frontend diagram-mapping and ExchangeDetail overlays,
replace N diagram fetches in AppConfigPage with single hook.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces null placeholders with actual getAttributes() calls now that
cameleer3-common SNAPSHOT is resolved with attributes support.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests were not updated when attributes field was added to ExecutionRecord,
ProcessorRecord, ProcessorDoc, and ExecutionDocument records.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds CompletableFuture-based request-reply mechanism for commands that
need synchronous results. CommandReply record in core, pendingReplies
map in AgentRegistryService, test-expression endpoint on config controller
with 5s timeout. CommandAckRequest extended with optional data field.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DetailService deserializes attributes JSON from ExecutionRecord/ProcessorRecord and
passes them to ExecutionDetail and ProcessorNode constructors. ExecutionDocument and
ProcessorDoc carry attributes as a JSON string. SearchIndexer passes attributes when
building documents. OpenSearchIndex includes attributes in indexed maps and
deserializes them when constructing ExecutionSummary from search hits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>