Files
cameleer-saas/docs/superpowers/specs/2026-03-29-moat-features-design.md
hsiegeln 63c194dab7
Some checks failed
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
chore: rename cameleer3 to cameleer
Rename Java packages from net.siegeln.cameleer3 to net.siegeln.cameleer,
update all references in workflows, Docker configs, docs, and bootstrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:44 +02:00

33 KiB

Moat-Strengthening Features — Design Specification

Date: 2026-03-29 Status: Draft — Awaiting Review Author: Boardroom simulation (Strategist, Skeptic, Architect, Growth Hacker) Gitea Issues: cameleer/cameleer #57-#72 (label: MOAT)

Executive Summary

Three features designed to convert Cameleer's technical moat (ByteBuddy agent) into a workflow moat (debugger + lineage) and ultimately a network moat (cross-service correlation) before the vibe-coding window closes.

Feature Ship Target Moat Type Agent Changes Server Changes
Live Route Debugger Weeks 8-14 Workflow Heavy (DebugSessionManager, breakpoints) Heavy (WebSocket, session mgmt)
Payload Flow Lineage Weeks 3-6 Technical Light (one capture mode check) Medium (DiffEngine)
Cross-Service Correlation Weeks 1-9 Network effect Light (header propagation) Medium (trace assembly, topology)

Build Order

Week 1-3:  Foundation + Topology Graph (from existing data, zero agent changes)
Week 3-6:  Payload Flow Lineage (agent + server + UI)
Week 5-9:  Distributed Trace Correlation (agent header + server joins + UI)
Week 8-14: Live Route Debugger (agent + server + UI)

Gitea Issue Map

Epics:

  • #57 — Live Route Debugger
  • #58 — Payload Flow Lineage
  • #59 — Cross-Service Trace Correlation + Topology Map

Debugger sub-issues:

  • #60 — Protocol: Debug session command types (cameleer-common)
  • #61 — Agent: DebugSessionManager + breakpoint InterceptStrategy integration
  • #62 — Agent: ExchangeStateSerializer + synthetic direct route wrapper
  • #63 — Server: DebugSessionService + WebSocket + REST API
  • #70 — UI: Debug session frontend components

Lineage sub-issues:

  • #64 — Protocol: Lineage command types (cameleer-common)
  • #65 — Agent: LineageManager + capture mode integration
  • #66 — Server: LineageService + DiffEngine + REST API
  • #71 — UI: Lineage timeline + diff viewer components

Correlation sub-issues:

  • #67 — Agent: Enhanced trace context header propagation
  • #68 — Server: CorrelationService — distributed trace assembly
  • #69 — Server: DependencyGraphService + service topology materialized view
  • #72 — UI: Distributed trace view + service topology graph

1. Live Route Debugger

1.1 Concept

Extend the existing replay command with a debug-session wrapper. Users provide an exchange (from a prior failed execution or manually constructed) and replay it through a route with breakpoints. Only the replayed exchange's thread blocks at breakpoints — production traffic flows normally.

User Story: A developer sees a failed exchange. They click "Debug This Exchange." Cameleer pre-fills the body/headers. They set breakpoints, click "Start Debug Session." The exchange replays through the route, pausing at each breakpoint. They inspect state, modify the body, step forward. Total: 3 minutes. Without Cameleer: 45 minutes.

1.2 Architecture

Browser (SaaS UI)
    |
    v
WebSocket <--------------------------------------+
    |                                             |
    v                                             |
cameleer-server                                  |
    |  POST /api/v1/debug/sessions                |
    |  POST /api/v1/debug/sessions/{id}/step      |
    |  POST /api/v1/debug/sessions/{id}/resume    |
    |  DELETE /api/v1/debug/sessions/{id}         |
    |                                             |
    v                                             |
SSE Command Channel --> cameleer agent           |
    |                       |                     |
    |  "start-debug"        |                     |
    |  command               v                    |
    |               DebugSessionManager           |
    |                       |                     |
    |               Replay exchange via           |
    |               ProducerTemplate              |
    |                       |                     |
    |               InterceptStrategy checks      |
    |               breakpoints before each       |
    |               processor                     |
    |                       |                     |
    |               On breakpoint hit:            |
    |               > LockSupport.park()          |
    |               > Serialize exchange state     |
    |               > POST state to server -------+
    |                       |                    (server pushes to
    |               Wait for resume/step/skip     browser via WS)
    |               command via SSE
    |                       |
    |               On resume: LockSupport.unpark()
    |               Continue to next processor

1.3 Protocol Additions (cameleer-common)

New SSE Commands

Command Direction Purpose
START_DEBUG Server -> Agent Create session, spawn thread, replay exchange with breakpoints
DEBUG_RESUME Server -> Agent Unpark thread, continue to next breakpoint
DEBUG_STEP Server -> Agent Unpark thread, break at next processor (STEP_OVER/STEP_INTO)
DEBUG_SKIP Server -> Agent Skip current processor, continue
DEBUG_MODIFY Server -> Agent Apply body/header changes at current breakpoint before resuming
DEBUG_ABORT Server -> Agent Abort session, release thread

StartDebugPayload

{
  "sessionId": "dbg-a1b2c3",
  "routeId": "route-orders",
  "exchange": {
    "body": "{\"orderId\": 42, \"amount\": 150.00}",
    "headers": { "Content-Type": "application/json" }
  },
  "breakpoints": [
    { "processorId": "choice1", "condition": null },
    { "processorId": "to5", "condition": "${body.amount} > 100" }
  ],
  "mode": "STEP_OVER",
  "timeoutSeconds": 300,
  "originalExchangeId": "ID-failed-789",
  "replayToken": "...",
  "nonce": "..."
}

BreakpointHitReport (Agent -> Server)

{
  "sessionId": "dbg-a1b2c3",
  "processorId": "to5",
  "processorType": "TO",
  "endpointUri": "http://payment-service/charge",
  "depth": 2,
  "stepIndex": 4,
  "exchangeState": {
    "body": "{\"orderId\": 42, \"amount\": 150.00, \"validated\": true}",
    "headers": { "..." },
    "properties": { "CamelSplitIndex": 0 },
    "exception": null,
    "bodyType": "java.util.LinkedHashMap"
  },
  "executionTree": ["...partial tree up to this point..."],
  "parentProcessorId": "split1",
  "routeId": "route-orders",
  "timestamp": "2026-03-29T14:22:05.123Z"
}

1.4 Agent Implementation (cameleer-agent)

DebugSessionManager

  • Location: com.cameleer.agent.debug.DebugSessionManager
  • Stores active sessions: ConcurrentHashMap<sessionId, DebugSession>
  • Enforces max concurrent sessions (default 3, configurable via cameleer.debug.maxSessions)
  • Allocates dedicated Thread per session (NOT from Camel thread pool)
  • Timeout watchdog: ScheduledExecutorService auto-aborts expired sessions
  • Handles all DEBUG_* commands via DefaultCommandHandler delegation

DebugSession

  • Stores breakpoint definitions, current step mode, parked thread reference
  • shouldBreak(processorId, Exchange): evaluates processor match + Simple condition + step mode
  • reportBreakpointHit(): serializes state, POSTs to server, calls LockSupport.park()
  • applyModifications(Exchange): sets body/headers from DEBUG_MODIFY command

InterceptStrategy Integration

In CameleerInterceptStrategy.DelegateAsyncProcessor.process():

DebugSession session = debugSessionManager.getSession(exchange);
if (session != null && session.shouldBreak(processorId, exchange)) {
    ExchangeState state = ExchangeStateSerializer.capture(exchange);
    List<ProcessorExecution> tree = executionCollector.getPartialTree(exchange);
    session.reportBreakpointHit(processorId, state, tree);
    // Thread parked until server sends resume/step/skip/abort

    if (session.isAborted()) throw new DebugSessionAbortedException();
    if (session.shouldSkip()) { callback.done(true); return true; }
    if (session.hasModifications()) session.applyModifications(exchange);
}

Zero production overhead: Debug exchanges carry CameleerDebugSessionId exchange property. getSession() checks this property — single null-check. Production exchanges have no property, check returns null, no further work.

ExchangeStateSerializer

  • TypeConverter chain: String -> byte[] as Base64 -> class name fallback
  • Stream bodies: wrap in CachedOutputStream (same pattern as Camel's stream caching)
  • Sensitive header redaction (reuses PayloadCapture redaction logic)
  • Size limit: cameleer.debug.maxBodySize (default 64KB)

Synthetic Direct Route Wrapper

For non-direct routes (timer, jms, http, file):

  1. Extract route's processor chain from CamelContext
  2. Create temporary direct:__debug_{routeId} route with same processors (shared by reference)
  3. Debug exchange enters via ProducerTemplate.send()
  4. Remove temporary route on session completion

1.5 Server Implementation (cameleer-server)

REST Endpoints

Method Path Role Purpose
POST /api/v1/debug/sessions OPERATOR Create debug session
GET /api/v1/debug/sessions/{id} VIEWER Get session state
POST /api/v1/debug/sessions/{id}/step OPERATOR Step over/into
POST /api/v1/debug/sessions/{id}/resume OPERATOR Resume to next breakpoint
POST /api/v1/debug/sessions/{id}/skip OPERATOR Skip current processor
POST /api/v1/debug/sessions/{id}/modify OPERATOR Modify exchange at breakpoint
DELETE /api/v1/debug/sessions/{id} OPERATOR Abort session
POST /api/v1/debug/sessions/{id}/breakpoint-hit AGENT Agent reports breakpoint
GET /api/v1/debug/sessions/{id}/compare VIEWER Compare debug vs original

WebSocket Channel

Endpoint: WS /api/v1/debug/ws?token={jwt}

Server -> Browser events:
  { "type": "breakpoint-hit", "sessionId": "...", "data": { ...state... } }
  { "type": "session-completed", "sessionId": "...", "execution": { ... } }
  { "type": "session-error", "sessionId": "...", "error": "Agent disconnected" }
  { "type": "session-timeout", "sessionId": "..." }

Data Model

CREATE TABLE debug_sessions (
    session_id        TEXT PRIMARY KEY,
    agent_id          TEXT NOT NULL,
    route_id          TEXT NOT NULL,
    original_exchange TEXT,
    status            TEXT NOT NULL DEFAULT 'PENDING',
    created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at      TIMESTAMPTZ,
    breakpoints       JSONB,
    current_state     JSONB,
    step_count        INT DEFAULT 0,
    replay_exchange   TEXT,
    created_by        TEXT NOT NULL
);

DebugSessionService

Lifecycle: PENDING -> ACTIVE -> PAUSED -> COMPLETED/ABORTED/TIMEOUT

  1. Generate sessionId + nonce + replay token
  2. Send START_DEBUG via existing SSE channel
  3. Receive breakpoint-hit POSTs, store state, push to WebSocket
  4. Translate browser actions (step/resume/skip/modify) into SSE commands
  5. Detect agent SSE disconnect via SseConnectionManager callback
  6. Store completed execution in normal pipeline (tagged with debugSessionId)

1.6 SaaS Layer (cameleer-saas)

  • Tenant isolation: debug sessions scoped to tenant's agents
  • Concurrent session limits per tier (free: 1, pro: 5, enterprise: unlimited)
  • Usage metering: session creation counted as billable event

1.7 UI Components

  • DebugLauncher.tsx — "Debug This Exchange" button on failed execution detail, pre-fills exchange data
  • DebugSession.tsx — Main view: route diagram with status coloring (green/yellow/gray), exchange state panel, step controls (F10/F11/F5/F6 keyboard shortcuts)
  • DebugCompare.tsx — Side-by-side: original execution vs debug replay with diff highlighting
  • BreakpointEditor.tsx — Click processor nodes to toggle breakpoints, conditional expression input

1.8 Safety Mechanisms

Concern Mitigation
Thread leak Session timeout auto-aborts (default 5 min)
Memory leak Exchange state captured on-demand, not buffered
Agent restart Server detects SSE disconnect, notifies browser
High-throughput route Only debug exchange hits breakpoints (property check)
Concurrent sessions Hard limit (default 3), FAILURE ack if exceeded
Non-direct routes Synthetic direct:__debug_* wrapper with same processor chain

2. Payload Flow Lineage

2.1 Concept

Capture the full transformation history of a message flowing through a route. At each processor, snapshot body before and after. Server computes structural diffs. UI renders a visual "data flow" timeline showing exactly where and how data transforms.

User Story: A developer has an exchange where customerName is null. They click "Trace Payload Flow." Vertical timeline: at each processor, before/after body with structural diff. Processor 7 (enrich1) returned a response missing the name field. Root cause in 30 seconds.

2.2 Architecture

cameleer agent
    |
    |  On lineage-enabled exchange:
    |  Before processor: capture INPUT
    |  After processor: capture OUTPUT
    |  Attach to ProcessorExecution as inputBody/outputBody
    |
    v
POST /api/v1/data/executions (processors carry full snapshots)
    |
    v
cameleer-server
    |
    |  LineageService:
    |  > Flatten processor tree to ordered list
    |  > Compute diffs between processor[n].output and processor[n+1].input
    |  > Classify transformation type
    |  > Generate human-readable summary
    |
    v
GET /api/v1/executions/{id}/lineage
    |
    v
Browser: LineageTimeline + DiffViewer

2.3 Protocol Additions (cameleer-common)

New SSE Commands

Command Direction Purpose
ENABLE_LINEAGE Server -> Agent Activate targeted payload capture
DISABLE_LINEAGE Server -> Agent Deactivate lineage capture

EnableLineagePayload

{
  "lineageId": "lin-x1y2z3",
  "scope": {
    "type": "ROUTE",
    "routeId": "route-orders"
  },
  "predicate": "${header.orderId} == 'ORD-500'",
  "predicateLanguage": "simple",
  "maxCaptures": 10,
  "duration": "PT10M",
  "captureHeaders": true,
  "captureProperties": false
}

Scope Types

Scope Meaning
ROUTE All exchanges on a specific route
CORRELATION All exchanges with a specific correlationId
EXPRESSION Any exchange matching a Simple/JsonPath predicate
NEXT_N Next N exchanges on the route (countdown)

2.4 Agent Implementation (cameleer-agent)

LineageManager

  • Location: com.cameleer.agent.lineage.LineageManager
  • Stores active configs: ConcurrentHashMap<lineageId, LineageConfig>
  • Tracks capture count per lineageId: auto-disables at maxCaptures
  • Duration timeout via ScheduledExecutorService: auto-disables after expiry
  • shouldCaptureLineage(Exchange): evaluates scope + predicate, sets CameleerLineageActive property
  • isLineageActive(Exchange): single null-check on exchange property (HOT PATH, O(1))

Integration Points (Minimal Agent Changes)

1. CameleerEventNotifier.onExchangeCreated():

lineageManager.shouldCaptureLineage(exchange);
// Sets CameleerLineageActive property if matching

2. ExecutionCollector.resolveProcessorCaptureMode():

if (lineageManager.isLineageActive(exchange)) {
    return PayloadCaptureMode.BOTH;
}

3. PayloadCapture body size:

int maxSize = lineageManager.isLineageActive(exchange)
    ? config.getLineageMaxBodySize()    // 64KB
    : config.getMaxBodySize();           // 4KB

Production overhead when lineage is disabled: effectively zero. The isLineageActive() check is a single null-check on an exchange property that doesn't exist on non-lineage exchanges.

Configuration

cameleer.lineage.maxBodySize=65536    # 64KB for lineage captures (vs 4KB normal)
cameleer.lineage.enabled=true          # master switch

2.5 Server Implementation (cameleer-server)

LineageService

  • getLineage(executionId): fetch execution, flatten tree to ordered processor list, compute diffs
  • enableLineage(request): send ENABLE_LINEAGE to target agents
  • disableLineage(lineageId): send DISABLE_LINEAGE
  • getActiveLineages(): list active configs across all agents

DiffEngine

Format-aware diff computation:

Format Detection Library Output
JSON Jackson parse success zjsonpatch (RFC 6902) or custom tree walk FIELD_ADDED, FIELD_REMOVED, FIELD_MODIFIED with JSON path
XML DOM parse success xmlunit-core ELEMENT_ADDED, ELEMENT_REMOVED, ATTRIBUTE_CHANGED
Text Fallback java-diff-utils (Myers) LINE_ADDED, LINE_REMOVED, LINE_CHANGED
Binary Type detection N/A Size comparison only

Transformation Classification

UNCHANGED      — No diff
MUTATION       — Existing fields modified, same format
ENRICHMENT     — Fields only added (e.g., enrich processor)
REDUCTION      — Fields only removed
FORMAT_CHANGED — Content type changed (XML -> JSON)
TYPE_CHANGED   — Java type changed but content equivalent
MIXED          — Combination of additions, removals, modifications

Summary Generation

Auto-generated human-readable summaries:

  • "XML -> JSON conversion" (FORMAT_CHANGED)
  • "Added customer object from external API" (ENRICHMENT + field names)
  • "Modified amount field: 150.00 -> 135.00" (MUTATION + values)

Lineage Response Schema

{
  "executionId": "exec-123",
  "routeId": "route-orders",
  "processors": [
    {
      "processorId": "unmarshal1",
      "processorType": "UNMARSHAL",
      "input": {
        "body": "<order><id>42</id></order>",
        "bodyType": "java.lang.String",
        "contentType": "application/xml"
      },
      "output": {
        "body": "{\"id\": 42}",
        "bodyType": "java.util.LinkedHashMap",
        "contentType": "application/json"
      },
      "diff": {
        "transformationType": "FORMAT_CHANGED",
        "summary": "XML -> JSON conversion",
        "bodyChanged": true,
        "headersChanged": true,
        "changes": [
          { "type": "FORMAT_CHANGED", "from": "XML", "to": "JSON" }
        ]
      },
      "durationMs": 12,
      "status": "COMPLETED"
    }
  ]
}

REST Endpoints

Method Path Role Purpose
GET /api/v1/executions/{id}/lineage VIEWER Full lineage with diffs
POST /api/v1/lineage/enable OPERATOR Enable lineage on agents
DELETE /api/v1/lineage/{lineageId} OPERATOR Disable lineage
GET /api/v1/lineage/active VIEWER List active lineage configs

2.6 SaaS Layer (cameleer-saas)

  • Lineage captures counted as premium events (higher billing weight)
  • Active lineage config limits per tier
  • Post-hoc lineage from COMPLETE engine level available on all tiers (resource-intensive fallback)
  • Targeted lineage-on-demand is a paid-tier feature (upgrade driver)

2.7 UI Components

  • LineageTimeline.tsx — Vertical processor list, color-coded by transformation type (green/yellow/blue/red/purple), expandable diffs, auto-generated summaries
  • LineageDiffViewer.tsx — Side-by-side or unified diff, format-aware (JSON tree-diff, XML element-diff, text line-diff, binary hex)
  • LineageEnableDialog.tsx — "Trace Payload Flow" button, scope/predicate builder, max captures slider
  • LineageSummaryStrip.tsx — Compact horizontal strip on execution detail page, transformation icons per processor

3. Cross-Service Trace Correlation + Topology Map

3.1 Concept

Stitch executions across services into unified distributed traces. Build a service dependency topology graph automatically from observed traffic. Design the protocol for future cross-tenant federation.

User Story: Platform team with 8 Camel microservices. Order stuck in "processing." Engineer searches by orderId, sees distributed trace: horizontal timeline across all services, each expandable to route detail. Service C (pricing) timed out. Root cause across 4 boundaries in 60 seconds.

3.2 Phase 1: Intra-Tenant Trace Correlation

Enhanced Trace Context Header

Current (exists):
  X-Cameleer-CorrelationId: corr-abc-123

New (added):
  X-Cameleer-TraceContext: {
    "traceId": "trc-xyz",
    "parentSpanId": "span-001",
    "hopIndex": 2,
    "sourceApp": "order-service",
    "sourceRoute": "route-validate"
  }

Transport-Specific Propagation

Transport Detection Mechanism
HTTP/REST URI prefix http:, https:, rest: HTTP header X-Cameleer-TraceContext
JMS URI prefix jms:, activemq:, amqp: JMS property CameleerTraceContext
Kafka URI prefix kafka: Kafka header cameleer-trace-context
Direct/SEDA URI prefix direct:, seda:, vm: Exchange property (in-process)
File/FTP URI prefix file:, ftp: Not propagated (async)

3.3 Agent Implementation (cameleer-agent)

Outgoing Propagation (InterceptStrategy)

Before delegating to TO/ENRICH/WIRE_TAP processors:

if (isOutgoingEndpoint(processorType, endpointUri)) {
    TraceContext ctx = new TraceContext(
        executionCollector.getTraceId(exchange),
        currentProcessorExecution.getId(),
        executionCollector.getHopIndex(exchange) + 1,
        config.getApplicationName(),
        exchange.getFromRouteId()
    );
    injectTraceContext(exchange, endpointUri, ctx);
}

Incoming Extraction (CameleerEventNotifier)

In onExchangeCreated():

String traceCtxJson = extractTraceContext(exchange);
if (traceCtxJson != null) {
    TraceContext ctx = objectMapper.readValue(traceCtxJson, TraceContext.class);
    exchange.setProperty("CameleerParentSpanId", ctx.parentSpanId);
    exchange.setProperty("CameleerSourceApp", ctx.sourceApp);
    exchange.setProperty("CameleerSourceRoute", ctx.sourceRoute);
    exchange.setProperty("CameleerHopIndex", ctx.hopIndex);
}

New RouteExecution Fields

execution.setParentSpanId(...);   // processor execution ID from calling service
execution.setSourceApp(...);      // application name of caller
execution.setSourceRoute(...);    // routeId of caller
execution.setHopIndex(...);       // depth in distributed trace

Safety

  • Header size always <256 bytes
  • Parse failure: log warning, continue without context (no exchange failure)
  • Only inject on outgoing processors, never on FROM consumers

3.4 Server Implementation: Trace Assembly (cameleer-server)

CorrelationService

buildDistributedTrace(correlationId):
  1. SELECT * FROM executions WHERE correlation_id = ? ORDER BY start_time
  2. Index by executionId for O(1) lookup
  3. Build tree: roots = executions where parentSpanId IS NULL
     For each with parentSpanId: find parent, attach as child hop
  4. Compute gaps: child.startTime - parent.processor.startTime = network latency
     If gap < 0: flag clock skew warning
  5. Aggregate: totalDuration, serviceCount, hopCount, status

Distributed Trace Response

{
  "traceId": "trc-xyz",
  "correlationId": "corr-abc-123",
  "totalDurationMs": 1250,
  "hopCount": 4,
  "serviceCount": 3,
  "status": "FAILED",
  "entryPoint": {
    "application": "api-gateway",
    "routeId": "route-incoming-orders",
    "executionId": "exec-001",
    "durationMs": 1250,
    "children": [
      {
        "calledFrom": {
          "processorId": "to3",
          "processorType": "TO",
          "endpointUri": "http://order-service/validate"
        },
        "application": "order-service",
        "routeId": "route-validate",
        "executionId": "exec-002",
        "durationMs": 350,
        "networkLatencyMs": 12,
        "children": []
      }
    ]
  }
}

Data Model Changes

ALTER TABLE executions ADD COLUMN parent_span_id TEXT;
ALTER TABLE executions ADD COLUMN source_app TEXT;
ALTER TABLE executions ADD COLUMN source_route TEXT;
ALTER TABLE executions ADD COLUMN hop_index INT;

CREATE INDEX idx_executions_parent_span
    ON executions(parent_span_id) WHERE parent_span_id IS NOT NULL;

Edge Cases

  • Missing hops: uninstrumented service shown as "unknown" node
  • Clock skew: flagged as warning, still rendered
  • Fan-out: parallel multicast creates multiple children from same processor
  • Circular calls: detected via hopIndex (max depth 20)

3.5 Server Implementation: Topology Graph (cameleer-server)

DependencyGraphService

Builds service dependency graph from existing execution data — zero additional agent overhead.

Data source: processor_executions where processor_type IN (TO, TO_DYNAMIC, EIP_ENRICH, EIP_POLL_ENRICH, EIP_WIRE_TAP) and resolved_endpoint_uri IS NOT NULL.

Endpoint-to-Service Resolution

  1. Direct/SEDA match: direct:processOrder -> route's applicationName
  2. Agent registration match: URI base URL matches registered agent
  3. Kubernetes hostname: extract hostname from URI -> applicationName
  4. Manual mapping: admin-configured regex/glob patterns
  5. Unresolved: external:{hostname} node

Materialized View

CREATE MATERIALIZED VIEW service_dependencies AS
SELECT
    e.application_name AS source_app,
    pe.resolved_endpoint_uri AS target_uri,
    COUNT(*) AS call_count,
    AVG(pe.duration_ms) AS avg_latency_ms,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY pe.duration_ms) AS p99_latency_ms,
    SUM(CASE WHEN pe.status = 'FAILED' THEN 1 ELSE 0 END)::FLOAT
        / NULLIF(COUNT(*), 0) AS error_rate,
    MAX(pe.start_time) AS last_seen,
    MIN(pe.start_time) AS first_seen
FROM executions e
JOIN processor_executions pe
    ON e.execution_id = pe.execution_id
    AND e.start_time = pe.start_time
WHERE pe.processor_type IN ('TO','TO_DYNAMIC','EIP_ENRICH','EIP_POLL_ENRICH','EIP_WIRE_TAP')
    AND pe.resolved_endpoint_uri IS NOT NULL
    AND e.start_time > NOW() - INTERVAL '24 hours'
GROUP BY e.application_name, pe.resolved_endpoint_uri;

-- Refresh every 5 minutes

REST Endpoints

Method Path Role Purpose
GET /api/v1/traces/{correlationId} VIEWER Assembled distributed trace
GET /api/v1/traces/{correlationId}/timeline VIEWER Flat timeline for Gantt
GET /api/v1/topology/dependencies VIEWER Service dependency graph
GET /api/v1/topology/diff VIEWER Topology changes between windows
GET /api/v1/topology/dependencies/{source}/{target} VIEWER Dependency detail

3.6 Phase 2: Cross-Tenant Federation (Design Only)

Reserve sourceTenantHash in TraceContext for future use:

{
  "traceId": "trc-xyz",
  "parentSpanId": "span-001",
  "hopIndex": 2,
  "sourceApp": "order-service",
  "sourceRoute": "route-validate",
  "sourceTenantHash": null
}

Consent model (v2):

  • Both tenants opt-in to "Federation" in SaaS settings
  • Shared: trace structure (timing, status, service names)
  • NOT shared: payload content, headers, internal route details
  • Either tenant can revoke at any time

3.7 SaaS Layer (cameleer-saas)

  • All trace correlation intra-tenant in v1
  • Topology graph scoped to tenant's applications
  • External dependencies shown as opaque nodes
  • Cross-tenant federation as enterprise-tier feature (v2)

3.8 UI Components

  • DistributedTraceView.tsx — Horizontal Gantt timeline, rows=services, bars=executions, arrows=call flow, click-to-expand to route detail
  • ServiceTopologyGraph.tsx — Force-directed graph, nodes sized by throughput, edges colored by error rate, animated traffic pulse, click drill-down
  • TopologyDiff.tsx — "What changed?" view, new/removed dependencies highlighted, latency/error changes annotated
  • TraceSearchEnhanced.tsx — Search by correlationId/traceId/business attributes, results show trace summaries with service count and hop count

4. Cross-Feature Integration Points

From -> To Integration
Correlation -> Debugger "Debug This Hop": from distributed trace, click a service hop to replay and debug
Correlation -> Lineage "Trace Payload Across Services": enable lineage on a correlationId, see transforms across boundaries
Lineage -> Debugger "Debug From Diff": unexpected processor output -> one-click launch debug with breakpoint on that processor
Debugger -> Lineage Debug sessions auto-capture full lineage (all processors at BOTH mode)
Topology -> Correlation Click dependency edge -> show recent traces between those services
Topology -> Lineage "How does data transform?" -> aggregated lineage summary for a dependency edge

5. Competitive Analysis

What an LLM + Junior Dev Can Replicate

Capability Replicable? Time Barrier
JMX metrics dashboard Yes 1 weekend None
Log parsing + display Yes 1 weekend None
Basic replay (re-send exchange) Yes 1 week Need agent access
Per-processor payload capture No* 2-3 months Requires bytecode instrumentation
Nested EIP execution trees No* 3-6 months Requires deep Camel internals knowledge
Breakpoint debugging in route No 6+ months Thread management + InterceptStrategy + serialization
Format-aware payload diffing Partially 2 weeks Diff library exists, but data pipeline doesn't
Distributed trace assembly Partially 1 month OTel exists but lacks Camel-specific depth
Service topology from execution data Partially 2 weeks Istio does this at network layer, not route layer

*Achievable with OTel Camel instrumentation (spans only, not payload content)

Where Each Feature Creates Unreplicable Value

  • Debugger: Requires InterceptStrategy breakpoints + thread parking + exchange serialization. The combination is unique — no other Camel tool offers browser-based route stepping.
  • Lineage: Requires per-processor INPUT/OUTPUT capture with correct ordering. OTel spans don't carry body content. JMX doesn't capture payloads. Only bytecode instrumentation provides this data.
  • Correlation + Topology: The trace assembly is achievable elsewhere. The differentiation is Camel-specific depth: each hop shows processor-level execution trees, not just "Service B took 350ms."

6. Implementation Sequencing

Phase A: Foundation + Topology (Weeks 1-3)

Work Repo Issue
Service topology materialized view cameleer-server #69
Topology REST API cameleer-server #69
ServiceTopologyGraph.tsx cameleer-server + saas #72
WebSocket infrastructure (for debugger) cameleer-server #63
TraceContext DTO in cameleer-common cameleer #67

Ship: Topology graph visible from existing data. Zero agent changes. Immediate visual payoff.

Phase B: Lineage (Weeks 3-6)

Work Repo Issue
Lineage protocol DTOs cameleer-common #64
LineageManager + capture integration cameleer-agent #65
LineageService + DiffEngine cameleer-server #66
Lineage UI components cameleer-server + saas #71

Ship: Payload flow lineage independently usable.

Phase C: Distributed Trace Correlation (Weeks 5-9, overlaps B)

Work Repo Issue
Trace context header propagation cameleer-agent #67
Executions table migration (new columns) cameleer-server #68
CorrelationService + trace assembly cameleer-server #68
DistributedTraceView + TraceSearch UI cameleer-server + saas #72

Ship: Distributed traces + topology — full correlation story.

Phase D: Live Route Debugger (Weeks 8-14)

Work Repo Issue
Debug protocol DTOs cameleer-common #60
DebugSessionManager + InterceptStrategy cameleer-agent #61
ExchangeStateSerializer + synthetic wrapper cameleer-agent #62
DebugSessionService + WS + REST cameleer-server #63
Debug UI components cameleer-server + saas #70

Ship: Full browser-based route debugger with integration to lineage and correlation.


7. Open Questions

  1. Debugger concurrency model: Should we support debugging through parallel Split branches? Current design follows the main thread. Parallel branches would require multiple parked threads per session.

  2. Lineage storage costs: Full INPUT+OUTPUT at every processor generates significant data. Should we add a separate lineage retention policy (e.g., 7 days) shorter than normal execution retention?

  3. Topology graph refresh frequency: 5-minute materialized view refresh is a trade-off. Real-time would require streaming aggregation (e.g., Kafka Streams). Is 5 minutes acceptable for v1?

  4. Cross-tenant federation security model: The v2 sourceTenantHash design needs a full threat model. Can a malicious tenant forge trace context to see another tenant's data?

  5. OTel interop: Should the trace context header be compatible with W3C Trace Context format? This would enable mixed environments where some services use OTel and others use Cameleer.