cameleer/cameleer-saas

Fork 0

Files

hsiegeln 63c194dab7

CI / build (push) Failing after 18s

Details

CI / docker (push) Has been skipped

Details

chore: rename cameleer3 to cameleer

Rename Java packages from net.siegeln.cameleer3 to net.siegeln.cameleer,
update all references in workflows, Docker configs, docs, and bootstrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-15 15:28:44 +02:00

33 KiB

Raw Blame History

Moat-Strengthening Features — Design Specification

Date: 2026-03-29 Status: Draft — Awaiting Review Author: Boardroom simulation (Strategist, Skeptic, Architect, Growth Hacker) Gitea Issues: cameleer/cameleer #57-#72 (label: MOAT)

Executive Summary

Three features designed to convert Cameleer's technical moat (ByteBuddy agent) into a workflow moat (debugger + lineage) and ultimately a network moat (cross-service correlation) before the vibe-coding window closes.

Feature	Ship Target	Moat Type	Agent Changes	Server Changes
Live Route Debugger	Weeks 8-14	Workflow	Heavy (DebugSessionManager, breakpoints)	Heavy (WebSocket, session mgmt)
Payload Flow Lineage	Weeks 3-6	Technical	Light (one capture mode check)	Medium (DiffEngine)
Cross-Service Correlation	Weeks 1-9	Network effect	Light (header propagation)	Medium (trace assembly, topology)

Build Order

Week 1-3:  Foundation + Topology Graph (from existing data, zero agent changes)
Week 3-6:  Payload Flow Lineage (agent + server + UI)
Week 5-9:  Distributed Trace Correlation (agent header + server joins + UI)
Week 8-14: Live Route Debugger (agent + server + UI)

Gitea Issue Map

Epics:

#57 — Live Route Debugger
#58 — Payload Flow Lineage
#59 — Cross-Service Trace Correlation + Topology Map

Debugger sub-issues:

#60 — Protocol: Debug session command types (cameleer-common)
#61 — Agent: DebugSessionManager + breakpoint InterceptStrategy integration
#62 — Agent: ExchangeStateSerializer + synthetic direct route wrapper
#63 — Server: DebugSessionService + WebSocket + REST API
#70 — UI: Debug session frontend components

Lineage sub-issues:

#64 — Protocol: Lineage command types (cameleer-common)
#65 — Agent: LineageManager + capture mode integration
#66 — Server: LineageService + DiffEngine + REST API
#71 — UI: Lineage timeline + diff viewer components

Correlation sub-issues:

#67 — Agent: Enhanced trace context header propagation
#68 — Server: CorrelationService — distributed trace assembly
#69 — Server: DependencyGraphService + service topology materialized view
#72 — UI: Distributed trace view + service topology graph

1. Live Route Debugger

1.1 Concept

Extend the existing replay command with a debug-session wrapper. Users provide an exchange (from a prior failed execution or manually constructed) and replay it through a route with breakpoints. Only the replayed exchange's thread blocks at breakpoints — production traffic flows normally.

User Story: A developer sees a failed exchange. They click "Debug This Exchange." Cameleer pre-fills the body/headers. They set breakpoints, click "Start Debug Session." The exchange replays through the route, pausing at each breakpoint. They inspect state, modify the body, step forward. Total: 3 minutes. Without Cameleer: 45 minutes.

1.2 Architecture

Browser (SaaS UI)
    |
    v
WebSocket <--------------------------------------+
    |                                             |
    v                                             |
cameleer-server                                  |
    |  POST /api/v1/debug/sessions                |
    |  POST /api/v1/debug/sessions/{id}/step      |
    |  POST /api/v1/debug/sessions/{id}/resume    |
    |  DELETE /api/v1/debug/sessions/{id}         |
    |                                             |
    v                                             |
SSE Command Channel --> cameleer agent           |
    |                       |                     |
    |  "start-debug"        |                     |
    |  command               v                    |
    |               DebugSessionManager           |
    |                       |                     |
    |               Replay exchange via           |
    |               ProducerTemplate              |
    |                       |                     |
    |               InterceptStrategy checks      |
    |               breakpoints before each       |
    |               processor                     |
    |                       |                     |
    |               On breakpoint hit:            |
    |               > LockSupport.park()          |
    |               > Serialize exchange state     |
    |               > POST state to server -------+
    |                       |                    (server pushes to
    |               Wait for resume/step/skip     browser via WS)
    |               command via SSE
    |                       |
    |               On resume: LockSupport.unpark()
    |               Continue to next processor

1.3 Protocol Additions (cameleer-common)

New SSE Commands

Command	Direction	Purpose
`START_DEBUG`	Server -> Agent	Create session, spawn thread, replay exchange with breakpoints
`DEBUG_RESUME`	Server -> Agent	Unpark thread, continue to next breakpoint
`DEBUG_STEP`	Server -> Agent	Unpark thread, break at next processor (STEP_OVER/STEP_INTO)
`DEBUG_SKIP`	Server -> Agent	Skip current processor, continue
`DEBUG_MODIFY`	Server -> Agent	Apply body/header changes at current breakpoint before resuming
`DEBUG_ABORT`	Server -> Agent	Abort session, release thread

StartDebugPayload

{
  "sessionId": "dbg-a1b2c3",
  "routeId": "route-orders",
  "exchange": {
    "body": "{\"orderId\": 42, \"amount\": 150.00}",
    "headers": { "Content-Type": "application/json" }
  },
  "breakpoints": [
    { "processorId": "choice1", "condition": null },
    { "processorId": "to5", "condition": "${body.amount} > 100" }
  ],
  "mode": "STEP_OVER",
  "timeoutSeconds": 300,
  "originalExchangeId": "ID-failed-789",
  "replayToken": "...",
  "nonce": "..."
}

BreakpointHitReport (Agent -> Server)

{
  "sessionId": "dbg-a1b2c3",
  "processorId": "to5",
  "processorType": "TO",
  "endpointUri": "http://payment-service/charge",
  "depth": 2,
  "stepIndex": 4,
  "exchangeState": {
    "body": "{\"orderId\": 42, \"amount\": 150.00, \"validated\": true}",
    "headers": { "..." },
    "properties": { "CamelSplitIndex": 0 },
    "exception": null,
    "bodyType": "java.util.LinkedHashMap"
  },
  "executionTree": ["...partial tree up to this point..."],
  "parentProcessorId": "split1",
  "routeId": "route-orders",
  "timestamp": "2026-03-29T14:22:05.123Z"
}

1.4 Agent Implementation (cameleer-agent)

DebugSessionManager

Location: com.cameleer.agent.debug.DebugSessionManager
Stores active sessions: ConcurrentHashMap<sessionId, DebugSession>
Enforces max concurrent sessions (default 3, configurable via cameleer.debug.maxSessions)
Allocates dedicated Thread per session (NOT from Camel thread pool)
Timeout watchdog: ScheduledExecutorService auto-aborts expired sessions
Handles all DEBUG_* commands via DefaultCommandHandler delegation

DebugSession

Stores breakpoint definitions, current step mode, parked thread reference
shouldBreak(processorId, Exchange): evaluates processor match + Simple condition + step mode
reportBreakpointHit(): serializes state, POSTs to server, calls LockSupport.park()
applyModifications(Exchange): sets body/headers from DEBUG_MODIFY command

InterceptStrategy Integration

In CameleerInterceptStrategy.DelegateAsyncProcessor.process():

DebugSession session = debugSessionManager.getSession(exchange);
if (session != null && session.shouldBreak(processorId, exchange)) {
    ExchangeState state = ExchangeStateSerializer.capture(exchange);
    List<ProcessorExecution> tree = executionCollector.getPartialTree(exchange);
    session.reportBreakpointHit(processorId, state, tree);
    // Thread parked until server sends resume/step/skip/abort

    if (session.isAborted()) throw new DebugSessionAbortedException();
    if (session.shouldSkip()) { callback.done(true); return true; }
    if (session.hasModifications()) session.applyModifications(exchange);
}

Zero production overhead: Debug exchanges carry CameleerDebugSessionId exchange property. getSession() checks this property — single null-check. Production exchanges have no property, check returns null, no further work.

ExchangeStateSerializer

TypeConverter chain: String -> byte[] as Base64 -> class name fallback
Stream bodies: wrap in CachedOutputStream (same pattern as Camel's stream caching)
Sensitive header redaction (reuses PayloadCapture redaction logic)
Size limit: cameleer.debug.maxBodySize (default 64KB)

Synthetic Direct Route Wrapper

For non-direct routes (timer, jms, http, file):

Extract route's processor chain from CamelContext
Create temporary direct:__debug_{routeId} route with same processors (shared by reference)
Debug exchange enters via ProducerTemplate.send()
Remove temporary route on session completion

1.5 Server Implementation (cameleer-server)

REST Endpoints

Method	Path	Role	Purpose
POST	`/api/v1/debug/sessions`	OPERATOR	Create debug session
GET	`/api/v1/debug/sessions/{id}`	VIEWER	Get session state
POST	`/api/v1/debug/sessions/{id}/step`	OPERATOR	Step over/into
POST	`/api/v1/debug/sessions/{id}/resume`	OPERATOR	Resume to next breakpoint
POST	`/api/v1/debug/sessions/{id}/skip`	OPERATOR	Skip current processor
POST	`/api/v1/debug/sessions/{id}/modify`	OPERATOR	Modify exchange at breakpoint
DELETE	`/api/v1/debug/sessions/{id}`	OPERATOR	Abort session
POST	`/api/v1/debug/sessions/{id}/breakpoint-hit`	AGENT	Agent reports breakpoint
GET	`/api/v1/debug/sessions/{id}/compare`	VIEWER	Compare debug vs original

WebSocket Channel

Endpoint: WS /api/v1/debug/ws?token={jwt}

Server -> Browser events:
  { "type": "breakpoint-hit", "sessionId": "...", "data": { ...state... } }
  { "type": "session-completed", "sessionId": "...", "execution": { ... } }
  { "type": "session-error", "sessionId": "...", "error": "Agent disconnected" }
  { "type": "session-timeout", "sessionId": "..." }

Data Model

CREATE TABLE debug_sessions (
    session_id        TEXT PRIMARY KEY,
    agent_id          TEXT NOT NULL,
    route_id          TEXT NOT NULL,
    original_exchange TEXT,
    status            TEXT NOT NULL DEFAULT 'PENDING',
    created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at      TIMESTAMPTZ,
    breakpoints       JSONB,
    current_state     JSONB,
    step_count        INT DEFAULT 0,
    replay_exchange   TEXT,
    created_by        TEXT NOT NULL
);

DebugSessionService

Lifecycle: PENDING -> ACTIVE -> PAUSED -> COMPLETED/ABORTED/TIMEOUT

Generate sessionId + nonce + replay token
Send START_DEBUG via existing SSE channel
Receive breakpoint-hit POSTs, store state, push to WebSocket
Translate browser actions (step/resume/skip/modify) into SSE commands
Detect agent SSE disconnect via SseConnectionManager callback
Store completed execution in normal pipeline (tagged with debugSessionId)

1.6 SaaS Layer (cameleer-saas)

Tenant isolation: debug sessions scoped to tenant's agents
Concurrent session limits per tier (free: 1, pro: 5, enterprise: unlimited)
Usage metering: session creation counted as billable event

1.7 UI Components

DebugLauncher.tsx — "Debug This Exchange" button on failed execution detail, pre-fills exchange data
DebugSession.tsx — Main view: route diagram with status coloring (green/yellow/gray), exchange state panel, step controls (F10/F11/F5/F6 keyboard shortcuts)
DebugCompare.tsx — Side-by-side: original execution vs debug replay with diff highlighting
BreakpointEditor.tsx — Click processor nodes to toggle breakpoints, conditional expression input

1.8 Safety Mechanisms

Concern	Mitigation
Thread leak	Session timeout auto-aborts (default 5 min)
Memory leak	Exchange state captured on-demand, not buffered
Agent restart	Server detects SSE disconnect, notifies browser
High-throughput route	Only debug exchange hits breakpoints (property check)
Concurrent sessions	Hard limit (default 3), FAILURE ack if exceeded
Non-direct routes	Synthetic `direct:__debug_*` wrapper with same processor chain

2. Payload Flow Lineage

2.1 Concept

Capture the full transformation history of a message flowing through a route. At each processor, snapshot body before and after. Server computes structural diffs. UI renders a visual "data flow" timeline showing exactly where and how data transforms.

User Story: A developer has an exchange where customerName is null. They click "Trace Payload Flow." Vertical timeline: at each processor, before/after body with structural diff. Processor 7 (enrich1) returned a response missing the name field. Root cause in 30 seconds.

2.2 Architecture

cameleer agent
    |
    |  On lineage-enabled exchange:
    |  Before processor: capture INPUT
    |  After processor: capture OUTPUT
    |  Attach to ProcessorExecution as inputBody/outputBody
    |
    v
POST /api/v1/data/executions (processors carry full snapshots)
    |
    v
cameleer-server
    |
    |  LineageService:
    |  > Flatten processor tree to ordered list
    |  > Compute diffs between processor[n].output and processor[n+1].input
    |  > Classify transformation type
    |  > Generate human-readable summary
    |
    v
GET /api/v1/executions/{id}/lineage
    |
    v
Browser: LineageTimeline + DiffViewer

2.3 Protocol Additions (cameleer-common)

New SSE Commands

Command	Direction	Purpose
`ENABLE_LINEAGE`	Server -> Agent	Activate targeted payload capture
`DISABLE_LINEAGE`	Server -> Agent	Deactivate lineage capture

EnableLineagePayload

{
  "lineageId": "lin-x1y2z3",
  "scope": {
    "type": "ROUTE",
    "routeId": "route-orders"
  },
  "predicate": "${header.orderId} == 'ORD-500'",
  "predicateLanguage": "simple",
  "maxCaptures": 10,
  "duration": "PT10M",
  "captureHeaders": true,
  "captureProperties": false
}

Scope Types

Scope	Meaning
`ROUTE`	All exchanges on a specific route
`CORRELATION`	All exchanges with a specific correlationId
`EXPRESSION`	Any exchange matching a Simple/JsonPath predicate
`NEXT_N`	Next N exchanges on the route (countdown)

2.4 Agent Implementation (cameleer-agent)

LineageManager

Location: com.cameleer.agent.lineage.LineageManager
Stores active configs: ConcurrentHashMap<lineageId, LineageConfig>
Tracks capture count per lineageId: auto-disables at maxCaptures
Duration timeout via ScheduledExecutorService: auto-disables after expiry
shouldCaptureLineage(Exchange): evaluates scope + predicate, sets CameleerLineageActive property
isLineageActive(Exchange): single null-check on exchange property (HOT PATH, O(1))

Integration Points (Minimal Agent Changes)

1. CameleerEventNotifier.onExchangeCreated():

lineageManager.shouldCaptureLineage(exchange);
// Sets CameleerLineageActive property if matching

2. ExecutionCollector.resolveProcessorCaptureMode():

if (lineageManager.isLineageActive(exchange)) {
    return PayloadCaptureMode.BOTH;
}

3. PayloadCapture body size:

int maxSize = lineageManager.isLineageActive(exchange)
    ? config.getLineageMaxBodySize()    // 64KB
    : config.getMaxBodySize();           // 4KB

Production overhead when lineage is disabled: effectively zero. The isLineageActive() check is a single null-check on an exchange property that doesn't exist on non-lineage exchanges.

Configuration

cameleer.lineage.maxBodySize=65536    # 64KB for lineage captures (vs 4KB normal)
cameleer.lineage.enabled=true          # master switch

2.5 Server Implementation (cameleer-server)

LineageService

getLineage(executionId): fetch execution, flatten tree to ordered processor list, compute diffs
enableLineage(request): send ENABLE_LINEAGE to target agents
disableLineage(lineageId): send DISABLE_LINEAGE
getActiveLineages(): list active configs across all agents

DiffEngine

Format-aware diff computation:

Format	Detection	Library	Output
JSON	Jackson parse success	zjsonpatch (RFC 6902) or custom tree walk	FIELD_ADDED, FIELD_REMOVED, FIELD_MODIFIED with JSON path
XML	DOM parse success	xmlunit-core	ELEMENT_ADDED, ELEMENT_REMOVED, ATTRIBUTE_CHANGED
Text	Fallback	java-diff-utils (Myers)	LINE_ADDED, LINE_REMOVED, LINE_CHANGED
Binary	Type detection	N/A	Size comparison only

Transformation Classification

UNCHANGED      — No diff
MUTATION       — Existing fields modified, same format
ENRICHMENT     — Fields only added (e.g., enrich processor)
REDUCTION      — Fields only removed
FORMAT_CHANGED — Content type changed (XML -> JSON)
TYPE_CHANGED   — Java type changed but content equivalent
MIXED          — Combination of additions, removals, modifications

Summary Generation

Auto-generated human-readable summaries:

"XML -> JSON conversion" (FORMAT_CHANGED)
"Added customer object from external API" (ENRICHMENT + field names)
"Modified amount field: 150.00 -> 135.00" (MUTATION + values)

Lineage Response Schema

{
  "executionId": "exec-123",
  "routeId": "route-orders",
  "processors": [
    {
      "processorId": "unmarshal1",
      "processorType": "UNMARSHAL",
      "input": {
        "body": "<order><id>42</id></order>",
        "bodyType": "java.lang.String",
        "contentType": "application/xml"
      },
      "output": {
        "body": "{\"id\": 42}",
        "bodyType": "java.util.LinkedHashMap",
        "contentType": "application/json"
      },
      "diff": {
        "transformationType": "FORMAT_CHANGED",
        "summary": "XML -> JSON conversion",
        "bodyChanged": true,
        "headersChanged": true,
        "changes": [
          { "type": "FORMAT_CHANGED", "from": "XML", "to": "JSON" }
        ]
      },
      "durationMs": 12,
      "status": "COMPLETED"
    }
  ]
}

REST Endpoints

Method	Path	Role	Purpose
GET	`/api/v1/executions/{id}/lineage`	VIEWER	Full lineage with diffs
POST	`/api/v1/lineage/enable`	OPERATOR	Enable lineage on agents
DELETE	`/api/v1/lineage/{lineageId}`	OPERATOR	Disable lineage
GET	`/api/v1/lineage/active`	VIEWER	List active lineage configs

2.6 SaaS Layer (cameleer-saas)

Lineage captures counted as premium events (higher billing weight)
Active lineage config limits per tier
Post-hoc lineage from COMPLETE engine level available on all tiers (resource-intensive fallback)
Targeted lineage-on-demand is a paid-tier feature (upgrade driver)

2.7 UI Components

LineageTimeline.tsx — Vertical processor list, color-coded by transformation type (green/yellow/blue/red/purple), expandable diffs, auto-generated summaries
LineageDiffViewer.tsx — Side-by-side or unified diff, format-aware (JSON tree-diff, XML element-diff, text line-diff, binary hex)
LineageEnableDialog.tsx — "Trace Payload Flow" button, scope/predicate builder, max captures slider
LineageSummaryStrip.tsx — Compact horizontal strip on execution detail page, transformation icons per processor

3. Cross-Service Trace Correlation + Topology Map

3.1 Concept

Stitch executions across services into unified distributed traces. Build a service dependency topology graph automatically from observed traffic. Design the protocol for future cross-tenant federation.

User Story: Platform team with 8 Camel microservices. Order stuck in "processing." Engineer searches by orderId, sees distributed trace: horizontal timeline across all services, each expandable to route detail. Service C (pricing) timed out. Root cause across 4 boundaries in 60 seconds.

3.2 Phase 1: Intra-Tenant Trace Correlation

Enhanced Trace Context Header

Current (exists):
  X-Cameleer-CorrelationId: corr-abc-123

New (added):
  X-Cameleer-TraceContext: {
    "traceId": "trc-xyz",
    "parentSpanId": "span-001",
    "hopIndex": 2,
    "sourceApp": "order-service",
    "sourceRoute": "route-validate"
  }

Transport-Specific Propagation

Transport	Detection	Mechanism
HTTP/REST	URI prefix `http:`, `https:`, `rest:`	HTTP header `X-Cameleer-TraceContext`
JMS	URI prefix `jms:`, `activemq:`, `amqp:`	JMS property `CameleerTraceContext`
Kafka	URI prefix `kafka:`	Kafka header `cameleer-trace-context`
Direct/SEDA	URI prefix `direct:`, `seda:`, `vm:`	Exchange property (in-process)
File/FTP	URI prefix `file:`, `ftp:`	Not propagated (async)

3.3 Agent Implementation (cameleer-agent)

Outgoing Propagation (InterceptStrategy)

Before delegating to TO/ENRICH/WIRE_TAP processors:

if (isOutgoingEndpoint(processorType, endpointUri)) {
    TraceContext ctx = new TraceContext(
        executionCollector.getTraceId(exchange),
        currentProcessorExecution.getId(),
        executionCollector.getHopIndex(exchange) + 1,
        config.getApplicationName(),
        exchange.getFromRouteId()
    );
    injectTraceContext(exchange, endpointUri, ctx);
}

Incoming Extraction (CameleerEventNotifier)

In onExchangeCreated():

String traceCtxJson = extractTraceContext(exchange);
if (traceCtxJson != null) {
    TraceContext ctx = objectMapper.readValue(traceCtxJson, TraceContext.class);
    exchange.setProperty("CameleerParentSpanId", ctx.parentSpanId);
    exchange.setProperty("CameleerSourceApp", ctx.sourceApp);
    exchange.setProperty("CameleerSourceRoute", ctx.sourceRoute);
    exchange.setProperty("CameleerHopIndex", ctx.hopIndex);
}

New RouteExecution Fields

execution.setParentSpanId(...);   // processor execution ID from calling service
execution.setSourceApp(...);      // application name of caller
execution.setSourceRoute(...);    // routeId of caller
execution.setHopIndex(...);       // depth in distributed trace

Safety

Header size always <256 bytes
Parse failure: log warning, continue without context (no exchange failure)
Only inject on outgoing processors, never on FROM consumers

3.4 Server Implementation: Trace Assembly (cameleer-server)

CorrelationService

buildDistributedTrace(correlationId):
  1. SELECT * FROM executions WHERE correlation_id = ? ORDER BY start_time
  2. Index by executionId for O(1) lookup
  3. Build tree: roots = executions where parentSpanId IS NULL
     For each with parentSpanId: find parent, attach as child hop
  4. Compute gaps: child.startTime - parent.processor.startTime = network latency
     If gap < 0: flag clock skew warning
  5. Aggregate: totalDuration, serviceCount, hopCount, status

Distributed Trace Response

{
  "traceId": "trc-xyz",
  "correlationId": "corr-abc-123",
  "totalDurationMs": 1250,
  "hopCount": 4,
  "serviceCount": 3,
  "status": "FAILED",
  "entryPoint": {
    "application": "api-gateway",
    "routeId": "route-incoming-orders",
    "executionId": "exec-001",
    "durationMs": 1250,
    "children": [
      {
        "calledFrom": {
          "processorId": "to3",
          "processorType": "TO",
          "endpointUri": "http://order-service/validate"
        },
        "application": "order-service",
        "routeId": "route-validate",
        "executionId": "exec-002",
        "durationMs": 350,
        "networkLatencyMs": 12,
        "children": []
      }
    ]
  }
}

Data Model Changes

ALTER TABLE executions ADD COLUMN parent_span_id TEXT;
ALTER TABLE executions ADD COLUMN source_app TEXT;
ALTER TABLE executions ADD COLUMN source_route TEXT;
ALTER TABLE executions ADD COLUMN hop_index INT;

CREATE INDEX idx_executions_parent_span
    ON executions(parent_span_id) WHERE parent_span_id IS NOT NULL;

Edge Cases

Missing hops: uninstrumented service shown as "unknown" node
Clock skew: flagged as warning, still rendered
Fan-out: parallel multicast creates multiple children from same processor
Circular calls: detected via hopIndex (max depth 20)

3.5 Server Implementation: Topology Graph (cameleer-server)

DependencyGraphService

Builds service dependency graph from existing execution data — zero additional agent overhead.

Data source: processor_executions where processor_type IN (TO, TO_DYNAMIC, EIP_ENRICH, EIP_POLL_ENRICH, EIP_WIRE_TAP) and resolved_endpoint_uri IS NOT NULL.

Endpoint-to-Service Resolution

Direct/SEDA match: direct:processOrder -> route's applicationName
Agent registration match: URI base URL matches registered agent
Kubernetes hostname: extract hostname from URI -> applicationName
Manual mapping: admin-configured regex/glob patterns
Unresolved: external:{hostname} node

Materialized View

CREATE MATERIALIZED VIEW service_dependencies AS
SELECT
    e.application_name AS source_app,
    pe.resolved_endpoint_uri AS target_uri,
    COUNT(*) AS call_count,
    AVG(pe.duration_ms) AS avg_latency_ms,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY pe.duration_ms) AS p99_latency_ms,
    SUM(CASE WHEN pe.status = 'FAILED' THEN 1 ELSE 0 END)::FLOAT
        / NULLIF(COUNT(*), 0) AS error_rate,
    MAX(pe.start_time) AS last_seen,
    MIN(pe.start_time) AS first_seen
FROM executions e
JOIN processor_executions pe
    ON e.execution_id = pe.execution_id
    AND e.start_time = pe.start_time
WHERE pe.processor_type IN ('TO','TO_DYNAMIC','EIP_ENRICH','EIP_POLL_ENRICH','EIP_WIRE_TAP')
    AND pe.resolved_endpoint_uri IS NOT NULL
    AND e.start_time > NOW() - INTERVAL '24 hours'
GROUP BY e.application_name, pe.resolved_endpoint_uri;

-- Refresh every 5 minutes

REST Endpoints

Method	Path	Role	Purpose
GET	`/api/v1/traces/{correlationId}`	VIEWER	Assembled distributed trace
GET	`/api/v1/traces/{correlationId}/timeline`	VIEWER	Flat timeline for Gantt
GET	`/api/v1/topology/dependencies`	VIEWER	Service dependency graph
GET	`/api/v1/topology/diff`	VIEWER	Topology changes between windows
GET	`/api/v1/topology/dependencies/{source}/{target}`	VIEWER	Dependency detail

3.6 Phase 2: Cross-Tenant Federation (Design Only)

Reserve sourceTenantHash in TraceContext for future use:

{
  "traceId": "trc-xyz",
  "parentSpanId": "span-001",
  "hopIndex": 2,
  "sourceApp": "order-service",
  "sourceRoute": "route-validate",
  "sourceTenantHash": null
}

Consent model (v2):

Both tenants opt-in to "Federation" in SaaS settings
Shared: trace structure (timing, status, service names)
NOT shared: payload content, headers, internal route details
Either tenant can revoke at any time

3.7 SaaS Layer (cameleer-saas)

All trace correlation intra-tenant in v1
Topology graph scoped to tenant's applications
External dependencies shown as opaque nodes
Cross-tenant federation as enterprise-tier feature (v2)

3.8 UI Components

DistributedTraceView.tsx — Horizontal Gantt timeline, rows=services, bars=executions, arrows=call flow, click-to-expand to route detail
ServiceTopologyGraph.tsx — Force-directed graph, nodes sized by throughput, edges colored by error rate, animated traffic pulse, click drill-down
TopologyDiff.tsx — "What changed?" view, new/removed dependencies highlighted, latency/error changes annotated
TraceSearchEnhanced.tsx — Search by correlationId/traceId/business attributes, results show trace summaries with service count and hop count

4. Cross-Feature Integration Points

From -> To	Integration
Correlation -> Debugger	"Debug This Hop": from distributed trace, click a service hop to replay and debug
Correlation -> Lineage	"Trace Payload Across Services": enable lineage on a correlationId, see transforms across boundaries
Lineage -> Debugger	"Debug From Diff": unexpected processor output -> one-click launch debug with breakpoint on that processor
Debugger -> Lineage	Debug sessions auto-capture full lineage (all processors at BOTH mode)
Topology -> Correlation	Click dependency edge -> show recent traces between those services
Topology -> Lineage	"How does data transform?" -> aggregated lineage summary for a dependency edge

5. Competitive Analysis

What an LLM + Junior Dev Can Replicate

Capability	Replicable?	Time	Barrier
JMX metrics dashboard	Yes	1 weekend	None
Log parsing + display	Yes	1 weekend	None
Basic replay (re-send exchange)	Yes	1 week	Need agent access
Per-processor payload capture	No*	2-3 months	Requires bytecode instrumentation
Nested EIP execution trees	No*	3-6 months	Requires deep Camel internals knowledge
Breakpoint debugging in route	No	6+ months	Thread management + InterceptStrategy + serialization
Format-aware payload diffing	Partially	2 weeks	Diff library exists, but data pipeline doesn't
Distributed trace assembly	Partially	1 month	OTel exists but lacks Camel-specific depth
Service topology from execution data	Partially	2 weeks	Istio does this at network layer, not route layer

*Achievable with OTel Camel instrumentation (spans only, not payload content)

Where Each Feature Creates Unreplicable Value

Debugger: Requires InterceptStrategy breakpoints + thread parking + exchange serialization. The combination is unique — no other Camel tool offers browser-based route stepping.
Lineage: Requires per-processor INPUT/OUTPUT capture with correct ordering. OTel spans don't carry body content. JMX doesn't capture payloads. Only bytecode instrumentation provides this data.
Correlation + Topology: The trace assembly is achievable elsewhere. The differentiation is Camel-specific depth: each hop shows processor-level execution trees, not just "Service B took 350ms."

6. Implementation Sequencing

Phase A: Foundation + Topology (Weeks 1-3)

Work	Repo	Issue
Service topology materialized view	cameleer-server	#69
Topology REST API	cameleer-server	#69
ServiceTopologyGraph.tsx	cameleer-server + saas	#72
WebSocket infrastructure (for debugger)	cameleer-server	#63
TraceContext DTO in cameleer-common	cameleer	#67

Ship: Topology graph visible from existing data. Zero agent changes. Immediate visual payoff.

Phase B: Lineage (Weeks 3-6)

Work	Repo	Issue
Lineage protocol DTOs	cameleer-common	#64
LineageManager + capture integration	cameleer-agent	#65
LineageService + DiffEngine	cameleer-server	#66
Lineage UI components	cameleer-server + saas	#71

Ship: Payload flow lineage independently usable.

Phase C: Distributed Trace Correlation (Weeks 5-9, overlaps B)

Work	Repo	Issue
Trace context header propagation	cameleer-agent	#67
Executions table migration (new columns)	cameleer-server	#68
CorrelationService + trace assembly	cameleer-server	#68
DistributedTraceView + TraceSearch UI	cameleer-server + saas	#72

Ship: Distributed traces + topology — full correlation story.

Phase D: Live Route Debugger (Weeks 8-14)

Work	Repo	Issue
Debug protocol DTOs	cameleer-common	#60
DebugSessionManager + InterceptStrategy	cameleer-agent	#61
ExchangeStateSerializer + synthetic wrapper	cameleer-agent	#62
DebugSessionService + WS + REST	cameleer-server	#63
Debug UI components	cameleer-server + saas	#70

Ship: Full browser-based route debugger with integration to lineage and correlation.

7. Open Questions

Debugger concurrency model: Should we support debugging through parallel Split branches? Current design follows the main thread. Parallel branches would require multiple parked threads per session.
Lineage storage costs: Full INPUT+OUTPUT at every processor generates significant data. Should we add a separate lineage retention policy (e.g., 7 days) shorter than normal execution retention?
Topology graph refresh frequency: 5-minute materialized view refresh is a trade-off. Real-time would require streaming aggregation (e.g., Kafka Streams). Is 5 minutes acceptable for v1?
Cross-tenant federation security model: The v2 sourceTenantHash design needs a full threat model. Can a malicious tenant forge trace context to see another tenant's data?
OTel interop: Should the trace context header be compatible with W3C Trace Context format? This would enable mixed environments where some services use OTel and others use Cameleer.

33 KiB Raw Blame History

Moat-Strengthening Features — Design Specification

Executive Summary

Build Order

Gitea Issue Map

1. Live Route Debugger

1.1 Concept

1.2 Architecture

1.3 Protocol Additions (cameleer-common)

New SSE Commands

StartDebugPayload

BreakpointHitReport (Agent -> Server)

1.4 Agent Implementation (cameleer-agent)

DebugSessionManager

DebugSession

InterceptStrategy Integration

ExchangeStateSerializer

Synthetic Direct Route Wrapper

1.5 Server Implementation (cameleer-server)

REST Endpoints

WebSocket Channel

Data Model

DebugSessionService

1.6 SaaS Layer (cameleer-saas)

1.7 UI Components

1.8 Safety Mechanisms

2. Payload Flow Lineage

2.1 Concept

2.2 Architecture

2.3 Protocol Additions (cameleer-common)

New SSE Commands

EnableLineagePayload

Scope Types

2.4 Agent Implementation (cameleer-agent)

LineageManager

Integration Points (Minimal Agent Changes)

Configuration

2.5 Server Implementation (cameleer-server)

LineageService

DiffEngine

Transformation Classification

Summary Generation

Lineage Response Schema

REST Endpoints

2.6 SaaS Layer (cameleer-saas)

2.7 UI Components

3. Cross-Service Trace Correlation + Topology Map

3.1 Concept

3.2 Phase 1: Intra-Tenant Trace Correlation

Enhanced Trace Context Header

Transport-Specific Propagation

3.3 Agent Implementation (cameleer-agent)

Outgoing Propagation (InterceptStrategy)

Incoming Extraction (CameleerEventNotifier)

New RouteExecution Fields

Safety

3.4 Server Implementation: Trace Assembly (cameleer-server)

CorrelationService

Distributed Trace Response

Data Model Changes

Edge Cases

3.5 Server Implementation: Topology Graph (cameleer-server)

DependencyGraphService

Endpoint-to-Service Resolution

Materialized View

REST Endpoints

3.6 Phase 2: Cross-Tenant Federation (Design Only)

3.7 SaaS Layer (cameleer-saas)

3.8 UI Components

4. Cross-Feature Integration Points

5. Competitive Analysis

What an LLM + Junior Dev Can Replicate

Where Each Feature Creates Unreplicable Value

6. Implementation Sequencing

Phase A: Foundation + Topology (Weeks 1-3)

Phase B: Lineage (Weeks 3-6)

Phase C: Distributed Trace Correlation (Weeks 5-9, overlaps B)

Phase D: Live Route Debugger (Weeks 8-14)

7. Open Questions

33 KiB

Raw Blame History