docs: add execution overlay & debugger design spec (sub-project 2)

Design for overlaying real execution data onto the ProcessDiagram:
- Node status visualization (green OK, red failed, dimmed skipped)
- Per-compound iteration stepping for loops/splits
- Tabbed detail panel (Info, Headers, Input, Output, Error, Config, Timeline)
- Jump to Error with cross-route drill-down
- Backend prerequisites for iteration fields and snapshot-by-id endpoint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-03-27 18:13:03 +01:00
parent 30c8fe1091
commit 509159417b

View File

@@ -0,0 +1,417 @@
# Execution Overlay & Debugger — Design Spec
**Sub-project:** 2 of 3 (Component → **Execution Overlay** → Page Integration)
**Scope:** Overlay real execution data onto the ProcessDiagram component from sub-project 1. Adds node status visualization, per-compound iteration stepping, a tabbed detail panel, and error navigation. Does NOT include page integration — that is sub-project 3.
---
## Problem
The ProcessDiagram from sub-project 1 shows route topology but cannot display what actually happened during an exchange's execution. Users investigating failures must cross-reference between the diagram and separate execution detail views. There is no way to see which processors were hit, which were skipped, where errors occurred, or what the message looked like at each step.
## Goal
Build an `ExecutionDiagram` wrapper component that overlays execution data onto ProcessDiagram, turning it into an "after-the-fact debugger." Users can see the execution path at a glance (green = OK, red = failed, dimmed = skipped), step through loop/split iterations independently, and inspect processor-level details (input/output body, headers, errors, timing) in a tabbed detail panel below the diagram.
---
## Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Architecture | Wrapper component (`ExecutionDiagram`) composing `ProcessDiagram` | Keeps topology component pure; execution concerns isolated |
| Layout | Top/bottom IDE split (diagram top, detail panel bottom) | Left-to-right diagram needs full width; familiar IDE pattern |
| Node status | Tinted backgrounds + status badges | Green tint + checkmark for OK, red tint + ! for failed, dimmed for skipped — scannable at a glance |
| Duration display | Badge on each executed node (bottom-right) | Quick bottleneck identification without opening detail panel |
| Iteration stepping | Per-compound stepper in header bar | Independent stepping at each nesting level; contextually placed |
| Error navigation | Passive highlighting + "Jump to Error" action | Red border + ! badge on failed node; jump action drills into sub-routes if needed |
| Cross-route errors | Red border + drill-down arrow on calling node | Communicates failure exists here; arrow signals root cause is deeper |
| Detail panel tabs | Info, Headers, Input, Output, Error, Config, Timeline | Comprehensive debugging context |
| Error tab visibility | Always visible, grayed out when no error | No layout shift; consistent tab bar |
| Reusability | Component usable standalone and embedded | Immediately replaces ExchangeDetail flow view; usable elsewhere |
---
## 0. Backend Prerequisites
### Iteration fields on ProcessorNode
The `ProcessorExecution` model in `cameleer3-common` has iteration tracking fields (`loopIndex`, `loopSize`, `splitIndex`, `splitSize`, `multicastIndex`), but the server's storage layer and API response model do not surface them. The following changes are needed:
**Storage:**
- Add columns to `processor_records` table: `loop_index`, `loop_size`, `split_index`, `split_size`, `multicast_index` (all nullable integers)
- Flyway migration to add columns
- Update `ExecutionStore` to persist and read these fields
**Detail model:**
- Add fields to `ProcessorNode.java`: `loopIndex`, `loopSize`, `splitIndex`, `splitSize`, `multicastIndex`
- Update `DetailService.buildTree()` to populate them from storage
**API:**
- Regenerate `openapi.json` and `schema.d.ts` to include the new fields
### Snapshot endpoint: accept processorId
The current snapshot endpoint `GET /executions/{id}/processors/{index}/snapshot` uses a positional index into the flat processor list. This is fragile when the tree structure changes. Add an alternative parameter:
- `GET /executions/{id}/processors/by-id/{processorId}/snapshot` — fetches snapshot by processor ID
- Add corresponding `useProcessorSnapshotById(executionId, processorId)` hook on the frontend
### Diagram loading by content hash
`ExecutionDetail` includes `diagramContentHash` linking to the diagram version active during the execution. The existing `useDiagramLayout(contentHash, direction)` hook already supports loading by content hash. The `ExecutionDiagram` wrapper uses this path instead of `useDiagramByRoute(application, routeId)`.
---
## 1. ExecutionDiagram Wrapper Component
### Location
```
ui/src/components/ExecutionDiagram/
├── ExecutionDiagram.tsx # Root: top/bottom split, orchestrates overlay + detail panel
├── ExecutionDiagram.module.css # Layout styles (splitter, exchange bar, panel)
├── useExecutionOverlay.ts # Hook: maps execution data → node overlay state
├── useIterationState.ts # Hook: per-compound iteration tracking
├── ExecutionContext.tsx # React context: shares execution data + iteration state
├── DetailPanel.tsx # Bottom panel: tabs container
├── tabs/InfoTab.tsx # Processor metadata + attributes
├── tabs/HeadersTab.tsx # Input/output headers side-by-side
├── tabs/BodyTab.tsx # Shared: formatted message body (used by Input + Output)
├── tabs/ErrorTab.tsx # Exception details + stack trace
├── tabs/ConfigTab.tsx # Processor configuration (TODO: agent data)
├── tabs/TimelineTab.tsx # Gantt-style processor duration chart
├── types.ts # Overlay-specific types
└── index.ts # Public exports
```
### Props API
```typescript
interface ExecutionDiagramProps {
/** Execution to overlay — fetched externally or by executionId */
executionId: string;
/** Optional: pre-fetched execution detail (skips internal fetch) */
executionDetail?: ExecutionDetail;
/** Diagram direction */
direction?: 'LR' | 'TB';
/** Known route IDs for drill-down resolution */
knownRouteIds?: Set<string>;
/** Called when user triggers node actions (trace toggle, tap config) */
onNodeAction?: (nodeId: string, action: NodeAction) => void;
/** Active node configs (trace/tap badges) */
nodeConfigs?: Map<string, NodeConfig>;
className?: string;
}
```
### Behavior
1. Fetches `ExecutionDetail` via `useExecutionDetail(executionId)` (or uses pre-fetched prop)
2. Extracts the `diagramContentHash` from the execution to load the correct diagram version
3. Maps processor execution tree to diagram node IDs (processor IDs match diagram node IDs)
4. Passes overlay data to ProcessDiagram via new overlay props
5. Manages selected node state, detail panel content, and iteration stepping
---
## 2. ProcessDiagram Overlay Props Extension
The existing `ProcessDiagramProps` gains optional overlay props. When absent, the diagram renders in topology-only mode (sub-project 1 behavior). When present, nodes render with execution state.
```typescript
interface ProcessDiagramProps {
// ... existing props from sub-project 1 ...
/** Execution overlay: maps diagram node ID → execution state */
executionOverlay?: Map<string, NodeExecutionState>;
/** Per-compound iteration state: maps compound node ID → current iteration index */
iterationState?: Map<string, number>;
/** Called when user changes iteration on a compound stepper */
onIterationChange?: (compoundNodeId: string, iterationIndex: number) => void;
}
interface NodeExecutionState {
status: 'COMPLETED' | 'FAILED';
durationMs: number;
/** True if this node's target sub-route failed (for DIRECT/SEDA nodes) */
subRouteFailed?: boolean;
/** True if trace data (input/output body) is available */
hasTraceData?: boolean;
/** Loop/split iteration info for the compound containing this node */
iterationIndex?: number;
iterationCount?: number;
}
```
---
## 3. Node Visual States
### Executed — Completed
- Background: green tint (`#F0F9F1`)
- Border: 1.5px solid `--success` (`#3D7C47`) + 4px green left accent
- Badge: green circle with white checkmark (top-right corner, 16px diameter)
- Duration: green text bottom-right (e.g., "5ms")
### Executed — Failed
- Background: red tint (`#FDF2F0`)
- Border: 2px solid `--error` (`#C0392B`)
- Badge: red circle with white `!` (top-right corner, 16px diameter)
- Duration: red text bottom-right
- Label text turns red, subtitle shows "FAILED"
### Sub-Route Failure (DIRECT/SEDA node whose target route failed)
- Same visual as Failed (red tint, red border, red ! badge)
- Additional: drill-down arrow icon (bottom-left corner)
- "Jump to Error" action on this node auto-drills into the sub-route
### Not Executed (Skipped)
- Opacity: 35%
- No status badge, no duration badge
- Original topology styling (no tint)
### Compound Node Status
Compound nodes (CHOICE, LOOP, SPLIT, etc.) derive their status from their children:
- If any child failed → compound shows as COMPLETED (the compound itself executed) but the failed child shows individually
- The compound does not get its own status badge — only leaf processors do
- Compound background tint: subtle green if all children OK, no tint if mixed results
### RUNNING Executions
RUNNING executions are out of scope for overlay (see Non-Goals). If the `ExecutionDetail.status` is `RUNNING`, the ExecutionDiagram shows the overlay for processors that have completed so far — completed processors get green/red treatment, processors not yet reached are dimmed. No special "in-progress" visual is needed.
### Edge States
- **Traversed edge:** solid, `--success` green (`#3D7C47`), 1.5px stroke
- **Not traversed edge:** dashed, `#9CA3AF` gray, 1px stroke
---
## 4. Per-Compound Iteration Stepper
### Placement
Small control widget embedded in the compound node's header bar (right-aligned). Rendered as part of the `CompoundNode` component when overlay data includes iteration info.
### Visual
Semi-transparent background pill inside the purple/colored header:
```
LOOP [< 3 / 5 >]
```
Prev/next buttons with the current iteration and total count.
### Behavior
- Each compound (LOOP, SPLIT, MULTICAST) tracks its iteration independently via `iterationState` map
- Changing iteration updates the overlay data for all children of that compound
- Nested compounds: outer loop at iteration 2, inner split at branch 1 — independent
- CHOICE compounds: no stepper. The taken branch renders with execution state; untaken branches are dimmed
- Keyboard: left/right arrow keys step when compound is hovered
- Detail panel syncs: selecting a processor inside a loop shows that iteration's snapshot data
### Data Flow
The `useIterationState` hook maintains a `Map<compoundNodeId, currentIndex>`. When an iteration changes:
1. The hook recalculates which `ProcessorExecution` children correspond to the selected iteration (using `loopIndex`, `splitIndex`, or `multicastIndex` fields)
2. Rebuilds the `executionOverlay` map for that compound's children
3. ProcessDiagram re-renders with updated overlay
---
## 5. Exchange Summary Bar
A thin bar above the diagram showing exchange-level information:
- Exchange ID (monospace, copyable)
- Status badge (COMPLETED green, FAILED red)
- Application / route ID
- Total duration
- "Jump to Error" button (only for FAILED exchanges) — scrolls diagram to failed node, drills into sub-route if needed
---
## 6. Detail Panel
### Layout
Below the diagram, separated by a resizable splitter. Default split: 60% diagram / 40% panel. Minimum panel height: 120px. The panel can be collapsed by dragging the splitter to the bottom.
The panel has:
1. **Processor header:** selected processor name, status badge, processor ID, duration
2. **Tab bar:** Info | Headers | Input | Output | Error | Config | Timeline
3. **Tab content area:** scrollable
When no processor is selected, the panel shows exchange-level data:
- **Info tab:** exchange metadata (exchangeId, correlationId, route, application, total duration, engine level, route-level attributes)
- **Headers tab:** route-level input/output headers
- **Input tab:** route-level input body
- **Output tab:** route-level output body
- **Error tab:** route-level error (if failed)
- **Config tab:** grayed out (not applicable at exchange level)
- **Timeline tab:** Gantt chart of all processors (always available)
### Tab: Info
Grid layout showing processor metadata:
- Processor ID, Type, Status
- Start time, End time, Duration
- Endpoint URI, Resolved Endpoint URI
- Attributes section: tap-extracted attributes as pill badges
### Tab: Headers
Side-by-side layout:
- Left: Input headers (key/value table)
- Right: Output headers (key/value table)
- New/changed headers highlighted in green
Data source: `useProcessorSnapshotById(executionId, processorId)``inputHeaders`, `outputHeaders`
### Tab: Input
Formatted message body at processor entry:
- Auto-detect format (JSON, XML, plain text)
- Syntax-highlighted code block (dark theme)
- Copy button
- Byte size indicator
Data source: `useProcessorSnapshotById(executionId, processorId)``inputBody`
### Tab: Output
Same layout as Input tab, showing processor exit body.
Data source: `useProcessorSnapshotById(executionId, processorId)``outputBody`
### Tab: Error
Shown for all processors but grayed out when the selected processor has no error.
When error exists:
- Exception type (class name)
- Error message
- Root cause type + message
- Stack trace in monospace block
Data source: `ProcessorNode.errorMessage`, `ProcessorNode.errorStackTrace` from the execution detail tree
### Tab: Config
Processor configuration from the route definition. **TODO:** Requires agent-side work to capture and expose processor configuration metadata on `RouteNode`. Initially shows a placeholder indicating config data is not yet available.
### Tab: Timeline
Gantt-style horizontal bar chart showing executed processors' relative durations:
- One row per processor from the `ProcessorNode` execution tree (flattened in execution order) — only executed processors, not all diagram nodes
- Bar width proportional to duration relative to total route duration
- Green bars for completed, red for failed
- Clicking a bar selects that processor in the diagram and scrolls to it
- Duration label on the right of each row
- When inside a loop/split compound, shows the current iteration's processors
---
## 7. Data Flow
```
ExecutionDiagram
├── useExecutionDetail(executionId)
│ → ExecutionDetail { processors: ProcessorNode[], diagramContentHash, ... }
├── useExecutionOverlay(executionDetail, iterationState)
│ → Maps ProcessorNode tree → Map<diagramNodeId, NodeExecutionState>
│ → Handles iteration filtering (loopIndex, splitIndex matching)
│ → Detects sub-route failures on DIRECT/SEDA nodes
├── useIterationState()
│ → Map<compoundNodeId, currentIterationIndex>
│ → onIterationChange(compoundId, index) callback
├── ProcessDiagram
│ props: { application, routeId, executionOverlay, iterationState, onIterationChange, ... }
│ Renders nodes with overlay visual states
└── DetailPanel
├── useProcessorSnapshotById(executionId, selectedProcessorId)
│ → { inputBody, outputBody, inputHeaders, outputHeaders }
└── Tabs render from ProcessorNode + snapshot data
```
### Processor-to-Node Mapping
The `processorId` field on `ProcessorNode` is the same value as the `id` field on diagram `PositionedNode`. The agent uses diagram node IDs as processor IDs during route model extraction, so no separate mapping or `diagramNodeId` field is needed. The `useExecutionOverlay` hook builds its map by walking the `ProcessorNode` tree and keying on `processorId`, which directly matches diagram node IDs.
### Snapshot Loading
Per-processor body/header data is fetched lazily via `useProcessorSnapshotById(executionId, processorId)` when a processor is selected and the user switches to Input/Output/Headers tabs. This avoids loading all snapshot data upfront for routes with many processors. The snapshot endpoint accepts `processorId` (see Backend Prerequisites, Section 0).
---
## 8. Jump to Error
When the user clicks "Jump to Error":
1. Find the first `ProcessorNode` with `status === 'FAILED'` in the execution tree
2. If the failed processor is a DIRECT/SEDA node with `subRouteFailed: true`:
a. Drill down into the target route (same as double-click drill-down from sub-project 1)
b. Recursively find the failed processor in the sub-route's execution
3. Select the failed processor node
4. Pan/zoom the diagram to center the failed node
5. Show the Error tab in the detail panel
This handles arbitrarily deep cross-route error chains (route A calls direct:B which calls direct:C where the actual failure is).
---
## 9. Integration with ExchangeDetail Page
The `ExecutionDiagram` component replaces the existing "Flow" view tab on the `ExchangeDetail` page. The page passes `executionId` and the component handles everything internally.
```typescript
// In ExchangeDetail page
<ExecutionDiagram
executionId={executionId}
knownRouteIds={knownRouteIds}
onNodeAction={handleNodeAction}
nodeConfigs={nodeConfigs}
/>
```
The existing Gantt timeline view on ExchangeDetail can be removed or kept as an alternative view — the Timeline tab inside the detail panel provides the same functionality.
---
## Non-Goals (Sub-project 3)
- Replacing RouteFlow on the Dashboard or RouteDetail pages
- Aggregate execution heatmaps (showing hot processors across many exchanges)
- Live execution tracking (watching a RUNNING exchange in real-time)
- Diff between two executions
- Export/share execution view
---
## Verification
1. `npx tsc -p tsconfig.app.json --noEmit` passes
2. ExecutionDiagram renders on ExchangeDetail page for a known failed exchange
3. Completed nodes show green tint + checkmark + duration badge
4. Failed nodes show red tint + ! badge + red duration
5. Skipped nodes are dimmed to 35% opacity
6. Edges between executed nodes turn green; edges to skipped nodes are dashed gray
7. Loop/split compounds show iteration stepper; stepping updates child overlay
8. CHOICE compounds highlight taken branch, dim untaken branches
9. Nested loops step independently
10. Clicking a node shows its data in the detail panel
11. Detail panel tabs: Info shows metadata + attributes, Headers shows side-by-side, Input/Output show formatted body, Error shows exception + stack trace, Timeline shows Gantt chart
12. "Jump to Error" navigates to and selects the failed processor, drilling into sub-routes if needed
13. Error tab grayed out for non-failed processors
14. Config tab shows placeholder (TODO)
15. Resizable splitter between diagram and detail panel works