refactor: rename group/groupName to application/applicationName

The execution-related "group" concept actually represents the application name. Rename all Java fields, API parameters, and frontend types from groupName→applicationName and group→application for clarity. - Java records: ExecutionSummary, ExecutionDetail, ExecutionDocument, ExecutionRecord, ProcessorRecord - API params: SearchRequest.group→application, SearchController @RequestParam group→application - Services: IngestionService, DetailService, SearchIndexer, StatsStore - Frontend: schema.d.ts, Dashboard, ExchangeDetail, RouteDetail, executions query hooks Database column names (group_name) and OpenSearch field names are unchanged — only the API-facing Java/TS field names are renamed. RBAC group references (groups table, GroupRepository, GroupsTab) are a separate domain concept and are NOT affected by this change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 21:21:38 +01:00
parent 3c226de62f
commit 8ad0016a8e
54 changed files with 21442 additions and 73 deletions
--- a/docs/ui-mocks/camel-developer-review.md
+++ b/docs/ui-mocks/camel-developer-review.md
@@ -0,0 +1,261 @@
+# Cameleer3 Dashboard Review -- Senior Camel Developer Perspective
+
+**Reviewer**: Senior Apache Camel Developer (10+ years, Java DSL / Spring Boot)
+**Artifact reviewed**: `mock-v2-light.html` -- Operations Dashboard (v2 synthesis)
+**Date**: 2026-03-17
+
+---
+
+## 1. What the Dashboard Gets RIGHT
+
+### Business ID as First-Class Citizen
+The Order ID and Customer columns in the execution table are exactly what I need. When support calls me about "order OP-88421", I can paste that into the search and find the execution immediately. Every other monitoring tool I have used forces me to map business IDs to correlation IDs manually. This alone would save me 10-15 minutes per incident.
+
+### Inline Error Previews
+Showing the exception message directly in the table row without requiring a click-through is genuinely useful. The two error examples in the mock (`HttpOperationFailedException` with a 504, `SQLTransientConnectionException` with HikariPool exhaustion) are realistic Camel exceptions. I can scan the error list and immediately tell whether it is a downstream timeout or a connection pool issue. That distinction determines whether I investigate our code or page the DBA.
+
+### Processor Timeline (Gantt View)
+The processor timeline in the detail panel is the single most valuable feature. Seeing that `to(payment-api)` consumed 280ms out of a 412ms total execution, while `enrich(inventory)` took 85ms, immediately tells me WHERE the bottleneck is. In my experience, 95% of Camel performance issues are in external calls, and this view pinpoints them. The color coding (green/yellow/red) for processor bars makes the slow step obvious at a glance.
+
+### SLA Awareness Baked In
+The SLA threshold line on the latency chart, the "SLA" tag on slow durations, and the "CLOSE" warning on the p99 card are exactly the kind of proactive indicators I want. Most monitoring tools show me raw numbers; this dashboard shows me numbers in context. I know immediately that 287ms p99 is dangerously close to our 300ms SLA.
+
+### Shift-Aware Time Context
+The "since 06:00" shift concept is something I have never seen in a developer tool but actually matches how production support works. When I start my day shift, I want to see what happened overnight and what is happening now, not a rolling 24-hour window that mixes yesterday afternoon with this morning.
+
+### Agent Health in Sidebar
+Seeing agent status (live/stale/dead), throughput per agent, and error rates at a glance in the sidebar is practical. When an agent goes stale, I know to check if a pod restarted or if there is a network partition.
+
+### Application-to-Route Navigation Hierarchy
+The sidebar tree (Applications > order-service > Routes > order-intake, order-enrichment, etc.) matches how I think about Camel deployments. I have multiple applications, each with multiple routes. Being able to filter by application first, then drill into routes, is the right hierarchy.
+
+---
+
+## 2. What is MISSING or Could Be Better
+
+### 2.1 Exchange Body/Header Inspection -- CRITICAL GAP
+
+**Pain point**: The "Exchange" tab exists in the detail panel tabs but its content is not shown. This is the single most important debugging feature for a Camel developer. When a message fails at step 5 of 7, I need to see:
+- What was the original inbound message (before any transformation)?
+- What did the exchange body look like at each processor step?
+- Which headers were present at each step, and which were added/removed?
+- What was the exception body (often different from the exception message)?
+
+**How to address it**: The Exchange tab should show a step-by-step diff view of the exchange. For each processor in the route, show the body (with a JSON/XML pretty-printer) and the headers map. Highlight headers that were added at that step. Allow comparing any two steps side-by-side. Show the original inbound message prominently at the top.
+
+**Priority**: **Must-Have**. Without this, the dashboard is an operations monitor, not a debugging tool. This is the difference between "I can see something failed" and "I can see WHY it failed."
+
+### 2.2 Route Diagram / Visual Graph -- MENTIONED BUT NOT SHOWN
+
+**Pain point**: The "View Route Diagram" button exists in the detail actions, but there is no mockup of what the route diagram looks like. As a Camel developer, I need to see the DAG (directed acyclic graph) of my route: from(jms:orders) -> unmarshal -> validate -> choice -> [branch A: enrich -> transform -> to(http)] [branch B: log -> to(dlq)]. I also need to see execution overlay on the diagram -- which path did THIS specific exchange take, and how long did each node take.
+
+**How to address it**: Add a Route Diagram page/view that shows:
+- The route definition as an interactive DAG (nodes = processors, edges = flow)
+- Execution overlay: color-code each node by success/failure for a specific execution
+- Aggregate overlay: color-code each node by throughput/error rate over a time window
+- Highlight the path taken by the selected exchange (dim the branches not taken)
+- Show inter-route connections (e.g., `direct:`, `seda:`, `vm:` endpoints linking routes)
+
+**Priority**: **Must-Have**. Cameleer already has `RouteGraph` data from agents -- this is the tool's differentiating feature.
+
+### 2.3 Cross-Route Correlation / Message Tracing
+
+**Pain point**: A single business transaction (e.g., an order) often spans multiple routes: `order-intake` -> `order-enrichment` -> `payment-process` -> `shipment-dispatch`. The dashboard shows each route execution as a separate row. There is no way to see the full journey of order OP-88421 across all routes.
+
+**How to address it**: Add a "Transaction Trace" or "Message Flow" view that:
+- Groups all executions sharing a breadcrumbId or correlation ID
+- Shows them as a horizontal timeline or waterfall chart
+- Highlights which route in the chain failed
+- Works across `direct:`, `seda:`, and `vm:` endpoints that link routes
+
+The search bar says "Search by Order ID, correlation ID" which is a good start, but the results should show the correlated group, not just individual rows.
+
+**Priority**: **Must-Have**. Splitter/aggregator patterns and multi-route flows are the norm, not the exception, in real Camel applications.
+
+### 2.4 Dead Letter Queue Monitoring
+
+**Pain point**: When messages fail and are routed to a dead letter channel (which is the standard Camel error handling pattern), I need to know: how many messages are in the DLQ, what are they, how long have they been there, and can I retry them?
+
+**How to address it**: Add a DLQ section or page showing:
+- Count of messages per dead letter endpoint
+- Age distribution (how many are from today vs. last week)
+- Message preview (body + headers + the exception that caused routing to DLQ)
+- Retry action (re-submit the message to the original route)
+- Purge action (acknowledge and discard)
+
+**Priority**: **Must-Have**. DLQ management is a daily production task.
+
+### 2.5 Per-Processor Statistics (Aggregate View)
+
+**Pain point**: The processor timeline in the detail panel shows per-processor timing for a single execution. But I also need aggregate statistics: for processor `to(payment-api)`, what is the p50/p95/p99 latency over the last hour? How many times did it fail? Is it getting slower over time?
+
+**How to address it**: Clicking a processor name in the timeline should show aggregate stats for that processor. Alternatively, the Route Detail page should have a "Processors" tab with a table of all processors in the route, their call count, success rate, and latency percentiles.
+
+**Priority**: **Must-Have**. Identifying a chronically slow processor is different from identifying a one-off slow execution.
+
+### 2.6 Error Pattern Grouping / Top Errors
+
+**Pain point**: The dashboard shows individual error rows. When there are 38 errors, I do not want to scroll through all 38. I want to see: "23 of the 38 errors are `HttpOperationFailedException` on `payment-process`, 10 are `SQLTransientConnectionException` on `order-enrichment`, 5 are `ValidationException` on `order-intake`." The design notes mention "Top error pattern grouping panel" from the operator expert, but it is not in the final mock.
+
+**How to address it**: Add an error summary panel above or alongside the execution table showing errors grouped by exception class + route. Each group should show count, first/last occurrence, and whether the count is trending up.
+
+**Priority**: **Must-Have**. Pattern recognition is more important than individual error viewing.
+
+### 2.7 Route Status Management
+
+**Pain point**: I need to know which routes are started, stopped, or suspended. And I need the ability to stop/start/suspend individual routes without redeploying. This is routine in production -- temporarily suspending a route that is flooding a downstream system.
+
+**How to address it**: The sidebar route list should show route status (started/stopped/suspended) with icons. Right-click or action menu on a route should offer start/stop/suspend. This maps directly to Camel's route controller API.
+
+**Priority**: **Nice-to-Have** for v1, **Must-Have** for v2. Operators will ask for this quickly.
+
+### 2.8 Route Version Comparison
+
+**Pain point**: After a deployment, I want to compare the current route definition with the previous version. Did someone add a processor? Change an endpoint URI? Route definition drift is a real source of production issues.
+
+**How to address it**: Store route graph snapshots per deployment/version. Show a diff view highlighting added/removed/modified processors.
+
+**Priority**: **Nice-to-Have**. Valuable but less urgent than the above.
+
+### 2.9 Thread Pool / Resource Monitoring
+
+**Pain point**: Camel's default thread pool max is 20. When all threads are consumed, messages queue up silently. The HikariPool error in the mock is a perfect example -- pool exhaustion. I need visibility into thread pool utilization, connection pool utilization, and inflight exchange count.
+
+**How to address it**: Add a "Resources" section (either in the agent detail or a separate page) showing:
+- Camel thread pool utilization (active/max)
+- Connection pool utilization (from endpoint components)
+- Inflight exchange count per route
+- Consumer prefetch/backlog (for JMS/Kafka consumers)
+
+**Priority**: **Nice-to-Have** initially, but becomes **Must-Have** when debugging pool exhaustion issues.
+
+### 2.10 Saved Searches / Alert Rules
+
+**Pain point**: I find myself searching for the same patterns repeatedly: "errors on payment-process in the last hour", "executions over 500ms for order-enrichment". There is no way to save these as bookmarks or convert them into alert rules.
+
+**How to address it**: Allow saving filter configurations as named views. Allow converting a saved search into an alerting rule (email/webhook when count exceeds threshold).
+
+**Priority**: **Nice-to-Have**.
+
+---
+
+## 3. Specific Page/Feature Recommendations
+
+### 3.1 Route Detail Page
+
+When I click a route name (e.g., `order-intake`) from the sidebar, I should see:
+
+- **Header**: Route name, status (started/stopped), uptime, route definition source (Java DSL / XML / YAML)
+- **KPI Strip**: Total executions, success rate, p50/p99 latency, inflight count, throughput -- all for this route only
+- **Processor Table**: Every processor in the route with columns: name, type, call count, success rate, p50 latency, p99 latency, total time %. Sortable by any column. This is where I find the bottleneck processor.
+- **Route Diagram**: Interactive DAG with execution overlay. Nodes sized by throughput, colored by error rate. Clicking a node filters the execution list to that processor.
+- **Recent Executions**: Filtered version of the main table, showing only this route's executions.
+- **Error Patterns**: Top errors for this route, grouped by exception class.
+
+### 3.2 Exchange / Message Inspector
+
+When I click "Exchange" tab in the detail panel:
+
+- **Inbound Message**: The original message as received by the route's consumer. Body + headers. Shown prominently, always visible.
+- **Step-by-Step Trace**: For each processor, show the exchange state AFTER that processor ran. Diff mode should highlight what changed (body mutations, added headers, removed headers).
+- **Properties**: Camel exchange properties (not just headers). Properties often carry routing decisions.
+- **Exception**: If the exchange failed, show the caught exception, the handled flag, and whether it was routed to a dead letter channel.
+- **Response**: If the route produces a response (e.g., REST endpoint), show the outbound body.
+
+Display format should auto-detect JSON/XML and pretty-print. Binary payloads should show hex dump with size.
+
+### 3.3 Metrics Dashboard (Developer vs. Operator KPIs)
+
+The current metrics (throughput, latency p99, error rate) are operator KPIs. A Camel developer also needs:
+
+**Developer KPIs** (add a "Developer" metrics view):
+- Per-processor latency breakdown (stacked bar: which processors consume the most time)
+- External endpoint response time (HTTP, DB, JMS) -- separate from Camel processing time
+- Type converter cache hit rate (rarely needed, but valuable when debugging serialization issues)
+- Redelivery count (how many messages required retries before succeeding)
+- Content-based router distribution (for `choice()` routes: how many messages went down each branch)
+
+**Operator KPIs** (already well-covered):
+- Throughput, error rate, latency percentiles -- these are solid as-is
+
+### 3.4 Dead Letter Queue View
+
+A dedicated DLQ page:
+
+- **Summary Cards**: One card per DLQ endpoint (e.g., `jms:DLQ.orders`, `seda:error-handler`), showing message count, oldest message age, newest message timestamp.
+- **Message List**: Table with columns: original route, exception class, business ID, timestamp, retry count.
+- **Message Detail**: Click a DLQ message to see the exchange snapshot (body + headers + exception) at the time of failure.
+- **Actions**: Retry (re-submit to original endpoint), Retry All (bulk retry for a pattern), Discard, Move to another queue.
+- **Filters**: By exception type, by route, by age.
+
+### 3.5 Route Comparison
+
+Two use cases:
+
+1. **Version diff**: Compare route graph v3.2.0 vs. v3.2.1. Show added/removed/modified processors as a visual diff on the DAG.
+2. **Performance comparison**: Compare this week's latency distribution for `payment-process` with last week's. Overlay histograms. Useful for validating that a deployment improved (or degraded) performance.
+
+---
+
+## 4. Information Architecture Critique
+
+### What Works
+- **Sidebar hierarchy** (Applications > Routes) is correct and matches how Camel projects are structured.
+- **Health strip at top** provides instant situational awareness without scrolling.
+- **Master-detail pattern** (table + slide-in panel) avoids page navigation for quick inspection. This keeps context.
+- **Keyboard shortcuts** (Ctrl+K search, arrow navigation, Esc to close) are the right accelerators for power users.
+
+### What Needs Adjustment
+
+**The sidebar is too flat.** It shows applications and routes in the same list, but there is no way to navigate to:
+- A dedicated Route Detail page (with per-processor stats, diagram, error patterns)
+- An Agent Detail page (with resource utilization, version info, configuration)
+- A DLQ page
+- A Search/Trace page (for cross-route correlation)
+
+Recommendation: Add top-level navigation items to the sidebar:
+```
+Dashboard  (the current view)
+Routes     (route list with status, drill into route detail)
+Traces     (cross-route message flow / correlation)
+Errors     (grouped error patterns, DLQ)
+Agents     (agent health, resource utilization)
+Diagrams   (route graph visualization)
+```
+
+**Route click should go deeper.** Currently, clicking a route in the sidebar filters the execution table. This is useful, but clicking the route NAME in a table row or in the detail panel should navigate to a dedicated Route Detail page with per-processor aggregate stats and the route diagram.
+
+**Search results need grouping.** The Ctrl+K search bar says "Search by Order ID, route, error..." but search results should group by correlation ID when searching by business ID. If I search for "OP-88421", I want to see ALL executions related to that order across all routes, not just the one row in `payment-process`.
+
+**1-click access priorities:**
+- Health overview: 1 click (current: 0 clicks -- it is the home page -- good)
+- Filter by errors only: 1 click (current: 1 click on Error pill -- good)
+- View a specific execution's processor timeline: 2 clicks (current: 1 click on row -- good)
+- View exchange body/headers: should be 2 clicks (click row, click Exchange tab). Currently not implemented.
+- View route diagram: should be 2 clicks (click route name, see diagram). Currently requires finding the button in the detail panel.
+- Cross-route trace: should be 2 clicks (click correlation ID or business ID, see trace). Currently not possible.
+- DLQ status: should be 1 click from sidebar. Currently not available.
+
+---
+
+## 5. Score Card
+
+| Dimension                   | Score (1-10) | Notes |
+|-----------------------------|:---:|-------|
+| Transaction tracking        | 4   | Individual executions visible, but no cross-route transaction view. Correlation ID shown but not actionable. |
+| Root cause analysis         | 6   | Processor timeline identifies the slow/failing step. Error messages shown inline. But no exchange body inspection, no stack trace expansion, no header diff. |
+| Performance monitoring       | 7   | Throughput, latency p99, error rate charts with SLA lines are solid. Missing per-processor aggregate stats and resource utilization. |
+| Route visualization         | 3   | Route names in sidebar, but no actual route diagram/DAG. The "View Route Diagram" button exists with no destination. This is Cameleer's key differentiator -- it must ship. |
+| Exchange/message visibility | 2   | Exchange tab exists but has no content. No body inspection, no header view, no step-by-step diff. This is the most critical gap. |
+| Correlation/tracing         | 3   | Correlation ID displayed in detail panel, but no way to trace a message across routes. No breadcrumb linking. No transaction waterfall. |
+| Overall daily usefulness    | 5   | As an operations monitor (is anything broken right now?), it scores 7-8. As a developer debugging tool (why is it broken and how do I fix it?), it scores 3-4. The gap is in the debugging/inspection features. |
+
+### Summary Verdict
+
+The dashboard is a **strong operations monitor** -- it answers "what is happening right now?" effectively. The health strip, SLA awareness, shift context, business ID columns, and inline error previews are genuinely useful and better than most tools I have used.
+
+However, it is a **weak debugging tool** -- it does not yet answer "why did this specific message fail?" or "what did the exchange look like at each step?" The Exchange tab, route diagram, cross-route tracing, and error pattern grouping are the features that would make this a daily-driver tool rather than a pretty overview I glance at in the morning.
+
+The processor Gantt chart in the detail panel is the single best feature in the entire dashboard. Build on that. Make it clickable (click a processor to see the exchange state at that point). Add aggregate stats. Link it to the route diagram. That is where this tool becomes indispensable.
+
+**Bottom line**: Ship the exchange inspector, the route diagram, and cross-route tracing, and this goes from a 5/10 to an 8/10 daily-use tool.