refactor: rename group/groupName to application/applicationName
The execution-related "group" concept actually represents the application name. Rename all Java fields, API parameters, and frontend types from groupName→applicationName and group→application for clarity. - Java records: ExecutionSummary, ExecutionDetail, ExecutionDocument, ExecutionRecord, ProcessorRecord - API params: SearchRequest.group→application, SearchController @RequestParam group→application - Services: IngestionService, DetailService, SearchIndexer, StatsStore - Frontend: schema.d.ts, Dashboard, ExchangeDetail, RouteDetail, executions query hooks Database column names (group_name) and OpenSearch field names are unchanged — only the API-facing Java/TS field names are renamed. RBAC group references (groups table, GroupRepository, GroupsTab) are a separate domain concept and are NOT affected by this change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
261
docs/ui-mocks/camel-developer-review.md
Normal file
261
docs/ui-mocks/camel-developer-review.md
Normal file
@@ -0,0 +1,261 @@
|
||||
# Cameleer3 Dashboard Review -- Senior Camel Developer Perspective
|
||||
|
||||
**Reviewer**: Senior Apache Camel Developer (10+ years, Java DSL / Spring Boot)
|
||||
**Artifact reviewed**: `mock-v2-light.html` -- Operations Dashboard (v2 synthesis)
|
||||
**Date**: 2026-03-17
|
||||
|
||||
---
|
||||
|
||||
## 1. What the Dashboard Gets RIGHT
|
||||
|
||||
### Business ID as First-Class Citizen
|
||||
The Order ID and Customer columns in the execution table are exactly what I need. When support calls me about "order OP-88421", I can paste that into the search and find the execution immediately. Every other monitoring tool I have used forces me to map business IDs to correlation IDs manually. This alone would save me 10-15 minutes per incident.
|
||||
|
||||
### Inline Error Previews
|
||||
Showing the exception message directly in the table row without requiring a click-through is genuinely useful. The two error examples in the mock (`HttpOperationFailedException` with a 504, `SQLTransientConnectionException` with HikariPool exhaustion) are realistic Camel exceptions. I can scan the error list and immediately tell whether it is a downstream timeout or a connection pool issue. That distinction determines whether I investigate our code or page the DBA.
|
||||
|
||||
### Processor Timeline (Gantt View)
|
||||
The processor timeline in the detail panel is the single most valuable feature. Seeing that `to(payment-api)` consumed 280ms out of a 412ms total execution, while `enrich(inventory)` took 85ms, immediately tells me WHERE the bottleneck is. In my experience, 95% of Camel performance issues are in external calls, and this view pinpoints them. The color coding (green/yellow/red) for processor bars makes the slow step obvious at a glance.
|
||||
|
||||
### SLA Awareness Baked In
|
||||
The SLA threshold line on the latency chart, the "SLA" tag on slow durations, and the "CLOSE" warning on the p99 card are exactly the kind of proactive indicators I want. Most monitoring tools show me raw numbers; this dashboard shows me numbers in context. I know immediately that 287ms p99 is dangerously close to our 300ms SLA.
|
||||
|
||||
### Shift-Aware Time Context
|
||||
The "since 06:00" shift concept is something I have never seen in a developer tool but actually matches how production support works. When I start my day shift, I want to see what happened overnight and what is happening now, not a rolling 24-hour window that mixes yesterday afternoon with this morning.
|
||||
|
||||
### Agent Health in Sidebar
|
||||
Seeing agent status (live/stale/dead), throughput per agent, and error rates at a glance in the sidebar is practical. When an agent goes stale, I know to check if a pod restarted or if there is a network partition.
|
||||
|
||||
### Application-to-Route Navigation Hierarchy
|
||||
The sidebar tree (Applications > order-service > Routes > order-intake, order-enrichment, etc.) matches how I think about Camel deployments. I have multiple applications, each with multiple routes. Being able to filter by application first, then drill into routes, is the right hierarchy.
|
||||
|
||||
---
|
||||
|
||||
## 2. What is MISSING or Could Be Better
|
||||
|
||||
### 2.1 Exchange Body/Header Inspection -- CRITICAL GAP
|
||||
|
||||
**Pain point**: The "Exchange" tab exists in the detail panel tabs but its content is not shown. This is the single most important debugging feature for a Camel developer. When a message fails at step 5 of 7, I need to see:
|
||||
- What was the original inbound message (before any transformation)?
|
||||
- What did the exchange body look like at each processor step?
|
||||
- Which headers were present at each step, and which were added/removed?
|
||||
- What was the exception body (often different from the exception message)?
|
||||
|
||||
**How to address it**: The Exchange tab should show a step-by-step diff view of the exchange. For each processor in the route, show the body (with a JSON/XML pretty-printer) and the headers map. Highlight headers that were added at that step. Allow comparing any two steps side-by-side. Show the original inbound message prominently at the top.
|
||||
|
||||
**Priority**: **Must-Have**. Without this, the dashboard is an operations monitor, not a debugging tool. This is the difference between "I can see something failed" and "I can see WHY it failed."
|
||||
|
||||
### 2.2 Route Diagram / Visual Graph -- MENTIONED BUT NOT SHOWN
|
||||
|
||||
**Pain point**: The "View Route Diagram" button exists in the detail actions, but there is no mockup of what the route diagram looks like. As a Camel developer, I need to see the DAG (directed acyclic graph) of my route: from(jms:orders) -> unmarshal -> validate -> choice -> [branch A: enrich -> transform -> to(http)] [branch B: log -> to(dlq)]. I also need to see execution overlay on the diagram -- which path did THIS specific exchange take, and how long did each node take.
|
||||
|
||||
**How to address it**: Add a Route Diagram page/view that shows:
|
||||
- The route definition as an interactive DAG (nodes = processors, edges = flow)
|
||||
- Execution overlay: color-code each node by success/failure for a specific execution
|
||||
- Aggregate overlay: color-code each node by throughput/error rate over a time window
|
||||
- Highlight the path taken by the selected exchange (dim the branches not taken)
|
||||
- Show inter-route connections (e.g., `direct:`, `seda:`, `vm:` endpoints linking routes)
|
||||
|
||||
**Priority**: **Must-Have**. Cameleer already has `RouteGraph` data from agents -- this is the tool's differentiating feature.
|
||||
|
||||
### 2.3 Cross-Route Correlation / Message Tracing
|
||||
|
||||
**Pain point**: A single business transaction (e.g., an order) often spans multiple routes: `order-intake` -> `order-enrichment` -> `payment-process` -> `shipment-dispatch`. The dashboard shows each route execution as a separate row. There is no way to see the full journey of order OP-88421 across all routes.
|
||||
|
||||
**How to address it**: Add a "Transaction Trace" or "Message Flow" view that:
|
||||
- Groups all executions sharing a breadcrumbId or correlation ID
|
||||
- Shows them as a horizontal timeline or waterfall chart
|
||||
- Highlights which route in the chain failed
|
||||
- Works across `direct:`, `seda:`, and `vm:` endpoints that link routes
|
||||
|
||||
The search bar says "Search by Order ID, correlation ID" which is a good start, but the results should show the correlated group, not just individual rows.
|
||||
|
||||
**Priority**: **Must-Have**. Splitter/aggregator patterns and multi-route flows are the norm, not the exception, in real Camel applications.
|
||||
|
||||
### 2.4 Dead Letter Queue Monitoring
|
||||
|
||||
**Pain point**: When messages fail and are routed to a dead letter channel (which is the standard Camel error handling pattern), I need to know: how many messages are in the DLQ, what are they, how long have they been there, and can I retry them?
|
||||
|
||||
**How to address it**: Add a DLQ section or page showing:
|
||||
- Count of messages per dead letter endpoint
|
||||
- Age distribution (how many are from today vs. last week)
|
||||
- Message preview (body + headers + the exception that caused routing to DLQ)
|
||||
- Retry action (re-submit the message to the original route)
|
||||
- Purge action (acknowledge and discard)
|
||||
|
||||
**Priority**: **Must-Have**. DLQ management is a daily production task.
|
||||
|
||||
### 2.5 Per-Processor Statistics (Aggregate View)
|
||||
|
||||
**Pain point**: The processor timeline in the detail panel shows per-processor timing for a single execution. But I also need aggregate statistics: for processor `to(payment-api)`, what is the p50/p95/p99 latency over the last hour? How many times did it fail? Is it getting slower over time?
|
||||
|
||||
**How to address it**: Clicking a processor name in the timeline should show aggregate stats for that processor. Alternatively, the Route Detail page should have a "Processors" tab with a table of all processors in the route, their call count, success rate, and latency percentiles.
|
||||
|
||||
**Priority**: **Must-Have**. Identifying a chronically slow processor is different from identifying a one-off slow execution.
|
||||
|
||||
### 2.6 Error Pattern Grouping / Top Errors
|
||||
|
||||
**Pain point**: The dashboard shows individual error rows. When there are 38 errors, I do not want to scroll through all 38. I want to see: "23 of the 38 errors are `HttpOperationFailedException` on `payment-process`, 10 are `SQLTransientConnectionException` on `order-enrichment`, 5 are `ValidationException` on `order-intake`." The design notes mention "Top error pattern grouping panel" from the operator expert, but it is not in the final mock.
|
||||
|
||||
**How to address it**: Add an error summary panel above or alongside the execution table showing errors grouped by exception class + route. Each group should show count, first/last occurrence, and whether the count is trending up.
|
||||
|
||||
**Priority**: **Must-Have**. Pattern recognition is more important than individual error viewing.
|
||||
|
||||
### 2.7 Route Status Management
|
||||
|
||||
**Pain point**: I need to know which routes are started, stopped, or suspended. And I need the ability to stop/start/suspend individual routes without redeploying. This is routine in production -- temporarily suspending a route that is flooding a downstream system.
|
||||
|
||||
**How to address it**: The sidebar route list should show route status (started/stopped/suspended) with icons. Right-click or action menu on a route should offer start/stop/suspend. This maps directly to Camel's route controller API.
|
||||
|
||||
**Priority**: **Nice-to-Have** for v1, **Must-Have** for v2. Operators will ask for this quickly.
|
||||
|
||||
### 2.8 Route Version Comparison
|
||||
|
||||
**Pain point**: After a deployment, I want to compare the current route definition with the previous version. Did someone add a processor? Change an endpoint URI? Route definition drift is a real source of production issues.
|
||||
|
||||
**How to address it**: Store route graph snapshots per deployment/version. Show a diff view highlighting added/removed/modified processors.
|
||||
|
||||
**Priority**: **Nice-to-Have**. Valuable but less urgent than the above.
|
||||
|
||||
### 2.9 Thread Pool / Resource Monitoring
|
||||
|
||||
**Pain point**: Camel's default thread pool max is 20. When all threads are consumed, messages queue up silently. The HikariPool error in the mock is a perfect example -- pool exhaustion. I need visibility into thread pool utilization, connection pool utilization, and inflight exchange count.
|
||||
|
||||
**How to address it**: Add a "Resources" section (either in the agent detail or a separate page) showing:
|
||||
- Camel thread pool utilization (active/max)
|
||||
- Connection pool utilization (from endpoint components)
|
||||
- Inflight exchange count per route
|
||||
- Consumer prefetch/backlog (for JMS/Kafka consumers)
|
||||
|
||||
**Priority**: **Nice-to-Have** initially, but becomes **Must-Have** when debugging pool exhaustion issues.
|
||||
|
||||
### 2.10 Saved Searches / Alert Rules
|
||||
|
||||
**Pain point**: I find myself searching for the same patterns repeatedly: "errors on payment-process in the last hour", "executions over 500ms for order-enrichment". There is no way to save these as bookmarks or convert them into alert rules.
|
||||
|
||||
**How to address it**: Allow saving filter configurations as named views. Allow converting a saved search into an alerting rule (email/webhook when count exceeds threshold).
|
||||
|
||||
**Priority**: **Nice-to-Have**.
|
||||
|
||||
---
|
||||
|
||||
## 3. Specific Page/Feature Recommendations
|
||||
|
||||
### 3.1 Route Detail Page
|
||||
|
||||
When I click a route name (e.g., `order-intake`) from the sidebar, I should see:
|
||||
|
||||
- **Header**: Route name, status (started/stopped), uptime, route definition source (Java DSL / XML / YAML)
|
||||
- **KPI Strip**: Total executions, success rate, p50/p99 latency, inflight count, throughput -- all for this route only
|
||||
- **Processor Table**: Every processor in the route with columns: name, type, call count, success rate, p50 latency, p99 latency, total time %. Sortable by any column. This is where I find the bottleneck processor.
|
||||
- **Route Diagram**: Interactive DAG with execution overlay. Nodes sized by throughput, colored by error rate. Clicking a node filters the execution list to that processor.
|
||||
- **Recent Executions**: Filtered version of the main table, showing only this route's executions.
|
||||
- **Error Patterns**: Top errors for this route, grouped by exception class.
|
||||
|
||||
### 3.2 Exchange / Message Inspector
|
||||
|
||||
When I click "Exchange" tab in the detail panel:
|
||||
|
||||
- **Inbound Message**: The original message as received by the route's consumer. Body + headers. Shown prominently, always visible.
|
||||
- **Step-by-Step Trace**: For each processor, show the exchange state AFTER that processor ran. Diff mode should highlight what changed (body mutations, added headers, removed headers).
|
||||
- **Properties**: Camel exchange properties (not just headers). Properties often carry routing decisions.
|
||||
- **Exception**: If the exchange failed, show the caught exception, the handled flag, and whether it was routed to a dead letter channel.
|
||||
- **Response**: If the route produces a response (e.g., REST endpoint), show the outbound body.
|
||||
|
||||
Display format should auto-detect JSON/XML and pretty-print. Binary payloads should show hex dump with size.
|
||||
|
||||
### 3.3 Metrics Dashboard (Developer vs. Operator KPIs)
|
||||
|
||||
The current metrics (throughput, latency p99, error rate) are operator KPIs. A Camel developer also needs:
|
||||
|
||||
**Developer KPIs** (add a "Developer" metrics view):
|
||||
- Per-processor latency breakdown (stacked bar: which processors consume the most time)
|
||||
- External endpoint response time (HTTP, DB, JMS) -- separate from Camel processing time
|
||||
- Type converter cache hit rate (rarely needed, but valuable when debugging serialization issues)
|
||||
- Redelivery count (how many messages required retries before succeeding)
|
||||
- Content-based router distribution (for `choice()` routes: how many messages went down each branch)
|
||||
|
||||
**Operator KPIs** (already well-covered):
|
||||
- Throughput, error rate, latency percentiles -- these are solid as-is
|
||||
|
||||
### 3.4 Dead Letter Queue View
|
||||
|
||||
A dedicated DLQ page:
|
||||
|
||||
- **Summary Cards**: One card per DLQ endpoint (e.g., `jms:DLQ.orders`, `seda:error-handler`), showing message count, oldest message age, newest message timestamp.
|
||||
- **Message List**: Table with columns: original route, exception class, business ID, timestamp, retry count.
|
||||
- **Message Detail**: Click a DLQ message to see the exchange snapshot (body + headers + exception) at the time of failure.
|
||||
- **Actions**: Retry (re-submit to original endpoint), Retry All (bulk retry for a pattern), Discard, Move to another queue.
|
||||
- **Filters**: By exception type, by route, by age.
|
||||
|
||||
### 3.5 Route Comparison
|
||||
|
||||
Two use cases:
|
||||
|
||||
1. **Version diff**: Compare route graph v3.2.0 vs. v3.2.1. Show added/removed/modified processors as a visual diff on the DAG.
|
||||
2. **Performance comparison**: Compare this week's latency distribution for `payment-process` with last week's. Overlay histograms. Useful for validating that a deployment improved (or degraded) performance.
|
||||
|
||||
---
|
||||
|
||||
## 4. Information Architecture Critique
|
||||
|
||||
### What Works
|
||||
- **Sidebar hierarchy** (Applications > Routes) is correct and matches how Camel projects are structured.
|
||||
- **Health strip at top** provides instant situational awareness without scrolling.
|
||||
- **Master-detail pattern** (table + slide-in panel) avoids page navigation for quick inspection. This keeps context.
|
||||
- **Keyboard shortcuts** (Ctrl+K search, arrow navigation, Esc to close) are the right accelerators for power users.
|
||||
|
||||
### What Needs Adjustment
|
||||
|
||||
**The sidebar is too flat.** It shows applications and routes in the same list, but there is no way to navigate to:
|
||||
- A dedicated Route Detail page (with per-processor stats, diagram, error patterns)
|
||||
- An Agent Detail page (with resource utilization, version info, configuration)
|
||||
- A DLQ page
|
||||
- A Search/Trace page (for cross-route correlation)
|
||||
|
||||
Recommendation: Add top-level navigation items to the sidebar:
|
||||
```
|
||||
Dashboard (the current view)
|
||||
Routes (route list with status, drill into route detail)
|
||||
Traces (cross-route message flow / correlation)
|
||||
Errors (grouped error patterns, DLQ)
|
||||
Agents (agent health, resource utilization)
|
||||
Diagrams (route graph visualization)
|
||||
```
|
||||
|
||||
**Route click should go deeper.** Currently, clicking a route in the sidebar filters the execution table. This is useful, but clicking the route NAME in a table row or in the detail panel should navigate to a dedicated Route Detail page with per-processor aggregate stats and the route diagram.
|
||||
|
||||
**Search results need grouping.** The Ctrl+K search bar says "Search by Order ID, route, error..." but search results should group by correlation ID when searching by business ID. If I search for "OP-88421", I want to see ALL executions related to that order across all routes, not just the one row in `payment-process`.
|
||||
|
||||
**1-click access priorities:**
|
||||
- Health overview: 1 click (current: 0 clicks -- it is the home page -- good)
|
||||
- Filter by errors only: 1 click (current: 1 click on Error pill -- good)
|
||||
- View a specific execution's processor timeline: 2 clicks (current: 1 click on row -- good)
|
||||
- View exchange body/headers: should be 2 clicks (click row, click Exchange tab). Currently not implemented.
|
||||
- View route diagram: should be 2 clicks (click route name, see diagram). Currently requires finding the button in the detail panel.
|
||||
- Cross-route trace: should be 2 clicks (click correlation ID or business ID, see trace). Currently not possible.
|
||||
- DLQ status: should be 1 click from sidebar. Currently not available.
|
||||
|
||||
---
|
||||
|
||||
## 5. Score Card
|
||||
|
||||
| Dimension | Score (1-10) | Notes |
|
||||
|-----------------------------|:---:|-------|
|
||||
| Transaction tracking | 4 | Individual executions visible, but no cross-route transaction view. Correlation ID shown but not actionable. |
|
||||
| Root cause analysis | 6 | Processor timeline identifies the slow/failing step. Error messages shown inline. But no exchange body inspection, no stack trace expansion, no header diff. |
|
||||
| Performance monitoring | 7 | Throughput, latency p99, error rate charts with SLA lines are solid. Missing per-processor aggregate stats and resource utilization. |
|
||||
| Route visualization | 3 | Route names in sidebar, but no actual route diagram/DAG. The "View Route Diagram" button exists with no destination. This is Cameleer's key differentiator -- it must ship. |
|
||||
| Exchange/message visibility | 2 | Exchange tab exists but has no content. No body inspection, no header view, no step-by-step diff. This is the most critical gap. |
|
||||
| Correlation/tracing | 3 | Correlation ID displayed in detail panel, but no way to trace a message across routes. No breadcrumb linking. No transaction waterfall. |
|
||||
| Overall daily usefulness | 5 | As an operations monitor (is anything broken right now?), it scores 7-8. As a developer debugging tool (why is it broken and how do I fix it?), it scores 3-4. The gap is in the debugging/inspection features. |
|
||||
|
||||
### Summary Verdict
|
||||
|
||||
The dashboard is a **strong operations monitor** -- it answers "what is happening right now?" effectively. The health strip, SLA awareness, shift context, business ID columns, and inline error previews are genuinely useful and better than most tools I have used.
|
||||
|
||||
However, it is a **weak debugging tool** -- it does not yet answer "why did this specific message fail?" or "what did the exchange look like at each step?" The Exchange tab, route diagram, cross-route tracing, and error pattern grouping are the features that would make this a daily-driver tool rather than a pretty overview I glance at in the morning.
|
||||
|
||||
The processor Gantt chart in the detail panel is the single best feature in the entire dashboard. Build on that. Make it clickable (click a processor to see the exchange state at that point). Add aggregate stats. Link it to the route diagram. That is where this tool becomes indispensable.
|
||||
|
||||
**Bottom line**: Ship the exchange inspector, the route diagram, and cross-route tracing, and this goes from a 5/10 to an 8/10 daily-use tool.
|
||||
Reference in New Issue
Block a user