Files
2026-03-11 11:05:37 +01:00

195 lines
14 KiB
Markdown

# Feature Landscape
**Domain:** Transaction monitoring / observability for Apache Camel route executions
**Researched:** 2026-03-11
**Confidence:** MEDIUM (based on domain expertise from njams Server, Jaeger, Zipkin, Dynatrace; web search unavailable for latest feature sets)
## Table Stakes
Features users expect. Missing = product feels incomplete.
### Transaction Search and Filtering
| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| Search by time range | Every monitoring tool has this; primary axis for incident investigation | Low | Date picker with presets (last 15m, 1h, 24h, 7d, custom) |
| Filter by transaction state | SUCCESS/ERROR/WARNING is the first thing ops checks | Low | Multi-select checkboxes, counts per state |
| Filter by duration | Finding slow transactions is core use case | Low | Min/max duration inputs, or predefined buckets |
| Full-text search across payload/attributes | Users need to find "that one order ID" across millions of records | Medium | Requires text index; match highlighting in results |
| Combined/compound filters | Users always combine: "errors in last hour on instance X" | Medium | AND-composition of all filter criteria |
| Paginated result list | Cannot load millions of rows; must page or virtual-scroll | Low | Cursor-based pagination preferred over offset for large datasets |
| Sort by time, duration, state | Basic result ordering | Low | Default: newest first |
| Filter by agent/instance | "Show me only transactions from production-instance-3" | Low | Dropdown populated from agent registry |
| Filter by route name | Users think in routes, not raw IDs | Low | Autocomplete from known route definitions |
| Save/bookmark search queries | Ops teams reuse the same searches during incidents | Medium | Named saved searches, shareable via URL |
### Transaction Detail and Drill-Down
| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| Transaction summary view | One-glance: state, start time, duration, instance, route entry point | Low | Header card in detail page |
| Activity list (per-route breakdown) | Hierarchical view of all route executions within a transaction | Medium | Tree or table showing each activity with timing |
| Activity timing waterfall | Visual timeline showing which routes executed when, and their overlap | Medium | Horizontal bar chart; critical for finding bottlenecks |
| Payload/attribute inspection | View message body, headers, properties at each activity step | Medium | Expandable sections; JSON/XML pretty-printing |
| Error detail with stack trace | When a transaction fails, users need the exception detail immediately | Low | Rendered stack trace with copy button |
| Cross-instance correlation | Transaction spans instances A and B -- show the full chain | High | Requires correlation ID propagation; single unified view |
| Link to route diagram | From any activity, jump to the diagram showing the route definition | Low | Hyperlink; depends on diagram storage existing |
### Route Diagram Visualization
| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| Render route diagram from stored definition | The core differentiator vs generic tracing tools; users think in Camel routes | High | Server-side or client-side rendering from graph model |
| Diagram versioning | Route changed last Tuesday -- show the diagram as it was when the transaction ran | Medium | Version stored per diagram; transaction references specific version |
| Zoom and pan | Diagrams can be large (50+ nodes); must be navigable | Medium | Standard canvas controls; minimap helpful for large diagrams |
| Execution overlay on diagram | Highlight which path the transaction actually took through the route | High | Color/annotate nodes with state (success/error), timing |
| Node click for activity detail | Click a node in the diagram to see the activity data for that step | Medium | Links diagram nodes to activity records |
### Agent Management
| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| Agent list with status | See all connected agents and their lifecycle state (LIVE/STALE/DEAD) | Low | Table with status indicator; auto-refresh |
| Agent heartbeat monitoring | Detect when an agent goes silent | Low | Timestamp of last heartbeat; threshold-based state transitions |
| Agent detail view | Instance name, version, connected routes, uptime, config | Low | Detail page per agent |
| Agent registration/deregistration | New agents register via bootstrap token; dead agents get cleaned up | Medium | Registration endpoint; TTL-based cleanup |
### Authentication and Security
| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| JWT-based API authentication | Secure the REST API; every enterprise monitoring tool requires auth | Medium | Token issuance, validation, refresh |
| Bootstrap token for agent registration | Agents need a way to initially register without pre-existing credentials | Low | Shared secret, single-use or time-limited |
| Ed25519 config signing | Agents must verify config came from the server, not tampered | Medium | Key management, signature generation/verification |
### Dashboard and Overview
| Feature | Why Expected | Complexity | Notes |
|---------|--------------|------------|-------|
| Transaction volume chart (time series) | "How many transactions are we processing?" -- first question on login | Medium | Bar or line chart, grouped by time bucket |
| Error rate chart | "Is something broken right now?" -- second question | Medium | Error count or percentage over time |
| Active agents count | Quick health check of the agent fleet | Low | Simple counter with status breakdown |
| Recent errors list | Quick access to the latest failures without searching | Low | Pre-filtered list, auto-refreshing |
## Differentiators
Features that set product apart from generic tracing tools. Not expected, but valued.
### Diagram-Centric Experience
| Feature | Value Proposition | Complexity | Notes |
|---------|-------------------|------------|-------|
| Route diagram as primary navigation | Instead of trace waterfall, users navigate via the Camel route diagram -- this is how they think | High | Diagram becomes the entry point, not just a visualization |
| Execution heatmap on diagram | Color nodes by frequency/error rate over a time window -- shows hotspots | High | Aggregate stats per node; requires efficient querying |
| Side-by-side diagram comparison | Compare two diagram versions to see what changed in a route | Medium | Diff view highlighting added/removed/changed nodes |
| Diagram-based search | "Show me all failed transactions that passed through this node" | High | Click a node, get filtered transaction list |
### Advanced Search and Analytics
| Feature | Value Proposition | Complexity | Notes |
|---------|-------------------|------------|-------|
| Statistical duration analysis | P50/P95/P99 duration for a route over time -- detect degradation trends | Medium | Requires ClickHouse aggregation queries |
| Transaction comparison | Side-by-side diff of two transactions through the same route | Medium | Useful for "why did this one fail but that one succeed?" |
| Search result aggregations | Faceted counts: N errors, N warnings, distribution by route, by instance | Medium | ClickHouse GROUP BY queries alongside search results |
| Correlation graph | Visual graph showing how transactions flow across instances | High | Network diagram; requires correlation data |
### Configuration Push
| Feature | Value Proposition | Complexity | Notes |
|---------|-------------------|------------|-------|
| Per-route tracing level control | Turn on detailed tracing for one problematic route without restarting the agent | Medium | SSE push of config change; agent applies dynamically |
| Bulk config push to agent groups | "Enable debug tracing on all production instances" | Medium | Agent tagging/grouping + batch SSE dispatch |
| Config history and rollback | See what config was active when, roll back a bad change | Medium | Versioned config storage with timestamps |
| Ad-hoc command dispatch | Send a "flush cache" or "reconnect" command to specific agents | Medium | Command/response pattern over SSE; command status tracking |
### Operational Intelligence
| Feature | Value Proposition | Complexity | Notes |
|---------|-------------------|------------|-------|
| Alerting on error rate thresholds | Notify when error rate exceeds threshold for a route | High | Threshold evaluation, notification channels (email, webhook) |
| Anomaly detection on duration | Alert when P95 duration spikes compared to baseline | High | Statistical baseline computation; deviation detection |
| Scheduled data export | Export transaction data as CSV/JSON for compliance or reporting | Medium | Job scheduler; file generation; download endpoint |
| Retention policy management | Configure per-route or per-instance retention periods | Medium | TTL management in ClickHouse; UI for policy CRUD |
## Anti-Features
Features to explicitly NOT build.
| Anti-Feature | Why Avoid | What to Do Instead |
|--------------|-----------|-------------------|
| General APM metrics (CPU, memory, GC) | Out of scope; Cameleer is transaction-focused, not an APM tool. Adding metrics creates scope creep and competes with Prometheus/Grafana which do it better | Provide a link/integration point to external metrics tools if needed |
| Log aggregation/viewer | Transactions are not logs. Mixing them confuses the data model and competes with ELK/Loki | Store transaction payloads and attributes, not raw log lines |
| Custom dashboard builder | Enormous complexity for marginal value. Ops teams already have Grafana for custom dashboards | Provide good built-in dashboards; expose metrics via Prometheus endpoint for Grafana |
| Multi-tenancy | Adds auth complexity, data isolation, billing concerns. Single-tenant deployment is simpler and sufficient for the target audience | Deploy separate instances per environment/team |
| Mobile app | Ops teams use desktop browsers during incidents. Mobile adds huge UI complexity | Responsive web UI that works on tablets if needed |
| Plugin/extension system | Premature abstraction; adds API stability burden before the core is stable | Build features directly; consider plugins much later if demand emerges |
| Real-time streaming transaction view | "Firehose" views of all transactions in real-time look impressive but are useless at scale (millions/day). Users cannot process the stream | Provide auto-refreshing search results and recent errors list |
| AI/ML-powered root cause analysis | Hype-driven feature with poor reliability. Requires massive training data and domain-specific models | Provide good search, filtering, and comparison tools so humans can find root causes efficiently |
## Feature Dependencies
```
Agent Registration --> Agent List/Status
Agent Registration --> SSE Connection --> Config Push
Agent Registration --> SSE Connection --> Ad-hoc Commands
Transaction Ingestion --> Transaction Storage
Transaction Storage --> Transaction Search/Filtering
Transaction Search --> Transaction Detail View
Transaction Detail --> Activity Waterfall
Transaction Detail --> Payload Inspection
Transaction Detail --> Error Detail
Diagram Storage --> Diagram Rendering
Diagram Versioning --> Transaction-to-Diagram Linking
Diagram Rendering --> Execution Overlay (requires both diagram + activity data)
Diagram Rendering --> Execution Heatmap (requires aggregated activity data)
Diagram Rendering --> Diagram-based Search
Transaction Search --> Statistical Duration Analysis (aggregation of search results)
Transaction Search --> Search Result Aggregations
JWT Auth --> All REST API endpoints
Bootstrap Token --> Agent Registration
Ed25519 Signing --> Config Push
Transaction Volume Chart --> Transaction Storage (aggregation queries)
Error Rate Chart --> Transaction Storage (aggregation queries)
```
## MVP Recommendation
**Prioritize (Phase 1 -- Foundation):**
1. Transaction ingestion and storage -- nothing works without data flowing in
2. Agent registration and lifecycle -- must know who is sending data
3. Basic transaction search (time range, state, duration) -- core value proposition
4. Transaction detail with activity breakdown -- users need to drill down
**Prioritize (Phase 2 -- Core Experience):**
5. Full-text search -- the "find that one transaction" use case
6. Route diagram rendering with version linking -- the Camel-specific differentiator
7. JWT authentication -- required before any production deployment
8. Dashboard overview (volume chart, error rate, agent status)
**Prioritize (Phase 3 -- Differentiation):**
9. Execution overlay on diagrams -- the killer feature that generic tools cannot offer
10. Config push via SSE -- operational value that justifies the agent-server architecture
11. Cross-instance correlation -- required for complex multi-instance Camel deployments
**Defer:**
- Alerting: defer until core search and dashboard are solid; alerting without good data is noise
- Data export: useful but not blocking; add when compliance demands arise
- Anomaly detection: requires baseline data that only accumulates over time
- Diagram-based search: powerful but depends on both diagram rendering and search being mature
- Execution heatmap: requires significant aggregation infrastructure
## Sources
- Domain knowledge from njams Server (Integration Matters) feature set -- transaction monitoring for integration platforms, hierarchical transaction/activity model, route diagram visualization
- Jaeger UI and Zipkin UI -- distributed tracing search, trace detail waterfall views, service dependency graphs
- Dynatrace PurePath -- transaction-level drill-down, service flow visualization, statistical analysis
- Apache Camel route model -- EIP-based visual representation, route definition structure
- Project context from PROJECT.md and CLAUDE.md -- specific requirements, constraints, and architectural decisions
**Confidence note:** Feature categorization is based on training data knowledge of these products. Web search was unavailable to verify latest feature additions in 2025-2026 releases. The core feature landscape for this domain is mature and unlikely to have shifted dramatically, but specific UI patterns and newer differentiators may be missed. Confidence: MEDIUM.