diff --git a/.planning/phases/02-transaction-search-diagrams/02-RESEARCH.md b/.planning/phases/02-transaction-search-diagrams/02-RESEARCH.md new file mode 100644 index 00000000..a25ce16f --- /dev/null +++ b/.planning/phases/02-transaction-search-diagrams/02-RESEARCH.md @@ -0,0 +1,577 @@ +# Phase 2: Transaction Search + Diagrams - Research + +**Researched:** 2026-03-11 +**Domain:** ClickHouse querying, full-text search, SVG diagram rendering, REST API design +**Confidence:** HIGH + +## Summary + +Phase 2 transforms the ingestion-only server into a queryable observability platform. The work divides into three domains: (1) structured search with ClickHouse WHERE clauses over the existing `route_executions` table plus schema extensions for exchange snapshot data, (2) full-text search using ClickHouse's `tokenbf_v1` skip indexes (the text index GA feature requires ClickHouse 26.2+ and we run 25.3), and (3) route diagram retrieval and server-side SVG rendering using Eclipse ELK for layout and JFreeSVG for output. + +The existing Phase 1 code provides a solid foundation: `ClickHouseExecutionRepository` already flattens processor trees into parallel arrays, `ClickHouseDiagramRepository` already stores diagrams with SHA-256 content-hash deduplication, and `AbstractClickHouseIT` provides the Testcontainers base class. Phase 2 extends these with query methods, schema additions for exchange data and tree reconstruction metadata, and new search/diagram REST controllers. + +**Primary recommendation:** Extend the existing repository interfaces with query methods, add a `SearchService` abstraction in core (for future OpenSearch swap), store exchange snapshot data as JSON strings in new columns on `route_executions`, and use Eclipse ELK 0.11.0 + JFreeSVG 5.0.7 for diagram rendering. + + +## User Constraints (from CONTEXT.md) + +### Locked Decisions +- Both GET and POST endpoints for search: GET /api/v1/search/executions for basic filters, POST for advanced filters including full-text and per-field targeting +- Response envelope: `{ "data": [...], "total": N, "offset": 0, "limit": 50 }` +- Substring matching (LIKE '%term%') for full-text search -- not token-based only +- Global `text` parameter searches all text fields; optional per-field targeting: textInBody, textInHeaders, textInErrors +- Search service interface designed for future OpenSearch swap +- Nested JSON tree returned by server for transaction detail -- server reconstructs processor tree from flat storage +- Add depth and parent index arrays to ClickHouse schema (processor_depths, processor_parent_indexes) -- populated at ingestion time +- Exchange snapshot data fetched separately per processor -- not inlined in detail response +- Diagram accessed via separate endpoint; detail response includes diagram content hash for linking +- Both SVG and JSON layout formats via Accept header content negotiation +- Top-to-bottom node layout flow +- Nested processors in swimlanes to highlight nesting/scope +- Color-coded node types matching route-diagram-example.html style +- Store everything the agent sends -- no server-side truncation +- API designed to support future cmd+k cross-entity search UI + +### Claude's Discretion +- Pagination implementation details (offset/limit vs cursor) +- ClickHouse schema extension approach for exchange snapshot storage +- SVG rendering library choice +- Layout algorithm for diagram node positioning +- Search service abstraction layer design + +### Deferred Ideas (OUT OF SCOPE) +- Cursor-based pagination (ASRCH-01) -- v2 +- Saved search queries (ASRCH-02) -- v2 +- Web UI with cmd+k search overlay -- v2 +- Execution overlay on diagrams -- UI responsibility +- OpenSearch for full-text search -- evaluate after Phase 2 + + + +## Phase Requirements + +| ID | Description | Research Support | +|----|-------------|-----------------| +| SRCH-01 (#7) | Search transactions by execution status (COMPLETED, FAILED, RUNNING) | WHERE clause on `status` column (LowCardinality, in ORDER BY) -- highly efficient | +| SRCH-02 (#8) | Search transactions by date/time range | WHERE clause on `start_time` (in ORDER BY, partition key) -- primary index range scan | +| SRCH-03 (#9) | Search transactions by duration range (min/max ms) | WHERE clause on `duration_ms` -- simple range filter | +| SRCH-04 (#10) | Search by correlationId for cross-instance correlation | WHERE + bloom_filter skip index on `correlation_id` (already exists) | +| SRCH-05 (#11) | Full-text search across bodies, headers, errors, stack traces | LIKE '%term%' on text columns + tokenbf_v1 skip indexes; schema extension needed for body/header storage | +| SRCH-06 (#12) | Transaction detail with nested processor execution tree | Reconstruct tree from parallel arrays using processor_depths + processor_parent_indexes; ARRAY JOIN query | +| DIAG-01 (#20) | Content-addressable diagram versioning | Already implemented: ReplacingMergeTree with SHA-256 content_hash | +| DIAG-02 (#21) | Transaction links to active diagram version | Add `diagram_content_hash` column to `route_executions`; populated at ingestion from latest diagram | +| DIAG-03 (#22) | Server renders route diagrams from stored definitions | Eclipse ELK for layout + JFreeSVG for SVG output; JSON layout alternative via Accept header | + + +## Standard Stack + +### Core (already in project) +| Library | Version | Purpose | Why Standard | +|---------|---------|---------|--------------| +| Spring Boot | 3.4.3 | Web framework, DI, JdbcTemplate | Already established in Phase 1 | +| ClickHouse JDBC | 0.9.7 | Database driver | Already established in Phase 1 | +| Jackson | 2.17.3 | JSON serialization | Already established in Phase 1 | +| springdoc-openapi | 2.8.6 | API documentation | Already established in Phase 1 | +| Testcontainers | 2.0.3 | ClickHouse integration tests | Already established in Phase 1 | + +### New for Phase 2 +| Library | Version | Purpose | When to Use | +|---------|---------|---------|-------------| +| Eclipse ELK Core | 0.11.0 | Graph layout algorithm (layered/hierarchical) | Diagram node positioning | +| Eclipse ELK Layered | 0.11.0 | Sugiyama-style top-to-bottom layout | The actual layout algorithm | +| JFreeSVG | 5.0.7 | Programmatic SVG generation via Graphics2D API | Rendering diagram to SVG string | + +### Alternatives Considered +| Instead of | Could Use | Tradeoff | +|------------|-----------|----------| +| Eclipse ELK | Manual layout algorithm | ELK handles edge crossing minimization, node spacing, layer assignment -- non-trivial to implement correctly | +| JFreeSVG | Apache Batik | Batik is 98x more memory than JSVG; JFreeSVG is lightweight, 5x faster, zero dependencies beyond JDK 11+ | +| JFreeSVG | Manual SVG string building | JFreeSVG handles SVG escaping, coordinate systems, text metrics correctly; manual strings are error-prone | +| Separate exchange table | JSON columns on route_executions | Separate table adds JOINs; JSON strings on the main table keep queries simple and align with "fetch snapshot separately" pattern | + +**Installation (new dependencies for app module pom.xml):** +```xml + + org.eclipse.elk + org.eclipse.elk.core + 0.11.0 + + + org.eclipse.elk + org.eclipse.elk.alg.layered + 0.11.0 + + + org.jfree + org.jfree.svg + 5.0.7 + +``` + +## Architecture Patterns + +### Recommended Project Structure (additions for Phase 2) +``` +cameleer3-server-core/src/main/java/com/cameleer3/server/core/ +├── search/ +│ ├── SearchService.java # Orchestrates search, delegates to SearchEngine +│ ├── SearchEngine.java # Interface for search backends (ClickHouse now, OpenSearch later) +│ ├── SearchRequest.java # Immutable search criteria record +│ └── SearchResult.java # Paginated result envelope record +├── detail/ +│ ├── DetailService.java # Reconstructs execution tree from flat data +│ └── ExecutionDetail.java # Rich detail model with nested tree +├── diagram/ +│ └── DiagramRenderer.java # Interface: render RouteGraph -> SVG or JSON layout +└── storage/ + ├── ExecutionRepository.java # Extended with query methods + └── DiagramRepository.java # Extended with lookup methods + +cameleer3-server-app/src/main/java/com/cameleer3/server/app/ +├── controller/ +│ ├── SearchController.java # GET + POST /api/v1/search/executions +│ ├── DetailController.java # GET /api/v1/executions/{id} +│ └── DiagramRenderController.java # GET /api/v1/diagrams/{hash} with content negotiation +├── search/ +│ └── ClickHouseSearchEngine.java # SearchEngine impl using JdbcTemplate +├── diagram/ +│ ├── ElkDiagramRenderer.java # DiagramRenderer impl: ELK layout + JFreeSVG +│ └── DiagramLayoutResult.java # JSON layout format DTO +└── storage/ + └── ClickHouseExecutionRepository.java # Extended with query + detail methods +``` + +### Pattern 1: Search Engine Abstraction (for future OpenSearch swap) +**What:** Interface in core module, ClickHouse implementation in app module +**When to use:** All search operations go through this interface +**Example:** +```java +// Core module: search engine interface +public interface SearchEngine { + SearchResult search(SearchRequest request); + long count(SearchRequest request); +} + +// Core module: search service orchestrates +public class SearchService { + private final SearchEngine engine; + + public SearchResult search(SearchRequest request) { + return engine.search(request); + } +} + +// App module: ClickHouse implementation +@Repository +public class ClickHouseSearchEngine implements SearchEngine { + private final JdbcTemplate jdbcTemplate; + + @Override + public SearchResult search(SearchRequest request) { + // Build dynamic WHERE clause from SearchRequest + // Execute against route_executions table + } +} +``` + +### Pattern 2: Dynamic SQL Query Building +**What:** Build WHERE clauses from optional filter parameters +**When to use:** Search queries with combinable filters +**Example:** +```java +public SearchResult search(SearchRequest req) { + var conditions = new ArrayList(); + var params = new ArrayList(); + + if (req.status() != null) { + conditions.add("status = ?"); + params.add(req.status().name()); + } + if (req.timeFrom() != null) { + conditions.add("start_time >= ?"); + params.add(Timestamp.from(req.timeFrom())); + } + if (req.text() != null) { + conditions.add("(error_message LIKE ? OR error_stacktrace LIKE ? OR exchange_bodies LIKE ? OR exchange_headers LIKE ?)"); + String pattern = "%" + escapeLike(req.text()) + "%"; + params.addAll(List.of(pattern, pattern, pattern, pattern)); + } + + String where = conditions.isEmpty() ? "" : "WHERE " + String.join(" AND ", conditions); + String countSql = "SELECT count() FROM route_executions " + where; + String dataSql = "SELECT ... FROM route_executions " + where + + " ORDER BY start_time DESC LIMIT ? OFFSET ?"; + // ... +} +``` + +### Pattern 3: Processor Tree Reconstruction +**What:** Rebuild nested tree from flat parallel arrays using depth + parent index +**When to use:** Transaction detail endpoint +**Example:** +```java +// At ingestion: compute depth and parent index while flattening +private record FlatProcessor(ProcessorExecution proc, int depth, int parentIndex) {} + +private List flattenWithMetadata(List processors) { + var result = new ArrayList(); + flattenRecursive(processors, 0, -1, result); + return result; +} + +private void flattenRecursive(List procs, int depth, int parentIdx, + List result) { + for (ProcessorExecution p : procs) { + int myIndex = result.size(); + result.add(new FlatProcessor(p, depth, parentIdx)); + if (p.getChildren() != null) { + flattenRecursive(p.getChildren(), depth + 1, myIndex, result); + } + } +} + +// At query: reconstruct tree from arrays +public List reconstructTree(String[] ids, String[] types, int[] depths, int[] parents, ...) { + var nodes = new ProcessorNode[ids.length]; + for (int i = 0; i < ids.length; i++) { + nodes[i] = new ProcessorNode(ids[i], types[i], ...); + } + var roots = new ArrayList(); + for (int i = 0; i < nodes.length; i++) { + if (parents[i] == -1) { + roots.add(nodes[i]); + } else { + nodes[parents[i]].addChild(nodes[i]); + } + } + return roots; +} +``` + +### Anti-Patterns to Avoid +- **Building full SQL strings with concatenation:** Use parameterized queries with `?` placeholders to prevent SQL injection, even for ClickHouse +- **Returning all columns in search results:** Search list endpoint should return summary (id, routeId, status, time, duration, correlationId, errorMessage) -- not the full processor arrays or body data +- **Inlining exchange snapshots in tree response:** Decision explicitly states snapshots are fetched separately per processor to keep tree response lightweight +- **Coupling to ClickHouse SQL in the service layer:** Keep ClickHouse-specific SQL in repository/engine implementations; services work with domain objects only + +## Don't Hand-Roll + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| Graph layout (node positioning) | Custom layered layout algorithm | Eclipse ELK Layered | Sugiyama algorithm has 5 phases (cycle breaking, layer assignment, crossing minimization, node placement, edge routing) -- each is a research paper | +| SVG generation | String concatenation of SVG XML | JFreeSVG SVGGraphics2D | Handles text metrics, coordinate transforms, SVG escaping, viewBox computation | +| LIKE pattern escaping | Manual string replace | Utility method that escapes `%`, `_`, `\` | ClickHouse LIKE uses these as wildcards; unescaped user input breaks queries or enables injection | +| Pagination math | Ad-hoc offset/limit calculations | Reusable `PageRequest` record | Off-by-one errors, negative offsets, exceeding total count | +| Content hash computation | Inline SHA-256 logic | Reuse `ClickHouseDiagramRepository.sha256Hex()` or extract to utility | Already implemented correctly in Phase 1 | + +**Key insight:** The diagram rendering pipeline (graph model to positioned layout to SVG output) involves three distinct concerns. Mixing layout logic with rendering logic creates an untestable mess. ELK handles layout, JFreeSVG handles rendering, and your code just bridges them. + +## Common Pitfalls + +### Pitfall 1: ClickHouse LIKE on Non-Indexed Columns is Full Scan +**What goes wrong:** `LIKE '%term%'` on a column without a skip index scans every granule, making queries slow at scale +**Why it happens:** Unlike PostgreSQL, ClickHouse has no built-in trigram index; skip indexes (tokenbf_v1) are the only acceleration for LIKE +**How to avoid:** Add `tokenbf_v1` skip indexes on ALL text-searchable columns (error_message already has one; add for exchange_bodies, exchange_headers). The existing `idx_error` index is the template +**Warning signs:** Search queries taking > 1 second on test data; query EXPLAIN showing all granules scanned + +### Pitfall 2: tokenbf_v1 Does Not Accelerate Substring LIKE +**What goes wrong:** `tokenbf_v1` indexes work with token-based matching (hasToken, =) but do NOT skip granules for arbitrary substring LIKE '%partial%' patterns. The index helps when the search term matches complete tokens +**Why it happens:** Bloom filters check token membership, not substring containment. LIKE '%ord%' won't match token "order" +**How to avoid:** Accept this limitation for v1 (documented in CONTEXT.md). The LIKE query still works correctly, just without index acceleration for partial-word matches. For common searches (error messages, stack traces), users typically search for complete words or phrases, where tokenbf_v1 helps. If performance is insufficient, this is the trigger to evaluate OpenSearch +**Warning signs:** Slow searches on short substring patterns; users reporting "search is slow for partial words" + +### Pitfall 3: ClickHouse Text Index Requires Version 26.2+ +**What goes wrong:** Attempting to use the newer GA `text` index type on ClickHouse 25.3 fails or requires experimental settings +**Why it happens:** The project uses ClickHouse 25.3 (see docker-compose and AbstractClickHouseIT). The GA text index with direct-read optimization is only in 26.2+ +**How to avoid:** Stick with `tokenbf_v1` and `ngrambf_v1` skip indexes for Phase 2. These are stable and well-supported on 25.3. Consider upgrading ClickHouse version later if full-text performance demands it +**Warning signs:** Schema DDL errors mentioning "unknown index type text" + +### Pitfall 4: Parallel Array ARRAY JOIN Produces Cartesian Product +**What goes wrong:** Using multiple `ARRAY JOIN` clauses on different array groups produces a cartesian product instead of aligned expansion +**Why it happens:** ClickHouse ARRAY JOIN expands one set of arrays at a time; multiple ARRAY JOINs multiply rows +**How to avoid:** For transaction detail, either (a) use a single ARRAY JOIN on all processor arrays together (they are parallel and same length), or (b) fetch the raw arrays and reconstruct in Java. Recommendation: fetch raw arrays and reconstruct in Java -- this gives full control over tree building and avoids SQL complexity +**Warning signs:** Query returning N^2 rows instead of N rows; detail endpoint returning wrong processor counts + +### Pitfall 5: Eclipse ELK Requires Explicit Algorithm Registration +**What goes wrong:** ELK layout returns empty or throws exception because no layout algorithm is registered +**Why it happens:** ELK uses a service-loader pattern; the layered algorithm must be on classpath AND may need explicit registration depending on how it's loaded +**How to avoid:** Include both `org.eclipse.elk.core` and `org.eclipse.elk.alg.layered` dependencies. Use `RecursiveGraphLayoutEngine` and set layout algorithm property to `LayeredOptions.ALGORITHM_ID` +**Warning signs:** NullPointerException or empty layout results from ELK + +### Pitfall 6: ClickHouse ORDER BY Determines Primary Index Efficiency +**What goes wrong:** Filters on columns NOT in the ORDER BY key (like `duration_ms`) scan more granules +**Why it happens:** ClickHouse primary index is sparse and follows ORDER BY column order. `route_executions` ORDER BY is `(agent_id, status, start_time, execution_id)`. Duration is not indexed +**How to avoid:** Accept that duration range queries are less efficient than status/time queries. This is fine for the expected query patterns (users usually filter by time first, then refine). If duration-first queries become common, consider a materialized view with different ORDER BY +**Warning signs:** Duration-only queries scanning excessive data + +## Code Examples + +### ClickHouse Schema Extension for Phase 2 +```sql +-- Migration: add exchange snapshot storage and tree reconstruction metadata +ALTER TABLE route_executions + ADD COLUMN IF NOT EXISTS exchange_bodies String DEFAULT '', + ADD COLUMN IF NOT EXISTS exchange_headers String DEFAULT '', + ADD COLUMN IF NOT EXISTS processor_depths Array(UInt16) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_parent_indexes Array(Int32) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_error_messages Array(String) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_error_stacktraces Array(String) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_input_bodies Array(String) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_output_bodies Array(String) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_input_headers Array(String) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_output_headers Array(String) DEFAULT [], + ADD COLUMN IF NOT EXISTS processor_diagram_node_ids Array(String) DEFAULT [], + ADD COLUMN IF NOT EXISTS diagram_content_hash String DEFAULT ''; + +-- Skip indexes for full-text search on new columns +ALTER TABLE route_executions + ADD INDEX IF NOT EXISTS idx_exchange_bodies exchange_bodies TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 4, + ADD INDEX IF NOT EXISTS idx_exchange_headers exchange_headers TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 4; +``` + +### Search Request Record (core module) +```java +public record SearchRequest( + ExecutionStatus status, + Instant timeFrom, + Instant timeTo, + Long durationMin, + Long durationMax, + String correlationId, + String text, // global full-text across all fields + String textInBody, // per-field targeting + String textInHeaders, + String textInErrors, + int offset, + int limit +) { + public SearchRequest { + if (limit <= 0) limit = 50; + if (limit > 500) limit = 500; + if (offset < 0) offset = 0; + } +} +``` + +### Paginated Result Envelope (core module) +```java +public record SearchResult( + List data, + long total, + int offset, + int limit +) { + public static SearchResult empty(int offset, int limit) { + return new SearchResult<>(List.of(), 0, offset, limit); + } +} +``` + +### ELK Layout Integration Pattern +```java +// Convert RouteGraph to ELK graph, run layout, extract positions +public DiagramLayout layoutGraph(RouteGraph graph) { + ElkNode rootNode = ElkGraphUtil.createGraph(); + // Set layout options + rootNode.setProperty(CoreOptions.ALGORITHM, LayeredOptions.ALGORITHM_ID); + rootNode.setProperty(CoreOptions.DIRECTION, Direction.DOWN); // top-to-bottom + rootNode.setProperty(LayeredOptions.SPACING_NODE_NODE, 40.0); + rootNode.setProperty(LayeredOptions.SPACING_EDGE_NODE, 20.0); + + // Create ELK nodes from RouteGraph nodes + Map elkNodes = new HashMap<>(); + for (RouteNode node : graph.getNodes()) { + ElkNode elkNode = ElkGraphUtil.createNode(rootNode); + elkNode.setIdentifier(node.getId()); + elkNode.setWidth(estimateWidth(node)); + elkNode.setHeight(estimateHeight(node)); + elkNodes.put(node.getId(), elkNode); + } + + // Create ELK edges from RouteGraph edges + for (RouteEdge edge : graph.getEdges()) { + ElkEdge elkEdge = ElkGraphUtil.createSimpleEdge( + elkNodes.get(edge.getSource()), + elkNodes.get(edge.getTarget()) + ); + } + + // Run layout + new RecursiveGraphLayoutEngine().layout(rootNode, new BasicProgressMonitor()); + + // Extract positions into DiagramLayout + return extractLayout(rootNode, elkNodes); +} +``` + +### SVG Rendering with JFreeSVG +```java +public String renderSvg(RouteGraph graph, DiagramLayout layout) { + SVGGraphics2D g2 = new SVGGraphics2D(layout.width(), layout.height()); + + // Draw edges first (behind nodes) + g2.setStroke(new BasicStroke(2f)); + for (var edge : layout.edges()) { + g2.setColor(Color.GRAY); + g2.drawLine(edge.x1(), edge.y1(), edge.x2(), edge.y2()); + } + + // Draw nodes with type-based colors + for (var positioned : layout.nodes()) { + Color fill = colorForNodeType(positioned.node().getType()); + g2.setColor(fill); + g2.fillRoundRect(positioned.x(), positioned.y(), positioned.width(), positioned.height(), 8, 8); + g2.setColor(Color.WHITE); + g2.drawString(positioned.node().getLabel(), positioned.x() + 8, positioned.y() + 20); + } + + return g2.getSVGDocument(); +} + +private Color colorForNodeType(NodeType type) { + return switch (type) { + case ENDPOINT, TO, TO_DYNAMIC, DIRECT, SEDA -> new Color(59, 130, 246); // blue + case PROCESSOR, BEAN, LOG, SET_HEADER, SET_BODY, TRANSFORM, MARSHAL, UNMARSHAL + -> new Color(34, 197, 94); // green + case ERROR_HANDLER, ON_EXCEPTION, TRY_CATCH, DO_TRY, DO_CATCH, DO_FINALLY + -> new Color(239, 68, 68); // red + default -> new Color(168, 85, 247); // purple for EIPs + }; +} +``` + +### Exchange Snapshot Storage Approach +```java +// At ingestion: serialize exchange data per processor into JSON strings +// for the parallel arrays. Concatenate all bodies/headers into searchable columns. +private void populateExchangeColumns(PreparedStatement ps, List processors, + RouteExecution exec) throws SQLException { + // Concatenated searchable text (for LIKE queries) + StringBuilder allBodies = new StringBuilder(); + StringBuilder allHeaders = new StringBuilder(); + + String[] inputBodies = new String[processors.size()]; + String[] outputBodies = new String[processors.size()]; + String[] inputHeaders = new String[processors.size()]; + String[] outputHeaders = new String[processors.size()]; + + for (int i = 0; i < processors.size(); i++) { + ProcessorExecution p = processors.get(i).proc(); + inputBodies[i] = nullSafe(p.getInputBody()); + outputBodies[i] = nullSafe(p.getOutputBody()); + inputHeaders[i] = mapToJson(p.getInputHeaders()); + outputHeaders[i] = mapToJson(p.getOutputHeaders()); + + allBodies.append(inputBodies[i]).append(' ').append(outputBodies[i]).append(' '); + allHeaders.append(inputHeaders[i]).append(' ').append(outputHeaders[i]).append(' '); + } + + // Also include route-level input/output snapshot + if (exec.getInputSnapshot() != null) { + allBodies.append(nullSafe(exec.getInputSnapshot().getBody())).append(' '); + allHeaders.append(mapToJson(exec.getInputSnapshot().getHeaders())).append(' '); + } + + ps.setString(col++, allBodies.toString()); // exchange_bodies (searchable) + ps.setString(col++, allHeaders.toString()); // exchange_headers (searchable) + ps.setObject(col++, inputBodies); + ps.setObject(col++, outputBodies); + ps.setObject(col++, inputHeaders); + ps.setObject(col++, outputHeaders); +} +``` + +## State of the Art + +| Old Approach | Current Approach | When Changed | Impact | +|--------------|------------------|--------------|--------| +| ClickHouse tokenbf_v1 for full-text | ClickHouse native text index (inverted) | GA in 26.2 (late 2025) | 7-10x faster cold queries; direct-read optimization. Not available on our 25.3 | +| Apache Batik for SVG | JSVG/JFreeSVG | ~2023 adoption wave | 98% less memory (JSVG), 5x faster generation (JFreeSVG) | +| Manual graph layout | Eclipse ELK | Stable since 0.7+ | Production-grade Sugiyama algorithm with compound node support | +| ClickHouse Map(String,String) | ClickHouse JSON type | July 2025 | 9x faster queries. Not critical for Phase 2 since we store serialized JSON strings | + +**Deprecated/outdated:** +- `allow_experimental_full_text_index` setting: replaced by `enable_full_text_index` in newer ClickHouse versions. Neither needed for tokenbf_v1 skip indexes (our approach) +- Apache Batik for generation-only use cases: heavyweight, SVG 1.1 only, excessive memory. Use JFreeSVG instead + +## Open Questions + +1. **ClickHouse 25.3 tokenbf_v1 performance at scale with LIKE '%term%'** + - What we know: tokenbf_v1 accelerates token-based queries (hasToken, =) well but LIKE with leading wildcard may not benefit from the skip index + - What's unclear: Exact performance characteristics at millions of rows with LIKE on 25.3 + - Recommendation: Implement with tokenbf_v1, add ngrambf_v1 indexes as well for substring acceleration. Benchmark during integration testing. This is the documented trigger point for evaluating OpenSearch + +2. **Eclipse ELK compound node support for swimlanes** + - What we know: ELK Layered supports hierarchical/compound nodes where child nodes can be laid out inside parent nodes + - What's unclear: Exact API for creating compound nodes to represent for-each/split/try-catch swimlanes + - Recommendation: Start with flat layout first, then add compound nodes for nesting as an enhancement. The ELK compound node feature maps directly to the swimlane requirement + +3. **Exchange snapshot data volume impact on ClickHouse performance** + - What we know: Bodies and headers can be large (JSON payloads, XML messages). Storing all of it (per user decision: no truncation) increases storage and scan cost + - What's unclear: Real-world data volume impact on query performance + - Recommendation: Use String columns (not JSON type) for searchable text. The concatenated `exchange_bodies` and `exchange_headers` columns enable LIKE search without ARRAY JOIN. Per-processor detail arrays are fetched only for the detail endpoint (single row) + +## Validation Architecture + +### Test Framework +| Property | Value | +|----------|-------| +| Framework | JUnit 5 + Spring Boot Test + Testcontainers ClickHouse 25.3 | +| Config file | cameleer3-server-app/pom.xml (testcontainers dep), AbstractClickHouseIT base class | +| Quick run command | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT -Dfailsafe.skip=true` | +| Full suite command | `mvn clean verify` | + +### Phase Requirements -> Test Map +| Req ID | Behavior | Test Type | Automated Command | File Exists? | +|--------|----------|-----------|-------------------|-------------| +| SRCH-01 | Filter by status returns matching executions | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByStatus` | No -- Wave 0 | +| SRCH-02 | Filter by time range returns matching executions | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByTimeRange` | No -- Wave 0 | +| SRCH-03 | Filter by duration range returns matching | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByDuration` | No -- Wave 0 | +| SRCH-04 | Filter by correlationId returns correlated | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByCorrelationId` | No -- Wave 0 | +| SRCH-05 | Full-text search across bodies/headers/errors | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#fullTextSearch` | No -- Wave 0 | +| SRCH-06 | Detail returns nested processor tree | integration | `mvn test -pl cameleer3-server-app -Dtest=DetailControllerIT#detailReturnsNestedTree` | No -- Wave 0 | +| DIAG-01 | Content-hash dedup stores identical defs once | integration | `mvn test -pl cameleer3-server-app -Dtest=DiagramControllerIT#contentHashDedup` | Partial (ingestion test exists) | +| DIAG-02 | Transaction links to active diagram version | integration | `mvn test -pl cameleer3-server-app -Dtest=DetailControllerIT#detailIncludesDiagramHash` | No -- Wave 0 | +| DIAG-03 | Diagram rendered as SVG or JSON layout | integration | `mvn test -pl cameleer3-server-app -Dtest=DiagramRenderControllerIT#renderSvg` | No -- Wave 0 | + +### Sampling Rate +- **Per task commit:** `mvn test -pl cameleer3-server-app -Dtest=IT` +- **Per wave merge:** `mvn clean verify` +- **Phase gate:** Full suite green before `/gsd:verify-work` + +### Wave 0 Gaps +- [ ] `SearchControllerIT.java` -- covers SRCH-01 through SRCH-05 +- [ ] `DetailControllerIT.java` -- covers SRCH-06, DIAG-02 +- [ ] `DiagramRenderControllerIT.java` -- covers DIAG-03 +- [ ] `TreeReconstructionTest.java` -- unit test for tree rebuild logic (core module) +- [ ] Schema migration script `02-search-columns.sql` -- extends schema for Phase 2 columns +- [ ] Update `AbstractClickHouseIT.initSchema()` to load both `01-schema.sql` and `02-search-columns.sql` + +## Sources + +### Primary (HIGH confidence) +- ClickHouse JDBC 0.9.7, ClickHouse 25.3 -- verified from project pom.xml and AbstractClickHouseIT +- cameleer3-common 1.0-SNAPSHOT JAR -- decompiled to verify RouteGraph, RouteNode, RouteEdge, NodeType, ProcessorExecution, ExchangeSnapshot field structures +- Existing Phase 1 codebase -- ClickHouseExecutionRepository, ClickHouseDiagramRepository, schema, test patterns + +### Secondary (MEDIUM confidence) +- [ClickHouse Text Indexes docs](https://clickhouse.com/docs/engines/table-engines/mergetree-family/textindexes) -- GA in 26.2, experimental settings for 25.3 +- [ClickHouse Full-Text Search blog](https://clickhouse.com/blog/clickhouse-full-text-search) -- tokenbf_v1 limitations vs text index +- [Eclipse ELK Layered reference](https://eclipse.dev/elk/reference/algorithms/org-eclipse-elk-layered.html) -- algorithm details, properties +- [JFreeSVG GitHub](https://github.com/jfree/jfreesvg) -- version 5.0.7, Java 11+ requirement, SVGGraphics2D API +- [Maven Central: org.eclipse.elk](https://mvnrepository.com/artifact/org.eclipse.elk) -- version 0.11.0 available + +### Tertiary (LOW confidence) +- Eclipse ELK compound node API for swimlanes -- not directly verified from docs; based on ELK architecture description of hierarchical layout support +- ngrambf_v1 acceleration of substring LIKE patterns -- mentioned in ClickHouse community but exact behavior with leading wildcards needs testing + +## Metadata + +**Confidence breakdown:** +- Standard stack: HIGH -- building directly on established Phase 1 patterns with JdbcTemplate +- Architecture: HIGH -- search abstraction layer, dynamic SQL, tree reconstruction are well-understood patterns +- Pitfalls: HIGH -- ClickHouse LIKE/index behavior well-documented; ELK registration pattern from official docs +- Diagram rendering: MEDIUM -- ELK + JFreeSVG individually well-documented, but the integration (especially swimlanes) needs implementation-time validation + +**Research date:** 2026-03-11 +**Valid until:** 2026-04-10 (stable stack, no fast-moving dependencies)