Commit Graph

287 Commits

Author SHA1 Message Date
hsiegeln
3c0e615fb7 fix: use timescaledb-ha image which includes toolkit extension 2026-03-16 19:13:47 +01:00
hsiegeln
589da1b6d6 fix: use asCompatibleSubstituteFor for TimescaleDB Testcontainer image 2026-03-16 19:06:54 +01:00
hsiegeln
41e2038190 fix: use ChronoUnit for Instant arithmetic in PostgresStatsStoreIT 2026-03-16 19:04:42 +01:00
hsiegeln
ea687a342c deploy: remove obsolete ClickHouse K8s manifest 2026-03-16 19:01:26 +01:00
hsiegeln
cea16b38ed ci: update workflow for PostgreSQL + OpenSearch deployment
Replace ClickHouse credentials secret with postgres-credentials and
opensearch-credentials secrets. Update deploy step to apply postgres.yaml
and opensearch.yaml manifests instead of clickhouse.yaml, with appropriate
rollout status checks for each StatefulSet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 19:00:20 +01:00
hsiegeln
a344be3a49 deploy: replace ClickHouse with PostgreSQL/TimescaleDB + OpenSearch in K8s manifests
- Dockerfile: update default SPRING_DATASOURCE_URL to jdbc:postgresql, add OPENSEARCH_URL default env
- deploy/postgres.yaml: new TimescaleDB StatefulSet + headless Service (10Gi PVC, pg_isready probes)
- deploy/opensearch.yaml: new OpenSearch 2.19.0 StatefulSet + headless Service (10Gi PVC, single-node, security disabled)
- deploy/server.yaml: switch datasource env from clickhouse-credentials to postgres-credentials, add OPENSEARCH_URL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:58:35 +01:00
hsiegeln
565b548ac1 refactor: remove all ClickHouse code, old interfaces, and SQL migrations
- Delete all ClickHouse storage implementations and config
- Delete old core interfaces (ExecutionRepository, DiagramRepository, MetricsRepository, SearchEngine, RawExecutionRow)
- Delete ClickHouse SQL migration files
- Delete AbstractClickHouseIT
- Update controllers to use new store interfaces (DiagramStore, ExecutionStore)
- Fix IngestionService calls in controllers for new synchronous API
- Migrate all ITs from AbstractClickHouseIT to AbstractPostgresIT
- Fix count() syntax and remove ClickHouse-specific test assertions
- Update TreeReconstructionTest for new buildTree() method

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:56:13 +01:00
hsiegeln
7dbfaf0932 feat: wire new storage beans, add MetricsFlushScheduler and RetentionScheduler
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:27:58 +01:00
hsiegeln
f7d7302694 feat: implement OpenSearchIndex with full-text and wildcard search
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:25:55 +01:00
hsiegeln
c48e0bdfde feat: implement debounced SearchIndexer for async OpenSearch indexing 2026-03-16 18:25:54 +01:00
hsiegeln
5932b5d969 feat: implement PostgresDiagramStore, PostgresUserRepository, PostgresOidcConfigRepository, PostgresMetricsStore
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:23:21 +01:00
hsiegeln
527e2cf017 feat: implement PostgresStatsStore querying continuous aggregates 2026-03-16 18:22:44 +01:00
hsiegeln
9fd02c4edb feat: implement PostgresExecutionStore with upsert and dedup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:20:57 +01:00
hsiegeln
85ebe76111 refactor: IngestionService uses synchronous ExecutionStore writes with event publishing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:18:54 +01:00
hsiegeln
adf4b44d78 refactor: DetailService uses ExecutionStore, tree built from parentProcessorId 2026-03-16 18:18:53 +01:00
hsiegeln
84b93d74c7 refactor: SearchService uses SearchIndex + StatsStore instead of SearchEngine 2026-03-16 18:18:52 +01:00
hsiegeln
a55fc3c10d feat: add new storage interfaces for PostgreSQL/OpenSearch backends
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:16:53 +01:00
hsiegeln
55ed3be71a feat: add ExecutionDocument model and ExecutionUpdatedEvent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:16:42 +01:00
hsiegeln
41a9a975fd config: switch datasource to PostgreSQL, add OpenSearch and Flyway config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:15:33 +01:00
hsiegeln
0eeae70369 test: add TimescaleDB test base class and Flyway migration smoke test 2026-03-16 18:15:32 +01:00
hsiegeln
8a637df65c feat: add Flyway migrations for PostgreSQL/TimescaleDB schema
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:13:53 +01:00
hsiegeln
5bed108d3b chore: swap ClickHouse deps for PostgreSQL, Flyway, OpenSearch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:13:45 +01:00
hsiegeln
ccc3f9fd92 Add storage layer refactor spec and implementation plan
All checks were successful
CI / build (push) Successful in 1m25s
CI / docker (push) Successful in 21s
CI / deploy (push) Successful in 32s
Design to replace ClickHouse with PostgreSQL/TimescaleDB + OpenSearch.
PostgreSQL as source of truth with continuous aggregates for analytics,
OpenSearch for full-text wildcard search. 21-task implementation plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:05:16 +01:00
hsiegeln
5ee78f7673 Reduce chart grid noise: subtle dashed Y-grid only, no X-grid or ticks
All checks were successful
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 52s
CI / deploy (push) Successful in 30s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 16:17:38 +01:00
hsiegeln
8c605d7523 Fix missing avg duration comparison on route Performance tab
All checks were successful
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 50s
CI / deploy (push) Successful in 29s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 16:14:11 +01:00
hsiegeln
4ea6814bb3 Fix unused import in AppScopedView
All checks were successful
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 53s
CI / deploy (push) Successful in 34s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 15:49:35 +01:00
hsiegeln
7fd8a787d0 UI overhaul: unified sidebar layout with app-scoped views
Some checks failed
CI / build (push) Failing after 48s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
Replace disconnected Transactions/Applications pages with a persistent
collapsible sidebar listing apps by health status. Add app-scoped view
(/apps/:group) with filtered stats, route chips, and scoped table.
Merge Processor Tree into diagram detail panel with Inspector/Tree
toggle and resizable divider. Remove max-width constraint for full
viewport usage. All view states are deep-linkable via URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 15:47:33 +01:00
hsiegeln
0b56590e3f Fix Swagger UI CORS: add /api/v1 server URL to OpenAPI spec
All checks were successful
CI / build (push) Successful in 1m15s
CI / docker (push) Successful in 44s
CI / deploy (push) Successful in 29s
The empty servers list caused Swagger UI to construct request URLs
without the /api/v1 prefix, resulting in CORS/fetch failures.
Adding a relative server entry makes paths resolve correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 14:59:12 +01:00
hsiegeln
7dec8fbaff Add embedded Swagger UI page with auto-injected JWT auth
All checks were successful
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 1m9s
CI / deploy (push) Successful in 31s
- New /swagger route with lazy-loaded SwaggerPage that initializes
  swagger-ui-dist and injects the session JWT via requestInterceptor
- Move API link from primary nav to navRight utility area (pill style)
- Code-split swagger chunk (~1.4 MB) so main bundle stays lean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 14:51:15 +01:00
hsiegeln
e466dc5861 Add API link in nav bar pointing to Swagger UI
All checks were successful
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 51s
CI / deploy (push) Successful in 35s
Opens /api/v1/swagger-ui in a new tab for manual endpoint testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 14:35:23 +01:00
hsiegeln
cc39ca3084 Fix stats query storm: stabilize time params to 10s granularity
All checks were successful
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 51s
CI / deploy (push) Successful in 31s
PerformanceTab and RouteHeader computed new Date().toISOString() on every
render, producing unique millisecond timestamps that busted the React Query
cache key — causing continuous refetches (every few ms instead of 10s).
Round timestamps to 10-second boundaries with useMemo so the query key
stays stable between renders.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 14:25:07 +01:00
hsiegeln
48d944354a Fix ClickHouse OOM: PREWHERE on active-count query + per-query memory limits
All checks were successful
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 33s
The active-count query scanned all wide rows on the base table, exceeding
the 3.6 GiB memory limit. Use PREWHERE status = 'RUNNING' so ClickHouse
reads only the status column first. Add SETTINGS max_memory_usage = 1 GiB
to all queries so concurrent requests degrade gracefully instead of crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:55:26 +01:00
hsiegeln
61a9549853 UX overhaul: 1-click row navigation, Exchange tab, Applications page (#69)
All checks were successful
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 49s
CI / deploy (push) Successful in 32s
Row click in ExecutionExplorer now navigates directly to RoutePage with
View Transition instead of expanding an inline panel. Route column is a
clickable link for context-free navigation. Search state syncs to URL
params for back-nav preservation, and previously-visited rows flash on
return. RoutePage gains an Exchange tab showing execution metadata/body/
errors. New /apps page lists application groups with status and route
links, accessible from TopNav.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:40:03 +01:00
hsiegeln
5ad0c75da8 Truncate rollup params to second precision for DateTime column
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 42s
CI / deploy (push) Successful in 29s
The JDBC driver sends java.sql.Timestamp with nanoseconds as a string
(e.g. '2026-03-15 10:13:58.105931162') which DateTime('UTC') rejects.
Add bucketTimestamp() helper that truncates to seconds for all rollup
query parameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:15:42 +01:00
hsiegeln
8e6f8e2693 Fix bucket alignment: compute 5-min floor in Java, not ClickHouse SQL
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 40s
CI / deploy (push) Successful in 31s
JDBC sends Timestamp params as strings, causing toStartOfFiveMinutes()
to fail with 'Illegal type String'. Floor to 5-minute boundaries in
Java instead and pass plain bucket >= ? comparisons.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:10:40 +01:00
hsiegeln
f660e88a17 Fix rollup queries: alias shadowed AggregateFunction column name
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 31s
countMerge(total_count) in the avg expression resolved to the UInt64
alias 'total_count' instead of the AggregateFunction column. Rename
SELECT aliases (cnt, failed, avg_ms, p99_ms) to avoid shadowing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:05:10 +01:00
hsiegeln
035356288f Fix stats rollup: AggregateFunction(count) takes no type argument
All checks were successful
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 40s
CI / deploy (push) Successful in 31s
ClickHouse count() accepts no arguments, so the column type must be
AggregateFunction(count) not AggregateFunction(count, UInt64). The
latter causes countMerge() to fail with ILLEGAL_TYPE_OF_ARGUMENT.
Drop and recreate the table/MV to apply the corrected schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:52:29 +01:00
hsiegeln
adf13f0430 Add 5-minute AggregatingMergeTree stats rollup for dashboard queries
All checks were successful
CI / build (push) Successful in 1m14s
CI / docker (push) Successful in 42s
CI / deploy (push) Successful in 30s
Pre-aggregate route execution stats into 5-minute buckets using a
materialized view with -State/-Merge combinators. Rewrite stats() and
timeseries() to query the rollup table instead of scanning the wide
base table. Active count remains a real-time query since RUNNING is
transient. Includes idempotent backfill migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:46:26 +01:00
6f7c92f793 cameleer3-server-app/src/main/java/com/cameleer3/server/app/diagram/ElkDiagramRenderer.java aktualisiert
All checks were successful
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 40s
CI / deploy (push) Successful in 30s
increased node spacing to 90
2026-03-15 10:37:56 +01:00
hsiegeln
520590fbf4 Increase ELK node spacing and revert frontend node height to 40
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 51s
CI / deploy (push) Successful in 33s
NODE_SPACING 40→60 gives edges more vertical room between nodes.
FIXED_H reverted to 40 to match backend NODE_HEIGHT.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:31:06 +01:00
hsiegeln
7b9dc32d6a Increase node height, add styled tooltips, make legend collapsible
All checks were successful
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 53s
CI / deploy (push) Successful in 30s
- #68: Increase FIXED_H from 40→52 for better edge visibility
- #67: Replace native <title> tooltips with styled HTML overlay
  showing node type, label, execution status and duration
- #66: Legend starts collapsed as small pill, expands on click
  with close button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:18:28 +01:00
hsiegeln
8961a5a63c Remove unused Sparkline.tsx (replaced by MiniChart)
All checks were successful
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 48s
CI / deploy (push) Successful in 30s
Closes #52

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 22:16:27 +01:00
hsiegeln
a108b57591 Fix route diagram open issues: bugs, visual polish, interactive features
Some checks failed
CI / build (push) Successful in 1m12s
CI / deploy (push) Has been cancelled
CI / docker (push) Has been cancelled
Batch 1 — Bug fixes:
- #51: Pass group+routeId to stats/timeseries API for route-scoped data
- #55: Propagate processor FAILED status to diagram error node highlighting

Batch 2 — Visual polish:
- #56: Brighter canvas background with amber/cyan radial gradients
- #57: Stronger glow filters (stdDeviation 3→6, opacity 0.4→0.6)
- #58: Uniform 200×40px leaf nodes with label truncation at 22 chars
- #59: Diagram legend (node types, edge types, overlay indicators)
- #64: SVG <title> tooltips on all nodes showing type, status, duration

Batch 3 — Interactive features:
- #60: Draggable minimap viewport (click-to-center, drag-to-pan)
- #62: CSS View Transitions slide animation, back arrow, Backspace key

Batch 4 — Advanced features:
- #50: Execution picker dropdown scoped to group+routeId
- #49: Iteration count badge (×N) on compound nodes
- #63: Route header stats (Executions Today, Success Rate, Avg, P99)

Closes #49 #50 #51 #55 #56 #57 #58 #59 #60 #62 #63 #64

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 22:14:23 +01:00
hsiegeln
7553139cf2 Add visible View Route Diagram button in execution detail row
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 48s
CI / deploy (push) Successful in 31s
Replace hidden Ctrl+Click navigation with an explicit button in the
expanded detail sidebar so users can discover the route diagram page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 21:44:47 +01:00
hsiegeln
7778793e7b Add route diagram page with execution overlay and group-aware APIs
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 1m3s
CI / deploy (push) Successful in 31s
Backend: Add group filtering to agent list, search, stats, and timeseries
endpoints. Add diagram lookup by group+routeId. Resolve application group
to agent IDs server-side for ClickHouse IN-clause queries.

Frontend: New route detail page at /apps/{group}/routes/{routeId} with
three tabs (Diagram, Performance, Processor Tree). SVG diagram rendering
with panzoom, execution overlay (glow effects, duration/sequence badges,
flow particles, minimap), and processor detail panel. uPlot charts for
performance tab replacing old SVG sparklines. Ctrl+Click from
ExecutionExplorer navigates to route diagram with overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 21:35:42 +01:00
hsiegeln
b64edaa16f Server-side sorting for execution search results
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 50s
CI / deploy (push) Successful in 33s
Sorting now applies to the entire result set via ClickHouse ORDER BY
instead of only sorting the current page client-side. Default sort
order is timestamp descending. Supported sort columns: startTime,
status, agentId, routeId, correlationId, durationMs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 19:34:22 +01:00
hsiegeln
31b8695420 Skip user upsert when nothing changed to avoid row accumulation
All checks were successful
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 45s
CI / deploy (push) Successful in 30s
ReplacingMergeTree only deduplicates during background merges, so
every login was inserting a new row even when all fields were identical.
Now compares the existing record and skips the write if nothing changed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:37:06 +01:00
hsiegeln
dbf53aa8e8 Auto-discover ClickHouse migration files instead of hardcoded list
All checks were successful
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 29s
Replace the static SCHEMA_FILES array with classpath pattern matching
(classpath:clickhouse/*.sql). Migration files are discovered and sorted
by filename, so adding a new numbered .sql file is all that's needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:26:33 +01:00
hsiegeln
9f9e677103 Add display_name_claim migration to schema init list
Some checks failed
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 42s
CI / deploy (push) Has been cancelled
The 06-oidc-display-name-claim.sql migration was not registered in
ClickHouseConfig.SCHEMA_FILES, so the ALTER TABLE never ran on
existing deployments, causing startup failure when the repository
tried to SELECT the missing column.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:24:31 +01:00
hsiegeln
86f905e672 Preserve created_at on user upsert to avoid accumulating un-merged rows
Some checks failed
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 42s
CI / deploy (push) Has been cancelled
On re-login the upsert was inserting a new row with created_at=now(),
causing ClickHouse ReplacingMergeTree to accumulate rows until
background compaction. Now preserves the original created_at via
INSERT...SELECT from the existing record.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:18:12 +01:00