Commit Graph

71 Commits

Author SHA1 Message Date
hsiegeln
7dbfaf0932 feat: wire new storage beans, add MetricsFlushScheduler and RetentionScheduler
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:27:58 +01:00
hsiegeln
f7d7302694 feat: implement OpenSearchIndex with full-text and wildcard search
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:25:55 +01:00
hsiegeln
5932b5d969 feat: implement PostgresDiagramStore, PostgresUserRepository, PostgresOidcConfigRepository, PostgresMetricsStore
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:23:21 +01:00
hsiegeln
527e2cf017 feat: implement PostgresStatsStore querying continuous aggregates 2026-03-16 18:22:44 +01:00
hsiegeln
9fd02c4edb feat: implement PostgresExecutionStore with upsert and dedup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:20:57 +01:00
hsiegeln
41a9a975fd config: switch datasource to PostgreSQL, add OpenSearch and Flyway config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:15:33 +01:00
hsiegeln
8a637df65c feat: add Flyway migrations for PostgreSQL/TimescaleDB schema
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:13:53 +01:00
hsiegeln
0b56590e3f Fix Swagger UI CORS: add /api/v1 server URL to OpenAPI spec
All checks were successful
CI / build (push) Successful in 1m15s
CI / docker (push) Successful in 44s
CI / deploy (push) Successful in 29s
The empty servers list caused Swagger UI to construct request URLs
without the /api/v1 prefix, resulting in CORS/fetch failures.
Adding a relative server entry makes paths resolve correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 14:59:12 +01:00
hsiegeln
48d944354a Fix ClickHouse OOM: PREWHERE on active-count query + per-query memory limits
All checks were successful
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 33s
The active-count query scanned all wide rows on the base table, exceeding
the 3.6 GiB memory limit. Use PREWHERE status = 'RUNNING' so ClickHouse
reads only the status column first. Add SETTINGS max_memory_usage = 1 GiB
to all queries so concurrent requests degrade gracefully instead of crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:55:26 +01:00
hsiegeln
5ad0c75da8 Truncate rollup params to second precision for DateTime column
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 42s
CI / deploy (push) Successful in 29s
The JDBC driver sends java.sql.Timestamp with nanoseconds as a string
(e.g. '2026-03-15 10:13:58.105931162') which DateTime('UTC') rejects.
Add bucketTimestamp() helper that truncates to seconds for all rollup
query parameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:15:42 +01:00
hsiegeln
8e6f8e2693 Fix bucket alignment: compute 5-min floor in Java, not ClickHouse SQL
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 40s
CI / deploy (push) Successful in 31s
JDBC sends Timestamp params as strings, causing toStartOfFiveMinutes()
to fail with 'Illegal type String'. Floor to 5-minute boundaries in
Java instead and pass plain bucket >= ? comparisons.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:10:40 +01:00
hsiegeln
f660e88a17 Fix rollup queries: alias shadowed AggregateFunction column name
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 31s
countMerge(total_count) in the avg expression resolved to the UInt64
alias 'total_count' instead of the AggregateFunction column. Rename
SELECT aliases (cnt, failed, avg_ms, p99_ms) to avoid shadowing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 11:05:10 +01:00
hsiegeln
035356288f Fix stats rollup: AggregateFunction(count) takes no type argument
All checks were successful
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 40s
CI / deploy (push) Successful in 31s
ClickHouse count() accepts no arguments, so the column type must be
AggregateFunction(count) not AggregateFunction(count, UInt64). The
latter causes countMerge() to fail with ILLEGAL_TYPE_OF_ARGUMENT.
Drop and recreate the table/MV to apply the corrected schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:52:29 +01:00
hsiegeln
adf13f0430 Add 5-minute AggregatingMergeTree stats rollup for dashboard queries
All checks were successful
CI / build (push) Successful in 1m14s
CI / docker (push) Successful in 42s
CI / deploy (push) Successful in 30s
Pre-aggregate route execution stats into 5-minute buckets using a
materialized view with -State/-Merge combinators. Rewrite stats() and
timeseries() to query the rollup table instead of scanning the wide
base table. Active count remains a real-time query since RUNNING is
transient. Includes idempotent backfill migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:46:26 +01:00
6f7c92f793 cameleer3-server-app/src/main/java/com/cameleer3/server/app/diagram/ElkDiagramRenderer.java aktualisiert
All checks were successful
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 40s
CI / deploy (push) Successful in 30s
increased node spacing to 90
2026-03-15 10:37:56 +01:00
hsiegeln
520590fbf4 Increase ELK node spacing and revert frontend node height to 40
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 51s
CI / deploy (push) Successful in 33s
NODE_SPACING 40→60 gives edges more vertical room between nodes.
FIXED_H reverted to 40 to match backend NODE_HEIGHT.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:31:06 +01:00
hsiegeln
7778793e7b Add route diagram page with execution overlay and group-aware APIs
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 1m3s
CI / deploy (push) Successful in 31s
Backend: Add group filtering to agent list, search, stats, and timeseries
endpoints. Add diagram lookup by group+routeId. Resolve application group
to agent IDs server-side for ClickHouse IN-clause queries.

Frontend: New route detail page at /apps/{group}/routes/{routeId} with
three tabs (Diagram, Performance, Processor Tree). SVG diagram rendering
with panzoom, execution overlay (glow effects, duration/sequence badges,
flow particles, minimap), and processor detail panel. uPlot charts for
performance tab replacing old SVG sparklines. Ctrl+Click from
ExecutionExplorer navigates to route diagram with overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 21:35:42 +01:00
hsiegeln
b64edaa16f Server-side sorting for execution search results
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 50s
CI / deploy (push) Successful in 33s
Sorting now applies to the entire result set via ClickHouse ORDER BY
instead of only sorting the current page client-side. Default sort
order is timestamp descending. Supported sort columns: startTime,
status, agentId, routeId, correlationId, durationMs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 19:34:22 +01:00
hsiegeln
31b8695420 Skip user upsert when nothing changed to avoid row accumulation
All checks were successful
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 45s
CI / deploy (push) Successful in 30s
ReplacingMergeTree only deduplicates during background merges, so
every login was inserting a new row even when all fields were identical.
Now compares the existing record and skips the write if nothing changed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:37:06 +01:00
hsiegeln
dbf53aa8e8 Auto-discover ClickHouse migration files instead of hardcoded list
All checks were successful
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 29s
Replace the static SCHEMA_FILES array with classpath pattern matching
(classpath:clickhouse/*.sql). Migration files are discovered and sorted
by filename, so adding a new numbered .sql file is all that's needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:26:33 +01:00
hsiegeln
9f9e677103 Add display_name_claim migration to schema init list
Some checks failed
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 42s
CI / deploy (push) Has been cancelled
The 06-oidc-display-name-claim.sql migration was not registered in
ClickHouseConfig.SCHEMA_FILES, so the ALTER TABLE never ran on
existing deployments, causing startup failure when the repository
tried to SELECT the missing column.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:24:31 +01:00
hsiegeln
86f905e672 Preserve created_at on user upsert to avoid accumulating un-merged rows
Some checks failed
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 42s
CI / deploy (push) Has been cancelled
On re-login the upsert was inserting a new row with created_at=now(),
causing ClickHouse ReplacingMergeTree to accumulate rows until
background compaction. Now preserves the original created_at via
INSERT...SELECT from the existing record.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:18:12 +01:00
hsiegeln
a6f94e8a70 Full OIDC logout with id_token_hint for provider session termination
Some checks failed
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 48s
CI / deploy (push) Has been cancelled
Return the OIDC id_token in the callback response so the frontend can
store it and pass it as id_token_hint to the provider's end-session
endpoint on logout. This lets Authentik (or any OIDC provider) honor
the post_logout_redirect_uri and redirect back to the Cameleer login
page instead of showing the provider's own logout page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:14:07 +01:00
hsiegeln
463cab1196 Add displayName to auth response and configurable display name claim for OIDC
Some checks failed
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 49s
CI / deploy (push) Failing after 2m9s
- Add displayName field to AuthTokenResponse so the UI shows human-readable
  names instead of internal JWT subjects (e.g. user:oidc:<hash>)
- Add displayNameClaim to OIDC config (default: "name") allowing admins to
  configure which ID token claim contains the user's display name
- Support dot-separated claim paths (e.g. profile.display_name) like rolesClaim
- Add admin UI field for Display Name Claim on the OIDC config page
- ClickHouse migration: ALTER TABLE adds display_name_claim column

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:09:24 +01:00
hsiegeln
6676e209c7 Fix OIDC login immediate logout — rename JWT subject prefix ui: → user:
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 30s
OIDC tokens had subject "oidc:<sub>" which didn't match the "ui:" prefix
check in JwtAuthenticationFilter, causing every post-login API call to
return 401 and trigger automatic logout. Renamed the prefix from "ui:"
to "user:" across all auth code for clarity (it covers both browser and
API clients, not just UI).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 15:55:10 +01:00
hsiegeln
465f210aee Contract-first API with DTOs, validation, and server-side OpenAPI post-processing
All checks were successful
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 2m6s
CI / deploy (push) Successful in 30s
Add dedicated request/response DTOs for all controllers, replacing raw
JsonNode parameters with validated types. Move OpenAPI path-prefix stripping
and ProcessorNode children injection into OpenApiCustomizer beans so the
spec served at /api/v1/api-docs is already clean — eliminating the need for
the ui/scripts/process-openapi.mjs post-processing script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 15:33:37 +01:00
hsiegeln
50bb22d6f6 Add OIDC logout, fix OpenAPI schema types, expose end_session_endpoint
All checks were successful
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 51s
CI / deploy (push) Successful in 29s
Backend:
- Expose end_session_endpoint from OIDC provider metadata in /auth/oidc/config
- Add getEndSessionEndpoint() to OidcTokenExchanger

Frontend:
- On OIDC logout, redirect to provider's end_session_endpoint to clear SSO session
- Strip /api/v1 prefix from OpenAPI paths to match client baseUrl convention
- Add schema-types.ts with convenience type re-exports from generated schema
- Fix all type imports to use schema-types instead of raw generated schema
- Fix optional field access (processors, children, duration) with proper typing
- Fix AgentInstance.state → status field name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 14:43:18 +01:00
hsiegeln
0d82304cf0 Fix SPA forward for /oidc/callback and /admin/* routes
Some checks failed
CI / build (push) Failing after 47s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
The SPA catch-all was missing these paths, causing 404 when Authentik
redirected back to /oidc/callback after authentication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 14:33:14 +01:00
hsiegeln
84f4c505a2 Add OIDC login flow to UI and fix dark mode datetime picker icons
All checks were successful
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 50s
CI / deploy (push) Successful in 31s
- Add "Sign in with SSO" button on login page (shown when OIDC is configured)
- Add /oidc/callback route to exchange authorization code for JWT tokens
- Add loginWithOidcCode action to auth store
- Treat issuer URI as complete discovery URL (no auto-append of .well-known)
- Update admin page placeholder to show full discovery URL format
- Fix datetime picker calendar icon visibility in dark mode (color-scheme)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 14:19:06 +01:00
hsiegeln
b024f83c26 Add ALTER TABLE migration for auto_signup column on existing ClickHouse
All checks were successful
CI / build (push) Successful in 1m14s
CI / docker (push) Successful in 43s
CI / deploy (push) Successful in 31s
The CREATE TABLE IF NOT EXISTS won't add new columns to an existing table.
Add 05-oidc-auto-signup.sql with ALTER TABLE ADD COLUMN IF NOT EXISTS and
register it in ClickHouseConfig startup schema + test init.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 14:01:45 +01:00
hsiegeln
0c47ac9b1a Add OIDC admin config page with auto-signup toggle
Some checks failed
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 50s
CI / deploy (push) Failing after 2m10s
Backend: add autoSignup field to OidcConfig, ClickHouse schema, repository,
and admin controller. Gate OIDC login when auto-signup is disabled and user
is not pre-created (returns 403).

Frontend: add OIDC admin page with full CRUD (save/test/delete), role-gated
Admin nav link parsed from JWT, and matching design system styles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 13:56:02 +01:00
hsiegeln
9d2e6f30a7 Move OIDC config from env vars to database with admin API
All checks were successful
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 2m11s
OIDC provider settings (issuer, client ID/secret, roles claim) are
now stored in ClickHouse and managed via admin REST API at
/api/v1/admin/oidc. This allows runtime configuration from the UI
without server restarts.

- New oidc_config table (ReplacingMergeTree, singleton row)
- OidcConfig record + OidcConfigRepository interface in core
- ClickHouseOidcConfigRepository implementation
- OidcConfigAdminController: GET/PUT/DELETE config, POST test
  connectivity, client_secret masked in responses
- OidcTokenExchanger: reads config from DB, invalidateCache()
  on config change
- OidcAuthController: always registered (no @ConditionalOnProperty),
  returns 404 when OIDC not configured
- Startup seeder: env vars seed DB on first boot only, then admin
  API takes over
- HOWTO.md updated with admin OIDC config API examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 13:01:05 +01:00
hsiegeln
a4de2a7b79 Add RBAC with role-based endpoint authorization and OIDC support
Some checks failed
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m38s
CI / deploy (push) Has been cancelled
Implement three-phase security upgrade:

Phase 1 - RBAC: Extend JWT with roles claim, populate Spring
GrantedAuthority in filter, enforce role-based access (AGENT for
data/heartbeat/SSE, VIEWER+ for search/diagrams, OPERATOR+ for
commands, ADMIN for user management). Configurable JWT secret via
CAMELEER_JWT_SECRET env var for token persistence across restarts.

Phase 2 - User persistence: ClickHouse users table with
ReplacingMergeTree, UserRepository interface + ClickHouse impl,
UserAdminController for CRUD at /api/v1/admin/users. Local login
upserts user on each authentication.

Phase 3 - OIDC: Token exchange flow where SPA sends auth code,
server exchanges it server-side (keeping client_secret secure),
validates id_token via JWKS, resolves roles (DB override > OIDC
claim > default), issues internal JWT. Conditional on
CAMELEER_OIDC_ENABLED=true. Uses oauth2-oidc-sdk for standards
compliance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 12:35:45 +01:00
hsiegeln
cb600be1f1 Fix nan-to-int64 crash when avg/quantile runs on empty result set
All checks were successful
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 39s
CI / deploy (push) Successful in 29s
ClickHouse avg() and quantile() return nan/inf on zero rows, which
toInt64() cannot convert. Wrap with ifNotFinite(..., 0) to default to
zero. Applied to both stats and timeseries queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 09:37:19 +01:00
hsiegeln
3641dffecc Add comparison stats: failure rate %, vs-yesterday change, today total
All checks were successful
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 48s
CI / deploy (push) Successful in 37s
Stats endpoint now returns current + previous period (24h shift) values
plus today's total count. UI shows:
- Total Matches: "of 12.3K today"
- Avg Duration: arrow + % vs yesterday
- Failure Rate: percentage of errors vs total, arrow + % vs yesterday
- P99 Latency: arrow + % vs yesterday
- In-Flight: unchanged (running executions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 09:29:14 +01:00
hsiegeln
393d19e3f4 Move failed count and avg duration from page-derived to backend stats
Some checks failed
CI / build (push) Successful in 1m11s
CI / deploy (push) Has been cancelled
CI / docker (push) Has been cancelled
All stat card values now come from the /search/stats endpoint which
queries the full time window, not just the current page of results.
Consolidated into a single ClickHouse query for efficiency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:51:43 +01:00
hsiegeln
4cdf2ac012 Fix ClickHouse OOM on batch insert: reduce batch size, increase memory
All checks were successful
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 46s
Execution rows are wide (29 cols with serialized arrays/JSON), so 500
rows can exceed ClickHouse's memory limit. Reduce default batch size
from 500 to 100 and bump ClickHouse memory limit from 2Gi to 4Gi.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:44:03 +01:00
hsiegeln
f156a2aab0 Fix quantile overflow: wrap p99 query with toInt64() for JDBC compat
Some checks failed
CI / build (push) Successful in 1m11s
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
quantile(0.99) returns Float64 which ClickHouse JDBC cannot cast to
Long directly. Same toInt64() pattern already used in timeseries query.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:42:52 +01:00
hsiegeln
cdf4c93630 Make stats endpoint respect selected time window instead of hardcoded last hour
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 48s
CI / deploy (push) Successful in 28s
P99 latency and active count now use the same from/to parameters as the
timeseries sparklines, so all stat cards are consistent with the user's
selected time range.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:19:59 +01:00
hsiegeln
d1940b98e5 Fix timeseries query: use epoch-based bucketing for DateTime64 compatibility
All checks were successful
CI / build (push) Successful in 1m1s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 29s
Replace toStartOfInterval with intDiv on epoch seconds, and cast
avg/quantile results to Int64 to avoid Float64 JDBC mapping issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:33:14 +01:00
hsiegeln
9e6e1b350a Add stat card sparkline graphs with timeseries backend endpoint
All checks were successful
CI / build (push) Successful in 1m0s
CI / docker (push) Successful in 45s
CI / deploy (push) Successful in 23s
New /search/stats/timeseries endpoint returns bucketed counts/metrics
over a time window using ClickHouse toStartOfInterval(). Frontend
Sparkline component renders SVG polyline + gradient fill on each
stat card, driven by a useStatsTimeseries query hook.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:20:08 +01:00
hsiegeln
4253751ef1 Redirect to login on expired/invalid auth
All checks were successful
CI / build (push) Successful in 1m1s
CI / docker (push) Successful in 46s
CI / deploy (push) Successful in 29s
Backend now returns 401 instead of 403 for unauthenticated requests
via HttpStatusEntryPoint. UI middleware handles both 401 and 403,
triggering token refresh and redirecting to /login on failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 17:39:29 +01:00
hsiegeln
3f98467ba5 Fix status filter OR logic and add P99/active stats endpoint
All checks were successful
CI / build (push) Successful in 1m1s
CI / docker (push) Successful in 47s
CI / deploy (push) Successful in 29s
Status filter now parses comma-separated values into SQL IN clause
instead of exact match, so filtering by multiple statuses works.

Added GET /api/v1/search/stats returning P99 latency (last hour) and
active execution count, wired into the UI stat cards with 10s polling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 17:34:11 +01:00
hsiegeln
86e016874a Fix command palette: agent ID propagation, result selection, and scope tabs
All checks were successful
CI / build (push) Successful in 59s
CI / docker (push) Successful in 46s
CI / deploy (push) Successful in 25s
- Propagate authenticated agent identity through write buffers via
  TaggedExecution/TaggedDiagram wrappers so ClickHouse rows get real
  agent IDs instead of empty strings
- Add execution_id to text search LIKE clause so selecting an execution
  by ID in the palette actually finds it
- Clear status filter to all three statuses on palette selection so the
  chosen execution/agent isn't filtered out
- Add disabled Routes and Exchanges scope tabs with "coming soon" state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 17:13:14 +01:00
hsiegeln
64b03a4e2f Add Cmd+K command palette for searching executions and agents
All checks were successful
CI / build (push) Successful in 59s
CI / docker (push) Successful in 56s
CI / deploy (push) Successful in 26s
Backend: add routeId, agentId, processorType filter fields to SearchRequest
and ClickHouseSearchEngine. Expand global text search to match route_id and
agent_id columns.

Frontend: new command palette component (portal overlay, Zustand store,
TanStack Query search hook with 300ms debounce, filter chip parsing,
keyboard navigation, scope tabs). Search bar in SearchFilters and TopNav
now open the palette. Selecting a result writes filters to the execution
search store to drive the results table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 16:28:16 +01:00
hsiegeln
3eb83f97d3 Add React UI with Execution Explorer, auth, and standalone deployment
Some checks failed
CI / build (push) Failing after 1m53s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
- Scaffold Vite + React + TypeScript frontend in ui/ with full design
  system (dark/light themes) matching the HTML mockups
- Implement Execution Explorer page: search filters, results table with
  expandable processor tree and exchange detail sidebar, pagination
- Add UI authentication: UiAuthController (login/refresh endpoints),
  JWT filter handles ui: subject prefix, CORS configuration
- Shared components: StatusPill, DurationBar, StatCard, AppBadge,
  FilterChip, Pagination — all using CSS Modules with design tokens
- API client layer: openapi-fetch with auth middleware, TanStack Query
  hooks for search/detail/snapshot queries, Zustand for state
- Standalone deployment: Nginx Dockerfile, K8s Deployment + ConfigMap +
  NodePort (30080), runtime config.js for API base URL
- Embedded mode: maven-resources-plugin copies ui/dist into JAR static
  resources, SPA forward controller for client-side routing
- CI/CD: UI build step, Docker build/push for server-ui image, K8s
  deploy step for UI, UI credential secrets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:59:22 +01:00
hsiegeln
88da1a0dd8 Fix ClickHouse OOM during Proxmox nightly backups
All checks were successful
CI / build (push) Successful in 48s
CI / docker (push) Successful in 1m36s
CI / deploy (push) Successful in 20s
Increase ClickHouse memory limit from 1Gi to 2Gi and reduce default
batch size from 5000 to 500. During VM backup snapshots, I/O contention
prevents ClickHouse from flushing writes fast enough, causing buffer
accumulation that exceeds the 1Gi container limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:55:46 +01:00
hsiegeln
a44a0c970b Revert to JdbcTemplate for schema init, keep comment-stripping fix
All checks were successful
CI / build (push) Successful in 48s
CI / docker (push) Successful in 36s
CI / deploy (push) Successful in 9s
The DriverManager-based approach likely failed because the ClickHouse
JDBC driver wasn't registered with DriverManager. The original
JdbcTemplate approach worked for route_diagrams and agent_metrics —
only route_executions was skipped due to the comment-parsing bug.

Reverts to simple JdbcTemplate-based init with unqualified table names
(DataSource targets cameleer3 database). The CLICKHOUSE_DB env var on
the ClickHouse container handles database creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:05:12 +01:00
hsiegeln
a2cbd115ee Fix SQL parser skipping statements that follow comment lines
All checks were successful
CI / build (push) Successful in 47s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 10s
split(';') produced chunks starting with '--' comment lines, causing
the startsWith('--') check to skip the entire CREATE TABLE statement
for route_executions. Now strips comment lines before splitting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:55:36 +01:00
hsiegeln
ce0eb58b0c Fix schema init: bypass DataSource, use direct JDBC with qualified table names
All checks were successful
CI / build (push) Successful in 48s
CI / docker (push) Successful in 40s
CI / deploy (push) Successful in 7s
The auto-configured DataSource targets jdbc:ch://.../cameleer3 which fails
if the database doesn't exist yet. Schema init now uses a direct JDBC
connection to the root URL, creates the database first, then applies all
schema SQL with fully qualified cameleer3.* table names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:50:47 +01:00