Commit Graph

35 Commits

Author SHA1 Message Date
hsiegeln
796be06a09 fix: resolve all integration test failures after storage layer refactor
- Use singleton container pattern for PostgreSQL + OpenSearch testcontainers
  (fixes container lifecycle issues with @TestInstance(PER_CLASS))
- Fix table name route_executions → executions in DetailControllerIT and
  ExecutionControllerIT
- Serialize processor headers as JSON (ObjectMapper) instead of Map.toString()
  for JSONB column compatibility
- Add nested mapping for processors field in OpenSearch index template
- Use .keyword sub-field for term queries on dynamically mapped text fields
- Add wildcard fallback queries for all text searches (substring matching)
- Isolate stats tests with unique route names to prevent data contamination
- Wait for OpenSearch indexing in SearchControllerIT with targeted Awaitility
- Reduce OpenSearch debounce to 100ms in test profile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 00:02:19 +01:00
hsiegeln
26f5a2ce3b fix: update remaining ITs for synchronous ingestion and PostgreSQL storage
- SearchControllerIT: remove @TestInstance(PER_CLASS), use @BeforeEach with
  static guard, fix table name (route_executions -> executions), remove
  Awaitility polling
- OpenSearchIndexIT: replace Thread.sleep with explicit index refresh via
  OpenSearchClient
- DiagramLinkingIT: fix table name, remove Awaitility awaits (writes are
  synchronous)
- IngestionSchemaIT: rewrite queries for PostgreSQL relational model
  (processor_executions table instead of ClickHouse array columns)
- PostgresStatsStoreIT: use explicit time bounds in
  refresh_continuous_aggregate calls
- IngestionService: populate diagramContentHash during execution ingestion
  by looking up the latest diagram for the route+agent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 22:03:29 +01:00
hsiegeln
565b548ac1 refactor: remove all ClickHouse code, old interfaces, and SQL migrations
- Delete all ClickHouse storage implementations and config
- Delete old core interfaces (ExecutionRepository, DiagramRepository, MetricsRepository, SearchEngine, RawExecutionRow)
- Delete ClickHouse SQL migration files
- Delete AbstractClickHouseIT
- Update controllers to use new store interfaces (DiagramStore, ExecutionStore)
- Fix IngestionService calls in controllers for new synchronous API
- Migrate all ITs from AbstractClickHouseIT to AbstractPostgresIT
- Fix count() syntax and remove ClickHouse-specific test assertions
- Update TreeReconstructionTest for new buildTree() method

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:56:13 +01:00
hsiegeln
c48e0bdfde feat: implement debounced SearchIndexer for async OpenSearch indexing 2026-03-16 18:25:54 +01:00
hsiegeln
85ebe76111 refactor: IngestionService uses synchronous ExecutionStore writes with event publishing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:18:54 +01:00
hsiegeln
adf4b44d78 refactor: DetailService uses ExecutionStore, tree built from parentProcessorId 2026-03-16 18:18:53 +01:00
hsiegeln
84b93d74c7 refactor: SearchService uses SearchIndex + StatsStore instead of SearchEngine 2026-03-16 18:18:52 +01:00
hsiegeln
a55fc3c10d feat: add new storage interfaces for PostgreSQL/OpenSearch backends
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:16:53 +01:00
hsiegeln
55ed3be71a feat: add ExecutionDocument model and ExecutionUpdatedEvent
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:16:42 +01:00
hsiegeln
7778793e7b Add route diagram page with execution overlay and group-aware APIs
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 1m3s
CI / deploy (push) Successful in 31s
Backend: Add group filtering to agent list, search, stats, and timeseries
endpoints. Add diagram lookup by group+routeId. Resolve application group
to agent IDs server-side for ClickHouse IN-clause queries.

Frontend: New route detail page at /apps/{group}/routes/{routeId} with
three tabs (Diagram, Performance, Processor Tree). SVG diagram rendering
with panzoom, execution overlay (glow effects, duration/sequence badges,
flow particles, minimap), and processor detail panel. uPlot charts for
performance tab replacing old SVG sparklines. Ctrl+Click from
ExecutionExplorer navigates to route diagram with overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 21:35:42 +01:00
hsiegeln
b64edaa16f Server-side sorting for execution search results
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 50s
CI / deploy (push) Successful in 33s
Sorting now applies to the entire result set via ClickHouse ORDER BY
instead of only sorting the current page client-side. Default sort
order is timestamp descending. Supported sort columns: startTime,
status, agentId, routeId, correlationId, durationMs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 19:34:22 +01:00
hsiegeln
463cab1196 Add displayName to auth response and configurable display name claim for OIDC
Some checks failed
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 49s
CI / deploy (push) Failing after 2m9s
- Add displayName field to AuthTokenResponse so the UI shows human-readable
  names instead of internal JWT subjects (e.g. user:oidc:<hash>)
- Add displayNameClaim to OIDC config (default: "name") allowing admins to
  configure which ID token claim contains the user's display name
- Support dot-separated claim paths (e.g. profile.display_name) like rolesClaim
- Add admin UI field for Display Name Claim on the OIDC config page
- ClickHouse migration: ALTER TABLE adds display_name_claim column

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:09:24 +01:00
hsiegeln
6676e209c7 Fix OIDC login immediate logout — rename JWT subject prefix ui: → user:
All checks were successful
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 30s
OIDC tokens had subject "oidc:<sub>" which didn't match the "ui:" prefix
check in JwtAuthenticationFilter, causing every post-login API call to
return 401 and trigger automatic logout. Renamed the prefix from "ui:"
to "user:" across all auth code for clarity (it covers both browser and
API clients, not just UI).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 15:55:10 +01:00
hsiegeln
0c47ac9b1a Add OIDC admin config page with auto-signup toggle
Some checks failed
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 50s
CI / deploy (push) Failing after 2m10s
Backend: add autoSignup field to OidcConfig, ClickHouse schema, repository,
and admin controller. Gate OIDC login when auto-signup is disabled and user
is not pre-created (returns 403).

Frontend: add OIDC admin page with full CRUD (save/test/delete), role-gated
Admin nav link parsed from JWT, and matching design system styles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 13:56:02 +01:00
hsiegeln
9d2e6f30a7 Move OIDC config from env vars to database with admin API
All checks were successful
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 2m11s
OIDC provider settings (issuer, client ID/secret, roles claim) are
now stored in ClickHouse and managed via admin REST API at
/api/v1/admin/oidc. This allows runtime configuration from the UI
without server restarts.

- New oidc_config table (ReplacingMergeTree, singleton row)
- OidcConfig record + OidcConfigRepository interface in core
- ClickHouseOidcConfigRepository implementation
- OidcConfigAdminController: GET/PUT/DELETE config, POST test
  connectivity, client_secret masked in responses
- OidcTokenExchanger: reads config from DB, invalidateCache()
  on config change
- OidcAuthController: always registered (no @ConditionalOnProperty),
  returns 404 when OIDC not configured
- Startup seeder: env vars seed DB on first boot only, then admin
  API takes over
- HOWTO.md updated with admin OIDC config API examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 13:01:05 +01:00
hsiegeln
a4de2a7b79 Add RBAC with role-based endpoint authorization and OIDC support
Some checks failed
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m38s
CI / deploy (push) Has been cancelled
Implement three-phase security upgrade:

Phase 1 - RBAC: Extend JWT with roles claim, populate Spring
GrantedAuthority in filter, enforce role-based access (AGENT for
data/heartbeat/SSE, VIEWER+ for search/diagrams, OPERATOR+ for
commands, ADMIN for user management). Configurable JWT secret via
CAMELEER_JWT_SECRET env var for token persistence across restarts.

Phase 2 - User persistence: ClickHouse users table with
ReplacingMergeTree, UserRepository interface + ClickHouse impl,
UserAdminController for CRUD at /api/v1/admin/users. Local login
upserts user on each authentication.

Phase 3 - OIDC: Token exchange flow where SPA sends auth code,
server exchanges it server-side (keeping client_secret secure),
validates id_token via JWKS, resolves roles (DB override > OIDC
claim > default), issues internal JWT. Conditional on
CAMELEER_OIDC_ENABLED=true. Uses oauth2-oidc-sdk for standards
compliance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 12:35:45 +01:00
hsiegeln
3641dffecc Add comparison stats: failure rate %, vs-yesterday change, today total
All checks were successful
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 48s
CI / deploy (push) Successful in 37s
Stats endpoint now returns current + previous period (24h shift) values
plus today's total count. UI shows:
- Total Matches: "of 12.3K today"
- Avg Duration: arrow + % vs yesterday
- Failure Rate: percentage of errors vs total, arrow + % vs yesterday
- P99 Latency: arrow + % vs yesterday
- In-Flight: unchanged (running executions)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 09:29:14 +01:00
hsiegeln
393d19e3f4 Move failed count and avg duration from page-derived to backend stats
Some checks failed
CI / build (push) Successful in 1m11s
CI / deploy (push) Has been cancelled
CI / docker (push) Has been cancelled
All stat card values now come from the /search/stats endpoint which
queries the full time window, not just the current page of results.
Consolidated into a single ClickHouse query for efficiency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:51:43 +01:00
hsiegeln
cdf4c93630 Make stats endpoint respect selected time window instead of hardcoded last hour
All checks were successful
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 48s
CI / deploy (push) Successful in 28s
P99 latency and active count now use the same from/to parameters as the
timeseries sparklines, so all stat cards are consistent with the user's
selected time range.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:19:59 +01:00
hsiegeln
9e6e1b350a Add stat card sparkline graphs with timeseries backend endpoint
All checks were successful
CI / build (push) Successful in 1m0s
CI / docker (push) Successful in 45s
CI / deploy (push) Successful in 23s
New /search/stats/timeseries endpoint returns bucketed counts/metrics
over a time window using ClickHouse toStartOfInterval(). Frontend
Sparkline component renders SVG polyline + gradient fill on each
stat card, driven by a useStatsTimeseries query hook.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:20:08 +01:00
hsiegeln
3f98467ba5 Fix status filter OR logic and add P99/active stats endpoint
All checks were successful
CI / build (push) Successful in 1m1s
CI / docker (push) Successful in 47s
CI / deploy (push) Successful in 29s
Status filter now parses comma-separated values into SQL IN clause
instead of exact match, so filtering by multiple statuses works.

Added GET /api/v1/search/stats returning P99 latency (last hour) and
active execution count, wired into the UI stat cards with 10s polling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 17:34:11 +01:00
hsiegeln
86e016874a Fix command palette: agent ID propagation, result selection, and scope tabs
All checks were successful
CI / build (push) Successful in 59s
CI / docker (push) Successful in 46s
CI / deploy (push) Successful in 25s
- Propagate authenticated agent identity through write buffers via
  TaggedExecution/TaggedDiagram wrappers so ClickHouse rows get real
  agent IDs instead of empty strings
- Add execution_id to text search LIKE clause so selecting an execution
  by ID in the palette actually finds it
- Clear status filter to all three statuses on palette selection so the
  chosen execution/agent isn't filtered out
- Add disabled Routes and Exchanges scope tabs with "coming soon" state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 17:13:14 +01:00
hsiegeln
64b03a4e2f Add Cmd+K command palette for searching executions and agents
All checks were successful
CI / build (push) Successful in 59s
CI / docker (push) Successful in 56s
CI / deploy (push) Successful in 26s
Backend: add routeId, agentId, processorType filter fields to SearchRequest
and ClickHouseSearchEngine. Expand global text search to match route_id and
agent_id columns.

Frontend: new command palette component (portal overlay, Zustand store,
TanStack Query search hook with 300ms debounce, filter chip parsing,
keyboard navigation, scope tabs). Search bar in SearchFilters and TopNav
now open the palette. Selecting a result writes filters to the execution
search store to drive the results table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 16:28:16 +01:00
hsiegeln
51a02700dd test(04-01): add failing tests for security services
- JwtService: 7 tests for access/refresh token creation and validation
- Ed25519SigningService: 5 tests for keypair, signing, verification
- BootstrapTokenValidator: 6 tests for token matching and rotation
- Core interfaces and stub implementations (all throw UnsupportedOperationException)
- Added nimbus-jose-jwt and spring-boot-starter-security dependencies

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:58:59 +01:00
hsiegeln
61f39021b3 feat(03-01): implement agent registry service and domain types
- AgentRegistryService: register, heartbeat, lifecycle, commands
- ConcurrentHashMap with atomic record-swapping for thread safety
- LIVE->STALE->DEAD lifecycle transitions via checkLifecycle()
- Heartbeat revives STALE agents back to LIVE
- Command queue with PENDING/DELIVERED/ACKNOWLEDGED/EXPIRED states
- AgentEventListener callback for SSE bridge integration
- All 23 unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:30:02 +01:00
hsiegeln
4cd7ed9e9a test(03-01): add failing tests for agent registry service
- 23 unit tests covering registration, heartbeat, lifecycle, queries, commands
- Domain types: AgentInfo, AgentState, AgentCommand, CommandStatus, CommandType
- AgentEventListener interface for SSE bridge
- AgentRegistryService stub with UnsupportedOperationException

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 18:28:28 +01:00
hsiegeln
0615a9851d feat(02-03): detail controller, tree reconstruction, processor snapshot endpoint
- Implement findRawById and findProcessorSnapshot in ClickHouseExecutionRepository
- DetailController with GET /executions/{id} returning nested processor tree
- GET /executions/{id}/processors/{index}/snapshot for per-processor exchange data
- 5 unit tests for tree reconstruction (linear, branching, multiple roots, empty)
- 6 integration tests for detail endpoint, snapshot, and 404 handling
- Added assertj and mockito test dependencies to core module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:29:53 +01:00
hsiegeln
c0922430c4 test(02-01): add failing IngestionSchemaIT for new column population
- Tests processor tree metadata (depths, parent indexes)
- Tests exchange body concatenation for search
- Tests null snapshot graceful handling
- AbstractClickHouseIT loads 02-search-columns.sql
- DiagramRenderer/DiagramLayout stubs to fix pre-existing compilation error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:09:45 +01:00
hsiegeln
044259535a feat(02-01): add schema extension and core search/detail domain types
- ClickHouse schema migration SQL with 12 new columns and skip indexes
- SearchRequest, SearchResult, ExecutionSummary records in core search package
- SearchEngine interface for swappable search backend (ClickHouse/OpenSearch)
- SearchService orchestration layer
- ProcessorNode, ExecutionDetail, RawExecutionRow, DetailService in core detail package
- DetailService reconstructs nested processor tree from flat parallel arrays
- ExecutionRepository extended with findRawById query method

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:05:14 +01:00
hsiegeln
6df74505be feat(02-02): add ELK/JFreeSVG dependencies and core diagram rendering interfaces
- Add Eclipse ELK core + layered algorithm and JFreeSVG dependencies to app module
- Create DiagramRenderer interface in core with renderSvg and layoutJson methods
- Create DiagramLayout, PositionedNode, PositionedEdge records for layout data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:04:32 +01:00
hsiegeln
17a18cf6da feat(01-02): add IngestionService, ClickHouse repositories, and flush scheduler
- IngestionService routes data to WriteBuffer instances (core module, plain class)
- ClickHouseExecutionRepository: batch insert with parallel processor arrays
- ClickHouseDiagramRepository: JSON storage with SHA-256 content-hash dedup
- ClickHouseMetricsRepository: batch insert for agent_metrics table
- ClickHouseFlushScheduler: scheduled drain with SmartLifecycle shutdown flush
- IngestionBeanConfig: wires WriteBuffer and IngestionService beans

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 12:08:36 +01:00
hsiegeln
cc1c082adb feat(01-01): add WriteBuffer, repository interfaces, and config classes
- WriteBuffer<T> with offer/offerBatch/drain and backpressure (all tests green)
- ExecutionRepository, DiagramRepository, MetricsRepository interfaces
- MetricsSnapshot record for agent metrics data
- IngestionConfig for buffer-capacity/batch-size/flush-interval-ms properties
- ClickHouseConfig exposing JdbcTemplate bean

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 11:49:25 +01:00
hsiegeln
f37009e380 test(01-01): add failing WriteBuffer unit tests
- Test offer/offerBatch/drain/isFull/size/capacity/remainingCapacity
- Tests fail because WriteBuffer class does not exist yet (TDD RED)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 11:47:50 +01:00
hsiegeln
96c52b88d1 feat(01-01): add ClickHouse dependencies, Docker Compose, schema, and app config
- Add clickhouse-jdbc, springdoc-openapi, actuator, testcontainers deps
- Add slf4j-api to core module
- Create Docker Compose with ClickHouse service on ports 8123/9000
- Create ClickHouse DDL: route_executions, route_diagrams, agent_metrics
- Configure application.yml with datasource, ingestion buffer, springdoc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 11:47:20 +01:00
hsiegeln
db17f02fcc Scaffold cameleer3-server project structure
Some checks failed
CI / build (push) Failing after 3s
Multi-module Maven project (server-core + server-app) with Spring Boot 3.4.3,
Gitea CI workflow, and dependency on cameleer3-common from Gitea Maven registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 10:06:17 +01:00