Commit Graph

77 Commits

Author SHA1 Message Date
hsiegeln
ca92b3ce7d feat: add CAMELEER_OIDC_TLS_SKIP_VERIFY to bypass cert verification for OIDC
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Self-signed CA certs on the OIDC provider (e.g. Logto behind a reverse
proxy) cause the login flow to fail because Java's truststore rejects
the connection. This adds an opt-in env var that creates a trust-all
SSLContext scoped to OIDC HTTP calls only (discovery, token exchange,
JWKS fetch) without affecting system-wide TLS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:26:40 +02:00
hsiegeln
3c70313d78 feat: add CAMELEER_OIDC_JWK_SET_URI for direct JWKS fetching
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
When set, fetches JWKs from this URL directly instead of discovering
from the OIDC well-known endpoint. Needed when the public issuer URL
(e.g., https://domain.com/oidc) isn't reachable from inside containers
but the internal URL (http://logto:3001/oidc/jwks) is.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 21:02:51 +02:00
hsiegeln
a5c4e0cead feat: add spring-boot-starter-oauth2-resource-server and OIDC properties
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:06:53 +02:00
hsiegeln
de85cdf5a2 fix: let SPRING_DATASOURCE_URL fully control datasource connection
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m26s
Explicit spring.datasource.url in YAML takes precedence over the env var,
causing deployed containers to connect to localhost instead of the postgres
service. Now the YAML uses ${SPRING_DATASOURCE_URL:...} so the env var
wins when set. Flyway inherits from the datasource (no separate URL).
Removed CAMELEER_DB_SCHEMA — schema is part of the datasource URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:24:22 +02:00
hsiegeln
ac87aa6eb2 fix: derive PG schema from tenant ID instead of defaulting to public
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m17s
Schema now defaults to tenant_${cameleer.tenant.id} (e.g. tenant_default,
tenant_acme) instead of public. Flyway create-schemas: true ensures the
schema is auto-created on first startup. CAMELEER_DB_SCHEMA env var still
available as override for feature branch isolation. Removed hardcoded
public schema from K8s base and main overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:46:57 +02:00
hsiegeln
a188308ec5 feat: implement multitenancy with tenant isolation + environment support
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m25s
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.

Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
  "default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
  updated ORDER BY (tenant→time→env→app), added tenant_id to
  usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures

Closes #123

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:00:18 +02:00
hsiegeln
ac94a67a49 fix: reduce ClickHouse CPU by increasing flush interval, rename LIVE→AUTO labels
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m24s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:05:29 +02:00
hsiegeln
d4327af6a4 refactor: consolidate ClickHouse schema into single init.sql, cache diagrams
All checks were successful
CI / build (push) Successful in 2m2s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- Merge all V1-V11 migration scripts into one idempotent init.sql
- Simplify ClickHouseSchemaInitializer to load single file
- Replace route_diagrams projection with in-memory caches:
  hashCache (routeId+instanceId → contentHash) warm-loaded on startup,
  graphCache (contentHash → RouteGraph) lazy-populated on access
- Eliminates 9M+ row scans on diagram lookups

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:24:53 +02:00
hsiegeln
bb3e1e2bc3 fix: set deduplicate_merge_projection_mode for ReplacingMergeTree projection
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
ClickHouse 24.12 requires this setting before adding projections to
ReplacingMergeTree tables. Using 'drop' mode which discards the projection
during deduplication merges and rebuilds it afterward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:14:56 +02:00
hsiegeln
6f00ff2e28 fix: reduce ClickHouse log noise, admin query spam, and diagram scan perf
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
- Set com.clickhouse log level to INFO and org.apache.hc.client5 to WARN
- Admin hooks (useUsers/useGroups/useRoles) now only fetch on admin pages,
  eliminating AUDIT view_users entries on every UI click
- Add ClickHouse projection on route_diagrams for (tenant_id, route_id,
  instance_id, created_at) to avoid full table scans on diagram lookups
- Bump @cameleer/design-system to v0.1.28 (PAUSED mode time range fix,
  refreshTimeRange API)
- Call refreshTimeRange before invalidateQueries in PAUSED mode manual
  refresh so sidebar clicks use current time window

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:48:30 +02:00
hsiegeln
e495b80432 fix: increase ClickHouse pool size and reduce flush interval
All checks were successful
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 2m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Pool was hardcoded to 10 connections serving 7 concurrent write
streams + UI reads, causing "too many simultaneous queries" and
WriteBuffer overflow. Pool now defaults to 50 (configurable via
clickhouse.pool-size), flush interval reduced from 1000ms to 500ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:11:15 +02:00
hsiegeln
805e6d51cb fix: add processor_type to stats_1m_processor_detail MV
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m14s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
The table and materialized view were missing the processor_type column,
causing the RouteMetricsController query to fail and the dashboard
processor metrics table to render empty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:00:23 +02:00
hsiegeln
9781fe0d7c fix: include execution/correlation/exchange IDs in full-text search
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
The _search_text materialized column only contained error messages,
bodies, and headers — not execution_id, correlation_id, exchange_id,
or route_id. Searching by ID via cmd-k returned no results.

- Add ID fields to _search_text in ClickHouse DDL (covered by ngram
  bloom filter index)
- Add direct LIKE matches on execution_id, correlation_id, exchange_id
  in the text search WHERE clause for faster exact ID lookups

Requires ClickHouse table recreation (fresh install).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:12:15 +02:00
hsiegeln
188810e54b feat: remove TimescaleDB, dead PG stores, and storage feature flags
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.

- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
  events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
  only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
  (docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:10:58 +02:00
hsiegeln
283e38a20d feat: remove OpenSearch, add ClickHouse admin page
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 33s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Remove all OpenSearch code, dependencies, configuration, deployment
manifests, and CI/CD references. Replace the OpenSearch admin page
with a ClickHouse admin page showing cluster status, table sizes,
performance metrics, and indexer pipeline stats.

- Delete 11 OpenSearch Java files (config, search impl, admin controller, DTOs, tests)
- Delete 3 OpenSearch frontend files (admin page, CSS, query hooks)
- Delete deploy/opensearch.yaml K8s manifest
- Remove opensearch Maven dependencies from pom.xml
- Remove opensearch config from application.yml, Dockerfile, docker-compose
- Remove opensearch from CI workflow (secrets, deploy, cleanup steps)
- Simplify ThresholdConfig (remove OpenSearch thresholds, database-only)
- Change default search backend from opensearch to clickhouse
- Add ClickHouseAdminController with /status, /tables, /performance, /pipeline
- Add ClickHouseAdminPage with StatCards, pipeline ProgressBar, tables DataTable
- Update CLAUDE.md, HOWTO.md, and source comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:56:06 +02:00
hsiegeln
aa2d203f4e feat: add UI usage analytics tracking
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 1m14s
CI / deploy (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
Tracks authenticated UI user requests to understand usage patterns:
- New ClickHouse usage_events table with 90-day TTL
- UsageTrackingInterceptor captures method, path, duration, user
- Path normalization groups dynamic segments ({id}, {hash})
- Buffered writes via WriteBuffer + periodic flush
- Admin endpoint GET /api/v1/admin/usage with groupBy=endpoint|user|hour
- Skips agent requests, health checks, and data ingestion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:53:32 +02:00
hsiegeln
d739094a56 fix: update ClickHouse DDL files with new column names instead of ALTER RENAME
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
ClickHouse can't rename columns that are part of ORDER BY keys.
Updated V1-V8 DDL files directly with new column names (instance_id,
application_id) and removed V9 migration. Wipe ClickHouse and restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:40:54 +02:00
hsiegeln
91400defe9 fix: add missing V9 (ClickHouse) and V14 (PostgreSQL) identity column rename migrations
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
Migration files were lost during worktree merge — recreated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:33:02 +02:00
hsiegeln
95b9dea5c4 feat(clickhouse): wire ClickHouseExecutionStore as active ExecutionStore
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:09:14 +02:00
hsiegeln
968117c41a feat(clickhouse): wire Phase 4 stores with feature flags
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:44:10 +02:00
hsiegeln
f7daadaaa9 feat(clickhouse): add DDL for route_diagrams, agent_events, and logs tables
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:30:38 +02:00
hsiegeln
606f81a970 fix: align server with protocol v2 chunked transport spec
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m45s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 46s
- ChunkIngestionController: /data/chunks → /data/executions (matches
  PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
  avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
  envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:18:35 +02:00
hsiegeln
1a00eed389 fix: schema initializer skips comment-only SQL segments
The V4 DDL had a semicolon inside a comment which caused the
split-on-semicolon logic to produce a comment-only segment that
ClickHouse rejected as empty query. Fixed the comment and made
the initializer strip comment-only segments before execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:06:31 +02:00
hsiegeln
9df00fdde0 feat(clickhouse): wire ClickHouseStatsStore with cameleer.storage.stats feature flag (default: clickhouse)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:51:45 +02:00
hsiegeln
052990bb59 feat(clickhouse): add ClickHouseStatsStore with -Merge aggregate queries
Implements StatsStore interface for ClickHouse using AggregatingMergeTree
tables with -Merge combinators (countMerge, countIfMerge, sumMerge,
quantileMerge). Uses literal SQL for aggregate table queries to avoid
ClickHouse JDBC driver PreparedStatement issues with AggregateFunction
columns. Raw table queries (SLA, topErrors, activeErrorTypes) use normal
prepared statements.

Includes 13 integration tests covering stats, timeseries, grouped
timeseries, SLA compliance, SLA counts by app/route, top errors, active
error types, punchcard, and processor stats. Also fixes AggregateFunction
type signatures in V4 DDL (count() takes no args, countIf takes UInt8).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:49:22 +02:00
hsiegeln
eb0d26814f feat(clickhouse): add stats materialized views DDL (5 tables + 5 MVs)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 20:11:38 +02:00
hsiegeln
31f7113b3f feat(clickhouse): wire ChunkAccumulator, flush scheduler, and search feature flag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:21:19 +02:00
hsiegeln
b30dfa39f4 feat(clickhouse): add executions and processor_executions DDL for chunked transport
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:04:19 +02:00
hsiegeln
b1c5cc0616 fix: cast DateTime64 to DateTime in ClickHouse TTL expression
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m46s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 1m8s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 2m19s
2026-03-31 18:10:20 +02:00
hsiegeln
8eeaecf6f3 fix: remove unsupported async_insert params from ClickHouse JDBC URL
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 55s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m39s
CI / deploy (push) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (push) Successful in 51s
CI / deploy-feature (pull_request) Has been skipped
clickhouse-jdbc 0.9.7 rejects async_insert and wait_for_async_insert as
unknown URL parameters. These are server-side settings, not driver config.
Can be set per-query later if needed via custom_settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:02:53 +02:00
hsiegeln
08934376df feat: add ClickHouse schema initializer with agent_metrics DDL
Adds ClickHouseSchemaInitializer that runs on ApplicationReadyEvent,
scanning classpath:clickhouse/*.sql in filename order and executing each
statement. Adds V1__agent_metrics.sql with MergeTree table, tenant/agent
partitioning, and 365-day TTL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 16:51:21 +02:00
hsiegeln
23f901279a feat: add ClickHouse DataSource and JdbcTemplate configuration
Adds ClickHouseProperties (bound to clickhouse.*), ClickHouseConfig
(conditional HikariDataSource + JdbcTemplate beans), and extends
application.yml with clickhouse.enabled/url/username/password and
cameleer.storage.metrics properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 16:51:14 +02:00
hsiegeln
ab7031e6ed feat: add is_replay flag to execution pipeline and UI
Detect replayed exchanges via X-Cameleer-Replay header during ingestion,
persist the flag through PostgreSQL and OpenSearch, and surface it in
the dashboard (amber replay icon) and exchange detail chain view.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:39:40 +02:00
hsiegeln
213aa86c47 feat: progressive drill-down dashboard with RED metrics and SLA compliance (#94)
Three-level dashboard driven by sidebar selection:
- L1 (no selection): all-apps overview with health table, per-app charts
- L2 (app selected): route performance table, error velocity, top errors
- L3 (route selected): processor table, latency heatmap data, bottleneck KPI

Backend: 3 new endpoints (timeseries/by-app, timeseries/by-route, errors/top),
per-app SLA settings (app_settings table, V12 migration), exact SLA compliance
from executions hypertable, error velocity with acceleration detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:29:20 +02:00
hsiegeln
3d71345181 feat: trace data indicators, inline tap config, and detail tab gating
All checks were successful
CI / build (push) Successful in 1m46s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m57s
Trace data visibility:
- ProcessorNode now includes hasTraceData flag computed from captured
  body/headers during tree conversion
- ConfigBadge shows teal for tracing configured, green when data captured
- Search results show green footprints icon for exchanges with trace data
- New has_trace_data column on executions table (V11 migration with backfill)
- OpenSearch documents and ExecutionSummary include the flag

Inline tap configuration:
- Extracted reusable TapConfigModal component from RouteDetail
- Diagram context menu opens tap modal inline instead of navigating away
- Toggle-trace action works immediately with toast feedback
- Modal closes only on ESC, Cancel, Save, or Delete (not backdrop click)

Detail panel tab gating:
- Headers, Input, Output tabs disabled when no data is available
- Works at both exchange and processor level
- Falls back to Info tab when active tab becomes empty

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 13:08:58 +02:00
hsiegeln
30344d29b1 feat: store raw processor tree JSON and add error categorization fields
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m0s
CI / docker (push) Successful in 53s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Fixes iteration overlay corruption caused by flat storage collapsing
duplicate processorIds across loop iterations.

Server:
- Store raw processor tree as processors_json JSONB on executions table
- Detail endpoint serves from processors_json (faithful tree), falls back
  to flat record reconstruction for older executions
- V10 migration: processors_json, error categorization (errorType,
  errorCategory, rootCauseType, rootCauseMessage), OTel (traceId, spanId),
  circuit breaker (circuitBreakerState, fallbackTriggered), drops
  erroneous splitDepth/loopDepth columns
- Add all new fields through full ingestion/storage/API chain

UI:
- Fix overlay wrapper filtering: check wrapper type before status filter
- Add new fields to schema.d.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:44:54 +01:00
hsiegeln
faf5d505f4 feat: support iteration wrapper nodes and filter overlay by selected iteration
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 38s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Server:
- Add split_depth and loop_depth columns (V9 migration)
- Persist splitDepth/loopDepth with reflection fallback for older agent versions

UI:
- Detect iterations via wrapper processorTypes (loopIteration, splitIteration, multicastBranch)
- Filter overlay by selected iteration at the wrapper level
- Skip non-selected iteration wrappers entirely (wrapper + children)
- Don't add synthetic wrappers to overlay (no diagram node correspondence)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:57:27 +01:00
hsiegeln
c4b396e618 feat: persist and expose resolvedEndpointUri for execution-level drill-down
Wire resolvedEndpointUri through the full chain:
- V9 migration adds resolved_endpoint_uri column
- IngestionService extracts from ProcessorExecution
- PostgresExecutionStore persists and reads the column
- ProcessorNode includes field in detail API response
- UI schema updated for ProcessorNode and PositionedNode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:37:11 +01:00
hsiegeln
edd841ffeb feat: add iteration fields to processor execution storage
Add loop_index, loop_size, split_index, split_size, multicast_index
columns to processor_executions table and thread them through the
full storage → ingestion → detail pipeline. These fields enable
execution overlay to display iteration context for loop, split,
and multicast EIPs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:32:47 +01:00
hsiegeln
d6c1f2c25b refactor: derive processor-route mapping from diagrams instead of executions
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 37s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Store application_name in route_diagrams at ingestion time (V7 migration),
resolve from agent registry same as ExecutionController. Move
findProcessorRouteMapping from ExecutionStore to DiagramStore using a
JSONB query that extracts node IDs directly from stored RouteGraph
definitions. This makes the mapping available as soon as diagrams are
sent, before any executions are recorded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:00:10 +01:00
hsiegeln
100b780b47 refactor: remove diagramNodeId indirection, use processorId directly
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 37s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Agent now uses Camel processorId as RouteNode.id, eliminating the
nodeId mapping layer. Drop diagram_node_id column (V6 migration),
remove from ProcessorRecord/ProcessorNode/IngestionService/DetailService,
add /processor-routes endpoint for processorId→routeId lookup,
simplify frontend diagram-mapping and ExchangeDetail overlays,
replace N diagram fetches in AppConfigPage with single hook.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:44:07 +01:00
hsiegeln
f08461cf35 feat(db): add attributes JSONB columns to executions and processor_executions (Task 1)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 18:23:26 +01:00
hsiegeln
7423e2ca14 feat: add application log ingestion with OpenSearch storage
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 59s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Agents can now send application log entries in batches via POST /api/v1/data/logs.
Logs are indexed directly into OpenSearch daily indices (logs-{yyyy-MM-dd}) using
the bulk API. Index template defines explicit mappings for full-text search readiness.

New DTOs (LogEntry, LogBatch) added to cameleer3-common in the agent repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 11:53:27 +01:00
hsiegeln
69a3eb192f feat: persistent per-application config with GET/PUT endpoints
Some checks failed
CI / build (push) Failing after 1m10s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Add application_config table (V4 migration), repository, and REST
controller. GET /api/v1/config/{app} returns config, PUT saves and
pushes CONFIG_UPDATE to all LIVE agents via SSE. UI tracing toggle
now uses config API instead of direct SET_TRACED_PROCESSORS command.
Tracing store syncs with server config on load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 07:42:55 +01:00
2887fe9599 feat: add V3 migration for engine_level and route-level snapshot columns
Some checks failed
CI / build (push) Failing after 51s
CI / cleanup-branch (push) Has been skipped
CI / build (pull_request) Failing after 52s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (push) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
2026-03-24 16:13:11 +01:00
hsiegeln
ea56bcf2d7 fix: split Flyway migration — DDL in V1, policies in V2
All checks were successful
CI / build (push) Successful in 1m20s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 43s
CI / deploy (push) Successful in 1m16s
CI / deploy-feature (push) Has been skipped
TimescaleDB add_continuous_aggregate_policy and add_compression_policy
cannot run inside a transaction block. Move all policy calls to V2
with flyway:executeInTransaction=false directive.

Also fix stats_1m_processor_detail: add WITH NO DATA and
materialized_only = false.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 21:34:35 +01:00
hsiegeln
6a5dba4eba refactor: rename group_name→application_name in DB, OpenSearch, SQL
Some checks failed
CI / build (push) Failing after 41s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Consolidate V1-V7 Flyway migrations into single V1__init.sql with
all columns renamed from group_name to application_name. Requires
fresh database (wipe flyway_schema_history, all data).

- DB columns: executions.group_name → application_name,
  processor_executions.group_name → application_name
- Continuous aggregates: all views updated to use application_name
- OpenSearch field: group_name → application_name in index/query
- All Java SQL strings updated to match new column names
- Delete V2-V7 migration files (folded into V1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 21:24:19 +01:00
hsiegeln
31b60c4e24 feat: add V7 migration for per-processor-id continuous aggregate 2026-03-23 18:09:24 +01:00
hsiegeln
2b111c603c feat: migrate UI to @cameleer/design-system, add backend endpoints
Some checks failed
CI / build (push) Failing after 47s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Backend:
- Add agent_events table (V5) and lifecycle event recording
- Add route catalog endpoint (GET /routes/catalog)
- Add route metrics endpoint (GET /routes/metrics)
- Add agent events endpoint (GET /agents/events-log)
- Enrich AgentInstanceResponse with tps, errorRate, activeRoutes, uptimeSeconds
- Add TimescaleDB retention/compression policies (V6)

Frontend:
- Replace custom Mission Control UI with @cameleer/design-system components
- Rebuild all pages: Dashboard, ExchangeDetail, RoutesMetrics, AgentHealth,
  AgentInstance, RBAC, AuditLog, OIDC, DatabaseAdmin, OpenSearchAdmin, Swagger
- New LayoutShell with design system AppShell, Sidebar, TopBar, CommandPalette
- Consume design system from Gitea npm registry (@cameleer/design-system@0.0.1)
- Add .npmrc for scoped registry, update Dockerfile with REGISTRY_TOKEN arg

CI:
- Pass REGISTRY_TOKEN build-arg to UI Docker build step

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 17:38:39 +01:00
hsiegeln
0fcbe83cc2 refactor: consolidate oidc_config and admin_thresholds into generic server_config table
All checks were successful
CI / build (push) Successful in 1m19s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 42s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 34s
CI / build (pull_request) Successful in 1m23s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
Single JSONB key-value table replaces two singleton config tables, making
future config types trivial to add. Also fixes pre-existing IT failures:
Flyway URL not overridden by Testcontainers, threshold test ordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 11:16:31 +01:00