Compare commits

...

375 Commits

Author SHA1 Message Date
0ac84a10e8 Merge pull request 'UX polish: bug fixes, design consistency, contrast, formatting' (#124) from feature/ux-polish into main
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / docker (push) Successful in 25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
Reviewed-on: cameleer/cameleer3-server#124
2026-04-09 19:03:53 +02:00
hsiegeln
191d4f39c1 fix: resolve 4 TypeScript compilation errors from CI
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m56s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m58s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 1m12s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 37s
- AuditLogPage: e.details -> e.detail (correct property name)
- AgentInstance: BarChart x: number -> x: String(i) (BarSeries requires string)
- AppsTab: add missing CatalogRoute import
- Dashboard: wrap MonoText in span for title attribute (MonoText lacks title prop)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:57:42 +02:00
hsiegeln
4bc38453fe fix: nice-to-have polish — breadcrumbs, close button, status badges
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 40s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Failing after 35s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
- 7.1: Add deployment status badge (StatusDot + Badge) to AppsTab app
  list, sourced from catalog.deployment.status via slug lookup
- 7.3: Add X close button to top-right of exchange detail right panel
  in ExchangesPage (position:absolute, triggers handleClearSelection)
- 7.5: PunchcardHeatmap shows "Requires at least 2 days of data"
  when timeRangeMs < 2 days; DashboardL1 passes the range down
- 7.6: Command palette exchange results truncate IDs to ...{last8}
  matching the exchanges table display

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:51:49 +02:00
hsiegeln
9466551044 fix: add unsaved changes banners to edit mode forms
Adds amber edit-mode banners to AppConfigDetailPage and both
DefaultResourcesSection/JarRetentionSection in EnvironmentsPage,
matching the existing ConfigSubTab pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:47:55 +02:00
hsiegeln
39687bc8a9 fix: fix unicode in roles, add password confirmation field
- RolesTab: wrap \u00b7 in JS expression {'\u00b7'} so JSX renders the middle dot correctly instead of literal backslash-u sequence
- UsersTab: add confirm password field with mismatch validation, hint text for password policy, and reset on cancel/success
- UserManagement.module.css: add .hintText style for password policy hint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:46:30 +02:00
hsiegeln
7ec56f3bd0 fix: add shared number formatting utilities (formatMetric, formatCount, formatPercent)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:43:52 +02:00
hsiegeln
605c8ad270 feat: add CSV export to audit log 2026-04-09 18:43:46 +02:00
hsiegeln
2ede06f32a fix: chart Y-axis auto-scaling, error rate unit, memory reference line, pointer events
- Throughput chart: divide totalCount by bucket duration (seconds) so Y-axis shows true msg/s instead of raw bucket counts; fixes flat-line appearance when TPS is low but totalCount is large
- Error Rate chart: convert failedCount/totalCount to percentage; change yLabel from "err/h" to "%" to match KPI stat card unit
- Memory chart: add threshold line at jvm.memory.heap.max so chart Y-axis extends to max heap and shows the reference line (spec 5.3)
- Agent state: suppress containerStatus badge when value is "UNKNOWN"; only render it with "Container: <state>" label when a non-UNKNOWN secondary state is present (spec 5.4)
- DashboardTab chartGrid: add pointer-events:none with pointer-events:auto on children so the chart grid overlay does not intercept clicks on the Application Health table rows below (spec 5.5)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:42:10 +02:00
hsiegeln
fb53dc6dfc fix: standardize button order, add confirmation dialogs for destructive actions
- Fix Cancel|Save order and add primary/loading props (AppConfigDetailPage)
- Add AlertDialog before stopping deployments (AppsTab)
- Add ConfirmDialog before deleting taps (TapConfigModal)
- Add AlertDialog before killing queries with toast feedback (DatabaseAdminPage)
- Add AlertDialog before removing roles from users (UsersTab)
- Standardize Cancel button to variant="ghost" (TapConfigModal, RouteDetail)
- Add loading prop to ConfirmDialogs (OidcConfigPage, RouteDetail)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:39:22 +02:00
hsiegeln
3d910af491 fix: hide empty attributes column, standardize status labels, truncate agent names
- Attributes column is now hidden when no exchanges in the current view
  have attributes; shown conditionally via hasAttributes check on rows
- Status labels already standardized via statusLabel() in ExchangeHeader
- Agent names truncated to last two hyphen-separated segments via
  shortAgentName(); full name preserved as tooltip title

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:36:06 +02:00
hsiegeln
eadcd160a3 fix: improve duration formatting (Xm Ys) and truncate exchange IDs
- formatDuration and formatDurationShort now show Xm Ys for durations >= 60s (e.g. "5m 21s" instead of "321s") and 1 decimal for 1-60s range ("6.7s" instead of "6.70s")
- Exchange ID column shows last 8 chars with ellipsis prefix; full ID on hover, copies to clipboard on click

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:34:04 +02:00
hsiegeln
ba0a1850a9 fix: WCAG AA contrast compliance for --text-muted/--text-faint, 12px font floor
Override design system tokens in app root CSS: --text-muted raised to 4.5:1
contrast in both light (#766A5E) and dark (#9A9088) modes; --text-faint dark
mode raised from catastrophic 1.4:1 to 3:1 (#6A6058). Migrate --text-faint
usages on readable text (empty states, italic notes, buttons) to --text-muted.
Raise all 10px and 11px font-size declarations to 12px floor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:31:51 +02:00
hsiegeln
b6b93dc3cc fix: prevent admin page redirect during token refresh
adminFetch called logout() directly on 401/403 responses, which cleared
roles and caused RequireAdmin to redirect to /exchanges while users were
editing forms. Now adminFetch attempts a token refresh before failing,
and RequireAdmin tolerates a transient empty-roles state during refresh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:28:45 +02:00
hsiegeln
3f9fd44ea5 fix: wrap app config in section cards, replace manual table with DataTable
- Add sectionStyles and tableStyles imports to AppsTab.tsx
- Wrap CreateAppView identity section and each config tab (Monitoring,
  Resources, Variables) in sectionStyles.section cards
- Wrap ConfigSubTab config tabs (Monitoring, Resources, Variables,
  Traces & Taps, Route Recording) in sectionStyles.section cards
- Replace manual <table> in OverviewSubTab with DataTable inside a
  tableStyles.tableSection card wrapper; pre-compute enriched row data
  via useMemo; handle muted non-selected-env rows via inline opacity
- Remove unused .table, .table th, .table td, .table tr:hover td, and
  .mutedRow CSS rules from AppsTab.module.css

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:28:11 +02:00
hsiegeln
ba53f91f4a fix: standardize table containment and container padding across pages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:21:58 +02:00
hsiegeln
be585934b9 fix: show descriptive error when creating local user with OIDC enabled
Return a JSON error body from UserAdminController instead of an empty 400,
and extract API error messages in adminFetch so toasts display the reason.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:19:10 +02:00
hsiegeln
2771dffb78 fix: add /deployments redirect and fix GC Pauses chart X-axis
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 18:16:53 +02:00
hsiegeln
80bc092ec1 Add UX polish implementation plan (19 tasks across 8 batches)
Detailed step-by-step plan covering critical bug fixes, layout/interaction
consistency, WCAG contrast compliance, data formatting, chart fixes, and
admin polish. Each task includes exact file paths, code snippets, and
verification steps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:13:41 +02:00
hsiegeln
4ea8bb368a Add UX polish design spec with comprehensive audit findings
Playwright-driven audit of the live UI (build 69dcce2, 60+ screenshots)
covering all pages, CRUD lifecycles, design consistency, and interaction
patterns. Spec defines 8 batches of work: critical bugs, layout
consistency, interaction consistency, contrast/readability, data
formatting, chart fixes, admin polish, and nice-to-have items.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:00:50 +02:00
hsiegeln
f24a5e5ff0 docs: update CLAUDE.md, audit, and spec for today's changes
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 27s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
- CLAUDE.md: security (last-admin guard, password policy, brute-force,
  token revocation), environment filtering (queries + commands), Docker
  reconciliation, UI shared patterns, V8/V9 migrations
- UI-CONSISTENCY-AUDIT.md: marked RESOLVED
- UI consistency design spec: marked COMPLETED

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:54:54 +02:00
hsiegeln
1971c70638 fix: commands respect selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Backend: AgentRegistryService gains findByApplicationAndEnvironment()
and environment-aware addGroupCommandWithReplies() overload.
AgentCommandController and ApplicationConfigController accept optional
environment query parameter. When set, commands only target agents in
that environment. Backward compatible — null means all environments.

Frontend: All command mutations (config update, route control, traced
processors, tap config, route recording) now pass selectedEnv to the
backend via query parameter.

Prevents cross-environment command leakage — e.g., updating config for
prod no longer pushes to dev agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:28:09 +02:00
hsiegeln
69dcce2a8f fix: Runtime tab respects selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Add environment parameter to AgentEventsController, AgentEventService,
  and ClickHouseAgentEventRepository (filters agent_events by environment)
- Wire selectedEnv to useAgents and useAgentEvents in both AgentHealth
  and AgentInstance pages
- Wire selectedEnv to useStatsTimeseries in AgentInstance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:12:33 +02:00
hsiegeln
cb36d7936f fix: auto-compute environment slug + respect environment filter globally
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Part A: Environment creation slug is now auto-derived from display name
and shown read-only (matching app creation pattern). Removes manual slug
input.

Part B: All data queries now pass the selected environment to backend:
- Exchanges search, Dashboard L1/L2/L3 stats, Routes metrics, Route
  detail, correlation chains, and processor metrics all filter by
  selected environment.
- Backend RouteMetricsController now accepts environment parameter for
  both route and processor metrics endpoints.

Closes #XYZ

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:01:50 +02:00
hsiegeln
f95a78a380 fix: add periodic deployment status reconciliation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The DockerEventMonitor only reacted to Docker events. If an event was
missed (e.g., during reconnect or startup race), a DEGRADED deployment
with all replicas healthy would never promote back to RUNNING.

Add a @Scheduled reconciliation (every 30s) that inspects actual
container state and corrects deployment status mismatches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:40:18 +02:00
hsiegeln
3f94c98c5b refactor: replace native HTML with design system components (Phase 5)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- EnvironmentSelector: bare <select> -> DS Select
- LogTab: raw <table> + <input> + <button> -> DS LogViewer + Input + Button
- AppsTab: 3 homegrown sub-tab bars -> DS Tabs, remove unused CSS
- AppConfigDetailPage: 4x <select> -> DS Select, 2x <input checkbox> ->
  DS Toggle, 7x <label> -> DS Label, 4x <button> -> DS Button
- AgentHealth: 4x <select> -> DS Select, 7x <button> -> DS Button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:22:14 +02:00
hsiegeln
ff62a34d89 refactor: UI consistency — shared CSS, design system colors, no inline styles
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m22s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Phase 1: Extract 6 shared CSS modules (table-section, log-panel,
rate-colors, refresh-indicator, chart-card, section-card) eliminating
~135 duplicate class definitions across 11 files.

Phase 2: Replace all hardcoded hex colors in CSS modules with design
system variables. Strip ~55 hex fallbacks from var() patterns. Fix 4
undefined variable names (--accent, --bg-base, --surface, --bg-surface-raised).

Phase 3: Replace ~45 hardcoded hex values in ProcessDiagram SVG
components with var() CSS custom properties. Fix Dashboard.tsx color prop.

Phase 4: Create CSS modules for AdminLayout, DatabaseAdminPage,
OidcCallback (previously 100% inline). Extract shared PageLoader
component (replaces 3 copy-pasted spinner patterns). Move AppsTab
static inline styles to CSS classes. Extract LayoutShell StarredList styles.

58 files changed, net -219 lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:55:54 +02:00
hsiegeln
bfed8174ca docs: UI consistency audit and fix design spec
Full audit of design system adoption, color consistency, inline styles,
layout patterns, and CSS module duplication across the server UI.
Includes 6-phase fix plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:45:32 +02:00
hsiegeln
827ba3c798 feat: last-ADMIN guard and password hardening (#87, #89)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m57s
CI / docker (push) Successful in 1m48s
CI / deploy (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
- Prevent removal of last ADMIN role via role unassign, user delete,
  or group role removal (returns 409 Conflict)
- Add password policy: min 12 chars, 3/4 character classes, no username
- Add brute-force protection: 5 attempts then 15min lockout, IP rate limit
- Add token revocation on password change via token_revoked_before column
- V9 migration adds failed_login_attempts, locked_until, token_revoked_before

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:58:03 +02:00
hsiegeln
3bf470f83f fix: narrow DEPLOY_STATUS_DOT type to match StatusDotVariant
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Fixes pre-existing TS2322 where Record<string, string> was not
assignable to the StatusDotVariant union type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:33:38 +02:00
hsiegeln
de46cee440 chore: add GitNexus config to .gitignore and CLAUDE.md
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 50s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:30:53 +02:00
hsiegeln
04c90bde06 refactor: extract duplicated utility functions into shared modules
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 41s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Consolidate 20+ duplicate function definitions across UI components into
three shared util files (format-utils, agent-utils, config-draft-utils).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:28:31 +02:00
hsiegeln
2df5e0d7ba feat: active config snapshot, composite StatusDot with tooltip
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 43s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Part 1 — Config snapshot:
- V8 migration adds resolved_config JSONB to deployments table
- DeploymentExecutor saves the full resolved config at deploy time
- Deployment record includes resolvedConfig for auditability

Part 2 — Composite health StatusDot:
- CatalogController computes composite health from deployment status +
  agent health (green only when RUNNING AND agent live)
- CatalogApp includes healthTooltip (e.g. "Deployment: RUNNING,
  Agents: live (1 connected)")
- StatusDot added to app detail header with deployment status Badge
- StatusDot added to deployment table rows
- Sidebar passes composite health + tooltip through to tree nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:00:54 +02:00
hsiegeln
7b822a787a feat: show Redeploy button when config changed after deployment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Compare app.updatedAt with deployment.deployedAt — if config was
modified after the deployment started, show a primary "Redeploy" button
in the Actions column. Also show a toast hint after saving config:
"Redeploy to apply changes to running deployments."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 07:41:11 +02:00
hsiegeln
e88db56f79 refactor: CPU config to millicores, fix replica health, reorder tabs
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m18s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
- Rename cpuShares to cpuRequest (millicores), cpuLimit from cores to
  millicores. ResolvedContainerConfig translates to Docker-native units
  via dockerCpuShares() and dockerCpuQuota() helpers. Future K8s
  orchestrator can pass millicores through directly.
- Fix waitForAnyHealthy to wait for ALL replicas instead of returning
  on first healthy one. Prevents false DEGRADED status with 2+ replicas.
- Default app detail to Configuration tab (was Overview)
- Reorder config sub-tabs: Monitoring, Resources, Variables, Traces &
  Taps, Route Recording

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 07:38:23 +02:00
hsiegeln
eb7cd9ba62 fix: keep sidebar selection when switching tabs
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Normalize the sidebar selectedPath so the app highlight persists across
tab switches (Dashboard, Runtime, Deployments). Also make sidebar clicks
tab-aware: clicking an app navigates to the current tab's path instead
of always going to /exchanges/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 07:13:04 +02:00
hsiegeln
b86e95f08e feat: unified catalog endpoint and slug-based app navigation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m47s
Consolidate route catalog (agent-driven) and apps table (deployment-
driven) into a single GET /api/v1/catalog?environment={slug} endpoint.
Apps table is authoritative; agent data enriches with live health,
routes, and metrics. Unmanaged apps (agents without App record) appear
with managed=false.

- Add CatalogController merging App records + agent registry + ClickHouse
- Add CatalogApp DTO with deployment summary, managed flag, health
- Change AppController and DeploymentController to accept slugs (not UUIDs)
- Add AppRepository.findBySlug() and AppService.getBySlug()
- Replace useRouteCatalog() with useCatalog() across all UI components
- Navigate to /apps/{slug} instead of /apps/{UUID}
- Update sidebar, search, and all catalog lookups to use slug

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:43:14 +02:00
hsiegeln
0720053523 docs: add catalog consolidation design spec
Unify route catalog (agent-driven) and apps table (deployment-driven)
into a single catalog endpoint. Apps table becomes authoritative,
agent data enriches with live health/routes. Slug-based URLs replace
UUIDs for navigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:32:18 +02:00
hsiegeln
a4a569a253 fix: improve deployment progress UI and prevent duplicate deployment rows
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m55s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m6s
- Redesign DeploymentProgress component: track-based layout with amber
  brand color, checkmarks for completed steps, user-friendly labels
  (Prepare, Image, Network, Launch, Verify, Activate, Live)
- Delete terminal (STOPPED/FAILED) deployments before creating new ones
  for the same app+environment, preventing duplicate rows in the UI
- Update CLAUDE.md with comprehensive key class locations, correct deploy
  stages, database migration reference, and REST endpoint summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:10:59 +02:00
hsiegeln
6288084daf docs: update documentation for Docker orchestration and env var rename
All checks were successful
CI / build (push) Successful in 2m9s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m41s
CI / deploy (push) Successful in 56s
CI / deploy-feature (push) Has been skipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 22:09:18 +02:00
hsiegeln
64ebf19ad3 refactor: use CAMELEER_SERVER_URL for agent export endpoint
Some checks failed
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (push) Has been cancelled
CI / build (push) Has been cancelled
The runtime-base image and all agent Dockerfiles now read
CAMELEER_SERVER_URL instead of CAMELEER_EXPORT_ENDPOINT.
Updated the volume-mode entrypoint override to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:07:07 +02:00
hsiegeln
20f3dfe59d feat: support Docker volume-based JAR mounting for Docker-in-Docker
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
When CAMELEER_JAR_DOCKER_VOLUME is set, the orchestrator mounts the
named volume at the jar storage path instead of using a host bind mount.
This solves the path translation issue in Docker-in-Docker setups where
the server runs inside a container and manages sibling containers.

The entrypoint is overridden to use the volume-mounted JAR path via
the CAMELEER_APP_JAR env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:38:34 +02:00
hsiegeln
c923d8233b fix: move network attachment from orchestrator to executor
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Docker's connectToNetworkCmd needs the network ID (not name) and the
container's network sandbox must be ready. Moving network connection
to DeploymentExecutor where DockerNetworkManager handles ID resolution
and the container is already started.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:29:13 +02:00
hsiegeln
c72424543e fix: add client_max_body_size 200m to nginx API proxy
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m22s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Nginx defaults to 1MB body size, causing 413 on JAR uploads through
the UI proxy. Matches the Spring Boot multipart limit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:25:02 +02:00
hsiegeln
18ffbea9db fix: use visually-hidden clip pattern for file inputs
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m25s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
The opacity:0 approach caused the native "Choose File" button to
appear in the accessibility tree and compete for clicks. The clip
pattern properly hides the input while keeping it functional for
programmatic .click().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:23:05 +02:00
hsiegeln
19da9b9f9f fix: use opacity-based hidden input for file upload instead of display:none
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m25s
CI / docker (push) Successful in 1m14s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Some browsers block programmatic .click() on display:none inputs.
Using position:absolute + opacity:0 keeps the input in the render tree.
Also added type="button" to prevent any form-submission interference.
Applied to both create page and detail view file inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:17:50 +02:00
hsiegeln
8b3c4ba2fe feat: routing mode, domain, server URL, SSL offloading on Environments page
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:35:23 +02:00
hsiegeln
96fbca1b35 feat: replicas column, deploy progress, and new config fields in Deployments UI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:33:41 +02:00
hsiegeln
977bfc1c6b feat: DeploymentProgress step indicator component
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:29:57 +02:00
hsiegeln
7e0536b5b3 feat: update Deployment interface with replicas, stages, new statuses
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:29:33 +02:00
hsiegeln
6e444a414d feat: add CAMELEER_SERVER_URL config property
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:28:44 +02:00
hsiegeln
f8d42026da feat: rewrite DeploymentExecutor with staged deploy, config merge, replicas
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:27:37 +02:00
hsiegeln
fef3ef6184 feat: DockerEventMonitor — persistent event stream for container lifecycle
Listens to Docker daemon events (die, oom, start, stop) for containers
labeled managed-by=cameleer3-server, updates replica states in Postgres,
and recomputes aggregate deployment status (RUNNING/DEGRADED/FAILED).
Bean is wired in RuntimeOrchestratorAutoConfig via instanceof guard so it
only activates when Docker is available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:24:03 +02:00
hsiegeln
76eacb17e6 feat: DockerNetworkManager with lazy network creation and container attachment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:21:39 +02:00
hsiegeln
3f2fec2815 feat: TraefikLabelBuilder with path-based and subdomain routing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:21:16 +02:00
hsiegeln
55bdab472b feat: expand ContainerRequest with cpuLimit, ports, restart policy, additional networks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:20:13 +02:00
hsiegeln
b7d00548c5 feat: ResolvedContainerConfig record and three-layer ConfigMerger
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:18:25 +02:00
hsiegeln
fef0239b1d feat: update PostgresDeploymentRepository for orchestration columns
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:16:57 +02:00
hsiegeln
6eff271238 feat(core): add orchestration fields to Deployment record
Extends Deployment with targetState, deploymentStrategy, replicaStates
(List<Map<String,Object>>), and deployStage. Updates withStatus() to
carry the new fields through.
2026-04-08 20:15:11 +02:00
hsiegeln
01e0062767 feat(core): expand DeploymentStatus and add DeployStage enum
Adds DEGRADED and STOPPING to DeploymentStatus (reordered for lifecycle
clarity). Introduces DeployStage enum for tracking orchestration progress
through PRE_FLIGHT → COMPLETE.
2026-04-08 20:15:07 +02:00
hsiegeln
0fccdb636f feat(db): add V7 deployment orchestration migration
Adds target_state, deployment_strategy, replica_states (JSONB), and
deploy_stage columns to the deployments table with backfill logic.
2026-04-08 20:15:01 +02:00
hsiegeln
123e66e44d docs: Docker container orchestration implementation plan
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m25s
CI / docker (push) Successful in 25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
17 tasks covering: migration, domain models, config merger, Traefik
labels, network manager, Docker event monitor, DeploymentExecutor
rewrite, controller updates, and UI changes (progress indicator,
replicas, new config fields).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:11:12 +02:00
hsiegeln
b196918e70 docs: revert ICC-disabled, use shared traefik network with app-level auth
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 26s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
ICC=false breaks Traefik routing and agent-server communication.
Switched to shared traefik network (ICC enabled) with app-level
security boundaries. Per-env Traefik networks noted as future option.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:00:12 +02:00
hsiegeln
dd4442329c docs: add ICC-disabled traefik network isolation to orchestration spec
The cameleer-traefik network disables inter-container communication
so app containers cannot reach each other directly — only through
Traefik. Environment networks keep ICC enabled for intra-env comms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:53:51 +02:00
hsiegeln
da6bf694f8 docs: Docker container orchestration design spec
Covers: config merging (3-layer), Traefik label generation (path +
subdomain routing), network topology (infra/traefik/env isolation),
replica management, blue/green and rolling deployment strategies,
Docker event stream monitoring, deployment status state machine
(DEGRADED/STOPPING states), pre-flight checks, and UI changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:48:34 +02:00
hsiegeln
7e47f1628d feat: JAR retention policy with nightly cleanup job
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Per-environment "keep last N versions" setting (default 5, null for
unlimited). Nightly scheduled job at 03:00 deletes old versions from
both database and disk, skipping any version that is currently deployed.

Full stack:
- V6 migration: adds jar_retention_count column to environments
- Environment record, repository, service, admin controller endpoint
- JarRetentionJob: @Scheduled nightly, iterates environments and apps
- UI: retention policy editor on admin Environments page with
  toggle between limited/unlimited and version count input
- AppVersionRepository.delete() for version cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:06:28 +02:00
hsiegeln
863a992cc4 feat: add default container config editor to Environments admin page
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
New "Default Resource Limits" section in environment detail view with
memory limit/reserve, CPU shares/limit. These defaults apply to new
apps unless overridden per-app.

Added useUpdateDefaultContainerConfig hook for the PUT endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:52:39 +02:00
hsiegeln
0ccb8bc68d feat: extract Variables as first config tab in create and detail views
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Environment Variables moved from Resources into a dedicated "Variables"
tab, placed first in the tab order since it's the most commonly needed
config when creating new apps.

Tab order:
- Create page: Variables | Monitoring | Resources
- Detail page: Variables | Monitoring | Traces & Taps | Route Recording | Resources

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:47:58 +02:00
hsiegeln
0a3733f9ba feat: show live external URL preview instead of slug on create page
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
As the user types the app name, the URL builds in real-time:
  /{envSlug}/{appSlug}/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:45:02 +02:00
hsiegeln
056b747c3f feat: replace create-app modal with full creation page at /apps/new
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m6s
Full-page creation flow with:
- Identity section: name, auto-slug, environment, JAR upload, deploy toggle
- Monitoring tab: engine level, payload capture, log levels, metrics,
  sampling, compress success, replay, route control
- Resources tab: memory, CPU, ports, environment variables

Environment variables are configurable before first deploy, addressing
the need to set app-specific config upfront.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:31:34 +02:00
hsiegeln
0b2d231b6b feat: split config into 4 tabs and fix JAR upload 413
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Config sub-tabs are now: Monitoring | Traces & Taps | Route Recording | Resources
(renamed from Agent/Infrastructure, with traces and recording as their own tabs).

Also increase Spring multipart max-file-size and max-request-size to 200MB
to fix HTTP 413 on JAR uploads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:22:39 +02:00
hsiegeln
7503641afe chore: remove dead LogsTab and AppConfigPage files
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Both replaced by consolidated Deployments tab. ~1300 lines removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:11:05 +02:00
hsiegeln
967156d41b feat: migrate traces/taps and route recording into Deployments config
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
ConfigSubTab now uses inner tabs (Agent / Infrastructure):
- Agent: observability settings, compress success, traces & taps table,
  route recording toggles
- Infrastructure: container resources, exposed ports, environment variables

This completes the Config tab consolidation — all features from the
standalone Config page now live in the Deployments tab.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:09:12 +02:00
hsiegeln
0a0733def7 refactor: consolidate tabs — remove standalone Logs and Config tabs
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m18s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Logs functionality already exists in Runtime tab (AgentHealth/AgentInstance).
Config functionality moved to Deployments tab ConfigSubTab.
Old routes redirect to /runtime and /apps respectively.
Navigation links updated throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:02:29 +02:00
hsiegeln
b7f215e90c feat: add delete confirmation dialog for apps
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Prevents accidental app deletion by requiring the user to type the app
slug before confirming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:55:37 +02:00
hsiegeln
6a32b83326 feat: single-step app creation with auto-slug, JAR upload, and deploy
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Replace inline create form with a modal that handles the full flow:
- Name → auto-computed slug (editable if needed)
- Environment picker
- JAR file upload
- "Deploy immediately" toggle (on by default)
- Single "Create & Deploy" button runs all three API calls sequentially
  with step indicator

After creation, navigates directly to the new app's detail view.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:48:20 +02:00
hsiegeln
c4fe992179 feat: redesign Deployments tab with Overview + Configuration sub-tabs
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Overview sub-tab:
- Deployments table with env badge, version, status, URL, deployed time
- Actions (Start/Stop) scoped to selected environment; other envs show
  "switch env to manage" hint with muted rows
- Versions list with per-env deploy target picker

Configuration sub-tab:
- Read-only by default with Edit mode gate (Cancel/Save banner)
- Agent observability: engine level, payload capture with size unit
  selector, log levels, metrics toggle, sampling, replay and route
  control (default enabled)
- Container resources: memory/CPU limits, exposed ports as deletable
  pills with inline add input
- Environment variables: key-value editor with add/remove
- Reuses existing ApplicationConfig API for agent config push via SSE

Tab renamed from "Apps" to "Deployments" in the tab bar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:36:09 +02:00
hsiegeln
01ac47eeb4 chore: update @cameleer/design-system to stable v0.1.39
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Replaces snapshot dependency with tagged release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:59:20 +02:00
hsiegeln
1c5ecb02e3 fix: make environment list accessible to all authenticated users
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m25s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The list endpoint on EnvironmentAdminController now overrides the
class-level ADMIN guard with isAuthenticated(), so VIEWERs can see
the environment selector. The LayoutShell merges environments from
both the table and agent heartbeats, so the selector always shows
configured environments even when no agents are connected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:50:31 +02:00
hsiegeln
b1b7e142bb fix: remove duplicate updated_at column from V5 migration
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
apps.updated_at already exists from V3. The duplicate ALTER caused
Flyway to fail on startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:31:06 +02:00
hsiegeln
de4ca10fa5 feat: move Apps from admin to main tab bar with container config
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m16s
- Apps tab visible to OPERATOR+ (hidden for VIEWER), scoped by
  sidebar app selection and environment filter
- List view: DataTable with name, environment, updated, created columns
- Detail view: deployments across all envs, version upload with
  per-env deploy target, container config form (resources, ports,
  custom env vars) with explicit Save
- Memory reserve field disabled for non-production environments
  with info hint
- Admin sidebar sorted alphabetically, Applications entry removed
- Old admin AppsPage deleted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:23:30 +02:00
hsiegeln
875062e59a feat: add container config to apps and default config to environments
- V5 migration: container_config JSONB + updated_at on apps,
  default_container_config JSONB on environments
- App/Environment records updated with new fields
- PUT /apps/{id}/container-config endpoint for per-app config
- PUT /admin/environments/{id}/default-container-config for env defaults
- GET /apps now supports optional environmentId (lists all when omitted)
- AppRepository.findAll() for cross-environment app listing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:18:08 +02:00
hsiegeln
e04dca55aa feat: add Applications admin page with version upload and deployments
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- SplitPane layout with environment selector, app list, and detail pane
- Create/delete apps with slug uniqueness validation
- Upload JAR versions with file size display
- Deploy versions and stop running deployments with status badges
- Deployment list auto-refreshes every 5s for live status updates
- Registered at /admin/apps with sidebar entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:24:22 +02:00
hsiegeln
448a63adc9 feat: add About Me dialog showing user info, roles, and groups
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- Add GET /api/v1/auth/me endpoint returning current user's UserDetail
- Add AboutMeDialog component with role badges and group memberships
- Add userMenuItems prop to TopBar via design-system update
- Wire "About Me" menu item into user dropdown above Logout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:12:29 +02:00
hsiegeln
a8b977a2db fix: include managed role assignments in direct roles query
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m18s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
getDirectRolesForUser filtered on origin='direct', which excluded
roles assigned via claim mapping (origin='managed'). This caused
OIDC users to appear roleless even when claim mappings matched.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:52:50 +02:00
hsiegeln
529e2c727c fix: apply defaultRoles fallback when no claim mapping rules match
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
When no claim mapping rules are configured or none match the JWT
claims, fall back to assigning the OidcConfig.defaultRoles (e.g.
VIEWER). This restores the behavior that was lost when syncOidcRoles
was replaced with claim mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:33:24 +02:00
hsiegeln
9af0043915 feat: add Environment admin UI page
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
SplitPane with create/edit/delete, production flag toggle,
enabled/disabled toggle. Follows existing admin page patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:19:05 +02:00
hsiegeln
2e006051bc feat: add production/enabled flags to environments, drop status enum
Environments now have:
- production (bool): prod vs non-prod resource allocation
- enabled (bool): disabled blocks new deployments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:16:09 +02:00
hsiegeln
d9160b7d0e fix: allow local login to coexist with OIDC
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m44s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Local login was blocked when OIDC env vars were present, causing
bootstrap to fail (chicken-and-egg: bootstrap needs local auth to
configure OIDC). The backend now accepts both auth paths; the
frontend/UI decides which login flow to present.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:09:24 +02:00
hsiegeln
36e8b2d8ff test: add integration tests for runtime management API
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m40s
CI / docker (push) Successful in 4m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- EnvironmentAdminControllerIT: CRUD, access control, default env protection
- AppControllerIT: create, list, JAR upload, viewer access denied
- DeploymentControllerIT: deploy, list, not-found handling
- Fix bean name conflict: rename executor bean to deploymentTaskExecutor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:52:07 +02:00
hsiegeln
3d20d7a0cb feat: add runtime management configuration properties
- JAR storage path, base image, Docker network
- Container memory/CPU limits, health check timeout
- Routing mode and domain for Traefik integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:47:43 +02:00
hsiegeln
8f2aafadc1 feat: add REST controllers for environment, app, and deployment management
- EnvironmentAdminController: CRUD under /api/v1/admin/environments (ADMIN)
- AppController: CRUD + JAR upload under /api/v1/apps (OPERATOR+)
- DeploymentController: deploy, stop, promote, logs under /api/v1/apps/{appId}/deployments
- Security rule for /api/v1/apps/** requiring OPERATOR or ADMIN role

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:47:05 +02:00
hsiegeln
248b716cb9 feat: implement async DeploymentExecutor pipeline
- Async container deployment with health check polling
- Stops previous deployment before starting new one
- Configurable memory, CPU, health timeout via application properties
- @EnableAsync on application class for Spring async proxy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:45:38 +02:00
hsiegeln
b05b7e5597 feat: implement DockerRuntimeOrchestrator with volume-mount JAR deployment
- DockerRuntimeOrchestrator: docker-java based container lifecycle
- DisabledRuntimeOrchestrator: no-op for observability-only mode
- RuntimeOrchestratorAutoConfig: auto-detects Docker socket availability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:44:32 +02:00
hsiegeln
585e078667 feat: implement PostgreSQL repositories for runtime management
- PostgresEnvironmentRepository, PostgresAppRepository
- PostgresAppVersionRepository, PostgresDeploymentRepository
- RuntimeBeanConfig wiring repositories, services, and async executor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:43:35 +02:00
hsiegeln
55068ff625 feat: add EnvironmentService, AppService, DeploymentService
- EnvironmentService: CRUD with slug uniqueness, default env protection
- AppService: CRUD, JAR upload with SHA-256 checksumming
- DeploymentService: create, promote, status transitions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:41:48 +02:00
hsiegeln
17f45645ff feat: add runtime repository interfaces and RuntimeOrchestrator
- EnvironmentRepository, AppRepository, AppVersionRepository, DeploymentRepository
- RuntimeOrchestrator interface with ContainerRequest and ContainerStatus

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:41:05 +02:00
hsiegeln
fd2e52e155 feat: add runtime management domain records
- Environment, EnvironmentStatus, App, AppVersion
- Deployment, DeploymentStatus, RoutingMode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:40:39 +02:00
hsiegeln
85530d5ea3 feat: add runtime management database schema (environments, apps, versions, deployments)
- environments, apps, app_versions, deployments tables
- Default environment seeded on migration
- Foreign keys with CASCADE delete

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:40:18 +02:00
hsiegeln
32ae642fab chore: add docker-java dependency for runtime orchestration
- docker-java-core 3.4.1
- docker-java-transport-zerodep 3.4.1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:39:57 +02:00
hsiegeln
ec9856d8a2 fix: Ed25519SigningService falls back to ephemeral key when jwt-secret is absent
- SecurityBeanConfig uses Ed25519SigningServiceImpl.ephemeral() when no jwt-secret
- Fixes pre-existing application context failure in integration tests
- Reverts test jwt-secret from application-test.yml (no longer needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:34:55 +02:00
hsiegeln
847c1f792b test: add integration tests for claim mapping admin API
- ClaimMappingAdminControllerIT with create+list and delete tests
- Add adminHeaders() convenience method to TestSecurityHelper
- Add jwt-secret to test profile (fixes pre-existing Ed25519 init failure)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:20:58 +02:00
hsiegeln
ac9ce4f2e7 feat: add ClaimMappingAdminController for CRUD on mapping rules
- ADMIN-only REST endpoints at /api/v1/admin/claim-mappings
- Full CRUD: list, get by ID, create, update, delete
- OpenAPI annotations for Swagger documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:16:23 +02:00
hsiegeln
7657081b78 feat: disable local auth when OIDC is configured (resource server mode)
- UiAuthController.login returns 404 when OIDC issuer is configured
- JwtAuthenticationFilter skips internal user tokens in OIDC mode (agents still work)
- UserAdminController.createUser and resetPassword return 400 in OIDC mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:15:47 +02:00
hsiegeln
b5e85162f8 feat: replace syncOidcRoles with claim mapping evaluation on OIDC login
- OidcUserInfo now includes allClaims map from id_token + access_token
- OidcAuthController.callback() calls applyClaimMappings instead of syncOidcRoles
- applyClaimMappings evaluates rules, clears managed assignments, applies new ones
- Supports both assignRole and addToGroup actions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:13:52 +02:00
hsiegeln
7904a18f67 feat: add origin-aware managed/direct assignment methods to RbacService
- Add clearManagedAssignments, assignManagedRole, addUserToManagedGroup to interface
- Update assignRoleToUser and addUserToGroup to explicitly set origin='direct'
- Update getDirectRolesForUser to filter by origin='direct'
- Implement managed assignment methods with ON CONFLICT upsert

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:12:07 +02:00
hsiegeln
67ca1e726f feat: add license admin API for runtime license updates
- GET /api/v1/admin/license returns current license info
- POST /api/v1/admin/license validates and loads new license token
- Requires ADMIN role, validates Ed25519 signature before applying
- OpenAPI annotations for Swagger documentation
2026-04-07 23:12:03 +02:00
hsiegeln
b969075007 feat: add license loading at startup from env var or file
- LicenseBeanConfig wires LicenseGate bean with startup validation
- Supports token from CAMELEER_LICENSE_TOKEN env var or CAMELEER_LICENSE_FILE path
- Falls back to open mode when no license or no public key configured
- Add license config properties to application.yml
2026-04-07 23:11:02 +02:00
hsiegeln
d734597ec3 feat: implement PostgresClaimMappingRepository and wire beans
- JdbcTemplate-based CRUD for claim_mapping_rules table
- RbacBeanConfig wires ClaimMappingRepository and ClaimMappingService beans

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:10:38 +02:00
hsiegeln
dd5cf1b38c feat: implement LicenseGate for feature checking
- Thread-safe AtomicReference-based license holder
- Defaults to open mode (all features enabled) when no license loaded
- Runtime license loading with feature/limit queries
- Unit tests for open mode and licensed mode
2026-04-07 23:10:14 +02:00
hsiegeln
e1cb17707b feat: implement ClaimMappingService with equals/contains/regex matching
- Evaluates JWT claims against mapping rules
- Supports equals, contains (list + space-separated), regex match types
- Results sorted by priority
- 7 unit tests covering all match types and edge cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:09:50 +02:00
hsiegeln
b5cf35ef9a feat: implement LicenseValidator with Ed25519 signature verification
- Validates payload.signature license tokens using Ed25519 public key
- Parses tier, features, limits, timestamps from JSON payload
- Rejects expired and tampered tokens
- Unit tests for valid, expired, and tampered license scenarios
2026-04-07 23:08:04 +02:00
hsiegeln
2f8fcb866e feat: add ClaimMappingRule domain model and repository interface
- AssignmentOrigin enum (direct/managed)
- ClaimMappingRule record with match type and action enums
- ClaimMappingRepository interface for CRUD operations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:07:57 +02:00
hsiegeln
bd78207060 feat: add claim mapping rules table and origin tracking to RBAC assignments
- Add origin and mapping_id columns to user_roles and user_groups
- Create claim_mapping_rules table with match_type and action constraints
- Update primary keys to include origin column
- Add indexes for fast managed assignment cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:07:30 +02:00
hsiegeln
96ba7cd711 feat: add LicenseInfo and Feature domain model
- Feature enum with topology, lineage, correlation, debugger, replay
- LicenseInfo record with tier, features, limits, issuedAt, expiresAt
- Open mode factory method for standalone/dev usage
2026-04-07 23:06:17 +02:00
hsiegeln
c6682c4c9c fix: update package-lock.json for DS v0.1.38
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 1m33s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
SonarQube / sonarqube (push) Successful in 2m4s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:47:54 +02:00
hsiegeln
6a1d3bb129 refactor: move inline styles to CSS modules
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 13s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Extract inline fontSize/color styles from LogTab, LayoutShell,
UsersTab, GroupsTab, RolesTab, and LevelFilterBar into CSS modules.
Follows project convention of CSS modules over inline styles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:45:02 +02:00
hsiegeln
9cbf647203 chore: update DS to v0.1.38, enforce 12px font size floor
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 22s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Update @cameleer/design-system to v0.1.38 (12px minimum font size).
Replace all 10px and 11px font sizes with 12px across 25 CSS modules
and 5 TSX inline styles to match the new DS floor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:41:51 +02:00
hsiegeln
07f3c2584c fix: syncOidcRoles uses direct roles only, always overwrites
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m0s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
- Expose getDirectRolesForUser on RbacService interface so syncOidcRoles
  compares against directly-assigned roles only, not group-inherited ones
- Remove early-return that preserved existing roles when OIDC returned
  none — now always applies defaultRoles as fallback
- Update CLAUDE.md and SERVER-CAPABILITIES.md to reflect changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:56:40 +02:00
hsiegeln
ca1b549f10 docs: document OIDC access_token role extraction and audience config
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:55:01 +02:00
hsiegeln
7d5866bca8 chore: remove debug logging from OidcTokenExchanger
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:50:27 +02:00
hsiegeln
f601074e78 fix: include resource parameter in OIDC token exchange request
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Logto returns opaque access tokens unless the resource parameter is
included in both the authorization request AND the token exchange.
Append resource to the token endpoint POST body per RFC 8707 so Logto
returns a JWT access token with Custom JWT claims.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:45:44 +02:00
hsiegeln
725f826513 debug: log access_token format to diagnose opaque vs JWT
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m18s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:39:53 +02:00
hsiegeln
52f5a0414e debug: temporarily log access_token decode failures at WARN level
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:34:15 +02:00
hsiegeln
11fc85e2b9 fix: log access_token claims and audience mismatch during OIDC exchange
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Helps diagnose whether rolesClaim path matches the actual token structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:32:34 +02:00
hsiegeln
d4b530ff8a refactor: remove PKCE from OIDC flow (confidential client)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Backend holds client_secret and does the token exchange server-side,
making PKCE redundant. Removes code_verifier/code_challenge from all
frontend auth paths and backend exchange method. Eliminates the source
of "grant request is invalid" errors from verifier mismatches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:22:13 +02:00
hsiegeln
03ff9a3813 feat: generic OIDC role extraction from access token
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The OIDC login flow now reads roles from the access_token (JWT) in
addition to the id_token. This fixes role extraction with providers
like Logto that put scopes/roles in access tokens rather than id_tokens.

- Add audience and additionalScopes to OidcConfig for RFC 8707 resource
  indicator support and configurable extra scopes
- OidcTokenExchanger decodes access_token with at+jwt-compatible processor,
  falls back to id_token if access_token is opaque or has no roles
- syncOidcRoles preserves existing local roles when OIDC returns none
- SPA includes resource and additionalScopes in authorization requests
- Admin UI exposes new config fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:16:52 +02:00
hsiegeln
95eb388283 fix: handle space-delimited scope string in OIDC role extraction
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
extractRoles() only handled List claims (JSON arrays). When rolesClaim
is configured as "scope", the JWT value is a space-delimited string,
which was silently returning [] and falling back to defaultRoles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:20:37 +02:00
hsiegeln
8852ec1483 feat: add diagnostic logging for OIDC scope and role extraction
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Has started running
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
Logs received scopes, rolesClaim path, extracted roles, and all claim
keys at each stage of the OIDC auth flow to help debug Logto integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 09:16:42 +02:00
hsiegeln
23e90d6afb fix: postinstall creates public/ dir before copying favicon
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 1m20s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
SonarQube / sonarqube (push) Successful in 3m31s
Docker build copies package.json before source, so public/ doesn't
exist when npm ci runs postinstall. Use mkdirSync(recursive:true).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:38:43 +02:00
hsiegeln
d19551f8aa chore: auto-sync favicon from DS via postinstall script
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Failing after 52s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
favicon.svg is now copied from @cameleer/design-system/assets on
npm install via postinstall hook. Removed from git tracking
(.gitignore). Updates automatically when DS version changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:25:44 +02:00
hsiegeln
b2e4b91d94 chore: update design system to v0.1.37 (improved SVG logo)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:24:12 +02:00
hsiegeln
95b35f6203 fix: make OIDC logout resilient to end-session endpoint failures
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m32s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Fire end-session via fetch(no-cors) instead of window.location redirect.
Always navigate to /login?local regardless of whether end-session
succeeds, preventing broken JSON responses from blocking logout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:06:56 +02:00
hsiegeln
a443abe6ae refactor: unify all brand icons to single SVG from DS v0.1.36
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m0s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Replace PNG favicons and brand logos with cameleer3-logo.svg from
@cameleer/design-system/assets. Favicon, login dialog, and sidebar
all use the same SVG. Remove PNG favicon files from public/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 23:03:30 +02:00
hsiegeln
a5340059d7 refactor: import brand assets directly from DS v0.1.34
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 1m30s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
DS now exports ./assets/* — import PNGs directly via Vite instead of
copying to public/. Removes duplicated brand files from public/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:47:31 +02:00
hsiegeln
45cccdbd8a fix: revert to public/ brand assets — DS exports field blocks imports
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 2m7s
CI / deploy (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
The @cameleer/design-system package.json exports field doesn't include
assets/, causing production build failures. Copy PNGs to public/ and
reference via basePath until DS adds asset exports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:41:20 +02:00
hsiegeln
281e168790 fix: pass commit short hash as version to UI sidebar
Some checks failed
CI / build (push) Failing after 38s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Add VITE_APP_VERSION build arg to UI Dockerfile, pass short SHA from
CI docker build step. vite.config.ts truncates to 7 chars so both
CI build and Docker build produce consistent short hashes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:37:46 +02:00
hsiegeln
1386e80670 refactor: import brand icons directly from design system
Some checks failed
CI / build (push) Failing after 36s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Import PNGs via Vite from @cameleer/design-system/assets instead of
copying to public/. Only favicons remain in public/ (needed by HTML).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:20:07 +02:00
hsiegeln
f372d0d63c chore: update design system to v0.1.33 (transparent brand icons)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:18:26 +02:00
hsiegeln
6ef66a14ec fix: use full-color brand PNGs for login dialog and sidebar
All checks were successful
CI / build (push) Successful in 1m32s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m44s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
The SVG uses fill=currentColor (inherits text color). Switch to the
full-color PNG brand icons: 192px for login dialog, 48px for sidebar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:10:48 +02:00
hsiegeln
0761d0dbee feat: use design system brand icons for favicon, login, sidebar
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
Replace hand-crafted favicon.svg with official brand assets from
@cameleer/design-system v0.1.32: PNG favicons (16/32px) and
camel-logo.svg for login dialog and sidebar. Update SecurityConfig
public endpoints accordingly. Update documentation for architecture
cleanup (PKCE, OidcProviderHelper, role normalization, K8s hardening,
Dockerfile credential removal, CI deduplication, sidebar path fix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:08:58 +02:00
hsiegeln
0de392ff6e fix: remove securityContext from UI pod — nginx needs root for setup
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
The standard nginx image requires root to modify /etc/nginx/conf.d
and create /var/cache/nginx directories during startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:06:07 +02:00
hsiegeln
c502a42f17 refactor: architecture cleanup — OIDC dedup, PKCE, K8s hardening
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m59s
- Extract OidcProviderHelper for shared discovery + JWK source construction
- Add SystemRole.normalizeScope() to centralize role normalization
- Merge duplicate claim extraction in OidcTokenExchanger
- Add PKCE (S256) to OIDC authorization flow (frontend + backend)
- Add SecurityContext (runAsNonRoot) to all K8s deployments
- Fix postgres probe to use $POSTGRES_USER instead of hardcoded username
- Remove default credentials from Dockerfile
- Extract sanitize_branch() to shared .gitea/sanitize-branch.sh
- Fix sidebar to use /exchanges/ paths directly, remove legacy redirects
- Centralize basePath computation in router.tsx via config module

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:57:29 +02:00
hsiegeln
07ff576eb6 fix: prevent SSO re-login loop on OIDC logout
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Logout now always redirects to /login?local, either via OIDC
end_session or as a direct fallback, preventing prompt=none
auto-redirect from logging the user back in immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:37:35 +02:00
hsiegeln
c249c6f3e0 docs: update Config tab navigation behavior and role gating
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 13s
CI / deploy (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:29:20 +02:00
hsiegeln
bb6a9c9269 fix: Config tab sidebar navigation stays on config for app and route clicks
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 58s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
When on Config tab: clicking an app navigates to /config/:appId (shows
that app's config with detail panel). Clicking a route navigates to
/config/:appId (same app config, since config is per-app not per-route).
Clicking Applications header navigates to /config (all apps table).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:13:39 +02:00
hsiegeln
c6a8a4471f fix: always show Config tab and fix 404 on sidebar navigation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Config tab now always visible (not just when app selected). Shows all-
app config table at /config, single-app detail at /config/:appId.

Fixed 404 when clicking sidebar nodes while on Config tab — the sidebar
navigation built /config/appId/routeId which had no route. Now falls
back to exchanges tab for route-level navigation from config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:10:02 +02:00
hsiegeln
640a48114d docs: document UI role gating for VIEWER/OPERATOR/ADMIN
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m37s
CI / docker (push) Successful in 1m0s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 15:52:25 +02:00
hsiegeln
b1655b366e feat: role-based UI access control
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
- Hide Admin sidebar section for non-ADMIN users
- Add RequireAdmin route guard — /admin/* redirects to / for non-admin
- Move App Config from admin section to main Config tab (per-app,
  visible when app selected). VIEWER sees read-only, OPERATOR+ can edit
- Hide diagram node toolbar for VIEWER (onNodeAction conditional)
- Add useIsAdmin/useCanControl helpers to centralize role checks
- Remove App Config from admin sidebar tree

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 15:51:15 +02:00
hsiegeln
e54f308607 docs: add role-based UI access control design spec
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 10s
CI / deploy (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 15:35:59 +02:00
hsiegeln
e69b44f566 docs: document configurable userIdClaim for OIDC
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:20:50 +02:00
hsiegeln
0c77f8d594 feat: add User ID Claim field to OIDC admin config UI
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
New input in the Claim Mapping section lets admins configure which
id_token claim is used as the unique user identifier (default: sub).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:19:38 +02:00
hsiegeln
a96cf2afed feat: add configurable userIdClaim for OIDC user identification
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
The OIDC user login ID is now configurable via the admin OIDC setup
dialog (userIdClaim field). Supports dot-separated claim paths (e.g.
'email', 'preferred_username', 'custom.user_id'). Defaults to 'sub'
for backwards compatibility. Throws if the configured claim is missing
from the id_token.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:18:03 +02:00
hsiegeln
549dbaa322 docs: document OIDC role sync on every login
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:11:49 +02:00
hsiegeln
f4eafd9a0f feat: sync OIDC roles on every login, not just first
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
Roles from the id_token's rolesClaim are now diffed against stored
system roles on each OIDC login. Missing roles are added, revoked
roles are removed. Group memberships (manually assigned) are never
touched. This propagates scope revocations from the OIDC provider
on next user login.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:11:06 +02:00
hsiegeln
4e12fcbe7a docs: document server:-prefixed scopes and case-insensitive role mapping
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:06:11 +02:00
hsiegeln
9c2e6aacad feat: support server:-prefixed scopes and case-insensitive role mapping
Some checks failed
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (push) Has been cancelled
CI / build (push) Has been cancelled
M2M scope mapping now accepts both 'server:admin' and 'admin' (case-
insensitive). OIDC user login role assignment strips the 'server:'
prefix before looking up SystemRole, so 'server:viewer' from the
id_token maps to VIEWER correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:05:13 +02:00
hsiegeln
c757a0ea51 fix: replace last hardcoded paths with BASE_PATH-aware alternatives
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
- index.html: change /src/main.tsx to ./src/main.tsx (relative, respects
  <base> tag)
- AgentRegistrationController: derive SSE endpoint URL from request
  context via ServletUriComponentsBuilder instead of hardcoding /api/v1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:53:00 +02:00
hsiegeln
9a40626a27 fix: include BASE_PATH and ?local in OIDC post-logout redirect URI
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Without BASE_PATH the redirect fails behind a reverse proxy. Adding
?local prevents the SSO auto-redirect from immediately signing the
user back in after logout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:45:46 +02:00
hsiegeln
4496be08bd docs: document SSO auto-redirect, consent handling, and auto-signup
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 34s
SonarQube / sonarqube (push) Successful in 3m36s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:45:45 +02:00
hsiegeln
e8bcc39ca9 fix: add ES384 to OidcTokenExchanger JWT algorithm list
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Logto signs id_tokens with ES384 by default. SecurityConfig already
included it but OidcTokenExchanger only had RS256 and ES256.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:37:22 +02:00
hsiegeln
94bfb8fc4a fix: Back to Login button navigates to /login?local to prevent auto-redirect
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:31:57 +02:00
hsiegeln
c628c25081 fix: handle consent_required by retrying OIDC without prompt=none
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m36s
CI / deploy (push) Has been cancelled
CI / docker (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
When prompt=none fails with consent_required (scopes not yet granted),
retry the OIDC flow without prompt=none so the user can grant consent
once. Uses sessionStorage flag to prevent infinite loops — falls back
to local login if the retry also fails.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:29:31 +02:00
hsiegeln
3cea306e17 feat: auto-redirect to OIDC provider for true SSO
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m51s
CI / docker (push) Successful in 2m37s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 54s
When OIDC is configured, the login page automatically redirects to the
provider with prompt=none. If the user has an active OIDC session, they
are signed in without seeing a login page. If the provider returns
login_required (no session), falls back to the login form via ?local.
Users can bypass auto-redirect with /login?local.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:20:55 +02:00
hsiegeln
4244dd82e9 fix: use BASE_PATH for favicon references in subpath deployments
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Hardcoded /favicon.svg paths skip the <base> tag and fail when served
from a subpath like /server/. Now uses config.basePath in TSX and a
relative href in index.html so the <base> tag resolves correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:17:17 +02:00
hsiegeln
d7001804f7 fix: permit branding endpoints without authentication
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
The login page loads the branding logo before the user is signed in.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:15:21 +02:00
hsiegeln
5c4c7ad321 fix: include BASE_PATH in OIDC redirect_uri for subpath deployments
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Has started running
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
Behind a reverse proxy with strip-prefix (e.g., Traefik at /server/),
the OIDC redirect_uri must include the prefix so the callback routes
back through the proxy. Now uses config.basePath (from <base href>)
instead of hardcoding '/'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:14:34 +02:00
hsiegeln
0fab20e67a fix: append .well-known/openid-configuration to issuerUri in token exchanger
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
OidcTokenExchanger fetched the discovery document from the issuerUri
as-is, but the database stores the issuer URI (e.g. /oidc), not the
full discovery URL. Logto returns 404 for the bare issuer path.
SecurityConfig already appended the well-known suffix — now the token
exchanger does the same.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:04:57 +02:00
hsiegeln
d7563902a7 fix: read oidcTlsSkipVerify at call time instead of caching in constructor
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
OidcTokenExchanger cached securityProperties.isOidcTlsSkipVerify() in
the constructor as a boolean field. If Spring constructed the bean
before property binding completed, the cached value was false even when
the env var was set. SecurityConfig worked because it read the property
at call time. Now OidcTokenExchanger stores the SecurityProperties
reference and reads the flag on each call, matching SecurityConfig's
pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 01:02:36 +02:00
hsiegeln
99e2a8354f fix: handle HTTPS redirects in InsecureTlsHelper for OIDC discovery
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Java's automatic redirect following creates new connections that do NOT
inherit custom SSLSocketFactory/HostnameVerifier. This caused the OIDC
discovery fetch to fail on redirect even with TLS_SKIP_VERIFY=true.
Now disables auto-redirect and follows manually with SSL on each hop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:51:49 +02:00
hsiegeln
083cb8b9ec feat: add CAMELEER_CORS_ALLOWED_ORIGINS for multi-origin CORS support
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Behind a reverse proxy the browser sends Origin matching the proxy's
public URL, which the single-origin CAMELEER_UI_ORIGIN rejects.
New env var accepts comma-separated origins and takes priority over
UI_ORIGIN, which remains as a backwards-compatible fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:41:00 +02:00
hsiegeln
0609220cdf docs: add CAMELEER_OIDC_TLS_SKIP_VERIFY to all documentation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m14s
CI / docker (push) Successful in 11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:30:18 +02:00
hsiegeln
ca92b3ce7d feat: add CAMELEER_OIDC_TLS_SKIP_VERIFY to bypass cert verification for OIDC
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Self-signed CA certs on the OIDC provider (e.g. Logto behind a reverse
proxy) cause the login flow to fail because Java's truststore rejects
the connection. This adds an opt-in env var that creates a trust-all
SSLContext scoped to OIDC HTTP calls only (discovery, token exchange,
JWKS fetch) without affecting system-wide TLS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:26:40 +02:00
hsiegeln
7ebbc18b31 fix: make API calls respect BASE_PATH for subpath deployments
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
config.apiBaseUrl now derives from <base> tag when no explicit config
is set (e.g., /server/api/v1 instead of /api/v1). commands.ts authFetch
prepends apiBaseUrl and uses relative paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:04:52 +02:00
hsiegeln
5b7c92848d fix: remove path-rewriting sed that doubled BASE_PATH in <base> tag
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m43s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
The second sed matched the just-injected <base href="/server/"> and
rewrote it to <base href="/server/server/">. Since Vite builds with
base: './' (relative paths), the <base> tag alone is sufficient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:52:30 +02:00
hsiegeln
44f3821df4 docs: add CAMELEER_OIDC_JWK_SET_URI to all documentation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m40s
CI / docker (push) Successful in 12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:58:05 +02:00
hsiegeln
51abe45fba feat: add BASE_PATH env var for serving UI from a subpath
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
When BASE_PATH is set (e.g., /server/), the entrypoint script injects
a <base> tag and rewrites asset paths in index.html. React Router reads
the basename from the <base> tag. Vite builds with relative paths.
Default / for standalone mode (no changes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 21:04:28 +02:00
hsiegeln
3c70313d78 feat: add CAMELEER_OIDC_JWK_SET_URI for direct JWKS fetching
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / build (push) Has been cancelled
When set, fetches JWKs from this URL directly instead of discovering
from the OIDC well-known endpoint. Needed when the public issuer URL
(e.g., https://domain.com/oidc) isn't reachable from inside containers
but the internal URL (http://logto:3001/oidc/jwks) is.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 21:02:51 +02:00
hsiegeln
12bb734c2d fix: use tcpSocket probe for logto-postgresql instead of pg_isready
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
pg_isready without -U defaults to OS user "root" which doesn't exist
as a PostgreSQL role, causing noisy log entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:44:59 +02:00
hsiegeln
cbeaf30bc7 fix: move PG_USER/PG_PASSWORD before DB_URL in logto.yaml
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m9s
K8s $(VAR) substitution only resolves env vars defined earlier in the
list. PG_USER and PG_PASSWORD must come before DB_URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:39:50 +02:00
hsiegeln
c4d2fa90ab docs: clarify Logto proxy setup and ENDPOINT/ADMIN_ENDPOINT semantics
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 3m15s
LOGTO_ENDPOINT and LOGTO_ADMIN_ENDPOINT are public-facing URLs that
Logto uses for OIDC discovery, issuer URI, and redirects. When behind
a reverse proxy (e.g., Traefik), set these to the external URLs.
Logto requires its own subdomain (not a path prefix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:31:17 +02:00
hsiegeln
e9ef97bc20 docs: add Logto OIDC resource server spec and implementation plan
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m33s
CI / docker (push) Successful in 3m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:25:24 +02:00
hsiegeln
eecb0adf93 docs: replace Authentik with Logto, document OIDC resource server
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:15:09 +02:00
hsiegeln
c47b8b9998 ci: replace Authentik with Logto in deployment pipeline 2026-04-05 13:12:38 +02:00
hsiegeln
22d812d832 feat: replace Authentik with Logto K8s deployment 2026-04-05 13:12:01 +02:00
hsiegeln
fec6717a85 feat: update default rolesClaim to 'roles' for Logto compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:10:53 +02:00
hsiegeln
3bd07c9b07 feat: add OIDC resource server support with JWKS discovery and scope-based roles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:10:08 +02:00
hsiegeln
a5c4e0cead feat: add spring-boot-starter-oauth2-resource-server and OIDC properties
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:06:53 +02:00
hsiegeln
de85cdf5a2 fix: let SPRING_DATASOURCE_URL fully control datasource connection
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m26s
Explicit spring.datasource.url in YAML takes precedence over the env var,
causing deployed containers to connect to localhost instead of the postgres
service. Now the YAML uses ${SPRING_DATASOURCE_URL:...} so the env var
wins when set. Flyway inherits from the datasource (no separate URL).
Removed CAMELEER_DB_SCHEMA — schema is part of the datasource URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:24:22 +02:00
hsiegeln
2277a0498f fix: set CAMELEER_DB_SCHEMA=public for existing main deployment
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m1s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Existing deployment has tables in public schema. The new tenant_default
default breaks startup because Flyway sees an empty schema. Override to
public for backward compat; new deployments use the tenant-derived default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:21:17 +02:00
hsiegeln
ac87aa6eb2 fix: derive PG schema from tenant ID instead of defaulting to public
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m17s
Schema now defaults to tenant_${cameleer.tenant.id} (e.g. tenant_default,
tenant_acme) instead of public. Flyway create-schemas: true ensures the
schema is auto-created on first startup. CAMELEER_DB_SCHEMA env var still
available as override for feature branch isolation. Removed hardcoded
public schema from K8s base and main overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:46:57 +02:00
hsiegeln
f16d331621 docs: add SERVER-CAPABILITIES.md for SaaS integration reference
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Comprehensive standalone document covering API surface, agent protocol,
security, storage, multi-tenancy, deployment, and configuration — designed
for external systems (like the SaaS orchestration layer) that need to
understand and manage Cameleer3 Server instances.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 20:30:42 +02:00
hsiegeln
69055f7d74 fix: persist environment selection in Zustand store instead of URL params
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Environment selector was losing its value on navigation because URL search
params were silently dropped by navigate() calls. Moved to a Zustand store
with localStorage persistence so the selection survives navigation, page
refresh, and new tabs. Switching environment now resets all filters, clears
URL params, invalidates queries, and remounts pages via Outlet key. Also
syncs openapi.json schema with running backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 17:12:16 +02:00
hsiegeln
37eb56332a fix: use environmentId from heartbeat body for auto-heal
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
HeartbeatRequest now carries environmentId (cameleer3-common update).
Auto-heal prefers the heartbeat value (most current) over the JWT
claim, ensuring agents recover their correct environment immediately
on the first heartbeat after server restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 16:21:55 +02:00
hsiegeln
72ec87a3ba fix: persist environment in JWT claims for auto-heal recovery
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m7s
CI / deploy (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
Add 'env' claim to agent JWTs (set at registration, carried through
refresh). Auto-heal on heartbeat/SSE now reads environment from the
JWT instead of hardcoding 'default', so agents retain their correct
environment after server restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 16:12:25 +02:00
hsiegeln
346e38ee1d fix: update DS to v0.1.31, simplify env selector styles
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m23s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
DS v0.1.31 changes .env wrapper to neutral button style matching
other TopBar controls. Simplified selector CSS to inherit all
font/color properties from the wrapper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 16:01:58 +02:00
hsiegeln
39d9ec9cd6 fix: restyle environment selector to match DS TopBar pill
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Make the select transparent (no border, no background) so it
inherits the DS .env pill styling (success-colored badge with
mono font). Negative margins compensate for the pill padding.
Dropdown chevron uses currentColor to match the pill text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:53:09 +02:00
hsiegeln
08f2a01057 fix: always show environment selector in TopBar
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m12s
CI / deploy (push) Successful in 44s
CI / deploy-feature (push) Has been skipped
Use unfiltered agent query to discover environments (avoids circular
filter). Always show selector even with single environment so it's
visible as a label. Default to ['default'] when no agents connected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:47:48 +02:00
hsiegeln
574f82b731 docs: add historical implementation plans
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 37s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:45:49 +02:00
hsiegeln
c2d4d38bfb feat: move environment selector into TopBar (DS v0.1.30)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Update @cameleer/design-system to v0.1.30 which accepts ReactNode
for the environment prop. Move EnvironmentSelector from standalone
div into TopBar, rendering between theme toggle and user menu.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:43:43 +02:00
hsiegeln
694d0eef59 feat: add environment filtering across all APIs and UI
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Backend: Added optional `environment` query parameter to catalog,
search, stats, timeseries, punchcard, top-errors, logs, and agents
endpoints. ClickHouse queries filter by environment when specified
(literal SQL for AggregatingMergeTree, ? binds for raw tables).
StatsStore interface methods all accept environment parameter.

UI: Added EnvironmentSelector component (compact native select).
LayoutShell extracts distinct environments from agent data and
passes selected environment to catalog and agent queries via URL
search param (?env=). TopBar shows current environment label.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:42:26 +02:00
hsiegeln
babdc1d7a4 docs: update CLAUDE.md with multitenancy architecture
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:14:38 +02:00
hsiegeln
a188308ec5 feat: implement multitenancy with tenant isolation + environment support
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m25s
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.

Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
  "default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
  updated ORDER BY (tenant→time→env→app), added tenant_id to
  usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures

Closes #123

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:00:18 +02:00
hsiegeln
ee7226cf1c docs: multitenancy architecture design spec
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Covers tenant isolation (1 tenant = 1 server instance), environment
support (first-class agent property), ClickHouse partitioning
(tenant → time → environment → application), PostgreSQL schema-per-
tenant via JDBC currentSchema, and agent protocol changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:37:00 +02:00
hsiegeln
7429b85964 feat: show route control bar on topology diagram
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
When no exchange is selected, the topology-only diagram now shows
the RouteControlBar above it (if the agent supports routeControl
or replay and the user has OPERATOR/ADMIN role). This fixes a gap
where suspended routes with no recent exchanges had no way to be
resumed from the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:49:28 +02:00
hsiegeln
a5c07b8585 docs: update CLAUDE.md with heartbeat capabilities restoration
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:31:33 +02:00
hsiegeln
45a74075a1 feat: restore agent capabilities from heartbeat after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
The heartbeat now carries capabilities (per protocol v2 update).
On each heartbeat, capabilities are updated in the agent registry.
On auto-heal (server restart), capabilities from the heartbeat
are used instead of empty Map.of(), so the agent's feature flags
(replay, routeControl, logForwarding, etc.) are restored
immediately on the first heartbeat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:19:15 +02:00
hsiegeln
abed4dc96f security: fix SQL injection in ClickHouse query escaping
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m6s
CI / deploy (push) Successful in 47s
CI / deploy-feature (push) Has been skipped
Convert ClickHouseUsageTracker and ClickHouseMetricsQueryStore to
use JDBC parameterized queries (? binds) — these query raw tables
without AggregateFunction columns.

Fix lit(String) in RouteMetricsController and ClickHouseStatsStore
to escape backslashes before single quotes. Without this, an input
like \' breaks out of the string literal in ClickHouse (where \
is an escaped backslash). These must remain as literal SQL because
the ClickHouse JDBC 0.9.x driver wraps PreparedStatement in
sub-queries that strip AggregateFunction types, breaking -Merge
combinators.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 12:17:12 +02:00
hsiegeln
170b2c4a02 fix: run sonar:sonar in same reactor as verify
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
Running mvn sonar:sonar as a separate invocation skips child
modules. Combining verify and sonar:sonar in a single mvn
command ensures the reactor processes all modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:57:05 +02:00
hsiegeln
66e91ba18c fix: remove explicit sonar.sources/tests from mvn sonar:sonar
All checks were successful
CI / build (push) Successful in 2m0s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 14s
CI / deploy (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
Maven sonar plugin auto-detects sources and tests from the POM
module structure. Passing sonar.sources as CLI args caused path
doubling (module-dir/module-dir/src) in multi-module projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:13:47 +02:00
hsiegeln
e30b561dfe fix: use mvn sonar:sonar instead of standalone sonar-scanner
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 14s
CI / deploy (push) Successful in 44s
CI / deploy-feature (push) Has been skipped
The standalone sonar-scanner CLI has Java discovery issues in the
build container. Switch to the Maven sonar plugin (same approach
as cameleer3 agent repo), which uses Maven's own JDK. This also
removes the sonar-scanner download/install step entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:07:49 +02:00
hsiegeln
5ae94e1e2c fix: set SONAR_SCANNER_JAVA_HOME for sonar-scanner 6.x
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m42s
CI / docker (push) Successful in 15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 48s
sonar-scanner 6.x checks SONAR_SCANNER_JAVA_HOME, not JAVA_HOME.
Despite JAVA_HOME being correct and java being on PATH, the scanner
uses its own env var for Java discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:04:03 +02:00
hsiegeln
7dca8f2609 fix: derive JAVA_HOME from jar binary and add to PATH
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 13s
CI / deploy (push) Successful in 50s
CI / deploy-feature (push) Has been skipped
java binary may not be on PATH directly in the build container.
Derive JAVA_HOME from the jar binary location (which we know works)
and prepend JAVA_HOME/bin to PATH so sonar-scanner can find java.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 10:59:45 +02:00
hsiegeln
2589c681c5 fix: derive JAVA_HOME for sonar-scanner in CI workflow
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m53s
CI / docker (push) Successful in 14s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
sonar-scanner 6.x requires JAVA_HOME or java on PATH. The build
container has Java installed but doesn't export JAVA_HOME, so
derive it from the java binary location.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 10:05:18 +02:00
hsiegeln
352fa43ef8 fix: add chmod +x for sonar-scanner binary after jar extraction
All checks were successful
CI / build (push) Successful in 2m5s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 10s
CI / deploy (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
jar xf doesn't preserve Unix file permissions from zip entries,
so the sonar-scanner binary lacks the execute bit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:57:48 +02:00
hsiegeln
b04b12220b fix: resolve 25 SonarQube code smells across 21 files
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m2s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Remove unused fields (log, rbacService, roleRepository, jwt),
unused variables (agentTps, routeKeys, updated), unused imports
(HttpHeaders, JdbcTemplate). Rename restricted identifier 'record'
to 'auditRecord'/'event'. Return empty collections instead of null.
Replace .collect(Collectors.toList()) with .toList(). Simplify
conditional return in BootstrapTokenValidator.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:36:13 +02:00
hsiegeln
633a61d89d perf: batch processor and log inserts to reduce ClickHouse part creation
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m2s
SonarQube / sonarqube (push) Failing after 1m58s
Diagnostics showed ~3,200 tiny inserts per 5 minutes:
- processor_executions: 2,376 inserts (14 rows avg) — one per chunk
- logs: 803 inserts (5 rows avg) — synchronous in HTTP handler

Fix 1: Consolidate processor inserts — new insertProcessorBatches() method
flattens all ProcessorBatch records into a single INSERT per flush cycle.

Fix 2: Buffer log inserts — route through WriteBuffer<BufferedLogEntry>,
flushed on the same 5s interval as executions. LogIngestionController now
pushes to buffer instead of inserting directly.

Also reverts async_insert config (doesn't work with JDBC inline VALUES).

Expected: ~3,200 inserts/5min → ~160 (20x reduction in part creation,
MV triggers, and background merge work).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:48:04 +02:00
hsiegeln
e0aac4bf0a perf: enable ClickHouse async_insert to batch small inserts server-side
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Diagnostics showed 3,200 tiny inserts per 5 minutes (processor_executions:
2,376 at 14 rows avg, logs: 803 at 5 rows avg), each creating a new part
and triggering MV aggregations + background merges. This was the root cause
of ~400m CPU usage at 3 tx/s.

async_insert=1 with 5s busy timeout lets ClickHouse buffer incoming inserts
and consolidate them into fewer, larger parts before writing to disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:33:48 +02:00
hsiegeln
ac94a67a49 fix: reduce ClickHouse CPU by increasing flush interval, rename LIVE→AUTO labels
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m24s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:05:29 +02:00
hsiegeln
e1cb9d7872 fix: extract snapshot data from chunks, reduce ClickHouse log noise
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- ChunkAccumulator now extracts inputBody/outputBody/inputHeaders/outputHeaders
  from ExecutionChunk.inputSnapshot/outputSnapshot instead of storing empty strings
- Set ClickHouse server log level to warning (was trace by default)
- Update CLAUDE.md to document Ed25519 key derivation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:58:54 +02:00
hsiegeln
a9ec424d52 fix: derive Ed25519 signing key from JWT secret, no DB storage
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Replace DB-persisted keypair with deterministic derivation from
CAMELEER_JWT_SECRET via HMAC-SHA256 seed + seeded SHA1PRNG KeyPairGenerator.
Same secret = same key pair across restarts, no private key in the database.

Closes #121

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:18:43 +02:00
hsiegeln
81f13396a0 fix: persist Ed25519 signing key to survive server restarts
All checks were successful
CI / build (push) Successful in 2m8s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 50s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 54s
The keypair was generated ephemerally on each startup, causing agents
to reject all commands after a server restart (signature mismatch).
Now persisted to PostgreSQL server_config table and restored on startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:13:40 +02:00
hsiegeln
670e458376 fix: update ITs to use consolidated init.sql, remove dead code
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m29s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 50s
- All 7 ClickHouse integration tests now load init.sql via shared
  ClickHouseTestHelper instead of deleted V1-V11 migration files
- Remove unused useScope exports (setApp, setRoute, setExchange, clearScope)
- Remove unused CSS classes (monoCell, punchcardStack)
- Update ui/README.md DS version to v0.1.28

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:03:54 +02:00
hsiegeln
d4327af6a4 refactor: consolidate ClickHouse schema into single init.sql, cache diagrams
All checks were successful
CI / build (push) Successful in 2m2s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- Merge all V1-V11 migration scripts into one idempotent init.sql
- Simplify ClickHouseSchemaInitializer to load single file
- Replace route_diagrams projection with in-memory caches:
  hashCache (routeId+instanceId → contentHash) warm-loaded on startup,
  graphCache (contentHash → RouteGraph) lazy-populated on access
- Eliminates 9M+ row scans on diagram lookups

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:24:53 +02:00
hsiegeln
bb3e1e2bc3 fix: set deduplicate_merge_projection_mode for ReplacingMergeTree projection
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
ClickHouse 24.12 requires this setting before adding projections to
ReplacingMergeTree tables. Using 'drop' mode which discards the projection
during deduplication merges and rebuilds it afterward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:14:56 +02:00
hsiegeln
984bb2d40f fix: sort ClickHouse migration scripts by numeric version prefix
All checks were successful
CI / build (push) Successful in 2m32s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 55s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 52s
Alphabetical sort put V10/V11 before V2-V9 ("V11" < "V1_" in ASCII),
causing the route_diagrams projection to run before the table existed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:06:56 +02:00
hsiegeln
6f00ff2e28 fix: reduce ClickHouse log noise, admin query spam, and diagram scan perf
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
- Set com.clickhouse log level to INFO and org.apache.hc.client5 to WARN
- Admin hooks (useUsers/useGroups/useRoles) now only fetch on admin pages,
  eliminating AUDIT view_users entries on every UI click
- Add ClickHouse projection on route_diagrams for (tenant_id, route_id,
  instance_id, created_at) to avoid full table scans on diagram lookups
- Bump @cameleer/design-system to v0.1.28 (PAUSED mode time range fix,
  refreshTimeRange API)
- Call refreshTimeRange before invalidateQueries in PAUSED mode manual
  refresh so sidebar clicks use current time window

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:48:30 +02:00
hsiegeln
2708bcec17 fix: first exchange click doesn't highlight selected row
All checks were successful
CI / build (push) Successful in 1m47s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 53s
On first click, Dashboard was in non-split mode. The click set
selectedId locally then triggered split view, which remounted
Dashboard — losing the selectedId state.

Added activeExchangeId prop passed from ExchangesPage so the
selection survives the remount. Also syncs via useEffect when
parent changes selection (e.g. correlated exchange navigation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:28:26 +02:00
hsiegeln
901dfd1eb8 fix: PAUSED mode disabled queries entirely instead of just polling
Some checks failed
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
useLiveQuery returned enabled:false when paused, which prevented
queries from running at all. Changed to enabled:true always —
PAUSED now means "fetch once, no polling" instead of "don't fetch".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:25:04 +02:00
hsiegeln
726e77bb91 docs: update all documentation for session changes
Some checks failed
CI / build (push) Successful in 2m2s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CLAUDE.md:
- Agent registry auto-heal note (in-memory, JWT fallback)
- Usage analytics (ClickHouse usage_events table)

HOWTO.md:
- Architecture diagram: added deploy-demo (NodePort 30092) and cameleer-demo namespace
- Access URLs: added Deploy Demo
- Agent registry: server restart resilience documentation
- Route control: CommandGroupResponse note

ui/README.md:
- Fixed outdated generate-api command
- Added DS version (v0.1.26)
- Fixed VITE_API_TARGET (30081 not 30090)
- Added key features section (cmd-k, LIVE mode, route control, event icons)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:22:44 +02:00
hsiegeln
d30c267292 fix: route catalog missing routes after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 52s
CI / deploy (push) Successful in 54s
CI / deploy-feature (push) Has been skipped
After server restart, auto-healed agents register with empty
routeIds. The catalog only looked at agent registry for routes,
so routes and counts disappeared.

Now merges route IDs from ClickHouse stats_1m_route into the
catalog. Also includes apps that only exist in ClickHouse data
(no agent currently registered). Routes and exchange counts
survive server restarts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:14:27 +02:00
hsiegeln
37c10ae0a6 feat: manual refresh on sidebar navigation when LIVE mode is off
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
When autoRefresh is disabled, sidebar clicks now invalidate all
queries (queryClient.invalidateQueries()), triggering a re-fetch.
This gives users "click to refresh" behavior instead of stale data.

When LIVE mode is on, queries already poll at intervals, so no
invalidation is needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:01:29 +02:00
hsiegeln
c16f0e62ed fix: clicking Applications header navigates back to all apps
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 1m22s
CI / deploy (push) Failing after 2m26s
CI / deploy-feature (push) Has been skipped
When the Applications section is already expanded, clicking the
header now navigates to /{tab} (all applications) instead of
collapsing. When collapsed, clicking expands as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:49:54 +02:00
hsiegeln
2bc3efad7f fix: agent auth, heartbeat, and SSE all break after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Three related issues caused by in-memory agent registry being empty
after server restart:

1. JwtAuthenticationFilter rejected valid agent JWTs if agent wasn't
   in registry — now authenticates any valid JWT regardless

2. Heartbeat returned 404 for unknown agents — now auto-registers
   the agent from JWT claims (subject, application)

3. SSE endpoint returned 404 — same auto-registration fix

JWT validation result is stored as a request attribute so downstream
controllers can extract the application claim for auto-registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:41:23 +02:00
hsiegeln
0632f1c6a8 fix: agent token refresh returns 404 after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m23s
The refresh endpoint required the agent to exist in the in-memory
registry. After server restart the registry is empty, so all refresh
attempts got 404. The refresh token itself is self-contained with
subject, application, and roles — the registry lookup is optional.

Now uses application from the JWT, falling back to registry only
if the agent happens to be registered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:37:57 +02:00
hsiegeln
bdac363e40 fix: active queries list always showed itself
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
The system.processes query was returning its own row. Added
filter: query NOT LIKE '%system.processes%'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:33:47 +02:00
hsiegeln
d9615204bf fix: admin pages not scrollable (content clipped by overflow:hidden)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 1m0s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
AdminLayout was a plain div with padding but no scroll. The parent
<main> has overflow:hidden, so admin page content beyond viewport
height was clipped. Added flex:1, overflow:auto, minHeight:0 to
make AdminLayout a proper scroll container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:31:04 +02:00
hsiegeln
2896bb90a9 fix: usage events never flushed to ClickHouse
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m7s
UsageFlushScheduler was a @Component with @ConditionalOnBean, but
ClickHouseUsageTracker is created via @Bean — component scan runs
first, so the condition always evaluated false. Events accumulated
in the WriteBuffer but flush() was never called.

Moved scheduler to @Bean in StorageBeanConfig with the same
@ConditionalOnProperty guard as the tracker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:07:13 +02:00
hsiegeln
a036d8a027 docs: spec for cameleer-deploy-demo prototype
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 00:13:04 +02:00
hsiegeln
44a37317d1 fix: cmd-k context key for tab reset and Enter-to-navigate on admin pages
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 1m27s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 51s
SonarQube / sonarqube (push) Failing after 2m22s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 23:53:09 +02:00
hsiegeln
146398b183 feat: RBAC page reads cmd-k navigation state for tab switch and highlight
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:42:18 +02:00
hsiegeln
69ca52b25e feat: handle admin cmd-k selection with tab navigation state 2026-04-02 23:38:06 +02:00
hsiegeln
111bcc302d feat: build admin search data for cmd-k on admin pages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 23:34:52 +02:00
hsiegeln
cf36f81ef1 chore: bump @cameleer/design-system to v0.1.26 2026-04-02 23:33:00 +02:00
hsiegeln
28f38331cc docs: implementation plan for context-aware cmd-k search
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 23:27:07 +02:00
hsiegeln
394fde30c7 docs: spec for context-aware cmd-k search
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 23:21:53 +02:00
hsiegeln
62b5c56c56 feat: event-type icons for agent event feeds
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m0s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 53s
Icons now reflect event type (UserPlus for registration, Skull
for dead, HeartPulse for recovery, Route for state changes, etc.)
while severity still drives the color. Updated in both
AgentInstance and AgentHealth pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 23:06:01 +02:00
hsiegeln
9b401558a5 fix: make disabled route control buttons visually distinct
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Disabled buttons now show reduced opacity (0.35) and muted icon
color instead of just changing the cursor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:58:46 +02:00
hsiegeln
38b76513c7 feat: route control buttons reflect current route state
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Buttons are disabled based on route state: Started disables
Start/Resume, Stopped disables Stop/Suspend/Resume, Suspended
disables Start/Suspend. State looked up from catalog API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:56:49 +02:00
hsiegeln
2265ebf801 chore: bump @cameleer/design-system to v0.1.25
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 1m23s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m12s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:47:53 +02:00
hsiegeln
20af81a5dc feat: show server version in sidebar header
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m30s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m39s
Version injected at build time via VITE_APP_VERSION env var.
CI sets it to branch@sha. Falls back to 'dev' in local dev.
Displayed next to "Cameleer" in the sidebar header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:42:06 +02:00
hsiegeln
d819f88ae4 fix: starred routes not showing — starKey prefix mismatch
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 53s
collectStarredItems used 'app:' prefix for route keys but
buildAppTreeNodes uses 'route:' prefix. Routes were starred
but never matched in the starred section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:36:28 +02:00
hsiegeln
5880abdd93 fix: keep admin section in place, don't move to top
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
Admin section stays in its fixed position (after Starred, before
Footer). Entering admin mode collapses Applications and Starred
but does not reorder sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:32:53 +02:00
hsiegeln
b676450995 fix: simplify sidebar to Applications + Starred + Admin footer
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Remove Agents and Routes sections from sidebar. Layout is now:
Header (camel logo + Cameleer) → Search → Applications section →
Starred section (when items exist) → Footer (Admin + API Docs).

Admin accordion: clicking Admin navigates to /admin/rbac and
expands Admin section at top while collapsing Applications and
Starred. Clicking Applications exits admin mode.

Removed buildAgentTreeNodes and buildRouteTreeNodes from
sidebar-utils (no longer needed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:29:44 +02:00
hsiegeln
e495b80432 fix: increase ClickHouse pool size and reduce flush interval
All checks were successful
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 2m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Pool was hardcoded to 10 connections serving 7 concurrent write
streams + UI reads, causing "too many simultaneous queries" and
WriteBuffer overflow. Pool now defaults to 50 (configurable via
clickhouse.pool-size), flush interval reduced from 1000ms to 500ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:11:15 +02:00
hsiegeln
45eab761b7 chore: bump @cameleer/design-system to v0.1.24
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m3s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:06:13 +02:00
hsiegeln
8d899cc70c refactor: use HeartbeatRequest from cameleer3-common
Some checks failed
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (push) Has been cancelled
CI / build (push) Has been cancelled
Replace local HeartbeatRequest DTO with the shared model from
cameleer3-common. Message types exchanged between server and agent
belong in the common module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:05:26 +02:00
hsiegeln
520b80444a feat(#119): accept route states in heartbeat and state-change events
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 34s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Replace ACK-based route state inference with agent-reported state.
Heartbeats now carry optional routeStates map, and ROUTE_STATE_CHANGED
events update the registry immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 21:45:13 +02:00
hsiegeln
17aff5ef9d docs: route state protocol extension spec
Defines two backward-compatible mechanisms for accurate route state
tracking: heartbeat extension (routeStates map in heartbeat body)
and ROUTE_STATE_CHANGED events for real-time updates. Covers
agent-side detection via Camel EventNotifier, server-side handling,
multi-agent conflict resolution, and migration path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:26:38 +02:00
hsiegeln
b714d3363f feat(#119): expose route state in catalog API and sidebar/dashboard
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 29s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Add routeState field to RouteSummary DTO (null for started, 'stopped'
or 'suspended' for non-default states). Sidebar shows stop/pause icons
and state badge for affected routes in both Apps and Routes sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:15:46 +02:00
hsiegeln
0acceaf1a9 feat(#119): add RouteStateRegistry for tracking route operational state
In-memory registry that infers route state (started/stopped/suspended)
from successful route-control command ACKs. Updates state only when all
agents in a group confirm success.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:15:35 +02:00
hsiegeln
ca1d472b78 feat(#117): agent-count toasts and persistent error toast dismiss
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 30s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:08:00 +02:00
hsiegeln
c3b4f70913 feat(#116): update command hooks for synchronous group response
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 30s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Add CommandGroupResponse and ConfigUpdateResponse types. Switch
useSendGroupCommand and useSendRouteCommand from openapi-fetch to authFetch
returning CommandGroupResponse. Update useUpdateApplicationConfig to return
ConfigUpdateResponse and fix all consumer onSuccess callbacks to access
saved.config.version instead of saved.version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:01:06 +02:00
hsiegeln
027e45aadf feat(#116): synchronous group command dispatch with multi-agent response collection
Add addGroupCommandWithReplies() to AgentRegistryService that sends commands
to all LIVE agents in a group and returns CompletableFuture per agent for
collecting replies. Update sendGroupCommand() and pushConfigToAgents() to
wait with a shared 10-second deadline, returning CommandGroupResponse with
per-agent status, timeouts, and overall success. Config update endpoint now
returns ConfigUpdateResponse wrapping both the saved config and push result.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:00:56 +02:00
hsiegeln
f39f07e7bf feat(#118): add confirmation dialog for stop and suspend commands
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 35s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Stop and suspend route commands now show a ConfirmDialog requiring
typed confirmation before dispatch. Start and resume execute
immediately without confirmation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 18:54:23 +02:00
hsiegeln
d21d8b2c48 fix(#112): initialize sidebar accordion state from initial route
Some checks failed
CI / build (push) Failing after 43s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Direct navigation to /admin/* now correctly opens Admin section
and collapses operational sections on first render. Previously
the accordion effect only triggered on route transitions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 18:36:43 +02:00
hsiegeln
d5f5601554 fix(#112): add missing Routes section, fix admin double padding
Review feedback: buildRouteTreeNodes was defined but never rendered.
Added Routes section between Agents and Admin. Removed duplicate
padding on admin pages (AdminLayout handles its own padding).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 18:32:26 +02:00
hsiegeln
00042b1d14 feat(#112): remove admin tabs, sidebar handles navigation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 18:29:29 +02:00
hsiegeln
fe49eb5aba feat(#112): migrate to composable sidebar with accordion and collapse
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 18:29:25 +02:00
hsiegeln
bc913eef6e feat(#112): extract sidebar tree builders and types from DS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 18:29:22 +02:00
hsiegeln
d70ad91b33 docs: clarify search ownership and icon-rail click behavior
Search: DS renders dumb input, app owns filterQuery state and
passes it to each SidebarTree. Icon-rail click: fires both
onCollapseToggle and onToggle simultaneously, no navigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:41:31 +02:00
hsiegeln
ba361af2d7 docs: composable sidebar design spec for #112
Replaces the previous "hide sidebar on admin" approach with a
composable compound component design. DS provides shell + building
blocks (Sidebar, Section, Footer, SidebarTree); consuming app
controls all content, section ordering, accordion behavior, and
icon-rail collapse.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:38:01 +02:00
hsiegeln
78777d2ba6 Revert "feat(#112): hide sidebar, topbar, cmd palette on admin pages"
This reverts commit d95e518622.
2026-04-02 17:22:06 +02:00
hsiegeln
3f8a9715a4 Revert "feat(#112): add admin header bar with back button and logout"
This reverts commit a484364029.
2026-04-02 17:22:06 +02:00
hsiegeln
f00a3e8b97 Revert "fix(#112): remove dead admin breadcrumb code, add logout aria-label"
This reverts commit d5028193c0.
2026-04-02 17:22:06 +02:00
hsiegeln
d5028193c0 fix(#112): remove dead admin breadcrumb code, add logout aria-label
Review feedback: breadcrumb memo had an unused isAdminPage branch
(TopBar no longer renders on admin pages). Added aria-label to
icon-only logout button for screen readers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:16:01 +02:00
hsiegeln
a484364029 feat(#112): add admin header bar with back button and logout
AdminLayout gains a self-contained header (Back / Admin / user+logout)
with CSS module styles, replacing the inline padding wrapper. Admin
pages now render fully without the main app chrome.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 17:12:50 +02:00
hsiegeln
d95e518622 feat(#112): hide sidebar, topbar, cmd palette on admin pages
Pass null as sidebar prop, guard TopBar and CommandPalette with
!isAdminPage, and remove conditional admin padding from main element.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 17:12:44 +02:00
hsiegeln
56297701e6 fix: use ILIKE for case-insensitive log search in ClickHouse
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m29s
LIKE is case-sensitive in ClickHouse. Switch to ILIKE for message,
stack_trace, and logger_name searches so queries match regardless
of casing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:35:34 +02:00
hsiegeln
8c7c9911c4 feat: highlight search matches in log results
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
Recursive case-insensitive highlighting of the search query in
collapsed message, expanded full message, and stack trace. Uses the
project's amber accent color for the highlight mark.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:34:15 +02:00
hsiegeln
4d66d6ab23 fix: use deterministic badge color for app names in Logs tab
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 1m0s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Use attributeBadgeColor() (hash-based) instead of "auto" so the same
application name gets the same badge color across all pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:31:04 +02:00
hsiegeln
b73f5e6dd4 feat: add Logs tab with cursor-paginated search, level filters, and live tail
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 1m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
- Extend GET /api/v1/logs with cursor pagination, multi-level filtering,
  optional application scoping, and level count aggregation
- Add exchangeId, instanceId, application, mdc fields to log responses
- Refactor ClickHouseLogStore with keyset pagination (N+1 pattern)
- Add LogSearchRequest/LogSearchResponse core domain records
- Create LogSearchPageResponse wrapper DTO
- Add Logs as 4th content tab (Exchanges | Dashboard | Runtime | Logs)
- Implement LogSearch component with debounced search, level filter bar,
  expandable log entries, cursor pagination, and live tail mode
- Add cross-navigation: exchange header → logs, log tab → logs tab
- Update ClickHouseLogStoreIT with cursor, multi-level, cross-app tests

Closes: #104

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 08:47:16 +02:00
hsiegeln
a52751da1b fix: avoid alias shadowing in processor metrics -Merge query
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Failing after 1m52s
ClickHouse 24.12 new query analyzer resolves countMerge(total_count)
in the CASE WHEN to the SELECT alias (UInt64) instead of the original
AggregateFunction column when the alias has the same name. Renamed
aliases to tc/fc to avoid the collision.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:24:50 +02:00
hsiegeln
51780031ea fix: use alias in ORDER BY for processor metrics query
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 44s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
ClickHouse rejects countMerge() in ORDER BY after GROUP BY because the
column is already finalized to UInt64. Use the SELECT alias instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:11:54 +02:00
hsiegeln
eb2cafc7fa fix: use jar instead of unzip in sonarqube workflow
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 47s
The build container lacks unzip. The JDK jar command handles zip
extraction natively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:02:09 +02:00
hsiegeln
805e6d51cb fix: add processor_type to stats_1m_processor_detail MV
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m14s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
The table and materialized view were missing the processor_type column,
causing the RouteMetricsController query to fail and the dashboard
processor metrics table to render empty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 22:00:23 +02:00
hsiegeln
f3feaddbfe feat: show distinct attribute keys in cmd-k Attributes tab
All checks were successful
CI / build (push) Successful in 1m58s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m46s
CI / deploy (push) Successful in 47s
CI / deploy-feature (push) Has been skipped
Add GET /search/attributes/keys endpoint that queries distinct
attribute key names from ClickHouse using JSONExtractKeys. Attribute
keys appear in the cmd-k Attributes tab alongside attribute value
matches from exchange results.

- SearchIndex.distinctAttributeKeys() interface method
- ClickHouseSearchIndex implementation using arrayJoin(JSONExtractKeys)
- SearchController /attributes/keys endpoint
- useAttributeKeys() React Query hook
- buildSearchData includes attribute keys as 'attribute' category items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:39:27 +02:00
hsiegeln
9057981cf7 fix: use composite ID for routes in command palette search data
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 55s
Routes with the same name across different applications (e.g., "route1"
in both QUARKUS-APP and BACKEND-APP) were deduplicated because they
shared the same id (routeId). Use appId/routeId as the id so all
routes appear in cmd-k results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:33:23 +02:00
hsiegeln
b30a5b5760 fix: prevent cmd-k scroll reset on catalog poll refresh
All checks were successful
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 2m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m0s
The searchData useMemo recomputed on every catalog poll cycle because
catalogData got a new array reference even when content was unchanged.
This caused the CommandPalette list to re-render and reset scroll.

Use a ref with deep equality check to keep a stable catalog reference,
only updating when the actual data changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:22:50 +02:00
hsiegeln
910230cbf8 fix: add <mark> highlighting to search match context snippets
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m14s
CI / docker (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The command palette renders matchContext via dangerouslySetInnerHTML
expecting HTML with <mark> tags, but extractSnippet() returned plain
text. Wrap the matched term in <mark> tags and escape surrounding
text to prevent XSS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:18:04 +02:00
hsiegeln
1d791bb329 fix: use exact match for ID fields in full-text search
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
ID fields (execution_id, correlation_id, exchange_id) should use
exact equality, not LIKE with wildcards. LIKE is only needed for
the _search_text full-text columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:13:54 +02:00
hsiegeln
9781fe0d7c fix: include execution/correlation/exchange IDs in full-text search
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
The _search_text materialized column only contained error messages,
bodies, and headers — not execution_id, correlation_id, exchange_id,
or route_id. Searching by ID via cmd-k returned no results.

- Add ID fields to _search_text in ClickHouse DDL (covered by ngram
  bloom filter index)
- Add direct LIKE matches on execution_id, correlation_id, exchange_id
  in the text search WHERE clause for faster exact ID lookups

Requires ClickHouse table recreation (fresh install).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:12:15 +02:00
hsiegeln
92951f1dcf chore: update @cameleer/design-system to v0.1.22
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m27s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Sidebar selectedPath now uses sidebarReveal on all tabs, not just
exchanges. This fixes sidebar highlighting on dashboard and runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:09:20 +02:00
hsiegeln
a7d256b38a fix: compute hasTraceData from processor records in chunk accumulator
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The chunked ingestion path hardcoded hasTraceData=false because the
execution envelope doesn't carry processor bodies. But the processor
records DO have inputBody/outputBody — we just need to check them.

Track hasTraceData across chunks in PendingExchange and pass it to
MergedExecution when the final chunk arrives or on stale sweep.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:04:34 +02:00
hsiegeln
e26266532a fix: regenerate OpenAPI types, fix search scoping by applicationId
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
The identity rename (application→applicationId) broke search filtering
because the stale schema.d.ts still had 'application' as the field name.
The backend silently ignored the unknown field, returning unfiltered results.

- Regenerate openapi.json and schema.d.ts from live backend
- Fix Dashboard: application→applicationId in search request
- Fix RouteDetail: application→applicationId in search request (2 places)
- LayoutShell: scope command palette search by appId/routeId
- LayoutShell: pass sidebarReveal state on sidebar click navigation

Note for DS team: the Sidebar selectedPath logic (line 5451 in dist)
has a hardcoded pathname.startsWith("/exchanges/") guard. This should
be broadened to simply `S ? S : $.pathname` so sidebarReveal works on
all tabs (dashboard, runtime), not just exchanges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:55:19 +02:00
hsiegeln
178bc40706 Revert "fix: sidebar selection highlight and scoped command palette search"
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
This reverts commit 4168a6d45b.
2026-04-01 20:43:27 +02:00
hsiegeln
4168a6d45b fix: sidebar selection highlight and scoped command palette search
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Two fixes:
- Pass sidebarReveal state on sidebar navigation so the design system
  can highlight the selected entry (it compares internal /apps/... paths
  against this state value, not the browser URL)
- Command palette search now includes scope.appId and scope.routeId
  so results are filtered to the current sidebar selection

Note: sidebar highlighting works on the exchanges tab. The design
system's selectedPath logic only checks pathname.startsWith("/exchanges/")
for sidebarReveal — a DS update is needed to support /dashboard/ and
/runtime/ tabs too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:41:42 +02:00
hsiegeln
a028905e41 fix: update agent field names in frontend to match backend DTO
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
The AgentInstanceResponse backend DTO uses instanceId, displayName,
applicationId, status — but the stale schema.d.ts still had id, name,
application, state. This caused the runtime table to show no data.

- Update schema.d.ts AgentInstanceResponse fields
- Fix AgentHealth: row.id→instanceId, row.name→displayName,
  row.application→applicationId, inst.id→instanceId
- Fix AgentInstance: agent.id→instanceId, agent.name→displayName
- Fix ExchangeHeader: agent.id→instanceId, agent.state→status
- Fix LayoutShell search: agent.state→status, agentTps→tps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:36:31 +02:00
hsiegeln
f82aa26371 fix: improve ClickHouse admin page, fix AgentHealth type error
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 3m46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 58s
Rewrite ClickHouse admin to show useful storage metrics instead of
often-empty system.events data. Add active queries section.

- Replace performance endpoint: query system.parts for disk size,
  uncompressed size, compression ratio, total rows, part count
- Add /queries endpoint querying system.processes for active queries
- Frontend: storage overview strip, tables with total size, active
  queries DataTable
- Fix AgentHealth.tsx type: agentId → instanceId in inline type cast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:18:06 +02:00
hsiegeln
188810e54b feat: remove TimescaleDB, dead PG stores, and storage feature flags
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.

- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
  events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
  only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
  (docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:10:58 +02:00
hsiegeln
283e38a20d feat: remove OpenSearch, add ClickHouse admin page
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 33s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Remove all OpenSearch code, dependencies, configuration, deployment
manifests, and CI/CD references. Replace the OpenSearch admin page
with a ClickHouse admin page showing cluster status, table sizes,
performance metrics, and indexer pipeline stats.

- Delete 11 OpenSearch Java files (config, search impl, admin controller, DTOs, tests)
- Delete 3 OpenSearch frontend files (admin page, CSS, query hooks)
- Delete deploy/opensearch.yaml K8s manifest
- Remove opensearch Maven dependencies from pom.xml
- Remove opensearch config from application.yml, Dockerfile, docker-compose
- Remove opensearch from CI workflow (secrets, deploy, cleanup steps)
- Simplify ThresholdConfig (remove OpenSearch thresholds, database-only)
- Change default search backend from opensearch to clickhouse
- Add ClickHouseAdminController with /status, /tables, /performance, /pipeline
- Add ClickHouseAdminPage with StatCards, pipeline ProgressBar, tables DataTable
- Update CLAUDE.md, HOWTO.md, and source comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:56:06 +02:00
hsiegeln
5ed7d38bf7 fix: sort sidebar entries alphanumerically
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 29s
CI / docker (push) Has been skipped
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been skipped
Applications, routes within each app, and agents within each app
are now sorted by name using localeCompare.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:24:39 +02:00
hsiegeln
4cdbcdaeea fix: update frontend field names for identity rename (applicationId, instanceId)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
The backend identity rename (applicationName → applicationId,
agentId → instanceId) was not reflected in the frontend. This caused
drilldown to fail (detail.applicationName was undefined, disabling
the diagram fetch) and various display issues.

Updated schema.d.ts, ExchangeHeader, ExecutionDiagram, Dashboard,
AgentHealth, AgentInstance, LayoutShell, LogTab, InfoTab, DetailPanel,
ExchangesPage, and tracing-store.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:22:16 +02:00
hsiegeln
aa2d203f4e feat: add UI usage analytics tracking
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 1m14s
CI / deploy (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
Tracks authenticated UI user requests to understand usage patterns:
- New ClickHouse usage_events table with 90-day TTL
- UsageTrackingInterceptor captures method, path, duration, user
- Path normalization groups dynamic segments ({id}, {hash})
- Buffered writes via WriteBuffer + periodic flush
- Admin endpoint GET /api/v1/admin/usage with groupBy=endpoint|user|hour
- Skips agent requests, health checks, and data ingestion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:53:32 +02:00
hsiegeln
ce4abaf862 fix: infer compound node color from descendants when no own overlay state
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
Path containers (EIP_WHEN, EIP_OTHERWISE, etc.) don't have their own
processor records, so they never get an overlay entry. Now inferred
from descendants: green if any descendant executed, red if any failed.
Gated (amber) only when no descendants executed at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:37:47 +02:00
hsiegeln
40ce4a57b4 fix: only show amber on containers where gate blocked all children
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
A container is only gated (amber) when filterMatched=false or
duplicateMessage=true AND no descendants were executed. Containers
with executed children (split, choice, idempotent that passed) now
correctly show green/red based on their execution status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:32:39 +02:00
hsiegeln
b44ffd08be fix: color compound nodes by execution status in overlay
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 51s
CompoundNode now uses execution overlay status to color its header:
failed (red) > completed (green) > default. Previously only used
static type-based color regardless of execution state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:20:59 +02:00
hsiegeln
cf439248b5 feat: expose iteration/iterationSize fields for diagram overlay
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 52s
Replace synthetic wrapper node approach with direct iteration fields:
- ProcessorNode gains iteration (child's index) and iterationSize
  (container's total) fields, populated from ClickHouse flat records
- Frontend hooks detect iteration containers from iterationSize != null
  instead of scanning for wrapper processorTypes
- useExecutionOverlay filters children by iteration field instead of
  wrapper nodes, eliminating ITERATION_WRAPPER_TYPES entirely
- Cleaner data contract: API returns exactly what the DB stores

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:14:36 +02:00
hsiegeln
e8f9ada1d1 fix: inject ClickHouse JdbcTemplate into stats-querying controllers
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 49s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
RouteCatalogController, RouteMetricsController, and AgentRegistrationController
had unqualified JdbcTemplate injection, receiving the PostgreSQL template
instead of ClickHouse. The stats queries silently failed (caught exception)
returning 0 counts. Added @Qualifier("clickHouseJdbcTemplate") to all three.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:34:56 +02:00
hsiegeln
bc70797e31 fix: force UTC timezone in Docker runtime
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 47s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Sets TZ=UTC and -Duser.timezone=UTC to guarantee all JVM time operations
use UTC regardless of the container's base image or host configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:24:23 +02:00
hsiegeln
f6123b8a7c fix: use explicit UTC formatting in ClickHouse DateTime literals
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 50s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 57s
Timestamp.toString() uses JVM local timezone which can mismatch with
ClickHouse's UTC timezone, causing time-filtered queries to return empty
results. Replaced with DateTimeFormatter.withZone(UTC) in all lit() methods.

Also added warn logging to RouteCatalogController catch blocks to surface
query errors instead of silently swallowing them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:13:52 +02:00
hsiegeln
d739094a56 fix: update ClickHouse DDL files with new column names instead of ALTER RENAME
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
ClickHouse can't rename columns that are part of ORDER BY keys.
Updated V1-V8 DDL files directly with new column names (instance_id,
application_id) and removed V9 migration. Wipe ClickHouse and restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:40:54 +02:00
hsiegeln
91400defe9 fix: add missing V9 (ClickHouse) and V14 (PostgreSQL) identity column rename migrations
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
Migration files were lost during worktree merge — recreated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:33:02 +02:00
hsiegeln
909d713837 feat: rename agent identity fields for protocol v2 + add SHUTDOWN lifecycle state
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 22s
Align all internal naming with the agent team's protocol v2 identity rename:
- agentId → instanceId (unique per-JVM identifier)
- applicationName → applicationId (shared app identifier)
- AgentInfo: id → instanceId, name → displayName, application → applicationId

Add SHUTDOWN lifecycle state for graceful agent shutdowns:
- New POST /data/events endpoint receives agent lifecycle events
- AGENT_STOPPED event transitions agent to SHUTDOWN (skips STALE/DEAD)
- New POST /{id}/deregister endpoint removes agent from registry
- Server now distinguishes graceful shutdown from crash (heartbeat timeout)

Includes ClickHouse V9 and PostgreSQL V14 migrations for column renames.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:22:42 +02:00
hsiegeln
ad8dd73596 fix: update ChunkAccumulator tests for DiagramStore constructor param
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 52s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:58:27 +02:00
hsiegeln
e50c9fa60d fix: address SonarQube reliability issues
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 39s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
- ElkDiagramRenderer.getElkRoot(): add null guard to prevent NPE
  when node is null (SQ java:S2259)
- WriteBuffer: add offerOrWarn() that logs when buffer is full instead
  of silently dropping data. ChunkAccumulator now uses this method
  so ingestion backpressure is visible in logs (SQ java:S899)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:55:31 +02:00
hsiegeln
d4dbfa7ae6 fix: populate diagramContentHash in chunked ingestion pipeline
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 43s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
ChunkAccumulator now injects DiagramStore and looks up the content hash
when converting to MergedExecution. Without this, the detail page had
no diagram hash, so the overlay couldn't find the route diagram.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:50:34 +02:00
hsiegeln
59374482bc fix: replace PostgreSQL aggregate functions with ClickHouse -Merge combinators
RouteCatalogController, RouteMetricsController, AgentRegistrationController
all had inline SQL using SUM() on AggregateFunction columns from stats_1m_*
AggregatingMergeTree tables. Replace with countMerge/countIfMerge/sumMerge.
Also fix time_bucket() → toStartOfInterval() and ::double → toFloat64().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:49:06 +02:00
hsiegeln
43e187a023 fix: ChunkIngestionController ObjectMapper missing FAIL_ON_UNKNOWN_PROPERTIES
Adds DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES=false (required
by PROTOCOL.md) and explicit TypeReference<List<ExecutionChunk>> for
array parsing. Without this, batched chunks from ChunkedExporter
(2+ chunks in a JSON array) were silently rejected, causing final:true
chunks to be lost and all exchanges to go stale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 10:45:12 +02:00
hsiegeln
bc1c71277c fix: resolve duplicate ExecutionStore bean conflict
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 49s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 55s
ClickHouseExecutionStore implements ExecutionStore, so the concrete bean
already satisfies the interface — remove redundant wrapper bean. Align
ChunkAccumulator and ExecutionFlushScheduler conditions to
cameleer.storage.executions flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 09:44:02 +02:00
hsiegeln
520181d241 test(clickhouse): add integration tests for execution read path and tree reconstruction
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m16s
SonarQube / sonarqube (push) Failing after 2m21s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:11:44 +02:00
hsiegeln
95b9dea5c4 feat(clickhouse): wire ClickHouseExecutionStore as active ExecutionStore
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:09:14 +02:00
hsiegeln
151b96a680 feat: seq-based tree reconstruction for ClickHouse flat processor model
Dual-mode buildTree: detects seq presence and uses seq/parentSeq linkage
instead of processorId map. Handles duplicate processorIds across
iterations correctly. Old processorId-based mode kept for PG compat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:07:20 +02:00
hsiegeln
0661fd995f feat(clickhouse): add read methods to ClickHouseExecutionStore
Implements ExecutionStore interface with findById (FINAL for
ReplacingMergeTree), findProcessors (ORDER BY seq), findProcessorById,
and findProcessorBySeq. Write methods unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:04:03 +02:00
hsiegeln
190ae2797d refactor: extend ProcessorRecord with seq/iteration fields for ClickHouse model
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:02:03 +02:00
hsiegeln
968117c41a feat(clickhouse): wire Phase 4 stores with feature flags
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:44:10 +02:00
hsiegeln
7d7eb52afb feat(clickhouse): add ClickHouseLogStore with LogIndex interface
Extract LogIndex interface from OpenSearchLogIndex. Both ClickHouseLogStore
and OpenSearchLogIndex implement it. Controllers now inject LogIndex.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:42:07 +02:00
hsiegeln
c73e4abf68 feat(clickhouse): add ClickHouseAgentEventRepository with integration tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:37:51 +02:00
hsiegeln
cd63d300b3 feat(clickhouse): add ClickHouseDiagramStore with integration tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:35:32 +02:00
hsiegeln
f7daadaaa9 feat(clickhouse): add DDL for route_diagrams, agent_events, and logs tables
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:30:38 +02:00
hsiegeln
af080337f5 feat: comprehensive ClickHouse low-memory tuning and switch all storage to ClickHouse
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 58s
Replace partial memory config with full Altinity low-memory guide
settings. Revert container limit from 6Gi back to 4Gi — proper
tuning (mlock=false, reduced caches/pools/threads, disk spill for
aggregations) makes the original budget sufficient.

Switch all storage feature flags to ClickHouse:
- CAMELEER_STORAGE_SEARCH: opensearch → clickhouse
- CAMELEER_STORAGE_METRICS: postgres → clickhouse
- CAMELEER_STORAGE_STATS: already clickhouse

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:27:10 +02:00
hsiegeln
606f81a970 fix: align server with protocol v2 chunked transport spec
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m45s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 46s
- ChunkIngestionController: /data/chunks → /data/executions (matches
  PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
  avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
  envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:18:35 +02:00
hsiegeln
154bce366a fix: remove references to deleted ProcessorExecution tree fields
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m15s
CI / docker (push) Successful in 44s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m0s
cameleer3-common removed children, loopIndex, splitIndex,
multicastIndex from ProcessorExecution (flat model only now).
Iteration context lives on synthetic wrapper nodes via processorType.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:00:11 +02:00
hsiegeln
a669df08bd fix(clickhouse): tune memory settings to prevent OOM on insert
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 40s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
ClickHouse 24.12 auto-sizes caches from the cgroup limit, leaving
insufficient headroom for MV processing and background merges.
Adds a custom config that shrinks mark/index/expression caches and
caps per-query memory at 2 GiB. Bumps container limit 4Gi → 6Gi.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:54:43 +02:00
hsiegeln
af18fc4142 Merge branch 'worktree-clickhouse-phase2'
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
2026-03-31 22:06:35 +02:00
hsiegeln
1a00eed389 fix: schema initializer skips comment-only SQL segments
The V4 DDL had a semicolon inside a comment which caused the
split-on-semicolon logic to produce a comment-only segment that
ClickHouse rejected as empty query. Fixed the comment and made
the initializer strip comment-only segments before execution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:06:31 +02:00
hsiegeln
0423518f72 feat: ClickHouse Phase 3 — Stats & Analytics (materialized views)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
- DDL for 5 AggregatingMergeTree tables + 5 materialized views
- ClickHouseStatsStore: all 15 StatsStore methods using -Merge combinators
- Stats/timeseries read from pre-aggregated MVs (countMerge, sumMerge, quantileMerge)
- SLA/topErrors/punchcard query raw executions FINAL table
- Feature flag: cameleer.storage.stats (default: clickhouse)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:52:13 +02:00
hsiegeln
9df00fdde0 feat(clickhouse): wire ClickHouseStatsStore with cameleer.storage.stats feature flag (default: clickhouse)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:51:45 +02:00
hsiegeln
052990bb59 feat(clickhouse): add ClickHouseStatsStore with -Merge aggregate queries
Implements StatsStore interface for ClickHouse using AggregatingMergeTree
tables with -Merge combinators (countMerge, countIfMerge, sumMerge,
quantileMerge). Uses literal SQL for aggregate table queries to avoid
ClickHouse JDBC driver PreparedStatement issues with AggregateFunction
columns. Raw table queries (SLA, topErrors, activeErrorTypes) use normal
prepared statements.

Includes 13 integration tests covering stats, timeseries, grouped
timeseries, SLA compliance, SLA counts by app/route, top errors, active
error types, punchcard, and processor stats. Also fixes AggregateFunction
type signatures in V4 DDL (count() takes no args, countIf takes UInt8).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:49:22 +02:00
hsiegeln
eb0d26814f feat(clickhouse): add stats materialized views DDL (5 tables + 5 MVs)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 20:11:38 +02:00
hsiegeln
c8e6bbe059 Merge branch 'worktree-clickhouse-phase2'
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m2s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
2026-03-31 20:02:49 +02:00
hsiegeln
a9eabe97f7 fix: wire @Primary JdbcTemplate to the @Primary DataSource bean
The jdbcTemplate() method was calling dataSource(properties) directly,
creating a new DataSource instance instead of using the Spring-managed
@Primary bean. This caused some repositories to receive the ClickHouse
connection instead of PostgreSQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 20:02:44 +02:00
hsiegeln
e724607a66 feat: ClickHouse Phase 2 — Executions + Search (chunked transport)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m36s
CI / docker (push) Successful in 3m21s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- DDL for executions (ReplacingMergeTree) and processor_executions (MergeTree with seq/parentSeq/iteration)
- ClickHouseExecutionStore with batch INSERT for both tables
- ChunkAccumulator: buffers exchange envelope across chunks, inserts processors immediately, writes execution on final chunk
- ExecutionFlushScheduler drains WriteBuffers to ClickHouse
- ChunkIngestionController: POST /api/v1/data/chunks endpoint
- ClickHouseSearchIndex: ngram-accelerated SQL search implementing SearchIndex interface
- Feature flags: cameleer.storage.search=opensearch|clickhouse
- Uses cameleer3-common ExecutionChunk and FlatProcessorRecord models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:37:21 +02:00
hsiegeln
07f215b0fd refactor: replace server-side DTOs with cameleer3-common ExecutionChunk and FlatProcessorRecord
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:33:49 +02:00
hsiegeln
38551eac9d test(clickhouse): add end-to-end chunk pipeline integration test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:24:55 +02:00
hsiegeln
31f7113b3f feat(clickhouse): wire ChunkAccumulator, flush scheduler, and search feature flag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:21:19 +02:00
hsiegeln
6052407c82 feat(clickhouse): add ClickHouseSearchIndex with ngram-accelerated SQL search
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:18:01 +02:00
hsiegeln
776f2ce90d feat(clickhouse): add ExecutionFlushScheduler and ChunkIngestionController
ExecutionFlushScheduler drains MergedExecution and ProcessorBatch write
buffers on a fixed interval and delegates batch inserts to
ClickHouseExecutionStore. Also sweeps stale exchanges every 60s.

ChunkIngestionController exposes POST /api/v1/data/chunks, accepts
single or array ExecutionChunk payloads, and feeds them into the
ChunkAccumulator. Conditional on ChunkAccumulator bean (clickhouse.enabled).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:12:38 +02:00
hsiegeln
62420cf0c2 feat(clickhouse): add ChunkAccumulator for chunked execution ingestion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:10:21 +02:00
hsiegeln
81f7f8afe1 feat(clickhouse): add ClickHouseExecutionStore with batch insert for chunked format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:07:33 +02:00
hsiegeln
b30dfa39f4 feat(clickhouse): add executions and processor_executions DDL for chunked transport
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:04:19 +02:00
hsiegeln
20c8e17843 feat: add server-side ExecutionChunk and FlatProcessorRecord DTOs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:02:47 +02:00
a96fe59840 Merge pull request 'fix: add @Primary PG DataSource/JdbcTemplate to prevent CH bean conflict' (#99) from feature/clickhouse-phase1 into main
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m46s
CI / docker (push) Successful in 11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Reviewed-on: cameleer/cameleer3-server#99
2026-03-31 18:21:00 +02:00
hsiegeln
7cf849269f fix: add @Primary PG DataSource/JdbcTemplate to prevent CH bean conflict
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 41s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 38s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m51s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
When clickhouse.enabled=true, the ClickHouse JdbcTemplate bean prevents
Spring Boot auto-config from creating the default PG JdbcTemplate.
All PG repositories then get the CH JdbcTemplate and fail with
"Table cameleer.audit_log does not exist".

Fix: explicitly create @Primary DataSource and JdbcTemplate from
DataSourceProperties so PG remains the default for unqualified injections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:18:09 +02:00
76afcaa637 Merge pull request 'fix: cast DateTime64 to DateTime in ClickHouse TTL expression' (#98) from feature/clickhouse-phase1 into main
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m55s
CI / docker (push) Successful in 14s
CI / deploy (push) Successful in 30s
CI / deploy-feature (push) Has been skipped
Reviewed-on: cameleer/cameleer3-server#98
2026-03-31 18:10:58 +02:00
hsiegeln
b1c5cc0616 fix: cast DateTime64 to DateTime in ClickHouse TTL expression
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m46s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 1m8s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 2m19s
2026-03-31 18:10:20 +02:00
8838077eff Merge pull request 'fix: remove unsupported async_insert params from ClickHouse JDBC URL' (#97) from feature/clickhouse-phase1 into main
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m39s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 34s
Reviewed-on: cameleer/cameleer3-server#97
2026-03-31 18:04:22 +02:00
hsiegeln
8eeaecf6f3 fix: remove unsupported async_insert params from ClickHouse JDBC URL
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 55s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m39s
CI / deploy (push) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (push) Successful in 51s
CI / deploy-feature (pull_request) Has been skipped
clickhouse-jdbc 0.9.7 rejects async_insert and wait_for_async_insert as
unknown URL parameters. These are server-side settings, not driver config.
Can be set per-query later if needed via custom_settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:02:53 +02:00
b54bef302d Merge pull request 'fix: ClickHouse auth credentials and non-fatal schema init' (#96) from feature/clickhouse-phase1 into main
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m17s
Reviewed-on: cameleer/cameleer3-server#96
2026-03-31 17:57:27 +02:00
hsiegeln
f8505401d7 fix: ClickHouse auth credentials and non-fatal schema init
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 43s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 13s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m47s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
- Set CLICKHOUSE_USER/PASSWORD via k8s secret (fixes "disabling network
  access for user 'default'" when no password is set)
- Add clickhouse-credentials secret to CI deploy + feature branch copy
- Pass CLICKHOUSE_USERNAME/PASSWORD env vars to server pod
- Make schema initializer non-fatal so server starts even if CH is
  temporarily unavailable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:54:44 +02:00
a0f1a4aba4 Merge pull request 'feature/clickhouse-phase1' (#95) from feature/clickhouse-phase1 into main
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m41s
Reviewed-on: cameleer/cameleer3-server#95
2026-03-31 17:48:41 +02:00
hsiegeln
aa5fc1b830 ci: retrigger after transient GitHub actions/cache 500 error
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m44s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m44s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 11s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 2m15s
2026-03-31 17:43:40 +02:00
hsiegeln
c42e13932b ci: deploy ClickHouse StatefulSet in main deploy job
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (pull_request) Failing after 45s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / build (push) Failing after 1m6s
CI / docker (push) Has been skipped
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been skipped
The deploy/clickhouse.yaml manifest was created but not referenced
in the CI workflow. Add kubectl apply between OpenSearch and Authentik.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:41:15 +02:00
hsiegeln
59dd629b0e fix: create cameleer database on ClickHouse startup
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (pull_request) Successful in 1m49s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 10s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been cancelled
ClickHouse only has the 'default' database out of the box. The JDBC URL
connects to 'cameleer', so the database must exist before the server starts.
Uses /docker-entrypoint-initdb.d/ init script via ConfigMap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:31:17 +02:00
hsiegeln
697c689192 fix: rename ClickHouse tests to *IT pattern for CI compatibility
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m28s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 2m27s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 3m32s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 2m17s
Testcontainers tests need Docker which isn't available in CI.
Rename to *IT so Surefire skips them (Failsafe runs them with -DskipITs=false).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:19:33 +02:00
hsiegeln
7a2a0ee649 test: add ClickHouse testcontainer to integration test base
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 2m29s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Failing after 2m28s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:09:09 +02:00
hsiegeln
1b991f99a3 deploy: add ClickHouse StatefulSet and server env vars
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:08:42 +02:00
hsiegeln
21991b6cf8 feat: wire MetricsStore and MetricsQueryStore with feature flag
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:07:35 +02:00
hsiegeln
53766aeb56 feat: add ClickHouseMetricsQueryStore with time-bucketed queries
Implements MetricsQueryStore using ClickHouse toStartOfInterval() for
time-bucketed aggregation queries; verified with 4 Testcontainers tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:05:45 +02:00
hsiegeln
bf0e9ea418 refactor: extract MetricsQueryStore interface from AgentMetricsController
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:00:57 +02:00
hsiegeln
6e30b7ec65 feat: add ClickHouseMetricsStore with batch insert
TDD implementation of MetricsStore backed by ClickHouse. Uses native
Map(String,String) column type (no JSON cast), relies on ClickHouse
DEFAULT for server_received_at, and handles null tags by substituting
an empty HashMap. All 4 Testcontainers tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 16:58:20 +02:00
hsiegeln
08934376df feat: add ClickHouse schema initializer with agent_metrics DDL
Adds ClickHouseSchemaInitializer that runs on ApplicationReadyEvent,
scanning classpath:clickhouse/*.sql in filename order and executing each
statement. Adds V1__agent_metrics.sql with MergeTree table, tenant/agent
partitioning, and 365-day TTL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 16:51:21 +02:00
hsiegeln
23f901279a feat: add ClickHouse DataSource and JdbcTemplate configuration
Adds ClickHouseProperties (bound to clickhouse.*), ClickHouseConfig
(conditional HikariDataSource + JdbcTemplate beans), and extends
application.yml with clickhouse.enabled/url/username/password and
cameleer.storage.metrics properties.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 16:51:14 +02:00
hsiegeln
6171827243 build: add clickhouse-jdbc and testcontainers-clickhouse dependencies
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 16:49:04 +02:00
hsiegeln
c77d8a7af0 docs: add Phase 1 implementation plan for ClickHouse migration
10-task TDD plan covering: CH dependency, config, schema init,
ClickHouseMetricsStore, MetricsQueryStore interface extraction,
ClickHouseMetricsQueryStore, feature flag wiring, k8s deployment,
integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 16:43:14 +02:00
hsiegeln
e7eda7a7b3 docs: add ClickHouse migration design and append-only protocol spec
Design for replacing PostgreSQL/TimescaleDB + OpenSearch with ClickHouse
OSS. Covers table schemas, ingestion pipeline (ExecutionAccumulator),
ngram search indexes, materialized views, multitenancy, and retention.

Companion doc proposes append-only execution protocol for the agent repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 16:36:22 +02:00
453 changed files with 37136 additions and 7803 deletions

11
.gitea/sanitize-branch.sh Normal file
View File

@@ -0,0 +1,11 @@
#!/bin/sh
# Shared branch slug sanitization for CI jobs.
# Strips prefix (feature/, fix/, etc.), lowercases, replaces non-alphanum, truncates to 20 chars.
sanitize_branch() {
echo "$1" | sed -E 's#^(feature|fix|feat|hotfix)/##' \
| tr '[:upper:]' '[:lower:]' \
| sed 's/[^a-z0-9-]/-/g' \
| sed 's/--*/-/g; s/^-//; s/-$//' \
| cut -c1-20 \
| sed 's/-$//'
}

View File

@@ -53,6 +53,7 @@ jobs:
npm run build
env:
REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
VITE_APP_VERSION: ${{ github.sha }}
- name: Build and Test
run: mvn clean verify -DskipITs -U --batch-mode
@@ -78,14 +79,7 @@ jobs:
REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
- name: Compute branch slug
run: |
sanitize_branch() {
echo "$1" | sed -E 's#^(feature|fix|feat|hotfix)/##' \
| tr '[:upper:]' '[:lower:]' \
| sed 's/[^a-z0-9-]/-/g' \
| sed 's/--*/-/g; s/^-//; s/-$//' \
| cut -c1-20 \
| sed 's/-$//'
}
. .gitea/sanitize-branch.sh
if [ "$GITHUB_REF_NAME" = "main" ]; then
echo "BRANCH_SLUG=main" >> "$GITHUB_ENV"
echo "IMAGE_TAGS=latest" >> "$GITHUB_ENV"
@@ -118,9 +112,11 @@ jobs:
for TAG in $IMAGE_TAGS; do
TAGS="$TAGS -t gitea.siegeln.net/cameleer/cameleer3-server-ui:$TAG"
done
SHORT_SHA=$(echo "${{ github.sha }}" | cut -c1-7)
docker buildx build --platform linux/amd64 \
-f ui/Dockerfile \
--build-arg REGISTRY_TOKEN="$REGISTRY_TOKEN" \
--build-arg VITE_APP_VERSION="$SHORT_SHA" \
$TAGS \
--cache-from type=registry,ref=gitea.siegeln.net/cameleer/cameleer3-server-ui:buildcache \
--cache-to type=registry,ref=gitea.siegeln.net/cameleer/cameleer3-server-ui:buildcache,mode=max \
@@ -209,27 +205,28 @@ jobs:
--from-literal=POSTGRES_DB="${POSTGRES_DB:-cameleer}" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl create secret generic opensearch-credentials \
kubectl create secret generic logto-credentials \
--namespace=cameleer \
--from-literal=OPENSEARCH_USER="${OPENSEARCH_USER:-admin}" \
--from-literal=OPENSEARCH_PASSWORD="$OPENSEARCH_PASSWORD" \
--from-literal=PG_USER="${LOGTO_PG_USER:-logto}" \
--from-literal=PG_PASSWORD="${LOGTO_PG_PASSWORD}" \
--from-literal=ENDPOINT="${LOGTO_ENDPOINT}" \
--from-literal=ADMIN_ENDPOINT="${LOGTO_ADMIN_ENDPOINT}" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl create secret generic authentik-credentials \
kubectl create secret generic clickhouse-credentials \
--namespace=cameleer \
--from-literal=PG_USER="${AUTHENTIK_PG_USER:-authentik}" \
--from-literal=PG_PASSWORD="${AUTHENTIK_PG_PASSWORD}" \
--from-literal=AUTHENTIK_SECRET_KEY="${AUTHENTIK_SECRET_KEY}" \
--from-literal=CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}" \
--from-literal=CLICKHOUSE_PASSWORD="$CLICKHOUSE_PASSWORD" \
--dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f deploy/postgres.yaml
kubectl -n cameleer rollout status statefulset/postgres --timeout=120s
kubectl apply -f deploy/opensearch.yaml
kubectl -n cameleer rollout status statefulset/opensearch --timeout=180s
kubectl apply -f deploy/clickhouse.yaml
kubectl -n cameleer rollout status statefulset/clickhouse --timeout=180s
kubectl apply -f deploy/authentik.yaml
kubectl -n cameleer rollout status deployment/authentik-server --timeout=180s
kubectl apply -f deploy/logto.yaml
kubectl -n cameleer rollout status deployment/logto --timeout=180s
kubectl apply -k deploy/overlays/main
kubectl -n cameleer set image deployment/cameleer3-server \
@@ -248,11 +245,12 @@ jobs:
POSTGRES_USER: ${{ secrets.POSTGRES_USER }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_DB: ${{ secrets.POSTGRES_DB }}
OPENSEARCH_USER: ${{ secrets.OPENSEARCH_USER }}
OPENSEARCH_PASSWORD: ${{ secrets.OPENSEARCH_PASSWORD }}
AUTHENTIK_PG_USER: ${{ secrets.AUTHENTIK_PG_USER }}
AUTHENTIK_PG_PASSWORD: ${{ secrets.AUTHENTIK_PG_PASSWORD }}
AUTHENTIK_SECRET_KEY: ${{ secrets.AUTHENTIK_SECRET_KEY }}
LOGTO_PG_USER: ${{ secrets.LOGTO_PG_USER }}
LOGTO_PG_PASSWORD: ${{ secrets.LOGTO_PG_PASSWORD }}
LOGTO_ENDPOINT: ${{ secrets.LOGTO_ENDPOINT }}
LOGTO_ADMIN_ENDPOINT: ${{ secrets.LOGTO_ADMIN_ENDPOINT }}
CLICKHOUSE_USER: ${{ secrets.CLICKHOUSE_USER }}
CLICKHOUSE_PASSWORD: ${{ secrets.CLICKHOUSE_PASSWORD }}
deploy-feature:
needs: docker
@@ -274,14 +272,7 @@ jobs:
KUBECONFIG_B64: ${{ secrets.KUBECONFIG_BASE64 }}
- name: Compute branch variables
run: |
sanitize_branch() {
echo "$1" | sed -E 's#^(feature|fix|feat|hotfix)/##' \
| tr '[:upper:]' '[:lower:]' \
| sed 's/[^a-z0-9-]/-/g' \
| sed 's/--*/-/g; s/^-//; s/-$//' \
| cut -c1-20 \
| sed 's/-$//'
}
. .gitea/sanitize-branch.sh
SLUG=$(sanitize_branch "$GITHUB_REF_NAME")
NS="cam-${SLUG}"
SCHEMA="cam_$(echo $SLUG | tr '-' '_')"
@@ -292,7 +283,7 @@ jobs:
run: kubectl create namespace "$BRANCH_NS" --dry-run=client -o yaml | kubectl apply -f -
- name: Copy secrets from cameleer namespace
run: |
for SECRET in gitea-registry postgres-credentials opensearch-credentials cameleer-auth; do
for SECRET in gitea-registry postgres-credentials clickhouse-credentials cameleer-auth; do
kubectl get secret "$SECRET" -n cameleer -o json \
| jq 'del(.metadata.namespace, .metadata.resourceVersion, .metadata.uid, .metadata.creationTimestamp, .metadata.managedFields)' \
| kubectl apply -n "$BRANCH_NS" -f -
@@ -372,15 +363,6 @@ jobs:
kubectl wait --for=condition=Ready pod/cleanup-schema-${BRANCH_SLUG} -n cameleer --timeout=30s || true
kubectl wait --for=jsonpath='{.status.phase}'=Succeeded pod/cleanup-schema-${BRANCH_SLUG} -n cameleer --timeout=60s || true
kubectl delete pod cleanup-schema-${BRANCH_SLUG} -n cameleer --ignore-not-found
- name: Delete OpenSearch indices
run: |
kubectl run cleanup-indices-${BRANCH_SLUG} \
--namespace=cameleer \
--image=curlimages/curl:latest \
--restart=Never \
--command -- curl -sf -X DELETE "http://opensearch:9200/cam-${BRANCH_SLUG}-*"
kubectl wait --for=jsonpath='{.status.phase}'=Succeeded pod/cleanup-indices-${BRANCH_SLUG} -n cameleer --timeout=60s || true
kubectl delete pod cleanup-indices-${BRANCH_SLUG} -n cameleer --ignore-not-found
- name: Cleanup Docker images
run: |
API="https://gitea.siegeln.net/api/v1"

View File

@@ -42,9 +42,6 @@ jobs:
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-maven-
- name: Build and Test Java
run: mvn clean verify -DskipITs -U --batch-mode
- name: Install UI dependencies
working-directory: ui
run: |
@@ -57,33 +54,10 @@ jobs:
working-directory: ui
run: npm run lint -- --format json --output-file eslint-report.json || true
- name: Install sonar-scanner
- name: Build, Test and Analyze
run: |
SONAR_SCANNER_VERSION=6.2.1.4610
ARCH=$(uname -m)
case "$ARCH" in
aarch64|arm64) PLATFORM="linux-aarch64" ;;
*) PLATFORM="linux-x64" ;;
esac
curl -sSLo sonar-scanner.zip "https://binaries.sonarsource.com/Distribution/sonar-scanner-cli/sonar-scanner-cli-${SONAR_SCANNER_VERSION}-${PLATFORM}.zip"
unzip -q sonar-scanner.zip
ln -s "$(pwd)/sonar-scanner-${SONAR_SCANNER_VERSION}-${PLATFORM}/bin/sonar-scanner" /usr/local/bin/sonar-scanner
- name: SonarQube Analysis
run: |
sonar-scanner \
-Dsonar.host.url="$SONAR_HOST_URL" \
-Dsonar.token="$SONAR_TOKEN" \
mvn clean verify sonar:sonar -DskipITs -U --batch-mode \
-Dsonar.host.url=${{ secrets.SONAR_HOST_URL }} \
-Dsonar.token=${{ secrets.SONAR_TOKEN }} \
-Dsonar.projectKey=cameleer3-server \
-Dsonar.projectName="Cameleer3 Server" \
-Dsonar.sources=cameleer3-server-core/src/main/java,cameleer3-server-app/src/main/java,ui/src \
-Dsonar.tests=cameleer3-server-core/src/test/java,cameleer3-server-app/src/test/java \
-Dsonar.java.binaries=cameleer3-server-core/target/classes,cameleer3-server-app/target/classes \
-Dsonar.java.test.binaries=cameleer3-server-core/target/test-classes,cameleer3-server-app/target/test-classes \
-Dsonar.java.libraries="$HOME/.m2/repository/**/*.jar" \
-Dsonar.typescript.eslint.reportPaths=ui/eslint-report.json \
-Dsonar.eslint.reportPaths=ui/eslint-report.json \
-Dsonar.exclusions="ui/node_modules/**,ui/dist/**,**/target/**"
env:
SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
-Dsonar.projectName="Cameleer3 Server"

1
.gitignore vendored
View File

@@ -40,3 +40,4 @@ logs/
# Claude
.claude/
.worktrees/
.gitnexus

Binary file not shown.

Before

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 141 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 115 KiB

334
CLAUDE.md
View File

@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project
Cameleer3 Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer3 agents. Pushes config and commands to agents via SSE.
Cameleer3 Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer3 agents. Pushes config and commands to agents via SSE. Also orchestrates Docker container deployments when running under cameleer-saas.
## Related Project
@@ -14,8 +14,8 @@ Cameleer3 Server — observability server that receives, stores, and serves Came
## Modules
- `cameleer3-server-core` — domain logic, storage, agent registry
- `cameleer3-server-app` — Spring Boot web app, REST controllers, SSE, static resources
- `cameleer3-server-core` — domain logic, storage interfaces, services (no Spring dependencies)
- `cameleer3-server-app` — Spring Boot web app, REST controllers, SSE, persistence, Docker orchestration
## Build Commands
@@ -30,6 +30,116 @@ mvn clean verify # Full build with tests
java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar
```
## Key Classes by Package
### Core Module (`cameleer3-server-core/src/main/java/com/cameleer3/server/core/`)
**agent/** — Agent lifecycle and commands
- `AgentRegistryService` — in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycle
- `AgentInfo` — record: id, name, application, environmentId, version, routeIds, capabilities, state
- `AgentCommand` — record: id, type, targetAgent, payload, createdAt, expiresAt
- `AgentEventService` — records agent state changes, heartbeats
**runtime/** — App/Environment/Deployment domain
- `App` — record: id, environmentId, slug, displayName, containerConfig (JSONB)
- `AppVersion` — record: id, appId, version, jarPath
- `Environment` — record: id, slug, jarRetentionCount
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
- `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
- `ContainerRequest` — record: 17 fields for Docker container creation
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, cpuShares, cpuLimit, appPort, replicas, routingMode, etc.
- `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
- `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs
**search/** — Execution search
- `SearchService` — search, topErrors, punchcard, distinctAttributeKeys
- `SearchRequest` / `SearchResult` — search DTOs
**storage/** — Storage abstractions
- `ExecutionStore`, `MetricsStore`, `DiagramStore`, `SearchIndex`, `LogIndex` — interfaces
**rbac/** — Role-based access control
- `RbacService` — getDirectRolesForUser, syncOidcRoles, assignRole
- `SystemRole` — enum: AGENT, VIEWER, OPERATOR, ADMIN; `normalizeScope()` maps scopes
- `UserDetail`, `RoleDetail`, `GroupDetail` — records
**security/** — Auth
- `JwtService` — interface: createAccessToken, validateAccessToken
- `Ed25519SigningService` — interface: sign, verify (config signing)
- `OidcConfig` — record: issuerUri, clientId, audience, rolesClaim, additionalScopes
**ingestion/** — Buffered data pipeline
- `IngestionService` — ingestExecution, ingestMetric, ingestLog, ingestDiagram
- `ChunkAccumulator` — batches data for efficient flush
### App Module (`cameleer3-server-app/src/main/java/com/cameleer3/server/app/`)
**controller/** — REST endpoints
- `AgentRegistrationController` — POST /register, POST /heartbeat, GET / (list), POST /refresh-token
- `AgentSseController` — GET /sse (Server-Sent Events connection)
- `AgentCommandController` — POST /broadcast, POST /{agentId}, POST /{agentId}/ack
- `AppController` — CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versions
- `DeploymentController` — GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logs
- `EnvironmentAdminController` — CRUD /api/v1/admin/environments, PUT /{id}/jar-retention
- `ExecutionController` — GET /api/v1/executions (search + detail)
- `SearchController` — POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcard
- `LogQueryController` — GET /api/v1/logs, GET /tail
- `ChunkIngestionController` — POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}
- `UserAdminController` — CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-password
- `RoleAdminController` — CRUD /api/v1/admin/roles
- `GroupAdminController` — CRUD /api/v1/admin/groups
- `OidcConfigAdminController` — GET/POST /api/v1/admin/oidc, POST /test
- `AuditLogController` — GET /api/v1/admin/audit
- `MetricsController` — GET /api/v1/metrics, GET /timeseries
- `DiagramController` — GET /api/v1/diagrams/{id}, POST /
- `DiagramRenderController` — POST /api/v1/diagrams/render (ELK layout)
- `LicenseAdminController` — GET/POST /api/v1/admin/license
**runtime/** — Docker orchestration
- `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
- `DeploymentExecutor`@Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE
- `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
- `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing
- `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
**storage/** — PostgreSQL repositories (JdbcTemplate)
- `PostgresAppRepository`, `PostgresAppVersionRepository`, `PostgresEnvironmentRepository`
- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId
- `PostgresUserRepository`, `PostgresRoleRepository`, `PostgresGroupRepository`
- `PostgresAuditRepository`, `PostgresOidcConfigRepository`, `PostgresClaimMappingRepository`
**storage/** — ClickHouse stores
- `ClickHouseExecutionStore`, `ClickHouseMetricsStore`, `ClickHouseLogStore`
- `ClickHouseStatsStore` — pre-aggregated stats, punchcard
- `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
- `ClickHouseSearchIndex` — full-text search
- `ClickHouseUsageTracker` — usage_events for billing
**security/** — Spring Security
- `SecurityConfig` — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional
- `JwtAuthenticationFilter` — OncePerRequestFilter, validates Bearer tokens
- `JwtServiceImpl` — HMAC-SHA256 JWT (Nimbus JOSE)
- `OidcAuthController` — /api/v1/auth/oidc (login-uri, token-exchange, logout)
- `OidcTokenExchanger` — code -> tokens, role extraction from access_token then id_token
- `OidcProviderHelper` — OIDC discovery, JWK source cache
**agent/** — Agent lifecycle
- `SseConnectionManager` — manages per-agent SSE connections, delivers commands
- `AgentLifecycleMonitor`@Scheduled 10s, LIVE->STALE->DEAD transitions
**retention/** — JAR cleanup
- `JarRetentionJob`@Scheduled 03:00 daily, per-environment retention, skips deployed versions
**config/** — Spring beans
- `RuntimeOrchestratorAutoConfig` — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
- `RuntimeBeanConfig` — DeploymentExecutor, AppService, EnvironmentService
- `SecurityBeanConfig` — JwtService, Ed25519, BootstrapTokenValidator
- `StorageBeanConfig` — all repositories
- `ClickHouseConfig` — ClickHouse JdbcTemplate, schema initializer
## Key Conventions
- Java 17+ required
@@ -37,30 +147,228 @@ java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar
- Depends on `com.cameleer3:cameleer3-common` from Gitea Maven registry
- Jackson `JavaTimeModule` for `Instant` deserialization
- Communication: receives HTTP POST data from agents (executions, diagrams, metrics, logs), serves SSE event streams for config push/commands (config-update, deep-trace, replay, route-control)
- Maintains agent instance registry with states: LIVE → STALE → DEAD
- Storage: PostgreSQL (TimescaleDB) for structured data, OpenSearch for full-text search and application log storage
- Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing, bootstrap token for registration
- OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API, stored in database (`server_config` table)
- Environment filtering: all data queries (exchanges, dashboard stats, route metrics, agent events, correlation) filter by the selected environment. All commands (config-update, route-control, set-traced-processors, replay) target only agents in the selected environment when one is selected. `AgentRegistryService.findByApplicationAndEnvironment()` for environment-scoped command dispatch. Backend endpoints accept optional `environment` query parameter; null = all environments (backward compatible).
- Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT `env` claim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeat `environmentId` > JWT `env` claim > `"default"`). Capabilities and route states updated on every heartbeat (protocol v2). Route catalog falls back to ClickHouse stats for route discovery when registry has incomplete data.
- Multi-tenancy: each server instance serves one tenant (configured via `CAMELEER_TENANT_ID`, default: `"default"`). Environments (dev/staging/prod) are first-class — agents send `environmentId` at registration and in heartbeats. JWT carries `env` claim for environment persistence across token refresh. PostgreSQL isolated via schema-per-tenant (`?currentSchema=tenant_{id}`). ClickHouse shared DB with `tenant_id` + `environment` columns, partitioned by `(tenant_id, toYYYYMM(timestamp))`.
- Storage: PostgreSQL for RBAC, config, and audit; ClickHouse for all observability data (executions, search, logs, metrics, stats, diagrams). ClickHouse schema migrations in `clickhouse/*.sql`, run idempotently on startup by `ClickHouseSchemaInitializer`. Use `IF NOT EXISTS` for CREATE and ADD PROJECTION.
- Logging: ClickHouse JDBC set to INFO (`com.clickhouse`), HTTP client to WARN (`org.apache.hc.client5`) in application.yml
- Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing (key derived deterministically from JWT secret via HMAC-SHA256), bootstrap token for registration. CORS: `CAMELEER_CORS_ALLOWED_ORIGINS` (comma-separated) overrides `CAMELEER_UI_ORIGIN` for multi-origin setups (e.g., reverse proxy). UI role gating: Admin sidebar/routes hidden for non-ADMIN; diagram toolbar and route control hidden for VIEWER. Read-only for VIEWER, editable for OPERATOR+. Role helpers: `useIsAdmin()`, `useCanControl()` in `auth-store.ts`. Route guard: `RequireAdmin` in `auth/RequireAdmin.tsx`. Last-ADMIN guard: system prevents removal of the last ADMIN role (409 Conflict on role removal, user deletion, group role removal). Password policy: min 12 chars, 3-of-4 character classes, no username match (enforced on user creation and admin password reset). Brute-force protection: 5 failed attempts -> 15 min lockout (tracked via `failed_login_attempts` / `locked_until` on users table). Token revocation: `token_revoked_before` column on users, checked in `JwtAuthenticationFilter`, set on password change.
- OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API/UI, stored in database (`server_config` table). Configurable `userIdClaim` (default `sub`) determines which id_token claim is used as the user identifier. Resource server mode: accepts external access tokens (Logto M2M) via JWKS validation when `CAMELEER_OIDC_ISSUER_URI` is set. `CAMELEER_OIDC_JWK_SET_URI` overrides JWKS discovery for container networking. `CAMELEER_OIDC_TLS_SKIP_VERIFY=true` disables TLS cert verification for OIDC calls (self-signed CAs). Scope-based role mapping via `SystemRole.normalizeScope()` (case-insensitive, strips `server:` prefix): `admin`/`server:admin` -> ADMIN, `operator`/`server:operator` -> OPERATOR, `viewer`/`server:viewer` -> VIEWER. SSO: when OIDC enabled, UI auto-redirects to provider with `prompt=none` for silent sign-in; falls back to `/login?local` on `login_required`, retries without `prompt=none` on `consent_required`. Logout always redirects to `/login?local` (via OIDC end_session or direct fallback) to prevent SSO re-login loops. Auto-signup provisions new OIDC users with default roles. System roles synced on every OIDC login via `syncOidcRoles` — always overwrites directly-assigned roles (falls back to `defaultRoles` when OIDC returns none); uses `getDirectRolesForUser` to avoid touching group-inherited roles. Group memberships are never touched. Supports ES384, ES256, RS256. Shared OIDC logic in `OidcProviderHelper` (discovery, JWK source, algorithm set).
- OIDC role extraction: `OidcTokenExchanger` reads roles from the **access_token** first (JWT with `at+jwt` type, decoded by a separate processor), then falls back to id_token. `OidcConfig` includes `audience` (RFC 8707 resource indicator — included in both authorization request and token exchange POST body to trigger JWT access tokens) and `additionalScopes` (extra scopes for the SPA to request). The `rolesClaim` config points to the claim name in the token (e.g., `"roles"` for Custom JWT claims, `"realm_access.roles"` for Keycloak). All provider-specific configuration is external — no provider-specific code in the server.
- User persistence: PostgreSQL `users` table, admin CRUD at `/api/v1/admin/users`
- Usage analytics: ClickHouse `usage_events` table tracks authenticated UI requests, flushed every 5s
## Database Migrations
PostgreSQL (Flyway): `cameleer3-server-app/src/main/resources/db/migration/`
- V1 — RBAC (users, roles, groups, audit_log)
- V2 — Claim mappings (OIDC)
- V3 — Runtime management (apps, environments, deployments, app_versions)
- V4 — Environment config (default_container_config JSONB)
- V5 — App container config (container_config JSONB on apps)
- V6 — JAR retention policy (jar_retention_count on environments)
- V7 — Deployment orchestration (target_state, deployment_strategy, replica_states JSONB, deploy_stage)
- V8 — Deployment active config (resolved_config JSONB on deployments)
- V9 — Password hardening (failed_login_attempts, locked_until, token_revoked_before on users)
ClickHouse: `cameleer3-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup)
## CI/CD & Deployment
- CI workflow: `.gitea/workflows/ci.yml` — build docker deploy on push to main or feature branches
- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches
- Build step skips integration tests (`-DskipITs`) — Testcontainers needs Docker daemon
- Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime
- `REGISTRY_TOKEN` build arg required for `cameleer3-common` dependency resolution
- Registry: `gitea.siegeln.net/cameleer/cameleer3-server` (container images)
- K8s manifests in `deploy/` — Kustomize base + overlays (main/feature), shared infra (PostgreSQL, OpenSearch, Authentik) as top-level manifests
- K8s manifests in `deploy/` — Kustomize base + overlays (main/feature), shared infra (PostgreSQL, ClickHouse, Logto) as top-level manifests
- Deployment target: k3s at 192.168.50.86, namespace `cameleer` (main), `cam-<slug>` (feature branches)
- Feature branches: isolated namespace, PG schema, OpenSearch index prefix; Traefik Ingress at `<slug>-api.cameleer.siegeln.net`
- Secrets managed in CI deploy step (idempotent `--dry-run=client | kubectl apply`): `cameleer-auth`, `postgres-credentials`, `opensearch-credentials`
- K8s probes: server uses `/api/v1/health`, PostgreSQL uses `pg_isready`, OpenSearch uses `/_cluster/health`
- Feature branches: isolated namespace, PG schema; Traefik Ingress at `<slug>-api.cameleer.siegeln.net`
- Secrets managed in CI deploy step (idempotent `--dry-run=client | kubectl apply`): `cameleer-auth`, `postgres-credentials`, `clickhouse-credentials`
- K8s probes: server uses `/api/v1/health`, PostgreSQL uses `pg_isready -U "$POSTGRES_USER"` (env var, not hardcoded)
- K8s security: server and database pods run with `securityContext.runAsNonRoot`. UI (nginx) runs without securityContext (needs root for entrypoint setup).
- Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
- Docker build uses buildx registry cache + `--provenance=false` for Gitea compatibility
- CI: branch slug sanitization extracted to `.gitea/sanitize-branch.sh`, sourced by docker and deploy-feature jobs
## UI Structure
The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments**.
- **Exchanges** — route execution search and detail (`ui/src/pages/Exchanges/`)
- **Dashboard** — metrics and stats with L1/L2/L3 drill-down (`ui/src/pages/DashboardTab/`)
- **Runtime** — live agent status, logs, commands (`ui/src/pages/RuntimeTab/`)
- **Deployments** — app management, JAR upload, deployment lifecycle (`ui/src/pages/AppsTab/`)
- Config sub-tabs: **Variables | Monitoring | Traces & Taps | Route Recording | Resources**
- Create app: full page at `/apps/new` (not a modal)
- Deployment progress: `ui/src/components/DeploymentProgress.tsx` (7-stage step indicator)
### Key UI Files
- `ui/src/router.tsx` — React Router v6 routes
- `ui/src/config.ts` — apiBaseUrl, basePath
- `ui/src/auth/auth-store.ts` — Zustand: accessToken, user, roles, login/logout
- `ui/src/api/environment-store.ts` — Zustand: selected environment (localStorage)
- `ui/src/components/ContentTabs.tsx` — main tab switcher
- `ui/src/components/ExecutionDiagram/` — interactive trace view (canvas)
- `ui/src/components/ProcessDiagram/` — ELK-rendered route diagram
- `ui/src/hooks/useScope.ts` — TabKey type, scope inference
## UI Styling
- Always use `@cameleer/design-system` CSS variables for colors (`var(--amber)`, `var(--error)`, `var(--success)`, etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG `fill`/`stroke` attributes. SVG presentation attributes resolve `var()` correctly.
- Always use `@cameleer/design-system` CSS variables for colors (`var(--amber)`, `var(--error)`, `var(--success)`, etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG `fill`/`stroke` attributes. SVG presentation attributes resolve `var()` correctly. All colors use CSS variables (no hardcoded hex).
- Shared CSS modules in `ui/src/styles/` (table-section, log-panel, rate-colors, refresh-indicator, chart-card, section-card) — import these instead of duplicating patterns.
- Shared `PageLoader` component replaces copy-pasted spinner patterns.
- Design system components used consistently: `Select`, `Tabs`, `Toggle`, `Button`, `LogViewer`, `Label` — prefer DS components over raw HTML elements.
- Environment slugs are auto-computed from display name (read-only in UI).
- Brand assets: `@cameleer/design-system/assets/` provides `camel-logo.svg` (currentColor), `cameleer3-{16,32,48,192,512}.png`, and `cameleer3-logo.png`. Copied to `ui/public/` for use as favicon (`favicon-16.png`, `favicon-32.png`) and logo (`camel-logo.svg` — login dialog 36px, sidebar 28x24px).
- Sidebar generates `/exchanges/` paths directly (no legacy `/apps/` redirects). basePath is centralized in `ui/src/config.ts`; router.tsx imports it instead of re-reading `<base>` tag.
- Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g. `?text=` search query). Switching environment resets all filters and remounts pages.
## Docker Orchestration
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles.
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer3-server` DNS alias.
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS.
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer3-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
### Deployment Status Model
Deployments move through these statuses:
| Status | Meaning |
|--------|---------|
| `STOPPED` | Intentionally stopped or initial state |
| `STARTING` | Deploy in progress |
| `RUNNING` | All replicas healthy and serving |
| `DEGRADED` | Some replicas healthy, some dead |
| `STOPPING` | Graceful shutdown in progress |
| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
### JAR Management
- **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
- **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
- **Volume-based JAR mounting** for Docker-in-Docker setups: set `CAMELEER_JAR_DOCKER_VOLUME` to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
### nginx / Reverse Proxy
- `client_max_body_size 200m` is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.
## Disabled Skills
- Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands.
<!-- gitnexus:start -->
# GitNexus — Code Intelligence
This project is indexed by GitNexus as **cameleer3-server** (5509 symbols, 13919 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
## Always Do
- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
## When Debugging
1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
3. `READ gitnexus://repo/cameleer3-server/process/{processName}` — trace the full execution flow step by step
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
## When Refactoring
- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
## Never Do
- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
## Tools Quick Reference
| Tool | When to use | Command |
|------|-------------|---------|
| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` |
| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` |
| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` |
| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` |
| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` |
| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` |
## Impact Risk Levels
| Depth | Meaning | Action |
|-------|---------|--------|
| d=1 | WILL BREAK — direct callers/importers | MUST update these |
| d=2 | LIKELY AFFECTED — indirect deps | Should test |
| d=3 | MAY NEED TESTING — transitive | Test if critical path |
## Resources
| Resource | Use for |
|----------|---------|
| `gitnexus://repo/cameleer3-server/context` | Codebase overview, check index freshness |
| `gitnexus://repo/cameleer3-server/clusters` | All functional areas |
| `gitnexus://repo/cameleer3-server/processes` | All execution flows |
| `gitnexus://repo/cameleer3-server/process/{name}` | Step-by-step execution trace |
## Self-Check Before Finishing
Before completing any code modification task, verify:
1. `gitnexus_impact` was run for all modified symbols
2. No HIGH/CRITICAL risk warnings were ignored
3. `gitnexus_detect_changes()` confirms changes match expected scope
4. All d=1 (WILL BREAK) dependents were updated
## Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
```bash
npx gitnexus analyze
```
If the index previously included embeddings, preserve them by adding `--embeddings`:
```bash
npx gitnexus analyze --embeddings
```
To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
## CLI
| Task | Read this skill file |
|------|---------------------|
| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` |
| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` |
| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` |
<!-- gitnexus:end -->

View File

@@ -18,10 +18,6 @@ FROM eclipse-temurin:17-jre
WORKDIR /app
COPY --from=build /build/cameleer3-server-app/target/cameleer3-server-app-*.jar /app/server.jar
ENV SPRING_DATASOURCE_URL=jdbc:postgresql://postgres:5432/cameleer3
ENV SPRING_DATASOURCE_USERNAME=cameleer
ENV SPRING_DATASOURCE_PASSWORD=cameleer_dev
ENV OPENSEARCH_URL=http://opensearch:9200
EXPOSE 8081
ENTRYPOINT exec java -jar /app/server.jar
ENV TZ=UTC
ENTRYPOINT exec java -Duser.timezone=UTC -jar /app/server.jar

106
HOWTO.md
View File

@@ -21,18 +21,17 @@ mvn clean verify # compile + run all tests (needs Docker for integrati
## Infrastructure Setup
Start PostgreSQL and OpenSearch:
Start PostgreSQL:
```bash
docker compose up -d
```
This starts TimescaleDB (PostgreSQL 16) and OpenSearch 2.19. The database schema is applied automatically via Flyway migrations on server startup.
This starts PostgreSQL 16. The database schema is applied automatically via Flyway migrations on server startup. ClickHouse tables are created by the schema initializer on startup.
| Service | Port | Purpose |
|------------|------|----------------------|
| PostgreSQL | 5432 | JDBC (Spring JDBC) |
| OpenSearch | 9200 | REST API (full-text) |
PostgreSQL credentials: `cameleer` / `cameleer_dev`, database `cameleer3`.
@@ -40,9 +39,15 @@ PostgreSQL credentials: `cameleer` / `cameleer_dev`, database `cameleer3`.
```bash
mvn clean package -DskipTests
CAMELEER_AUTH_TOKEN=my-secret-token java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar
SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/cameleer3 \
SPRING_DATASOURCE_USERNAME=cameleer \
SPRING_DATASOURCE_PASSWORD=cameleer_dev \
CAMELEER_AUTH_TOKEN=my-secret-token \
java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar
```
> **Note:** The Docker image no longer includes default database credentials. When running via `docker run`, pass `-e SPRING_DATASOURCE_URL=...` etc. The docker-compose setup provides these automatically.
The server starts on **port 8081**. The `CAMELEER_AUTH_TOKEN` environment variable is **required** — the server fails fast on startup if it is not set.
For token rotation without downtime, set `CAMELEER_AUTH_TOKEN_PREVIOUS` to the old token while rolling out the new one. The server accepts both during the overlap window.
@@ -101,12 +106,14 @@ JWTs carry a `roles` claim. Endpoints are restricted by role:
| Role | Access |
|------|--------|
| `AGENT` | Data ingestion (`/data/**` — executions, diagrams, metrics, logs), heartbeat, SSE events, command ack |
| `VIEWER` | Search, execution detail, diagrams, agent list |
| `OPERATOR` | VIEWER + send commands to agents |
| `ADMIN` | OPERATOR + user management (`/admin/**`) |
| `VIEWER` | Search, execution detail, diagrams, agent list, app config (read-only) |
| `OPERATOR` | VIEWER + send commands to agents, route control, replay, edit app config |
| `ADMIN` | OPERATOR + user management, audit log, OIDC config, database admin (`/admin/**`) |
The env-var local user gets `ADMIN` role. Agents get `AGENT` role at registration.
**UI role gating:** The sidebar hides the Admin section for non-ADMIN users. Admin routes (`/admin/*`) redirect to `/` for non-admin. The diagram node toolbar and route control bar are hidden for VIEWER. Config is a main tab (`/config` shows all apps, `/config/:appId` filters to one app with detail panel; sidebar clicks stay on config tab, route clicks resolve to parent app). VIEWER sees read-only, OPERATOR+ can edit.
### OIDC Login (Optional)
OIDC configuration is stored in PostgreSQL and managed via the admin API or UI. The SPA checks if OIDC is available:
@@ -139,7 +146,7 @@ curl -s -X PUT http://localhost:8081/api/v1/admin/oidc \
-H "Authorization: Bearer $TOKEN" \
-d '{
"enabled": true,
"issuerUri": "http://authentik:9000/application/o/cameleer/",
"issuerUri": "http://logto:3001/oidc",
"clientId": "your-client-id",
"clientSecret": "your-client-secret",
"rolesClaim": "realm_access.roles",
@@ -157,28 +164,48 @@ curl -s -X DELETE http://localhost:8081/api/v1/admin/oidc \
**Initial provisioning**: OIDC can also be seeded from `CAMELEER_OIDC_*` env vars on first startup (when DB is empty). After that, the admin API takes over.
### Authentik Setup (OIDC Provider)
### Logto Setup (OIDC Provider)
Authentik is deployed alongside the Cameleer stack. After first deployment:
Logto is deployed alongside the Cameleer stack. After first deployment:
1. **Initial setup**: Open `http://192.168.50.86:30950/if/flow/initial-setup/` and create the admin account
2. **Create provider**: Admin Interface → Providers → Create → OAuth2/OpenID Provider
- Name: `Cameleer`
- Authorization flow: `default-provider-authorization-explicit-consent`
- Client type: `Confidential`
- Redirect URIs: `http://192.168.50.86:30090/callback` (or your UI URL)
Logto is proxy-aware via `TRUST_PROXY_HEADER=1`. The `LOGTO_ENDPOINT` and `LOGTO_ADMIN_ENDPOINT` secrets define the public-facing URLs that Logto uses for OIDC discovery, issuer URI, and redirect URLs. When behind a reverse proxy (e.g., Traefik), set these to the external URLs (e.g., `https://auth.cameleer.my.domain`). Logto needs its own subdomain — it cannot be path-prefixed under another app.
1. **Initial setup**: Open the Logto admin console (the `LOGTO_ADMIN_ENDPOINT` URL) and create the admin account
2. **Create SPA application**: Applications → Create → Single Page App
- Name: `Cameleer UI`
- Redirect URI: your UI URL + `/oidc/callback`
- Note the **Client ID**
3. **Create API Resource**: API Resources → Create
- Name: `Cameleer Server API`
- Indicator: your API URL (e.g., `https://cameleer.siegeln.net/api`)
- Add permissions: `server:admin`, `server:operator`, `server:viewer`
4. **Create M2M application** (for SaaS platform): Applications → Create → Machine-to-Machine
- Name: `Cameleer SaaS`
- Assign the API Resource created above with `server:admin` scope
- Note the **Client ID** and **Client Secret**
3. **Create application**: Admin Interface → Applications → Create
- Name: `Cameleer`
- Provider: select `Cameleer` (created above)
4. **Configure roles** (optional): Create groups in Authentik and map them to Cameleer roles via the `roles-claim` config. Default claim path is `realm_access.roles`. For Authentik, you may need to customize the OIDC scope to include group claims.
5. **Configure Cameleer**: Use the admin API (`PUT /api/v1/admin/oidc`) or set env vars for initial seeding:
5. **Configure Cameleer OIDC login**: Use the admin API (`PUT /api/v1/admin/oidc`) or set env vars for initial seeding:
```
CAMELEER_OIDC_ENABLED=true
CAMELEER_OIDC_ISSUER=http://authentik:9000/application/o/cameleer/
CAMELEER_OIDC_ISSUER=<LOGTO_ENDPOINT>/oidc
CAMELEER_OIDC_CLIENT_ID=<client-id-from-step-2>
CAMELEER_OIDC_CLIENT_SECRET=<client-secret-from-step-2>
CAMELEER_OIDC_CLIENT_SECRET=<not-needed-for-public-spa>
```
6. **Configure resource server** (for M2M token validation):
```
CAMELEER_OIDC_ISSUER_URI=<LOGTO_ENDPOINT>/oidc
CAMELEER_OIDC_JWK_SET_URI=http://logto:3001/oidc/jwks
CAMELEER_OIDC_AUDIENCE=<api-resource-indicator-from-step-3>
CAMELEER_OIDC_TLS_SKIP_VERIFY=true # optional — skip cert verification for self-signed CAs
```
`JWK_SET_URI` is needed when the public issuer URL isn't reachable from inside containers — it fetches JWKS directly from the internal Logto service. `TLS_SKIP_VERIFY` disables certificate verification for all OIDC HTTP calls (discovery, token exchange, JWKS); use only when the provider has a self-signed CA.
### SSO Behavior
When OIDC is configured and enabled, the UI automatically redirects to the OIDC provider for silent SSO (`prompt=none`). Users with an active provider session are signed in without seeing a login form. On first login, the provider may show a consent screen (scopes), after which subsequent logins are seamless. If auto-signup is enabled, new users are automatically provisioned with the configured default roles.
- **Bypass SSO**: Navigate to `/login?local` to see the local login form
- **Subpath deployments**: The OIDC redirect_uri respects `BASE_PATH` (e.g., `https://host/server/oidc/callback`)
- **Role sync**: System roles (ADMIN/OPERATOR/VIEWER) are synced from OIDC scopes on every login — revoking a scope in the provider takes effect on next login. Manually assigned group memberships are preserved.
### User Management (ADMIN only)
@@ -344,10 +371,14 @@ curl -s -X POST http://localhost:8081/api/v1/agents/agent-1/commands/{commandId}
**Agent lifecycle:** LIVE (heartbeat within 90s) → STALE (missed 3 heartbeats) → DEAD (5min after STALE). DEAD agents kept indefinitely.
**Server restart resilience:** The agent registry is in-memory and lost on server restart. Agents auto-re-register on their next heartbeat or SSE connection — the server reconstructs registry entries from JWT claims (subject, application). Route catalog uses ClickHouse execution data as fallback until agents re-register with full route IDs. Agents should also handle 404 on heartbeat by triggering a full re-registration.
**SSE events:** `config-update`, `deep-trace`, `replay`, `route-control` commands pushed in real time. Server sends ping keepalive every 15s.
**Command expiry:** Unacknowledged commands expire after 60 seconds.
**Route control responses:** Route control commands return `CommandGroupResponse` with per-agent status, response count, and timed-out agent IDs.
### Backpressure
When the write buffer is full (default capacity: 50,000), ingestion endpoints return **503 Service Unavailable**. Already-buffered data is not lost.
@@ -374,6 +405,7 @@ Key settings in `cameleer3-server-app/src/main/resources/application.yml`:
| `security.ui-user` | `admin` | UI login username (`CAMELEER_UI_USER` env var) |
| `security.ui-password` | `admin` | UI login password (`CAMELEER_UI_PASSWORD` env var) |
| `security.ui-origin` | `http://localhost:5173` | CORS allowed origin for UI (`CAMELEER_UI_ORIGIN` env var) |
| `security.cors-allowed-origins` | *(empty)* | Comma-separated CORS origins (`CAMELEER_CORS_ALLOWED_ORIGINS`) — overrides `ui-origin` when set |
| `security.jwt-secret` | *(random)* | HMAC secret for JWT signing (`CAMELEER_JWT_SECRET`). If set, tokens survive restarts |
| `security.oidc.enabled` | `false` | Enable OIDC login (`CAMELEER_OIDC_ENABLED`) |
| `security.oidc.issuer-uri` | | OIDC provider issuer URL (`CAMELEER_OIDC_ISSUER`) |
@@ -381,8 +413,8 @@ Key settings in `cameleer3-server-app/src/main/resources/application.yml`:
| `security.oidc.client-secret` | | OAuth2 client secret (`CAMELEER_OIDC_CLIENT_SECRET`) |
| `security.oidc.roles-claim` | `realm_access.roles` | JSONPath to roles in OIDC id_token (`CAMELEER_OIDC_ROLES_CLAIM`) |
| `security.oidc.default-roles` | `VIEWER` | Default roles for new OIDC users (`CAMELEER_OIDC_DEFAULT_ROLES`) |
| `opensearch.log-index-prefix` | `logs-` | OpenSearch index prefix for application logs (`CAMELEER_LOG_INDEX_PREFIX`) |
| `opensearch.log-retention-days` | `7` | Days before log indices are deleted (`CAMELEER_LOG_RETENTION_DAYS`) |
| `cameleer.indexer.debounce-ms` | `2000` | Search indexer debounce delay (`CAMELEER_INDEXER_DEBOUNCE_MS`) |
| `cameleer.indexer.queue-size` | `10000` | Search indexer queue capacity (`CAMELEER_INDEXER_QUEUE_SIZE`) |
## Web UI Development
@@ -407,7 +439,7 @@ npm run generate-api # Requires backend running on :8081
## Running Tests
Integration tests use Testcontainers (starts PostgreSQL and OpenSearch automatically — requires Docker):
Integration tests use Testcontainers (starts PostgreSQL automatically — requires Docker):
```bash
# All tests
@@ -438,13 +470,15 @@ The full stack is deployed to k3s via CI/CD on push to `main`. K8s manifests are
```
cameleer namespace:
PostgreSQL (StatefulSet, 10Gi PVC) ← postgres:5432 (ClusterIP)
OpenSearch (StatefulSet, 10Gi PVC) ← opensearch:9200 (ClusterIP)
ClickHouse (StatefulSet, 10Gi PVC) ← clickhouse:8123 (ClusterIP)
cameleer3-server (Deployment) ← NodePort 30081
cameleer3-ui (Deployment, Nginx) ← NodePort 30090
Authentik Server (Deployment) ← NodePort 30950
Authentik Worker (Deployment)
Authentik PostgreSQL (StatefulSet, 1Gi) ← ClusterIP
Authentik Redis (Deployment) ← ClusterIP
cameleer-deploy-demo (Deployment) ← NodePort 30092
Logto Server (Deployment) ← NodePort 30951/30952
Logto PostgreSQL (StatefulSet, 1Gi) ← ClusterIP
cameleer-demo namespace:
(deployed Camel applications — managed by cameleer-deploy-demo)
```
### Access (from your network)
@@ -454,13 +488,15 @@ cameleer namespace:
| Web UI | `http://192.168.50.86:30090` |
| Server API | `http://192.168.50.86:30081/api/v1/health` |
| Swagger UI | `http://192.168.50.86:30081/api/v1/swagger-ui.html` |
| Authentik | `http://192.168.50.86:30950` |
| Deploy Demo | `http://192.168.50.86:30092` |
| Logto API | `LOGTO_ENDPOINT` secret (NodePort 30951 direct, or behind reverse proxy) |
| Logto Admin | `LOGTO_ADMIN_ENDPOINT` secret (NodePort 30952 direct, or behind reverse proxy) |
### CI/CD Pipeline
Push to `main` triggers: **build** (UI npm + Maven, unit tests) → **docker** (buildx amd64 for server + UI, push to Gitea registry) → **deploy** (kubectl apply + rolling update).
Required Gitea org secrets: `REGISTRY_TOKEN`, `KUBECONFIG_BASE64`, `CAMELEER_AUTH_TOKEN`, `CAMELEER_JWT_SECRET`, `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB`, `OPENSEARCH_USER`, `OPENSEARCH_PASSWORD`, `CAMELEER_UI_USER` (optional), `CAMELEER_UI_PASSWORD` (optional), `AUTHENTIK_PG_USER`, `AUTHENTIK_PG_PASSWORD`, `AUTHENTIK_SECRET_KEY`, `CAMELEER_OIDC_ENABLED`, `CAMELEER_OIDC_ISSUER`, `CAMELEER_OIDC_CLIENT_ID`, `CAMELEER_OIDC_CLIENT_SECRET`.
Required Gitea org secrets: `REGISTRY_TOKEN`, `KUBECONFIG_BASE64`, `CAMELEER_AUTH_TOKEN`, `CAMELEER_JWT_SECRET`, `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB`, `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, `CAMELEER_UI_USER` (optional), `CAMELEER_UI_PASSWORD` (optional), `LOGTO_PG_USER`, `LOGTO_PG_PASSWORD`, `LOGTO_ENDPOINT` (public-facing Logto URL, e.g., `https://auth.cameleer.my.domain`), `LOGTO_ADMIN_ENDPOINT` (admin console URL), `CAMELEER_OIDC_ISSUER_URI` (optional, for resource server M2M token validation), `CAMELEER_OIDC_AUDIENCE` (optional, API resource indicator), `CAMELEER_OIDC_TLS_SKIP_VERIFY` (optional, skip TLS cert verification for self-signed CAs).
### Manual K8s Commands
@@ -474,8 +510,8 @@ kubectl -n cameleer logs -f deploy/cameleer3-server
# View PostgreSQL logs
kubectl -n cameleer logs -f statefulset/postgres
# View OpenSearch logs
kubectl -n cameleer logs -f statefulset/opensearch
# View ClickHouse logs
kubectl -n cameleer logs -f statefulset/clickhouse
# Restart server
kubectl -n cameleer rollout restart deployment/cameleer3-server

259
UI-CONSISTENCY-AUDIT.md Normal file
View File

@@ -0,0 +1,259 @@
> **Status: RESOLVED** — All phases (1-5) executed on 2026-04-09. Remaining: responsive design (separate initiative).
# UI Consistency Audit — cameleer3-server
**Date:** 2026-04-09
**Scope:** All files under `ui/src/` (26 CSS modules, ~45 TSX components, ~15 pages)
**Verdict:** ~55% design system adoption for interactive UI. Significant duplication and inline style debt.
---
## Executive Summary
| Dimension | Score | Key Issue |
|-----------|-------|-----------|
| Design system component adoption | 55% | 32 raw `<button>`, 12 raw `<select>`, 8 raw `<input>` should use DS |
| Color consistency | Poor | ~140 violations: 45 hardcoded hex in TSX, 13 naked hex in CSS, ~55 fallback hex in `var()` |
| Inline styles | Poor | 55 RED (static inline styles), 8 YELLOW, 14 GREEN (justified) |
| Layout consistency | Mixed | 3 different page padding values, mixed gap/margin approaches |
| CSS module duplication | 22% | ~135 of 618 classes are copy-pasted across files |
| Responsive design | None | Zero `@media` queries in entire UI |
---
## 1. Critical: Hardcoded Colors (CLAUDE.md violation)
The project rule states: *"Always use `@cameleer/design-system` CSS variables for colors — never hardcode hex values."*
### Worst offenders
| File | Violations | Severity |
|------|-----------|----------|
| `ProcessDiagram/DiagramNode.tsx` | ~20 hex values in SVG fill/stroke | Critical |
| `ExecutionDiagram/ExecutionDiagram.module.css` | 17 naked hex + ~40 hex fallbacks in `var()` | Critical |
| `ProcessDiagram/CompoundNode.tsx` | 8 hex values | Critical |
| `ProcessDiagram/DiagramEdge.tsx` | 3 hex values | High |
| `ProcessDiagram/ConfigBadge.tsx` | 3 hex values | High |
| `ProcessDiagram/ErrorSection.tsx` | 2 hex values | High |
| `ProcessDiagram/NodeToolbar.tsx` | 2 hex values | High |
| `ProcessDiagram/Minimap.tsx` | 3 hex values | High |
| `Dashboard/Dashboard.module.css` | `#5db866` (not even a DS color) | High |
| `AppsTab/AppsTab.module.css` | `var(--accent, #6c7aff)` (undefined DS variable) | Medium |
### Undefined CSS variables (not in design system)
| Variable | Files | Should be |
|----------|-------|-----------|
| `--accent` | EnvironmentSelector, AppsTab | `--amber` (or define in DS) |
| `--bg-base` | LoginPage | `--bg-body` |
| `--surface` | ContentTabs, ExchangeHeader | `--bg-surface` |
| `--bg-surface-raised` | AgentHealth | `--bg-raised` |
### Missing DS tokens needed
Several tint/background colors are used repeatedly but have no DS variable:
- `--error-bg` (used as `#FDF2F0`, `#F9E0DC`)
- `--success-bg` (used as `#F0F9F1`)
- `--amber-bg` / `--warning-bg` (used as `#FFF8F0`)
- `--bg-inverse` / `--text-inverse` (used as `#1A1612` / `#E4DFD8`)
---
## 2. Critical: CSS Module Duplication (~22%)
~135 of 618 class definitions are copy-pasted across files.
### Table section pattern — 5 files, ~35 duplicate classes
`.tableSection`, `.tableHeader`, `.tableTitle`, `.tableMeta`, `.tableRight` are **identical** in:
- `DashboardTab.module.css`
- `AuditLogPage.module.css`
- `ClickHouseAdminPage.module.css`
- `RoutesMetrics.module.css`
- `RouteDetail.module.css`
### Log viewer panel — 2 files, ~50 lines identical
`.logCard`, `.logHeader`, `.logToolbar`, `.logSearchWrap`, `.logSearchInput`, `.logSearchClear`, `.logClearFilters`, `.logEmpty`, `.sortBtn`, `.refreshBtn`, `.headerActions` — byte-for-byte identical in `AgentHealth.module.css` and `AgentInstance.module.css`.
### Tap modal form — 2 files, ~40 lines identical
`.typeSelector`, `.typeOption`, `.typeOptionActive`, `.testSection`, `.testTabs`, `.testTabBtn`, `.testTabBtnActive`, `.testBody`, `.testResult`, `.testSuccess`, `.testError` — identical in `TapConfigModal.module.css` and `RouteDetail.module.css`.
### Other duplicates
| Pattern | Files | Lines |
|---------|-------|-------|
| Rate color classes (`.rateGood/.rateWarn/.rateBad/.rateNeutral`) | DashboardTab, RouteDetail, RoutesMetrics | ~12 each |
| Refresh indicator + `@keyframes pulse` | DashboardTab, RoutesMetrics | ~15 each |
| Chart card (`.chartCard`) | AgentInstance, RouteDetail | ~6 each |
| Section card (`.section`) | AppConfigDetailPage, OidcConfigPage | ~7 each |
| Meta grid (`.metaGrid/.metaLabel/.metaValue`) | AboutMeDialog, UserManagement | ~9 each |
---
## 3. High: Inline Styles (55 RED violations)
### Files with zero CSS modules (all inline)
| File | Issue |
|------|-------|
| `pages/Admin/AdminLayout.tsx` | Entire layout wrapper is inline styled |
| `pages/Admin/DatabaseAdminPage.tsx` | All layout, typography, spacing inline — no CSS module |
| `auth/OidcCallback.tsx` | Full-page layout inline — no CSS module |
### Most inline violations
| File | RED count | Primary patterns |
|------|-----------|-----------------|
| `pages/AppsTab/AppsTab.tsx` | ~25 | Fixed-width inputs (`width: 50-90px` x18), visually-hidden pattern x2, table cell layouts |
| `components/LayoutShell.tsx` | 6 | StarredList sub-component, sidebar layout |
| `pages/Admin/EnvironmentsPage.tsx` | 8 | Raw `<select>` fully styled inline, save/cancel button rows |
| `pages/Routes/RouteDetail.tsx` | 5 | Heading styles, tab panel margins |
### Repeated inline patterns that need extraction
| Pattern | Occurrences | Fix |
|---------|-------------|-----|
| `style={{ display: 'flex', justifyContent: 'center', padding: '4rem' }}` (loading fallback) | 3 files | Create shared `<PageLoader>` |
| `style={{ position: 'absolute', width: 1, height: 1, clip: 'rect(0,0,0,0)' }}` (visually hidden) | 2 in AppsTab | Create `.visuallyHidden` utility class |
| `style={{ width: N }}` on `<Input>`/`<Select>` (fixed widths) | 18+ in AppsTab | Size classes or CSS module rules |
| `style={{ marginTop: 8, display: 'flex', gap: 8, justifyContent: 'flex-end' }}` (action row) | 3+ in EnvironmentsPage | Shared `.editActions` class |
---
## 4. High: Design System Component Adoption Gaps
### Native HTML that should use DS components
| Element | Instances | Files | DS Replacement |
|---------|-----------|-------|---------------|
| `<button>` | 32 | 8 files | `Button`, `SegmentedTabs` |
| `<select>` | 12 | 4 files | `Select` |
| `<input>` | 8 | 4 files | `Input`, `Toggle`, `Checkbox` |
| `<label>` | 9 | 2 files | `FormField`, `Label` |
| `<table>` (data) | 2 | 2 files | `DataTable`, `LogViewer` |
### Highest-priority replacements
1. **`EnvironmentSelector.tsx`** — zero DS imports, entire component is a bare `<select>`. Used globally in sidebar.
2. **`ExecutionDiagram/tabs/LogTab.tsx`** — reimplements LogViewer from scratch (raw table + input + button). AgentInstance and AgentHealth already use DS `LogViewer` correctly.
3. **`AppsTab.tsx` sub-tabs** — 3 instances of homegrown `<button>` tab bars. DS provides `SegmentedTabs` and `Tabs`.
4. **`AppConfigDetailPage.tsx`** — 4x `<select>`, 4x `<label>`, 2x `<input type="checkbox">`, 4x `<button>` — all have DS equivalents already used elsewhere.
5. **`AgentHealth.tsx`** — config bar uses `Toggle` (correct) alongside raw `<select>` and `<button>` (incorrect).
### Cross-page inconsistencies
| Pattern | Correct usage | Incorrect usage |
|---------|--------------|-----------------|
| Log viewer | AgentInstance, AgentHealth use DS `LogViewer` | LogTab rebuilds from scratch |
| Config edit form | Both pages render same 4 fields | AgentHealth uses `Toggle`, AppConfigDetail uses `<input type="checkbox">` |
| Sub-tabs | RbacPage uses DS `Tabs` | AppsTab uses homegrown `<button>` tabs with non-DS `--accent` color |
| Select dropdowns | AppsTab uses DS `Select` for some fields | Same file uses raw `<select>` for other fields |
---
## 5. Medium: Layout Inconsistencies
### Page padding (3 different values)
| Pages | Padding |
|-------|---------|
| AgentHealth, AgentInstance, AdminLayout | `20px 24px 40px` |
| AppsTab | `16px` (all sides) |
| DashboardTab, Dashboard | No padding (full-bleed) |
### Section gap spacing (mixed approaches)
| Approach | Pages |
|----------|-------|
| CSS `gap: 20px` on flex container | DashboardTab, RoutesMetrics |
| `margin-bottom: 20px` | AgentInstance |
| Mixed `margin-bottom: 16px` and `20px` on same page | AgentHealth, ClickHouseAdminPage |
### Typography inconsistencies
| Issue | Details |
|-------|---------|
| Card title weight | Most use `font-weight: 600`, RouteDetail `.paneTitle` uses `700` |
| Chart title style | RouteDetail: `12px/700/uppercase`, AgentHealth: `12px/600/uppercase` |
| Font units | ExchangeHeader + TabKpis use `rem`, everything else uses `px` |
| Raw headings | DatabaseAdminPage uses `<h2>`/`<h3>` with inline styles; all others use DS `SectionHeader` or CSS classes |
| Table header padding | Most: `12px 16px`, Dashboard: `8px 12px`, AgentHealth eventCard: `10px 16px` |
### Stat strip layouts
| Page | Layout | Gap |
|------|--------|-----|
| AgentHealth, AgentInstance, RbacPage | CSS grid `repeat(N, 1fr)` | `10px` |
| ClickHouseAdminPage | Flexbox (unequal widths) | `10px` |
| DatabaseAdminPage | Inline flex | `1rem` (16px) |
### Empty state patterns (4 different approaches)
1. DS `<EmptyState>` component (AgentInstance — correct)
2. `EntityList emptyMessage` prop (EnvironmentsPage, RbacPage)
3. `.logEmpty` CSS class, `12px`, `var(--text-faint)` (AgentHealth, AgentInstance)
4. `.emptyNote` CSS class, `12px`, `italic` (AppsTab)
5. Inline `0.875rem`, `var(--text-muted)` (ExchangesPage)
### Loading state patterns (3 different approaches)
1. `<Spinner size="lg">` in flex div with inline `padding: 4rem` — copy-pasted 3 times
2. `<Spinner size="md">` returned directly, no centering (EnvironmentsPage)
3. No loading UI, data simply absent (DashboardL1/L2/L3)
---
## 6. Low: Other Findings
- **`!important`**: 1 use in `RouteControlBar.module.css` — works around specificity conflict
- **Zero responsive design**: no `@media` queries anywhere
- **Z-index**: only 4 uses, all in diagram components (5 and 10), consistent
- **Naming convention**: all camelCase — consistent, no issues
- **Unused CSS classes**: ~11 likely unused in AppsTab (old create-modal classes) and TapConfigModal
---
## Recommended Fix Order
### Phase 1: Design system tokens (unblocks everything else)
1. Add missing DS variables: `--error-bg`, `--success-bg`, `--amber-bg`, `--bg-inverse`, `--text-inverse`
2. Fix undefined variables: `--accent` -> `--amber`, `--bg-base` -> `--bg-body`, `--surface` -> `--bg-surface`
### Phase 2: Eliminate CSS duplication (~22% of all classes)
3. Extract shared `tableSection` pattern to shared CSS module (saves ~140 duplicate lines across 5 files)
4. Extract shared log viewer CSS to shared module (saves ~50 lines across 2 files)
5. Remove duplicate tap modal CSS from RouteDetail (saves ~40 lines)
6. Extract shared rate/refresh/chart patterns
### Phase 3: Fix hardcoded colors
7. Replace all hex in `ProcessDiagram/*.tsx` SVG components (~45 values)
8. Replace all hex in `ExecutionDiagram.module.css` (~17 naked + strip ~40 fallbacks)
9. Fix remaining CSS hex violations (Dashboard, AppsTab, AgentHealth)
### Phase 4: Replace native HTML with DS components
10. `EnvironmentSelector` -> DS `Select`
11. `LogTab` -> DS `LogViewer`
12. `AppsTab` sub-tabs -> DS `SegmentedTabs`
13. `AppConfigDetailPage` form elements -> DS `Select`/`Toggle`/`FormField`/`Button`
14. Remaining `<button>` -> DS `Button`
### Phase 5: Eliminate inline styles
15. Create CSS modules for AdminLayout, DatabaseAdminPage, OidcCallback
16. Extract shared `<PageLoader>` component
17. Move AppsTab fixed-width inputs to CSS module size classes
18. Move remaining inline margins/flex patterns to CSS classes
### Phase 6: Standardize layout patterns
19. Unify page padding to `20px 24px 40px`
20. Standardize section gaps to `gap: 20px` on flex containers
21. Normalize font units to `px` throughout
22. Standardize empty state to DS `<EmptyState>`
23. Standardize loading state to shared `<PageLoader>`

View File

@@ -0,0 +1,303 @@
# Cameleer3 Admin UI UX Audit
**Date:** 2026-04-09
**Auditor:** Claude (automated)
**URL:** https://desktop-fb5vgj9.siegeln.internal/
**Login:** admin/admin (OIDC-authenticated)
---
## Executive Summary
The Cameleer3 UI is generally well-built with consistent styling, good information density, and a clear layout. However, there are several **Critical** bugs that prevent core CRUD operations from working, and a few **Important** UX issues that reduce clarity and usability.
**Critical issues:** 3
**Important issues:** 7
**Nice-to-have improvements:** 8
---
## 1. Users & Roles (`/server/admin/rbac`)
### What Works Well
- Clean master-detail layout: user list on the left, detail panel on the right
- Summary cards at top (Users: 2, Groups: 1, Roles: 4) provide quick overview
- Tab structure (Users / Groups / Roles) is intuitive
- User detail shows all relevant info: status, ID, created date, provider, password, group membership, effective roles
- Inline role/group management with "+ Add" dropdown and "x" remove buttons
- Search bar for filtering users/groups/roles
- Delete button correctly disabled for the admin user (last-admin guard)
- Group detail shows Top-level, children count, member count, and assigned roles
- Local/OIDC toggle on the user creation form
### Issues Found
#### CRITICAL: User creation fails silently in OIDC mode
- **Location:** "+ Add user" button and create user form
- **Details:** When OIDC is enabled, the backend returns HTTP 400 with an **empty response body** when attempting to create a local user. The UI shows a generic "Failed to create user" toast with no explanation.
- **Root Cause:** `UserAdminController.createUser()` line 92-93 returns `ResponseEntity.badRequest().build()` (no body) when `oidcEnabled` is true.
- **Impact:** The UI still shows the "+ Add user" button and the full creation form even though the operation will always fail. Users fill out the form, click Create, and get a useless error.
- **Fix:** Either (a) hide the "+ Add user" button when OIDC is enabled, or (b) show a clear inline message like "Local user creation is disabled when OIDC is enabled", or (c) return a proper error body from the API.
- **Screenshots:** `09-user-create-filled.png`, `10-user-create-result.png`
#### IMPORTANT: Unicode escape shown literally in role descriptions
- **Location:** Roles tab, role description text
- **Details:** Role descriptions display `\u00b7` literally instead of rendering the middle dot character (middle dot).
- **Example:** "Full administrative access \u00b7 0 assignments" should be "Full administrative access - 0 assignments"
- **Screenshot:** `14-roles-tab.png`
#### IMPORTANT: No "Confirm password" field in user creation
- **Location:** "+ Add user" form
- **Details:** The form has Username*, Display name, Email, Password* but no password confirmation field. This increases the risk of typos in passwords.
#### NICE-TO-HAVE: Create button disabled until valid with no inline validation messages
- **Location:** User creation form
- **Details:** The "Create" button is disabled until form is valid, but there are no visible inline error messages explaining what is required. The asterisks on "Username *" and "Password *" help, but there's no indication of password policy requirements (min 12 chars, 3-of-4 character classes).
#### NICE-TO-HAVE: "Select a user to view details" placeholder
- **Location:** Right panel when no user selected
- **Details:** The placeholder text is fine but could be more visually styled (e.g., centered, with an icon).
---
## 2. Audit Log (`/server/admin/audit`)
### What Works Well
- Comprehensive filter system: date range (1h/6h/Today/24h/7d/Custom), user filter, category dropdown, action/target search
- Category dropdown includes all relevant categories: INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT
- Custom date range with From/To date pickers
- Table columns: Timestamp, User, Category, Action, Target, Result
- Color-coded result badges (SUCCESS in green, FAILURE in red)
- Shows my failed user creation attempts correctly logged as FAILURE
- Row count indicator ("179 events") with AUTO/MANUAL refresh
- Pagination with configurable rows per page
### Issues Found
#### IMPORTANT: No export functionality
- **Location:** Audit log page
- **Details:** There is no Export/Download button for audit log data. Compliance requirements typically mandate the ability to export audit logs as CSV or JSON.
#### NICE-TO-HAVE: Audit detail row expansion
- **Location:** Table rows are clickable (cursor: pointer) but clicking doesn't reveal additional details
- **Details:** For entries like "HTTP POST /api/v1/admin/users FAILURE", it would be helpful to see the error response body or request details in an expanded row.
#### NICE-TO-HAVE: Date range filter is independent of the global time selector
- **Location:** Top bar time selector vs. audit log's own time filter
- **Details:** The audit log has its own "Last 1h / 6h / Today / 24h / 7d / Custom" filter, which is separate from the global time range in the header bar. While this provides independence, it could confuse users who expect the global time selector to affect the audit log.
---
## 3. OIDC Config (`/server/admin/oidc`)
### What Works Well
- Well-organized sections: Behavior, Provider Settings, Claim Mapping, Default Roles, Danger Zone
- Each field has a descriptive label and help text (e.g., "RFC 8707 resource indicator sent in the authorization request")
- "Test Connection" button at the top for verification
- "Save" button is clearly visible
- **Excellent** delete protection: "Confirm Deletion" dialog requires typing "delete oidc" to confirm, warns that "All users signed in via OIDC will lose access"
- Enabled/Auto Sign-Up checkboxes with clear descriptions
- Default Roles management with add/remove
### Issues Found
#### IMPORTANT: No unsaved changes indicator
- **Location:** Form fields
- **Details:** If a user modifies a field but navigates away without saving, there is no "You have unsaved changes" warning. This is particularly dangerous for the OIDC configuration since changes could lock users out.
#### NICE-TO-HAVE: Client Secret field is plain text
- **Location:** Client Secret textbox
- **Details:** The Client Secret is a regular text input, not a password/masked field. Since it's sensitive, it should be masked by default with a "show/hide" toggle.
---
## 4. Environments (`/server/admin/environments`)
### What Works Well
- Clean list with search and "+ Add environment" button
- Master-detail layout consistent with Users & Roles
- Environment detail shows: ID, Tier badge (NON-PROD), slug, created date
- Sub-tabs for "Production environment" and "Docker Containers"
- Default Resource Limits section with configurable values
- JAR Retention section with "Edit Policy" button
- "Edit Defaults" button for container defaults
### Issues Found
#### NICE-TO-HAVE: Slug is shown but not labeled clearly
- **Location:** Environment detail panel
- **Details:** The slug "default" appears below the display name "Default" but could benefit from a "Slug:" label for clarity.
---
## 5. Database (`/server/admin/database`)
### What Works Well
- Clear "Connected" status at the top with green styling
- Shows PostgreSQL version string: "PostgreSQL 16.13 on x86_64-pc-linux-musl, compiled by gcc (Alpine 15.2.0) 15.2.0, 64-bit"
- Connection Pool section with Active/Idle/Max counts
- Tables section listing all database tables with rows and sizes
- Consistent styling with the rest of the admin section
### Issues Found
No significant issues found. The page is read-only and informational, which is appropriate.
---
## 6. ClickHouse (`/server/admin/clickhouse`)
### What Works Well
- Clear "Connected" status with version number (26.3.5.12)
- Uptime display: "1 hour, 44 minutes and 29 seconds"
- Key metrics: Disk Usage (156.33 MiB), Memory (1.47 GiB), Compression Ratio (0.104x), Rows (4,875,598), Parts (55), Uncompressed Size (424.02 MiB)
- Tables section listing all ClickHouse tables with engine, rows, and sizes
- Consistent card-based layout
### Issues Found
No significant issues found. Well-presented status page.
---
## 7. Deployments Tab (`/server/apps`)
### What Works Well
- Table layout showing app name, environment, status, and created date
- "+ Create App" button clearly visible
- Clicking an app navigates to a detail page with Configuration and Overrides tabs
- Configuration has sub-tabs: Monitoring, Variables, Traces & Taps, Route Recording
- App detail shows environment (DEFAULT), tier (ORACLE), status
- "Create App" full page form with clear Identity & Security, Configuration sections
### Issues Found
#### CRITICAL: Direct URL /server/deployments returns 404 error
- **Location:** `/server/deployments` URL
- **Details:** Navigating directly to `/server/deployments` shows "Unexpected Application Error! 404 Not Found" with a React Router development error ("Hey developer -- You can provide a way better UX than this..."). The Deployments tab is actually at `/server/apps`.
- **Impact:** Users who bookmark or share the URL will see an unhandled error page instead of a redirect to the correct URL.
- **Screenshot:** `20-deployments-tab.png` (first attempt)
#### IMPORTANT: Create App page shows full configuration before app exists
- **Location:** `/server/apps/new`
- **Details:** The Create Application page shows Monitoring configuration, Variables, Traces & Taps, and Route Recording sub-tabs with values already populated. This is overwhelming for initial creation -- a simpler wizard-style flow (name + environment first, then configure) would be more intuitive.
#### NICE-TO-HAVE: App deletion flow not easily discoverable
- **Location:** App detail page
- **Details:** There is no visible "Delete App" button on the app detail page. The deletion mechanism is not apparent.
---
## 8. SaaS Platform Pages
### Platform Dashboard (`/platform`)
#### What Works Well
- Clean tenant overview: "Example Tenant" with LOW tier badge
- Three summary cards: Tier (LOW), Status (ACTIVE), License (Active, expires 8.4.2027)
- Tenant Information section with Slug, Status, Created date
- Server Management section with "Open Server Dashboard" button
- Sidebar navigation: Dashboard, License, Open Server Dashboard
#### Issues Found
##### IMPORTANT: "Slug" label missing space
- **Location:** Tenant Information section
- **Details:** Shows "Slugdefault" instead of "Slug: default" -- the label and value run together without separation.
##### NICE-TO-HAVE: "Open Server Dashboard" button appears 3 times
- **Location:** Page header, Server Management section, sidebar bottom
- **Details:** The same action appears in three places on a single page view. One prominent button would suffice.
### Platform License (`/platform/license`)
#### What Works Well
- Clear Validity section: Issued, Expires, Days remaining (365 days badge)
- Features section with Enabled/Disabled badges for each feature
- Limits section: Max Agents, Retention Days, Max Environments
- License Token section with "Show token" button for security
#### Issues Found
##### IMPORTANT: Labels and values lack spacing
- **Location:** Validity section, Limits section
- **Details:** "Issued8. April 2026" and "Max Agents3" -- labels and values run together without separators. Should be "Issued: 8. April 2026" and "Max Agents: 3".
- **Screenshot:** `02-platform-license.png`
---
## 9. Cross-Cutting UX Issues
### CRITICAL: Sporadic auto-navigation to /server/exchanges
- **Location:** Occurs across all admin pages
- **Details:** While interacting with admin pages (Users & Roles, Environments, etc.), the browser occasionally auto-navigates back to `/server/exchanges`. This appears to be triggered by the real-time exchange data stream (SSE). Even when auto-refresh is set to MANUAL, the exchange list continues updating and can cause route changes.
- **Impact:** Users actively editing admin forms can lose their work mid-interaction. This was observed repeatedly during the audit.
- **Root Cause:** Likely a React state update from the SSE exchange stream that triggers a route navigation when the exchange list data changes.
### IMPORTANT: Error toast messages lack detail
- **Location:** Global toast system
- **Details:** Error toasts show generic messages like "Failed to create user" without the specific API error reason. The server returns empty 400 bodies in some cases, and even when it returns error details, they may not be surfaced in the toast.
### NICE-TO-HAVE: Global time range selector persists on admin pages
- **Location:** Top header bar on admin pages (Audit Log, ClickHouse, Database, OIDC, etc.)
- **Details:** The global time range selector (1h/3h/6h/Today/24h/7d) and the status filter buttons (OK/Warn/Error/Running) appear on every page including admin pages where they are not relevant. This adds visual clutter.
### NICE-TO-HAVE: Environment dropdown in header on admin pages
- **Location:** Top header bar, "All Envs" dropdown
- **Details:** The environment selector appears on admin pages where it has no effect (e.g., Users & Roles, OIDC config). It should be hidden or grayed out on pages where it's not applicable.
---
## Summary Table
| # | Severity | Page | Issue |
|---|----------|------|-------|
| 1 | **CRITICAL** | Users & Roles | User creation fails silently in OIDC mode -- form shown but always returns 400 with empty body |
| 2 | **CRITICAL** | Deployments | Direct URL `/server/deployments` returns unhandled 404 error page |
| 3 | **CRITICAL** | Cross-cutting | Sporadic auto-navigation to `/server/exchanges` interrupts admin page interactions |
| 4 | **IMPORTANT** | Users & Roles | Unicode escape `\u00b7` shown literally in role descriptions |
| 5 | **IMPORTANT** | Users & Roles | No password confirmation field in user creation form |
| 6 | **IMPORTANT** | Audit Log | No export/download functionality for compliance |
| 7 | **IMPORTANT** | OIDC | No unsaved changes warning on form navigation |
| 8 | **IMPORTANT** | Deployments | Create App page shows all config options before app exists (overwhelming) |
| 9 | **IMPORTANT** | Platform Dashboard | Label-value spacing missing ("Slugdefault", "Issued8. April 2026", "Max Agents3") |
| 10 | **IMPORTANT** | Cross-cutting | Error toasts lack specific error details from API responses |
| 11 | Nice-to-have | Users & Roles | No inline validation messages on creation form (just disabled button) |
| 12 | Nice-to-have | Users & Roles | "Select a user to view details" placeholder could be more visual |
| 13 | Nice-to-have | Audit Log | Clickable rows don't expand to show additional event detail |
| 14 | Nice-to-have | Audit Log | Separate time filter from global time selector could confuse users |
| 15 | Nice-to-have | OIDC | Client Secret field should be masked by default |
| 16 | Nice-to-have | Environments | Slug display could use explicit label |
| 17 | Nice-to-have | Deployments | Delete app flow not easily discoverable |
| 18 | Nice-to-have | Cross-cutting | Global time range and status filter buttons shown on irrelevant admin pages |
---
## Screenshots Index
| File | Description |
|------|-------------|
| `01-platform-dashboard.png` | SaaS Platform dashboard |
| `02-platform-license.png` | License page with features and limits |
| `03-server-exchanges-overview.png` | Server exchanges main view |
| `05-users-roles-page.png` | Users & Roles list view |
| `06-user-detail-admin.png` | Admin user detail panel |
| `07-add-user-dialog.png` | Add user form (showing along with detail) |
| `09-user-create-filled.png` | User creation form filled out |
| `10-user-create-result.png` | Error toast after failed user creation |
| `11-rbac-after-create.png` | RBAC page after failed creation (still 2 users) |
| `13-groups-tab.png` | Groups tab with Admins group |
| `14-roles-tab.png` | Roles tab showing unicode escape bug |
| `15-audit-log.png` | Audit log with failed user creation events |
| `16-clickhouse.png` | ClickHouse status page |
| `17-database.png` | Database status page |
| `18-environments.png` | Environments list |
| `19-oidc.png` | OIDC configuration page |
| `19-oidc-full.png` | OIDC full page (scrolled) |
| `20-deployments-tab.png` | Deployments tab (via tab click) |
| `21-environment-detail.png` | Default environment detail |
| `22-create-app.png` | Create Application form |
| `23-app-detail.png` | Sample app detail page |
| `24-runtime-tab.png` | Runtime tab with agents |
| `25-dashboard-tab.png` | Dashboard with metrics and charts |
| `26-oidc-delete-confirm.png` | OIDC delete confirmation dialog (well done) |

View File

@@ -0,0 +1,354 @@
# Design Consistency Audit — Cameleer3 UI
**Audited**: 2026-04-09
**Scope**: All pages under `ui/src/pages/`
**Base path**: `C:/Users/Hendrik/Documents/projects/cameleer3-server/ui/src/`
## Shared Layout Infrastructure
### LayoutShell (`components/LayoutShell.tsx`)
All pages render inside `<main className={css.mainContent}>` which applies:
```css
.mainContent {
flex: 1;
display: flex;
flex-direction: column;
overflow: hidden;
min-height: 0;
}
```
This is a flex column container with **no padding/margin**. Each page is responsible for its own content spacing.
### Shared CSS Modules (`styles/`)
| Module | Class | Pattern |
|--------|-------|---------|
| `section-card.module.css` | `.section` | Card with `padding: 16px 20px`, border, shadow, `margin-bottom: 16px` |
| `table-section.module.css` | `.tableSection` | Card wrapper for tables, no padding (overflow hidden), with `.tableHeader` (12px 16px padding) |
| `chart-card.module.css` | `.chartCard` | Card with `padding: 16px` |
| `log-panel.module.css` | `.logCard` | Card for log viewers, max-height 420px |
| `refresh-indicator.module.css` | `.refreshIndicator` | Auto-refresh dot indicator |
| `rate-colors.module.css` | `.rateGood/.rateWarn/.rateBad` | Semantic color helpers |
## Per-Page Findings
---
### 1. Exchanges Page (`pages/Exchanges/`)
**Files**: `ExchangesPage.tsx`, `ExchangesPage.module.css`, `ExchangeHeader.tsx`, `ExchangeHeader.module.css`, `RouteControlBar.tsx`, `RouteControlBar.module.css`
**Container pattern**: NO wrapper padding. Uses `height: 100%` split-view layout that fills the entire `mainContent` area.
**Content wrapper**:
```css
.splitView { display: flex; height: 100%; overflow: hidden; }
```
**Table**: The exchange list is rendered by `Dashboard.tsx` (in `pages/Dashboard/`), which uses:
```css
.content { display: flex; flex-direction: column; flex: 1; min-height: 0; overflow: hidden; background: var(--bg-body); }
```
- Custom `.tableHeader` with `padding: 8px 12px` (slightly tighter than shared `tableStyles.tableHeader` which uses `12px 16px`)
- `DataTable` rendered with `flush` and `fillHeight` props
- **NO card wrapper** around the table — it's full-bleed against the background
- **Does NOT import shared `table-section.module.css`** — rolls its own `.tableHeader`, `.tableTitle`, `.tableRight`, `.tableMeta`
**Shared modules used**: NONE. All custom.
**INCONSISTENCY**: Full-bleed table with no card, no container padding. Custom table header styling duplicates shared module patterns with slightly different padding values (8px 12px vs 12px 16px).
---
### 2. Dashboard Tab (`pages/DashboardTab/`)
**Files**: `DashboardPage.tsx`, `DashboardL1.tsx`, `DashboardL2.tsx`, `DashboardL3.tsx`, `DashboardTab.module.css`
**Container pattern**:
```css
.content { display: flex; flex-direction: column; gap: 20px; flex: 1; min-height: 0; overflow-y: auto; padding-bottom: 20px; }
```
- **No top/left/right padding** — content is full-width inside `mainContent`
- Only `padding-bottom: 20px` and `gap: 20px` between sections
**Tables**: Wrapped in shared `tableStyles.tableSection` (card with border, shadow, border-radius). Imports `table-section.module.css`.
**Charts**: Wrapped in design-system `<Card>` component.
**Custom sections**: `errorsSection` and `diagramSection` duplicate the card pattern:
```css
.errorsSection {
background: var(--bg-surface);
border: 1px solid var(--border-subtle);
border-radius: var(--radius-lg);
box-shadow: var(--shadow-card);
overflow: hidden;
}
```
This is identical to `tableStyles.tableSection` but defined separately in `DashboardTab.module.css`.
**Shared modules used**: `table-section.module.css`, `refresh-indicator.module.css`, `rate-colors.module.css`
**INCONSISTENCY**: No container padding means KPI strip and tables sit flush against the sidebar/edge. The `.errorsSection` duplicates `tableStyles.tableSection` exactly — should import the shared module instead of copy-pasting.
---
### 3. Runtime Tab — Agent Health (`pages/AgentHealth/`)
**Files**: `AgentHealth.tsx`, `AgentHealth.module.css`
**Container pattern**:
```css
.content { flex: 1; overflow-y: auto; padding: 20px 24px 40px; min-width: 0; background: var(--bg-body); }
```
- **Has explicit padding**: `20px 24px 40px` (top, sides, bottom)
**Tables**: Uses design-system `DataTable` inside a DS `Card` component for agent group cards. The group cards use custom `.groupGrid` grid layout. No `tableStyles.tableSection` wrapper.
**Cards/sections**: Custom card patterns like `.configBar`, `.eventCard`:
```css
.configBar {
background: var(--bg-surface);
border: 1px solid var(--border-subtle);
border-radius: var(--radius-lg);
box-shadow: var(--shadow-card);
padding: 12px 16px;
margin-bottom: 16px;
}
```
**Shared modules used**: `log-panel.module.css`
**INCONSISTENCY**: Uses `padding: 20px 24px 40px` — different from DashboardTab (no padding) and Exchanges (no padding). Custom card patterns duplicate the standard card styling. Does not use `table-section.module.css` or `section-card.module.css`.
---
### 4. Runtime Tab — Agent Instance (`pages/AgentInstance/`)
**Files**: `AgentInstance.tsx`, `AgentInstance.module.css`
**Container pattern**:
```css
.content { flex: 1; overflow-y: auto; padding: 20px 24px 40px; min-width: 0; background: var(--bg-body); }
```
- Matches AgentHealth padding exactly (consistent within Runtime tab)
**Cards/sections**: Custom `.processCard`, `.timelineCard` duplicate the card pattern. Uses `chart-card.module.css` for chart wrappers.
**Shared modules used**: `log-panel.module.css`, `chart-card.module.css`
**INCONSISTENCY**: Consistent with AgentHealth but inconsistent with DashboardTab and Exchanges. Custom card patterns (processCard, timelineCard) duplicate shared module patterns.
---
### 5. Apps Tab (`pages/AppsTab/`)
**Files**: `AppsTab.tsx`, `AppsTab.module.css`
**Container pattern**:
```css
.container { padding: 16px; overflow-y: auto; flex: 1; }
```
- **Has padding**: `16px` all around
**Content structure**: Three sub-views (`AppListView`, `AppDetailView`, `CreateAppView`) all wrapped in `.container`.
**Tables**: App list uses `DataTable` directly — no `tableStyles.tableSection` wrapper. Deployment table uses custom `.table` with manual `<table>` HTML (not DataTable).
**Form controls**: Directly on page background with custom grid layout (`.configGrid`). Uses `SectionHeader` from design-system for visual grouping, but forms are not in cards/sections — they sit flat against the `.container` background.
**Custom elements**:
- `.editBanner` / `.editBannerActive` — custom banner pattern
- `.configGrid` — 2-column label/input grid
- `.table` — fully custom `<table>` styling (not DataTable)
**Shared modules used**: NONE. All custom.
**INCONSISTENCY (user-reported)**: Controls "meshed into background" — correct. Form controls use `SectionHeader` for labels but no `section-card` wrapper. The Tabs component provides visual grouping but the content below tabs is flat. Config grids, toggles, and inputs sit directly on `var(--bg-body)` background via the 16px-padded container. No card/section separation between different config groups. Also uses a manual `<table>` element instead of DataTable for deployments.
---
### 6. Admin — RBAC Page (`pages/Admin/RbacPage.tsx`, `UsersTab.tsx`, `GroupsTab.tsx`, `RolesTab.tsx`)
**Container pattern**: AdminLayout provides `padding: 20px 24px 40px`. RbacPage renders a bare `<div>` (no extra wrapper class).
**Content**: Uses `StatCard` strip, `Tabs`, then tab content. Detail views use `SplitPane` (from design-system). User/Group/Role detail sections use `SectionHeader` without card wrappers.
**Stat strip**: Custom grid — `grid-template-columns: repeat(3, 1fr)` with `gap: 10px; margin-bottom: 16px`
**Shared modules used**: NONE. Uses `UserManagement.module.css` (custom).
**INCONSISTENCY**: Detail sections use `SectionHeader` labels but content is flat (no `section-card` wrapper). Similar to AppsTab pattern.
---
### 7. Admin — Audit Log (`pages/Admin/AuditLogPage.tsx`)
**Container pattern**: Inherits AdminLayout padding (`20px 24px 40px`). Renders a bare `<div>`.
**Table**: Properly uses shared `tableStyles.tableSection` with `.tableHeader`, `.tableTitle`, `.tableRight`, `.tableMeta`.
**Shared modules used**: `table-section.module.css`
**STATUS**: CONSISTENT with shared patterns for the table section. Good.
---
### 8. Admin — OIDC Config (`pages/Admin/OidcConfigPage.tsx`)
**Container pattern**: Inherits AdminLayout padding. Adds `.page { max-width: 640px; margin: 0 auto; }` — centered narrow layout.
**Sections**: Uses shared `sectionStyles.section` from `section-card.module.css` for every form group. Uses `SectionHeader` inside each section card.
**Shared modules used**: `section-card.module.css`
**STATUS**: GOOD. This is the correct pattern — form groups wrapped in section cards. Should be the model for other form pages.
---
### 9. Admin — Database (`pages/Admin/DatabaseAdminPage.tsx`)
**Container pattern**: Inherits AdminLayout padding. Renders bare `<div>`.
**Tables**: Uses `DataTable` directly with NO `tableStyles.tableSection` wrapper. Tables under custom `.section` divs with `.sectionHeading` text labels.
**Cards**: Uses DS `<Card>` for connection pool. Stat strip is a flex layout.
**Shared modules used**: NONE. All custom.
**INCONSISTENCY**: Tables not wrapped in `tableStyles.tableSection`. Uses custom section headings instead of `SectionHeader`. Missing card wrappers around tables. Stat strip uses `flex` layout while other pages use `grid`.
---
### 10. Admin — ClickHouse (`pages/Admin/ClickHouseAdminPage.tsx`)
**Container pattern**: Inherits AdminLayout padding. Renders bare `<div>`.
**Tables**: Uses shared `tableStyles.tableSection` combined with custom `.tableSection` for margin: `className={tableStyles.tableSection} ${styles.tableSection}`.
**Custom elements**: `.pipelineCard` duplicates card pattern (bg-surface, border, radius, shadow, padding).
**Shared modules used**: `table-section.module.css`
**PARTIAL**: Tables correctly use shared module. Pipeline card duplicates shared card pattern.
---
### 11. Admin — Environments (`pages/Admin/EnvironmentsPage.tsx`)
**Container pattern**: Inherits AdminLayout padding. Renders via `SplitPane` (design-system).
**Content**: Uses `SectionHeader`, `SplitPane`, custom meta grids from `UserManagement.module.css`.
**Shared modules used**: Uses `UserManagement.module.css` (shared with RBAC pages)
**INCONSISTENCY**: Does not use `section-card.module.css` for form sections. Config sections use `SectionHeader` without card wrappers. `SplitPane` provides some structure but detail content is flat.
---
### 12. Admin — App Config Detail (`pages/Admin/AppConfigDetailPage.tsx`)
**Container pattern**: Adds `.page { max-width: 720px; margin: 0 auto; }` — centered layout.
**Sections**: Uses shared `sectionStyles.section` from `section-card.module.css`. Uses `SectionHeader` inside section cards. Custom header card duplicates the card pattern.
**Shared modules used**: `section-card.module.css`
**STATUS**: GOOD. Follows same pattern as OIDC page.
---
### 13. Routes pages (`pages/Routes/`) — NOT ROUTED
These pages (`RoutesMetrics.tsx`, `RouteDetail.tsx`) exist but are NOT in `router.tsx`. They may be deprecated or used as sub-components. `RoutesMetrics` correctly uses shared `tableStyles.tableSection`. `RouteDetail` has many custom card patterns (`.headerCard`, `.diagramPane`, `.statsPane`, `.executionsTable`, `.routeFlowSection`) that duplicate the shared card pattern.
---
## Summary: Inconsistency Matrix
### Container Padding
| Page | Padding | Pattern |
|------|---------|---------|
| **Exchanges** | NONE (full-bleed) | `height: 100%`, fills container |
| **Dashboard Tab** | NONE (gap only) | `gap: 20px`, `padding-bottom: 20px` only |
| **Runtime (AgentHealth)** | `20px 24px 40px` | Explicit padding |
| **Runtime (AgentInstance)** | `20px 24px 40px` | Explicit padding |
| **Apps Tab** | `16px` | Uniform padding |
| **Admin pages** | `20px 24px 40px` | Via AdminLayout |
**Finding**: Three different padding strategies. Exchanges and Dashboard have no padding; Runtime and Admin use 20px/24px; Apps uses 16px.
### Table Wrapper Pattern
| Page | Uses `tableStyles.tableSection`? | Card wrapper? |
|------|----------------------------------|---------------|
| **Exchanges (Dashboard.tsx)** | NO — custom `.tableHeader` | NO — full-bleed |
| **Dashboard L1/L2/L3** | YES | YES (shared) |
| **Runtime AgentHealth** | NO | YES (via DS `Card`) |
| **Apps Tab** | NO | NO — bare `<table>` |
| **Admin — Audit** | YES | YES (shared) |
| **Admin — ClickHouse** | YES | YES (shared) |
| **Admin — Database** | NO | NO |
**Finding**: 4 of 7 table-using pages do NOT use the shared `table-section.module.css`. The Exchanges page custom header has padding `8px 12px` vs shared `12px 16px`.
### Form/Control Wrapper Pattern
| Page | Form controls in cards? | Uses `section-card`? |
|------|------------------------|---------------------|
| **Apps Tab (detail)** | NO — flat against background | NO |
| **Apps Tab (create)** | NO — flat against background | NO |
| **Admin — OIDC** | YES | YES |
| **Admin — App Config** | YES | YES |
| **Admin — RBAC detail** | NO — flat against background | NO |
| **Admin — Environments** | NO — flat against background | NO |
| **Admin — Database** | PARTIAL (Card for pool) | NO |
| **Runtime — AgentHealth** | YES (custom `.configBar`) | NO (custom) |
**Finding**: Only OIDC and AppConfigDetail use `section-card.module.css` for form grouping. Most form pages render controls flat against the page background.
### Duplicated Card Pattern
The following CSS pattern appears in 8+ custom locations instead of importing `section-card.module.css` or `table-section.module.css`:
```css
background: var(--bg-surface);
border: 1px solid var(--border-subtle);
border-radius: var(--radius-lg);
box-shadow: var(--shadow-card);
```
**Duplicated in**:
- `DashboardTab.module.css``.errorsSection`, `.diagramSection`
- `AgentHealth.module.css``.configBar`, `.eventCard`
- `AgentInstance.module.css``.processCard`, `.timelineCard`
- `ClickHouseAdminPage.module.css``.pipelineCard`
- `AppConfigDetailPage.module.css``.header`
- `RouteDetail.module.css``.headerCard`, `.diagramPane`, `.statsPane`, `.executionsTable`, `.routeFlowSection`
## Prioritized Fixes
### P0 — User-reported issues
1. **Exchanges table full-bleed**: `Dashboard.tsx` should wrap its table in `tableStyles.tableSection` and use the shared table header classes instead of custom ones. Custom `.tableHeader` padding (8px 12px) should match shared (12px 16px).
2. **Apps detail flat controls**: `AppsTab.tsx` config sections should wrap form groups in `sectionStyles.section` (from `section-card.module.css`), matching the OIDC page pattern.
3. **Apps deployment table**: Replace manual `<table>` with `DataTable` inside `tableStyles.tableSection`.
### P1 — Padding normalization
4. **Standardize container padding**: Choose ONE pattern for scrollable content areas. Recommended: `padding: 20px 24px 40px` (currently used by Runtime + Admin). Apply to DashboardTab's `.content`. Exchanges is an exception due to its split-view height-filling layout.
5. **DashboardTab.module.css**: Add side padding to `.content`.
### P2 — Shared module adoption
6. **Replace duplicated card patterns**: Import `section-card.module.css` or `table-section.module.css` instead of duplicating the card CSS in:
- `DashboardTab.module.css` (`.errorsSection` -> use `tableStyles.tableSection`)
- `AgentHealth.module.css` (`.configBar`, `.eventCard`)
- `AgentInstance.module.css` (`.processCard`, `.timelineCard`)
- `ClickHouseAdminPage.module.css` (`.pipelineCard`)
7. **Database admin**: Wrap tables in `tableStyles.tableSection`.
8. **Admin detail pages** (RBAC, Environments): Wrap form sections in `sectionStyles.section`.

View File

@@ -0,0 +1,599 @@
# Cameleer3 UI Interaction Patterns Audit
Audit date: 2026-04-09
Scope: All `.tsx` files under `ui/src/pages/` and `ui/src/components/`
---
## 1. Delete / Destructive Operations
### 1.1 Delete User
- **File**: `ui/src/pages/Admin/UsersTab.tsx` (lines 155-172, 358-365, 580-587)
- **Button location**: Detail pane header, top-right, inline with avatar and name
- **Button**: `<Button size="sm" variant="danger">Delete</Button>`
- **Confirmation**: `ConfirmDialog` (type-to-confirm)
- Message: `Delete user "${name}"? This cannot be undone.`
- Confirm text: user's `displayName`
- Has `loading` prop bound to mutation
- **Self-delete guard**: Button is `disabled={isSelf}` (cannot delete yourself)
- **Toast on success**: `variant: 'warning'`, title: "User deleted"
- **Toast on error**: `variant: 'error'`, `duration: 86_400_000`
### 1.2 Remove User From Group (via User detail)
- **File**: `ui/src/pages/Admin/UsersTab.tsx` (lines 588-613)
- **Button location**: Tag `onRemove` handler on group tags in detail pane
- **Confirmation**: `AlertDialog` (simple confirm, no type-to-confirm)
- Title: "Remove group membership"
- Description: "Removing this group may also revoke inherited roles. Continue?"
- Confirm label: "Remove"
- Variant: `warning`
- **Toast on success**: `variant: 'success'`, title: "Group removed"
### 1.3 Remove Role From User (via User detail)
- **File**: `ui/src/pages/Admin/UsersTab.tsx` (lines 504-528)
- **Button location**: Tag `onRemove` handler on role tags in detail pane
- **Confirmation**: NONE -- immediate mutation on tag remove click
- **Toast on success**: `variant: 'success'`, title: "Role removed"
**INCONSISTENCY**: Removing a group shows an AlertDialog confirmation but removing a role does not, even though both can have cascading effects.
### 1.4 Delete Group
- **File**: `ui/src/pages/Admin/GroupsTab.tsx` (lines 140-155, 340-347, 434-441)
- **Button location**: Detail pane header, top-right
- **Button**: `<Button size="sm" variant="danger">Delete</Button>`
- **Confirmation**: `ConfirmDialog` (type-to-confirm)
- Message: `Delete group "${name}"? This cannot be undone.`
- Confirm text: group's `name`
- Has `loading` prop
- **Built-in guard**: Button is `disabled={isBuiltinAdmins}`
- **Toast on success**: `variant: 'warning'`, title: "Group deleted"
### 1.5 Remove Role From Group
- **File**: `ui/src/pages/Admin/GroupsTab.tsx` (lines 404-427, 442-455)
- **Button location**: Tag `onRemove` handler on role tags in group detail
- **Confirmation**: `AlertDialog` shown ONLY when the group has members (conditional)
- Title: "Remove role from group"
- Description: `Removing this role will affect ${members.length} member(s) who inherit it. Continue?`
- Confirm label: "Remove"
- Variant: `warning`
- **If group has no members**: Immediate mutation, no confirmation
- **Toast on success**: `variant: 'success'`, title: "Role removed"
### 1.6 Remove Member From Group
- **File**: `ui/src/pages/Admin/GroupsTab.tsx` (lines 366-372)
- **Button location**: Tag `onRemove` handler on member tags in group detail
- **Confirmation**: NONE -- immediate mutation on tag remove click
- **Toast on success**: `variant: 'success'`, title: "Member removed"
### 1.7 Delete Role
- **File**: `ui/src/pages/Admin/RolesTab.tsx` (lines 93-110, 261-265, 223-231)
- **Button location**: Detail pane header, top-right
- **Button**: `<Button size="sm" variant="danger">Delete</Button>`
- **Confirmation**: `ConfirmDialog` (type-to-confirm)
- Message: `Delete role "${name}"? This cannot be undone.`
- Confirm text: role's `name`
- Has `loading` prop
- **System role guard**: Button hidden for system roles (`!role.system`)
- **Toast on success**: `variant: 'warning'`, title: "Role deleted"
### 1.8 Delete Environment
- **File**: `ui/src/pages/Admin/EnvironmentsPage.tsx` (lines 101-112, 245-252, 319-327)
- **Button location**: Detail pane header, top-right
- **Button**: `<Button size="sm" variant="danger">Delete</Button>`
- **Confirmation**: `ConfirmDialog` (type-to-confirm)
- Message: `Delete environment "${displayName}"? All apps and deployments in this environment will be removed. This cannot be undone.`
- Confirm text: environment's `slug` (NOT the display name)
- Has `loading` prop
- **Default guard**: Button is `disabled={isDefault}` (cannot delete default environment)
- **Toast on success**: `variant: 'warning'`, title: "Environment deleted"
**NOTE**: The confirm text requires the slug but the message shows the display name. This is intentional (slug is the unique identifier) but differs from Users/Groups/Roles which use the display name.
### 1.9 Delete OIDC Configuration
- **File**: `ui/src/pages/Admin/OidcConfigPage.tsx` (lines 113-124, 253-264)
- **Button location**: Bottom of page in a "Danger Zone" section
- **Button**: `<Button size="sm" variant="danger">Delete OIDC Configuration</Button>`
- **Confirmation**: `ConfirmDialog` (type-to-confirm)
- Message: `Delete OIDC configuration? All users signed in via OIDC will lose access.`
- Confirm text: `"delete oidc"` (static string)
- NO `loading` prop
- **Toast on success**: `variant: 'warning'`, title: "Configuration deleted"
**INCONSISTENCY**: No `loading` prop on this ConfirmDialog, unlike all other delete confirmations.
### 1.10 Delete App
- **File**: `ui/src/pages/AppsTab/AppsTab.tsx` (lines 533-539, 565, 589-596)
- **Button location**: App detail header, top-right, in `detailActions` div alongside "Upload JAR"
- **Button**: `<Button size="sm" variant="danger">Delete App</Button>`
- **Confirmation**: `ConfirmDialog` (type-to-confirm)
- Message: `Delete app "${displayName}"? All versions and deployments will be removed. This cannot be undone.`
- Confirm text: app's `slug`
- Has `loading` prop
- **Toast on success**: `variant: 'warning'`, title: "App deleted"
- **Post-delete**: Navigates to `/apps`
### 1.11 Stop Deployment
- **File**: `ui/src/pages/AppsTab/AppsTab.tsx` (lines 526-531, 672)
- **Button location**: Inline in deployments table, right-aligned actions column
- **Button**: `<Button size="sm" variant="danger">Stop</Button>`
- **Confirmation**: NONE -- immediate mutation on click
- **Toast on success**: `variant: 'warning'`, title: "Deployment stopped"
**INCONSISTENCY**: Stopping a deployment is a destructive operation that affects live services but has NO confirmation dialog. Route stop/suspend in RouteControlBar uses a ConfirmDialog, but deployment stop does not.
### 1.12 Stop/Suspend Route
- **File**: `ui/src/pages/Exchanges/RouteControlBar.tsx` (lines 43-154)
- **Button location**: Route control bar (segmented button group)
- **Button**: Custom segmented `<button>` elements (not design system Button)
- **Confirmation**: `ConfirmDialog` (type-to-confirm) -- only for `stop` and `suspend` actions
- Title: `"Stop route?"` or `"Suspend route?"`
- Message: `This will ${action} route "${routeId}" on ${application}. This affects all live agents.`
- Confirm text: the action name (e.g., `"stop"` or `"suspend"`)
- Confirm label: `"Stop Route"` or `"Suspend Route"`
- Variant: `danger` for stop, `warning` for suspend
- Has `loading` prop
- **Start and Resume**: No confirmation (immediate action)
- **Toast patterns match others**
### 1.13 Delete Tap (Route Detail page)
- **File**: `ui/src/pages/Routes/RouteDetail.tsx` (lines 991-1001)
- **Button location**: Inline delete icon button in taps table row
- **Confirmation**: `ConfirmDialog` (type-to-confirm)
- Title: "Delete Tap"
- Message: `This will remove the tap "${attributeName}" from the configuration.`
- Confirm text: tap's `attributeName`
- Confirm label: "Delete"
- Variant: `danger`
- **No `loading` prop on this dialog**
**INCONSISTENCY**: No `loading` prop, unlike entity delete confirmations.
### 1.14 Delete Tap (TapConfigModal)
- **File**: `ui/src/components/TapConfigModal.tsx` (lines 117-122, 249-253)
- **Button location**: Inside the modal footer, left-aligned (only shown when editing)
- **Button**: `<Button variant="danger">Delete</Button>`
- **Confirmation**: NONE -- immediate call to `onDelete` then `onClose`
- **Toast**: Handled by parent component (ExchangesPage)
**INCONSISTENCY**: Deleting a tap from the TapConfigModal has no confirmation, but deleting from the RouteDetail table shows a ConfirmDialog.
### 1.15 Kill Database Query
- **File**: `ui/src/pages/Admin/DatabaseAdminPage.tsx` (line 30)
- **Button location**: Inline in active queries table
- **Button**: `<Button variant="danger" size="sm">Kill</Button>`
- **Confirmation**: NONE -- immediate mutation
- **Toast**: None visible
**INCONSISTENCY**: Killing a database query is a destructive action with no confirmation and no toast feedback.
---
## 2. Button Placement & Order
### 2.1 Create Forms (Users, Groups, Roles, Environments)
All four entity create forms use an identical pattern:
| Page | File | Line | Left Button | Right Button |
|------|------|------|-------------|--------------|
| Users | `UsersTab.tsx` | 254-274 | Cancel (ghost) | Create (primary) |
| Groups | `GroupsTab.tsx` | 251-268 | Cancel (ghost) | Create (primary) |
| Roles | `RolesTab.tsx` | 142-159 | Cancel (ghost) | Create (primary) |
| Environments | `EnvironmentsPage.tsx` | 181-194 | Cancel (ghost) | Create (primary) |
- **Position**: Bottom of inline create form in the list pane
- **Container class**: `styles.createFormActions`
- **Order**: Cancel (left) | Create (right) -- **CONSISTENT**
- **Variants**: Cancel = `ghost`, Create = `primary` -- **CONSISTENT**
- **Size**: Both `sm` -- **CONSISTENT**
### 2.2 App Creation Page
- **File**: `ui/src/pages/AppsTab/AppsTab.tsx` (lines 282-287)
- **Position**: Top of page in `detailActions` header area
- **Order**: Cancel (ghost, left) | Create & Deploy / Create (primary, right)
- **Size**: Both `sm`
- **CONSISTENT** with the pattern (Cancel left, Submit right)
### 2.3 OIDC Config Page (Toolbar)
- **File**: `ui/src/pages/Admin/OidcConfigPage.tsx` (lines 130-137)
- **Position**: Top toolbar
- **Order**: Test Connection (secondary, left) | Save (primary, right)
- **No Cancel button** -- form is always editable
**NOTE**: This is the only admin page without a Cancel button or Edit mode toggle.
### 2.4 App Detail Header
- **File**: `ui/src/pages/AppsTab/AppsTab.tsx` (lines 560-566)
- **Position**: Top-right header area in `detailActions`
- **Order**: Upload JAR (primary) | Delete App (danger)
**NOTE**: The primary action (Upload) is on the LEFT and the destructive action (Delete) is on the RIGHT.
### 2.5 App Config Detail Page (AppConfigDetailPage)
- **File**: `ui/src/pages/Admin/AppConfigDetailPage.tsx` (lines 308-319)
- **Position**: Top toolbar
- **Read mode**: Back (ghost) ... Edit (secondary)
- **Edit mode**: Back (ghost) ... Save (default/no variant specified!) | Cancel (secondary)
- **Order when editing**: Save (left) | Cancel (right)
**INCONSISTENCY #1**: Save button has NO `variant` prop set -- it renders as default, not `primary`. Every other Save button uses `variant="primary"`.
**INCONSISTENCY #2**: Button order is REVERSED from every other form. Here it is Save (left) | Cancel (right). Everywhere else it is Cancel (left) | Save (right).
### 2.6 App Config Sub-Tab (AppsTab ConfigSubTab)
- **File**: `ui/src/pages/AppsTab/AppsTab.tsx` (lines 922-936)
- **Position**: Top banner bar (editBanner)
- **Read mode**: Banner text + Edit (secondary)
- **Edit mode**: Banner text + Cancel (ghost) | Save Configuration (primary)
- **Order when editing**: Cancel (left) | Save (right) -- **CONSISTENT**
### 2.7 Environment Default Resources / JAR Retention Sections
- **File**: `ui/src/pages/Admin/EnvironmentsPage.tsx` (lines 437-446, 505-514)
- **Position**: Bottom of section, right-aligned (`justifyContent: 'flex-end'`)
- **Read mode**: Edit Defaults / Edit Policy (secondary)
- **Edit mode**: Cancel (ghost) | Save (primary) -- **CONSISTENT**
- **Size**: Both `sm`
### 2.8 User Password Reset
- **File**: `ui/src/pages/Admin/UsersTab.tsx` (lines 407-431)
- **Position**: Inline in Security section
- **Order**: Cancel (ghost) | Set (primary)
- **CONSISTENT** pattern (Cancel left, Submit right)
### 2.9 Tap Modal (TapConfigModal)
- **File**: `ui/src/components/TapConfigModal.tsx` (lines 249-257)
- **Position**: Modal footer
- **Order (edit mode)**: Delete (danger, left, in `footerLeft`) | Cancel (secondary) | Save (primary)
- **Order (create mode)**: Cancel (secondary) | Save (primary)
- **No `size` prop specified** -- renders at default size
**NOTE**: Uses `variant="secondary"` for Cancel, not `variant="ghost"` like create forms.
### 2.10 Tap Modal (RouteDetail inline version)
- **File**: `ui/src/pages/Routes/RouteDetail.tsx` (lines 984-986)
- **Position**: Modal footer (`tapModalFooter`)
- **Order**: Cancel (secondary) | Save (primary)
- **No `size` prop specified**
- **CONSISTENT** with TapConfigModal
### 2.11 About Me Dialog
- **File**: `ui/src/components/AboutMeDialog.tsx` (lines 14, 72)
- **Uses `Modal` with built-in close button** (no explicit action buttons)
- **Close via**: Modal `onClose` handler (X button and backdrop click)
### 2.12 Login Page
- **File**: `ui/src/auth/LoginPage.tsx` (lines 176-184)
- **Single button**: Sign in (primary, full width, submit type)
- **Optional SSO button above**: Sign in with SSO (secondary)
### Summary of Button Order Patterns
| Location | Cancel Side | Submit Side | Consistent? |
|----------|------------|-------------|-------------|
| User create form | Left (ghost) | Right (primary) | YES |
| Group create form | Left (ghost) | Right (primary) | YES |
| Role create form | Left (ghost) | Right (primary) | YES |
| Env create form | Left (ghost) | Right (primary) | YES |
| App create page | Left (ghost) | Right (primary) | YES |
| Env Default Resources edit | Left (ghost) | Right (primary) | YES |
| Env JAR Retention edit | Left (ghost) | Right (primary) | YES |
| AppsTab config sub-tab edit | Left (ghost) | Right (primary) | YES |
| User password reset | Left (ghost) | Right (primary) | YES |
| TapConfigModal | Left (secondary) | Right (primary) | Variant mismatch |
| RouteDetail tap modal | Left (secondary) | Right (primary) | Variant mismatch |
| **AppConfigDetailPage** | **Left (NO variant)** | **Right (secondary)** | **REVERSED** |
---
## 3. Edit / Save Patterns
### 3.1 Users (UsersTab)
- **Edit mode**: No explicit toggle. Display name uses `InlineEdit` (click-to-edit). Everything else is managed via tag add/remove.
- **No Save/Cancel for the detail view** -- all changes are immediate mutations.
- **Unsaved changes indicator**: N/A (no batched editing)
- **On success**: Toast with `variant: 'success'`
- **On error**: Toast with `variant: 'error'`, `duration: 86_400_000` (effectively permanent)
### 3.2 Groups (GroupsTab)
- **Edit mode**: Name uses `InlineEdit`. All other changes (members, roles) are immediate mutations.
- **Pattern**: Same as Users -- no batched edit mode.
### 3.3 Roles (RolesTab)
- **Edit mode**: Read-only detail panel. No editing of role fields.
- **Only action**: Delete
### 3.4 Environments (EnvironmentsPage)
- **Edit mode (name)**: `InlineEdit`
- **Edit mode (production/enabled toggles)**: Immediate mutations per toggle change
- **Edit mode (Default Resources)**: Explicit Edit toggle (`setEditing(true)`)
- Cancel/Save buttons appear at bottom-right
- Resets form on cancel
- No unsaved changes indicator
- On success: Toast `variant: 'success'`
- **Edit mode (JAR Retention)**: Same pattern as Default Resources
- **On environment switch**: Both sub-sections auto-reset to read mode
### 3.5 OIDC Config (OidcConfigPage)
- **Edit mode**: ALWAYS editable (no toggle)
- **Save button**: Always visible in top toolbar
- **No Cancel button** -- cannot discard changes
- **No unsaved changes indicator**
- **On success**: Toast `variant: 'success'`
- **On error**: Toast `variant: 'error'` + inline `<Alert variant="error">` both shown
**INCONSISTENCY**: Only page that is always editable with no way to discard changes. Also the only page that shows BOTH a toast AND an inline alert on error.
### 3.6 App Config Detail (AppConfigDetailPage)
- **Edit mode**: Explicit toggle via `Edit` button (Pencil icon) in toolbar
- **Toolbar in edit mode**: Save (unstyled!) | Cancel (secondary)
- **Save button text**: Shows "Saving..." while pending
- **No unsaved changes indicator**
- **On success**: Toast `variant: 'success'`, exits edit mode
- **On error**: Toast `variant: 'error'`, stays in edit mode
### 3.7 App Config Sub-Tab (AppsTab ConfigSubTab)
- **Edit mode**: Explicit toggle via banner + Edit button
- **Banner in read mode**: "Configuration is read-only. Enter edit mode to make changes."
- **Banner in edit mode**: "Editing configuration. Changes are not saved until you click Save." (styled differently with `editBannerActive`)
- **This IS an unsaved changes indicator** (the banner text changes)
- **Cancel/Save in edit banner**: Cancel (ghost) | Save Configuration (primary)
- **On success**: Toast `variant: 'success'`, exits edit mode, shows redeploy notice
- **On error**: Toast `variant: 'error'`, stays in edit mode
### 3.8 App Create Page
- **Edit mode**: N/A (always a creation form)
- **Multi-step indicator**: Shows step text like "Creating app...", "Uploading JAR..." during submission
- **On success**: Toast `variant: 'success'`, navigates to app detail page
- **On error**: Toast `variant: 'error'` with step context
### 3.9 Tap Editing (TapConfigModal + RouteDetail inline)
- **Edit mode**: Modal opens for edit or create
- **Save/Cancel**: In modal footer
- **On success**: Modal closes, parent handles toast
- **On error**: Parent handles toast
### Summary of Edit Patterns
| Page | Explicit Edit Toggle? | Unsaved Changes Indicator? | Consistent? |
|------|----------------------|---------------------------|-------------|
| Users | No (inline edits) | N/A | N/A |
| Groups | No (inline edits) | N/A | N/A |
| Roles | No (read-only) | N/A | N/A |
| Environments - name | No (InlineEdit) | N/A | OK |
| Environments - resources | YES | No | Missing |
| Environments - JAR retention | YES | No | Missing |
| OIDC Config | No (always editable) | No | Deviation |
| AppConfigDetailPage | YES | No | Missing |
| AppsTab ConfigSubTab | YES (banner) | YES (banner text) | Best pattern |
**INCONSISTENCY**: The AppsTab ConfigSubTab is the only one with a proper unsaved-changes indicator. AppConfigDetailPage (which edits the same data for a different entry point) has no such indicator.
---
## 4. Toast / Notification Patterns
### 4.1 Toast Provider
- **File**: `ui/src/components/LayoutShell.tsx` (line 783)
- **Provider**: `<ToastProvider>` from `@cameleer/design-system` wraps the entire app layout
- **Hook**: `useToast()` returns `{ toast }` function
### 4.2 Toast Call Signature
All toast calls use the same shape:
```typescript
toast({
title: string,
description?: string,
variant: 'success' | 'error' | 'warning',
duration?: number
})
```
### 4.3 Toast Variants Used
| Variant | Used For | Duration |
|---------|----------|----------|
| `success` | Successful operations | Default (auto-dismiss) |
| `error` | Failed operations | `86_400_000` (24 hours = effectively permanent) |
| `warning` | Destructive successes (delete, stop) AND partial failures | Mixed (see below) |
### 4.4 Duration Patterns
- **Success toasts**: No explicit duration (uses design system default) -- **CONSISTENT**
- **Error toasts**: Always `duration: 86_400_000` -- **CONSISTENT** (49 occurrences across 10 files)
- **Warning toasts for deletion success** (user/group/role/env/OIDC/app deleted): No explicit duration (auto-dismiss) -- **CONSISTENT**
- **Warning toasts for partial push failures**: `duration: 86_400_000` -- **CONSISTENT**
### 4.5 Naming Conventions for Toast Titles
**Success pattern**: Action-noun format
- "User created", "Group created", "Role created", "Environment created"
- "Display name updated", "Password updated", "Group renamed"
- "Config saved", "Configuration saved", "Tap configuration saved"
**Error pattern**: "Failed to [action]" format
- "Failed to create user", "Failed to delete group", "Failed to update password"
- "Save failed", "Upload failed", "Deploy failed" (shorter form)
**INCONSISTENCY**: Error messages mix two patterns:
1. "Failed to [verb] [noun]" (e.g., "Failed to create user") -- used in RBAC pages
2. "[Noun] failed" (e.g., "Save failed", "Upload failed") -- used in AppsTab, AppConfigDetailPage
### 4.6 Warning Variant for Deletions
Successful deletions use `variant: 'warning'` consistently:
- "User deleted" (UsersTab:162)
- "Group deleted" (GroupsTab:147)
- "Role deleted" (RolesTab:100)
- "Environment deleted" (EnvironmentsPage:105)
- "Configuration deleted" (OidcConfigPage:119)
- "App deleted" (AppsTab:536)
- "Deployment stopped" (AppsTab:529)
**CONSISTENT** -- all destructive-but-successful operations use warning.
---
## 5. Loading / Empty States
### 5.1 Full-Page Loading States
| Page | Component | Size | Wrapper |
|------|-----------|------|---------|
| UsersTab | `<Spinner size="md" />` | md | Bare return |
| GroupsTab | `<Spinner size="md" />` | md | Bare return |
| RolesTab | `<Spinner size="md" />` | md | Bare return |
| EnvironmentsPage | `<Spinner size="md" />` | md | Bare return |
| AppListView | `<Spinner size="md" />` | md | Bare return |
| AppDetailView | `<Spinner size="md" />` | md | Bare return |
| AgentInstance | `<Spinner size="lg" />` | **lg** | Bare return |
| AppConfigDetailPage | `<Spinner size="lg" />` | **lg** | Wrapped in `div.loading` |
| DashboardPage | `<PageLoader />` | lg | Centered container |
| RuntimePage | `<PageLoader />` | lg | Centered container |
| OidcConfigPage | `return null` | N/A | Returns nothing |
**INCONSISTENCY #1**: Most admin pages use `<Spinner size="md" />` as a bare return. AgentInstance and AppConfigDetailPage use `size="lg"`. DashboardPage and RuntimePage use the `<PageLoader />` component which wraps `<Spinner size="lg" />` in a centered container.
**INCONSISTENCY #2**: OidcConfigPage returns `null` while loading (shows a blank page), unlike every other page.
**INCONSISTENCY #3**: SplitPane detail loading (GroupsTab line 317, RolesTab line 212) uses `<Spinner size="md" />` -- consistent within that context.
### 5.2 Section Loading States
- **RouteDetail charts**: `<Spinner size="sm" />` inline in chart containers (lines 713, 804)
- **AboutMeDialog**: `<Spinner size="md" />` in a `div.loading` wrapper
### 5.3 Empty States
| Context | Pattern | Component Used |
|---------|---------|----------------|
| SplitPane list (no search match) | `emptyMessage="No X match your search"` | EntityList built-in |
| SplitPane detail (nothing selected) | `emptyMessage="Select a X to view details"` | SplitPane built-in |
| Deployments table (none) | `<p className={styles.emptyNote}>No deployments yet.</p>` | Plain `<p>` |
| Versions list (none) | `<p className={styles.emptyNote}>No versions uploaded yet.</p>` | Plain `<p>` |
| Env vars (none, not editing) | `<p className={styles.emptyNote}>No environment variables configured.</p>` | Plain `<p>` |
| Traces/Taps (none) | `<p className={styles.emptyNote}>No processor traces or taps configured.</p>` | Plain `<p>` |
| Route recording (none) | `<p className={styles.emptyNote}>No routes found for this application.</p>` | Plain `<p>` |
| AgentInstance metrics | `<EmptyState title="No data" description="No X available" />` | EmptyState (DS component) |
| Log/Event panels | `<div className={logStyles.logEmpty}>No events...</div>` | Styled `<div>` |
| OIDC default roles | `<span className={styles.noRoles}>No default roles configured</span>` | `<span>` |
| Group members (none) | `<span className={styles.inheritedNote}>(no members)</span>` | `<span>` |
| AppConfigDetailPage (not found) | `<div>No configuration found for "{appId}".</div>` | Plain `<div>` |
| RouteDetail error patterns | `<div className={styles.emptyText}>No error patterns found...</div>` | Styled `<div>` |
| RouteDetail taps (none) | `<div className={styles.emptyState}>No taps configured...</div>` | Styled `<div>` |
**INCONSISTENCY**: Empty states use at least 5 different approaches:
1. Design system `EmptyState` component (only in AgentInstance)
2. `<p className={styles.emptyNote}>` (AppsTab)
3. `<span className={styles.inheritedNote}>` with parenthetical format "(none)" (RBAC pages)
4. `<div className={styles.emptyText}>` (RouteDetail)
5. Unstyled inline text (AppConfigDetailPage)
The design system provides an `EmptyState` component but it is only used in one place (AgentInstance).
---
## 6. Inconsistency Summary
### HIGH Priority (User-facing confusion)
1. **AppConfigDetailPage button order is reversed** (Save|Cancel instead of Cancel|Save) and Save button has no `variant="primary"`. File: `ui/src/pages/Admin/AppConfigDetailPage.tsx`, lines 311-315.
2. **Deployment Stop has no confirmation dialog**. Stopping a running deployment immediately executes with no confirmation, while stopping/suspending a route shows a ConfirmDialog. File: `ui/src/pages/AppsTab/AppsTab.tsx`, line 672.
3. **Tap deletion is inconsistent**. Deleting from TapConfigModal: no confirmation. Deleting from RouteDetail table: ConfirmDialog. File: `ui/src/components/TapConfigModal.tsx` line 117 vs `ui/src/pages/Routes/RouteDetail.tsx` line 992.
4. **Kill Query has no confirmation and no feedback**. File: `ui/src/pages/Admin/DatabaseAdminPage.tsx`, line 30.
### MEDIUM Priority (Pattern deviations)
5. **Cancel button variant inconsistency**. Create forms use `variant="ghost"` for Cancel. Modal dialogs (TapConfigModal, RouteDetail tap modal) use `variant="secondary"`. File: `ui/src/components/TapConfigModal.tsx` line 255, vs `ui/src/pages/Admin/UsersTab.tsx` line 258.
6. **Removing a role from a user has no confirmation** but removing a group from a user shows an AlertDialog. Both can cascade. File: `ui/src/pages/Admin/UsersTab.tsx`, lines 504-528 vs 588-613.
7. **OIDC Config is always editable with no Cancel/discard**. Every other editable form either has inline-edit (immediate save) or explicit edit mode with Cancel. File: `ui/src/pages/Admin/OidcConfigPage.tsx`.
8. **OIDC Config delete ConfirmDialog missing `loading` prop**. All other delete ConfirmDialogs pass `loading={mutation.isPending}`. File: `ui/src/pages/Admin/OidcConfigPage.tsx`, line 258.
9. **Loading state size inconsistency**. Most pages use `Spinner size="md"`, some use `size="lg"`, some use `PageLoader`, and OidcConfigPage returns `null`. No single standard.
10. **Error toast title format inconsistency**. RBAC pages use "Failed to [verb] [noun]" while AppsTab/AppConfigDetailPage use "[Noun] failed". Should pick one.
### LOW Priority (Minor deviations)
11. **Empty state presentation varies widely**. Five different approaches used. Should standardize on the design system `EmptyState` component or at least a consistent CSS class.
12. **ConfirmDialog confirmText varies between display name and slug**. Users/Groups/Roles use display name; Environments and Apps use slug. This is arguably intentional (slug is the technical identifier) but may confuse users.
13. **OIDC Config shows both toast and inline Alert on error**. No other page shows both simultaneously. File: `ui/src/pages/Admin/OidcConfigPage.tsx`, line 92 (toast) + line 139 (inline Alert).
14. **AppConfigDetailPage Save button text changes to "Saving..."** using string interpolation, while every other page uses the `loading` prop on Button (which shows a spinner). File: `ui/src/pages/Admin/AppConfigDetailPage.tsx`, line 313.
15. **Unsaved changes indicator** only present on AppsTab ConfigSubTab (banner text). AppConfigDetailPage, Environment resource sections, and JAR retention section have no indicator even though they use explicit edit mode.
---
## 7. ConfirmDialog Usage Matrix
| Object | File | Line | confirmText Source | Has `loading`? | Has `variant`? | Has `confirmLabel`? |
|--------|------|------|-------------------|----------------|----------------|---------------------|
| User | UsersTab.tsx | 580 | displayName | YES | No (default) | No (default) |
| Group | GroupsTab.tsx | 434 | name | YES | No (default) | No (default) |
| Role | RolesTab.tsx | 223 | name | YES | No (default) | No (default) |
| Environment | EnvironmentsPage.tsx | 319 | slug | YES | No (default) | No (default) |
| OIDC Config | OidcConfigPage.tsx | 258 | "delete oidc" | **NO** | No (default) | No (default) |
| App | AppsTab.tsx | 589 | slug | YES | No (default) | No (default) |
| Tap (RouteDetail) | RouteDetail.tsx | 992 | attributeName | **NO** | `danger` | `"Delete"` |
| Route Stop | RouteControlBar.tsx | 139 | action name | YES | `danger`/`warning` | `"Stop Route"` / `"Suspend Route"` |
**NOTE**: RouteControlBar and RouteDetail set explicit `variant` and `confirmLabel` on ConfirmDialog while all RBAC/admin pages use defaults. This creates visual differences in the confirmation dialogs.
---
## 8. AlertDialog Usage Matrix
| Context | File | Line | Title | Confirm Label | Variant |
|---------|------|------|-------|---------------|---------|
| Remove group from user | UsersTab.tsx | 588 | "Remove group membership" | "Remove" | `warning` |
| Remove role from group | GroupsTab.tsx | 442 | "Remove role from group" | "Remove" | `warning` |
AlertDialog is used consistently where present (both use `warning` variant and "Remove" label).
---
## 9. Files Examined
All `.tsx` files under `ui/src/pages/` and `ui/src/components/`:
- `ui/src/pages/Admin/UsersTab.tsx`
- `ui/src/pages/Admin/GroupsTab.tsx`
- `ui/src/pages/Admin/RolesTab.tsx`
- `ui/src/pages/Admin/EnvironmentsPage.tsx`
- `ui/src/pages/Admin/OidcConfigPage.tsx`
- `ui/src/pages/Admin/AppConfigDetailPage.tsx`
- `ui/src/pages/Admin/DatabaseAdminPage.tsx`
- `ui/src/pages/Admin/ClickHouseAdminPage.tsx`
- `ui/src/pages/Admin/AuditLogPage.tsx`
- `ui/src/pages/AppsTab/AppsTab.tsx`
- `ui/src/pages/Routes/RouteDetail.tsx`
- `ui/src/pages/Exchanges/ExchangesPage.tsx`
- `ui/src/pages/Exchanges/RouteControlBar.tsx`
- `ui/src/pages/AgentHealth/AgentHealth.tsx`
- `ui/src/pages/AgentInstance/AgentInstance.tsx`
- `ui/src/pages/DashboardTab/DashboardPage.tsx`
- `ui/src/pages/RuntimeTab/RuntimePage.tsx`
- `ui/src/components/TapConfigModal.tsx`
- `ui/src/components/AboutMeDialog.tsx`
- `ui/src/components/PageLoader.tsx`
- `ui/src/components/LayoutShell.tsx`
- `ui/src/auth/LoginPage.tsx`

View File

@@ -0,0 +1,267 @@
# Cameleer3 Web UI - UX Audit Findings
**Date:** 2026-04-09
**URL:** https://desktop-fb5vgj9.siegeln.internal/server/
**Build:** 69dcce2
**Auditor:** Claude (automated browser audit)
---
## 1. Exchange Detail (Split View)
**Screenshots:** `04-exchange-detail-ok.png`, `05-exchange-detail-err.png`, `27-exchange-err-error-tab.png`
### What Works Well
- Split view layout (50/50) is clean and efficient -- table on left, detail on right
- Processor timeline visualization is excellent -- clear step sequence with color-coded status (green OK, red/amber error)
- Exchange detail tabs (Info, Headers, Input, Output, Error, Config, Timeline, Log) are comprehensive
- Error tab shows full Java stack trace with Copy button and exception message prominently displayed
- ERR rows in table have clear red status badge with icon
- Correlated exchanges section present (even when none found)
- JSON download button available on the detail view
### Issues Found
**Important:**
- **Exchange ID is raw hex, hard to scan.** The IDs like `96E395B0088AA6D-000000000001ED46` are 33+ characters wide. They push the table columns apart and are hard for humans to parse. Consider truncating with copy-on-click or showing a short hash.
- **Attributes column always shows "--".** Every single exchange row displays "--" in the Attributes column. If no attributes are captured, this column wastes horizontal space. Consider hiding it when empty or showing it only when relevant data exists.
- **Status shows "OK" but detail shows "COMPLETED".** The table status column shows "OK" / "ERR" but the detail panel shows "COMPLETED" / "FAILED". This terminology mismatch is confusing -- pick one convention.
**Nice-to-have:**
- **No breadcrumb update when exchange selected.** The breadcrumb still shows "All Applications" even when viewing a specific exchange detail. Should show: All Applications > sample-app > Exchange 96E39...
- **No action buttons on exchange detail.** No "Replay", "Trace", or "View Route" buttons in the detail view. Users would benefit from contextual actions.
- **Back navigation relies on de-selecting the row.** There is no explicit "Close" or "Back" button on the detail panel.
---
## 2. Dashboard Tab
**Screenshots:** `07-dashboard-full.png`, `08-dashboard-drilldown.png`
### What Works Well
- KPI strip is clean and scannable: Throughput (7/s), Success Rate (98.0%), P99 Latency (6695ms), SLA Compliance (38.0%), Active Errors (3)
- L1 (applications) -> L2 (routes) drill-down works via table row click
- L2 view shows comprehensive route performance table with throughput, success %, avg/P99, SLA %, sparkline
- Top Errors table with error velocity and "last seen" is very useful
- Charts: Throughput by Application, Error Rate, Volume vs SLA Compliance, 7-Day Pattern heatmap
- Color coding is consistent (amber for primary metrics, red for errors)
- Auto-refresh indicator shows "Auto-refresh: 30s"
### Issues Found
**Important:**
- **Application Health table row click is blocked by overlapping elements.** Playwright detected `_tableSection` and `_chartGrid` divs intercepting pointer events on the table row. While JavaScript `.click()` works, this means CSS `pointer-events` or `z-index` is wrong -- real mouse clicks may be unreliable depending on scroll position.
- **SLA Compliance 0.0% shows "BREACH" label** in L2 view but no explanation of what the SLA threshold is until you look closely at the latency chart. The SLA threshold (300ms) should be shown next to the KPI, not just in the chart.
- **7-Day Pattern heatmap is flat/empty.** The heatmap shows data only for the current day, making it look broken for a fresh deployment. Consider showing "Insufficient data" when less than 2 days of data exist.
- **"Application Volume vs SLA Compliance" bubble chart** truncates long application names (e.g., "complex-fulfil..." in L2). The chart has limited space for labels.
**Nice-to-have:**
- **No trend arrows on KPI values in L2.** The L1 dashboard shows up/down arrows (all "up"), but L2 KPIs show percentage change text instead. The two levels should be consistent.
- **P99 latency 6695ms is not formatted as seconds.** Values over 1000ms should display as "6.7s" for readability. The L2 view uses raw milliseconds (1345ms) which is also inconsistent with the L1 (6695ms) and the exchange list which does format durations.
- **Throughput numbers use locale-specific formatting.** In the route table: `1.050` (German locale?) vs `14.377` -- these look like decimal numbers rather than thousands. Consider using explicit thousands separator or always using K suffix.
---
## 3. Runtime Tab
**Screenshots:** `09-runtime-tab.png`, `09-runtime-full.png`, `10-runtime-agent-detail.png`, `24-runtime-agent-detail-full.png`
### What Works Well
- KPI strip: Total Agents (3), Applications (1), Active Routes (30/0), Total TPS (4.8), Dead (0) -- clear at a glance
- Agent state indicators are clear: green "LIVE" badges, "3/3 LIVE" summary
- Instance table shows key metrics: State, Uptime, TPS, Errors, Heartbeat
- Clicking an agent row navigates to a rich detail view with 6 charts (CPU, Memory, Throughput, Error Rate, Thread Count, GC Pauses)
- Agent capabilities displayed as badges (LOGFORWARDING, DIAGRAMS, TRACING, METRICS)
- Application Log viewer with level filtering (Error/Warn/Info/Debug/Trace) and auto-scroll
- Timeline shows agent events (CONFIG_APPLIED, COMMAND_SUCCESS) with relative timestamps
### Issues Found
**Critical:**
- **GC Pauses chart X-axis is unreadable.** The chart renders ~60 full ISO-8601 timestamps (`2026-04-09T14:16:00Z` through `2026-04-09T15:15:00Z`) as X-axis labels. These overlap completely and form an unreadable block of text. All other charts use concise numeric labels (e.g., "12", "24"). The GC Pauses chart should use the same time formatting.
**Important:**
- **Agent state shows "UNKNOWN" alongside "LIVE".** The detail view shows both "LIVE" and "UNKNOWN" state indicators. The "UNKNOWN" appears to be a secondary state field (perhaps container state?) but it is confusing to show two conflicting states without explanation.
- **Memory chart shows absolute MB values but no percentage on Y-axis.** The KPI shows "46% / 57 MB / 124 MB" which is great, but the chart Y-axis goes from 0-68 MB which doesn't match the 124 MB limit. The max heap should be indicated on the chart (e.g., as a reference line).
- **Throughput chart Y-axis scale is wildly mismatched.** The KPI shows 2.0 msg/s but the Y-axis goes to 1.2k msg/s, making the actual data appear as a flat line near zero. The Y-axis should auto-scale to the actual data range.
- **Error Rate chart Y-axis shows "err/h"** but the unit inconsistency with the KPI (which shows percentage "1.7%") is confusing.
**Nice-to-have:**
- **"DEAD 0" KPI in the overview is redundant** when "all healthy" text is already shown below it. Consider combining or removing the redundant label.
- **Application Log shows "0 entries"** in the overview but "100 entries" in the agent detail. The overview log may not aggregate across agents, which is misleading.
---
## 4. Deployments Tab
**Screenshots:** `12-deployments-list.png`, `25-app-detail.png`, `11-deployments-tab.png`
### What Works Well
- App list is clean: Name, Environment (with colored badges DEFAULT/DEVELOPMENT), Updated, Created columns
- App detail page shows configuration tabs: Monitoring, Resources, Variables, Traces & Taps, Route Recording
- Read-only mode with explicit "Edit" button prevents accidental changes
- "Upload JAR" and "Delete App" action buttons are visible
- Create Application form (`/apps/new`) is comprehensive with Identity & Artifact section, deploy toggle, and monitoring sub-tabs
### Issues Found
**Important:**
- **Navigating to `/server/apps` redirected to `/server/apps/new`** on the initial visit, bypassing the apps list. This happened once but not consistently. The default route for the Deployments tab should always be the list view, not the create form.
- **No deployment status/progress visible in the list.** The apps list shows "RUNNING" status only in the detail view. The list should show the deployment status directly (RUNNING/STOPPED/FAILED badge per row).
- **"Updated: 59m ago" is relative time** which becomes stale if the page is left open. Consider showing absolute timestamp on hover.
**Nice-to-have:**
- **Configuration form select dropdowns** (Engine Level, Payload Capture, App Log Level, etc.) all use native HTML selects with a custom `"triangle"` indicator -- this is inconsistent with the design system's `Select` component used elsewhere.
- **"External URL" field shows `/default/.../`** placeholder which is cryptic. Should show the full resolved URL or explain the pattern.
---
## 5. Command Palette (Ctrl+K)
**Screenshots:** `14-command-palette.png`, `15-command-palette-search.png`, `16-command-palette-keyboard.png`
### What Works Well
- Opens instantly with Ctrl+K
- Shows categorized results: All (24), Applications (1), Exchanges (10), Routes (10), Agents (3)
- Search is fast and filters results in real-time (typed "error" -> filtered to 11 results)
- Search term highlighting (yellow background on matched text)
- Keyboard navigation works (ArrowDown moves selection)
- Rich result items: exchange IDs with status, routes with app name and exchange count, applications with agent count
- Escape closes the palette
- Category tabs allow filtering by type
### Issues Found
**Nice-to-have:**
- **Exchange IDs in search results are full hex strings.** The same issue as the exchanges table -- `5EF55FC31352A9A-000000000001F07C` is hard to scan. Show a shorter preview.
- **No keyboard shortcut hints in results.** Results don't show "Enter to open" or "Tab to switch category" -- users must discover these by trial.
- **Category counts don't update when filtering.** When I typed "error", the category tabs still show the original counts (Applications, Exchanges 10, Routes 1, Agents) but some categories become empty. The empty categories should hide or dim.
---
## 6. Dark Mode
**Screenshots:** `17-dark-mode-exchanges.png`, `18-dark-mode-dashboard.png`, `19-dark-mode-runtime.png`
### What Works Well
- Dark mode applies cleanly across all pages
- Table rows have good contrast (light text on dark background)
- Status badges (OK green, ERR red) remain clearly visible
- Chart lines and data points are visible against dark backgrounds
- KPI cards have distinct dark card backgrounds with readable text
- The dark mode toggle is easy to find (moon icon in header)
- Theme preference persists in localStorage (`cameleer-theme`)
### Issues Found
**Important:**
- **Chart backgrounds appear as opaque dark cards but chart lines may be harder to see.** The throughput and error rate charts use amber/orange lines on dark gray backgrounds -- this is acceptable but not ideal. Consider slightly brighter chart colors in dark mode.
- **Application Volume vs SLA chart** in dashboard: the bubble/bar labels may have low contrast in dark mode (hard to verify at screenshot resolution).
**Nice-to-have:**
- **Sidebar border/separator** between the sidebar and main content area is very subtle in dark mode. A slightly more visible divider would help.
- **Environment badges** (DEFAULT in gold, DEVELOPMENT in orange) are designed for light mode and may look less distinct against the dark background.
---
## 7. Cross-Cutting Interaction Issues
### Status Filter Buttons (OK/Warn/Error/Running)
**Screenshots:** `03-exchanges-error-filtered.png`
**Important:**
- **Error filter works correctly** -- clicking the Error button filters to show only ERR exchanges (447 in the test). The button shows active/pressed state.
- **Filter state is not preserved in URL.** Navigating away and back loses the filter. Consider encoding active filters in the URL query string.
- **KPI strip does not update when filter is active.** When Error filter is active, the KPI strip still shows overall stats (Total 23.4K, Err% 1.9%). It should either update to show filtered stats or clearly indicate it shows overall stats.
### Column Sorting
**Screenshot:** `23-sorting-route.png`
- Sorting works correctly (Route column sorted alphabetically, "audit-log" rows grouped)
- Sort indicator arrow is visible on the column header
- **Sorting is client-side only (within the 50-row page).** With 23K+ exchanges, sorting only the visible page is misleading. Consider either fetching sorted data from the server or clearly labeling "sorted within current page."
### Pagination
- Pagination works: "1-25 of 50", page 1/2, rows per page selector (10/25/50/100)
- Next/Previous page buttons work
- **"50 of 23,485 exchanges" label is confusing.** The "50" refers to the server-side limit (max fetched), not the page size (25). This should read "Showing 1-25 of 23,485" or similar.
### Sidebar App Tree
**Screenshot:** `20-sidebar-expanded.png`
- Expand/collapse works for "sample app"
- Shows all 10 routes with exchange counts (audit-log 5.3k, file-processing 114.2k, etc.)
- Exchange counts use K-suffix formatting which is good
- **Add to starred button is present** (star icon on the app)
### Environment Selector
- Dropdown works: All Envs / default / development
- Switching environment correctly filters data (65K -> 3.5K exchanges)
- Selection persists in localStorage
### Time Range Pills
**Screenshot:** `21-time-range-3h.png`
- Time range pills work (1h, 3h, 6h, Today, 24h, 7d)
- Switching updates data and KPI strip correctly
- Custom date range is shown: "9. Apr. 16:14 -- now" with clickable start/end timestamps
- **Date formatting uses European style** ("9. Apr. 16:14") which is fine but inconsistent with ISO timestamps elsewhere.
---
## 8. Systematic Navigation Bug
**Critical:**
During the audit, the browser consistently auto-redirected from any page to `/server/admin/rbac` (Users & Roles) after interactions involving the Playwright accessibility snapshot tool. This happened:
- After taking snapshots of the exchanges page
- After clicking exchange detail rows
- After interacting with filter buttons
- After attempting to click table rows
The redirect does **not** happen when using only JavaScript-based interactions (`page.evaluate`) without the Playwright snapshot/click methods. The root cause appears to be that the Playwright MCP accessibility snapshot tool triggers focus/click events on sidebar items (specifically "Users & Roles"), causing unintended navigation.
**While this is likely a tool interaction artifact rather than a real user-facing bug**, it reveals that:
1. The sidebar tree items may have overly aggressive focus/activation behavior (activating on focus rather than explicit click)
2. There may be no route guard preventing unexpected navigation when the user hasn't explicitly clicked a sidebar item
Recommend investigating whether keyboard focus on sidebar tree items triggers navigation (it should require Enter/click, not just focus).
---
## Summary of Issues by Severity
### Critical (1)
1. **GC Pauses chart X-axis renders ~60 full ISO timestamps** -- completely unreadable (Runtime > Agent Detail)
### Important (10)
1. **Exchange ID columns are too wide** -- 33-char hex strings push table layout (Exchanges)
2. **Attributes column always shows "--"** -- wastes space (Exchanges)
3. **Status terminology mismatch** -- "OK/ERR" in table vs "COMPLETED/FAILED" in detail (Exchange Detail)
4. **Dashboard table row clicks intercepted by overlapping divs** -- z-index/pointer-events issue (Dashboard)
5. **SLA threshold not shown on KPI** -- have to find it in the chart (Dashboard L2)
6. **Agent state shows "UNKNOWN" alongside "LIVE"** -- confusing dual state (Runtime Agent Detail)
7. **Throughput chart Y-axis scale mismatch** -- 2 msg/s data on 1.2k scale, appears flat (Runtime Agent Detail)
8. **Error Rate chart unit mismatch** -- "err/h" on chart vs "%" on KPI (Runtime Agent Detail)
9. **Filter state not preserved in URL** (Exchanges)
10. **"50 of 23,485 exchanges" pagination label is confusing** (Exchanges)
### Nice-to-have (12)
1. No breadcrumb update when exchange selected
2. No action buttons (Replay/Trace) on exchange detail
3. No explicit Close/Back button on detail panel
4. P99 latency not formatted as seconds when >1000ms
5. Throughput numbers use locale-specific decimal formatting
6. 7-Day Pattern heatmap appears empty with limited data
7. Exchange IDs in command palette are full hex strings
8. No keyboard shortcut hints in command palette results
9. Sidebar border subtle in dark mode
10. Deployment list doesn't show status badges
11. "Updated: 59m ago" relative time goes stale
12. Category counts in command palette don't update when filtering

View File

@@ -48,14 +48,10 @@
<artifactId>flyway-database-postgresql</artifactId>
</dependency>
<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-java</artifactId>
<version>2.19.0</version>
</dependency>
<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-rest-client</artifactId>
<version>2.19.0</version>
<groupId>com.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.9.7</version>
<classifier>all</classifier>
</dependency>
<dependency>
<groupId>org.springdoc</groupId>
@@ -90,6 +86,10 @@
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-security</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-oauth2-resource-server</artifactId>
</dependency>
<dependency>
<groupId>com.nimbusds</groupId>
<artifactId>nimbus-jose-jwt</artifactId>
@@ -121,11 +121,20 @@
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.opensearch</groupId>
<artifactId>opensearch-testcontainers</artifactId>
<version>2.1.1</version>
<groupId>org.testcontainers</groupId>
<artifactId>testcontainers-clickhouse</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.docker-java</groupId>
<artifactId>docker-java-core</artifactId>
<version>3.4.1</version>
</dependency>
<dependency>
<groupId>com.github.docker-java</groupId>
<artifactId>docker-java-transport-zerodep</artifactId>
<version>3.4.1</version>
</dependency>
<dependency>
<groupId>org.awaitility</groupId>
<artifactId>awaitility</artifactId>

View File

@@ -5,6 +5,7 @@ import com.cameleer3.server.app.config.IngestionConfig;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.scheduling.annotation.EnableAsync;
import org.springframework.scheduling.annotation.EnableScheduling;
/**
@@ -16,6 +17,7 @@ import org.springframework.scheduling.annotation.EnableScheduling;
"com.cameleer3.server.app",
"com.cameleer3.server.core"
})
@EnableAsync
@EnableScheduling
@EnableConfigurationProperties({IngestionConfig.class, AgentRegistryConfig.class})
public class Cameleer3ServerApplication {

View File

@@ -39,7 +39,7 @@ public class AgentLifecycleMonitor {
// Snapshot states before lifecycle check
Map<String, AgentState> statesBefore = new HashMap<>();
for (AgentInfo agent : registryService.findAll()) {
statesBefore.put(agent.id(), agent.state());
statesBefore.put(agent.instanceId(), agent.state());
}
registryService.checkLifecycle();
@@ -47,12 +47,12 @@ public class AgentLifecycleMonitor {
// Detect transitions and record events
for (AgentInfo agent : registryService.findAll()) {
AgentState before = statesBefore.get(agent.id());
AgentState before = statesBefore.get(agent.instanceId());
if (before != null && before != agent.state()) {
String eventType = mapTransitionEvent(before, agent.state());
if (eventType != null) {
agentEventService.recordEvent(agent.id(), agent.application(), eventType,
agent.name() + " " + before + " -> " + agent.state());
agentEventService.recordEvent(agent.instanceId(), agent.applicationId(), eventType,
agent.displayName() + " " + before + " -> " + agent.state());
}
}
}

View File

@@ -0,0 +1,26 @@
package com.cameleer3.server.app.analytics;
import com.cameleer3.server.app.storage.ClickHouseUsageTracker;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.scheduling.annotation.Scheduled;
public class UsageFlushScheduler {
private static final Logger log = LoggerFactory.getLogger(UsageFlushScheduler.class);
private final ClickHouseUsageTracker tracker;
public UsageFlushScheduler(ClickHouseUsageTracker tracker) {
this.tracker = tracker;
}
@Scheduled(fixedDelayString = "${cameleer.usage.flush-interval-ms:5000}")
public void flush() {
try {
tracker.flush();
} catch (Exception e) {
log.warn("Usage event flush failed: {}", e.getMessage());
}
}
}

View File

@@ -0,0 +1,88 @@
package com.cameleer3.server.app.analytics;
import com.cameleer3.server.core.analytics.UsageEvent;
import com.cameleer3.server.core.analytics.UsageTracker;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.security.core.Authentication;
import org.springframework.security.core.context.SecurityContextHolder;
import org.springframework.web.servlet.HandlerInterceptor;
import java.time.Instant;
import java.util.regex.Pattern;
/**
* Tracks authenticated UI user requests for usage analytics.
* Skips agent requests, health checks, data ingestion, and static assets.
*/
public class UsageTrackingInterceptor implements HandlerInterceptor {
private static final String START_ATTR = "usage.startNanos";
// Patterns for normalizing dynamic path segments
private static final Pattern EXCHANGE_ID = Pattern.compile(
"/[A-F0-9]{15,}-[A-F0-9]{16}(?=/|$)", Pattern.CASE_INSENSITIVE);
private static final Pattern UUID = Pattern.compile(
"/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}(?=/|$)", Pattern.CASE_INSENSITIVE);
private static final Pattern HEX_HASH = Pattern.compile(
"/[0-9a-f]{32,64}(?=/|$)", Pattern.CASE_INSENSITIVE);
private static final Pattern NUMERIC_ID = Pattern.compile(
"(?<=/)(\\d{2,})(?=/|$)");
// Agent instance IDs like "cameleer3-sample-598867949d-g7nt4-1"
private static final Pattern INSTANCE_ID = Pattern.compile(
"(?<=/agents/)[^/]+(?=/)", Pattern.CASE_INSENSITIVE);
private final UsageTracker usageTracker;
public UsageTrackingInterceptor(UsageTracker usageTracker) {
this.usageTracker = usageTracker;
}
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
request.setAttribute(START_ATTR, System.nanoTime());
return true;
}
@Override
public void afterCompletion(HttpServletRequest request, HttpServletResponse response,
Object handler, Exception ex) {
String username = extractUsername();
if (username == null) return; // unauthenticated or agent request
Long startNanos = (Long) request.getAttribute(START_ATTR);
long durationMs = startNanos != null ? (System.nanoTime() - startNanos) / 1_000_000 : 0;
String path = request.getRequestURI();
String queryString = request.getQueryString();
usageTracker.track(new UsageEvent(
Instant.now(),
username,
request.getMethod(),
path,
normalizePath(path),
response.getStatus(),
durationMs,
queryString
));
}
private String extractUsername() {
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth == null || auth.getName() == null) return null;
String name = auth.getName();
// Only track UI users (user:admin), not agents
if (!name.startsWith("user:")) return null;
return name;
}
static String normalizePath(String path) {
String normalized = EXCHANGE_ID.matcher(path).replaceAll("/{id}");
normalized = UUID.matcher(normalized).replaceAll("/{id}");
normalized = HEX_HASH.matcher(normalized).replaceAll("/{hash}");
normalized = INSTANCE_ID.matcher(normalized).replaceAll("{id}");
normalized = NUMERIC_ID.matcher(normalized).replaceAll("{id}");
return normalized;
}
}

View File

@@ -3,11 +3,13 @@ package com.cameleer3.server.app.config;
import com.cameleer3.server.core.agent.AgentEventRepository;
import com.cameleer3.server.core.agent.AgentEventService;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.agent.RouteStateRegistry;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
/**
* Creates the {@link AgentRegistryService} and {@link AgentEventService} beans.
* Creates the {@link AgentRegistryService}, {@link AgentEventService},
* and {@link RouteStateRegistry} beans.
* <p>
* Follows the established pattern: core module plain class, app module bean config.
*/
@@ -27,4 +29,9 @@ public class AgentRegistryBeanConfig {
public AgentEventService agentEventService(AgentEventRepository repository) {
return new AgentEventService(repository);
}
@Bean
public RouteStateRegistry routeStateRegistry() {
return new RouteStateRegistry();
}
}

View File

@@ -0,0 +1,54 @@
package com.cameleer3.server.app.config;
import com.zaxxer.hikari.HikariDataSource;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.boot.autoconfigure.jdbc.DataSourceProperties;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;
import org.springframework.jdbc.core.JdbcTemplate;
import javax.sql.DataSource;
@Configuration
@EnableConfigurationProperties(ClickHouseProperties.class)
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public class ClickHouseConfig {
/**
* Explicit primary PG DataSource. Required because adding a second DataSource
* (ClickHouse) prevents Spring Boot auto-configuration from creating the default one.
*/
@Bean
@Primary
public DataSource dataSource(DataSourceProperties properties) {
return properties.initializeDataSourceBuilder().build();
}
@Bean
@Primary
public JdbcTemplate jdbcTemplate(@Qualifier("dataSource") DataSource dataSource) {
return new JdbcTemplate(dataSource);
}
@Bean(name = "clickHouseDataSource")
public DataSource clickHouseDataSource(ClickHouseProperties props) {
HikariDataSource ds = new HikariDataSource();
ds.setJdbcUrl(props.getUrl());
ds.setUsername(props.getUsername());
ds.setPassword(props.getPassword());
ds.setMaximumPoolSize(props.getPoolSize());
ds.setMinimumIdle(5);
ds.setConnectionTimeout(5000);
ds.setPoolName("clickhouse-pool");
return ds;
}
@Bean(name = "clickHouseJdbcTemplate")
public JdbcTemplate clickHouseJdbcTemplate(
@Qualifier("clickHouseDataSource") DataSource ds) {
return new JdbcTemplate(ds);
}
}

View File

@@ -0,0 +1,24 @@
package com.cameleer3.server.app.config;
import org.springframework.boot.context.properties.ConfigurationProperties;
@ConfigurationProperties(prefix = "clickhouse")
public class ClickHouseProperties {
private String url = "jdbc:clickhouse://localhost:8123/cameleer";
private String username = "default";
private String password = "";
private int poolSize = 50;
public String getUrl() { return url; }
public void setUrl(String url) { this.url = url; }
public String getUsername() { return username; }
public void setUsername(String username) { this.username = username; }
public String getPassword() { return password; }
public void setPassword(String password) { this.password = password; }
public int getPoolSize() { return poolSize; }
public void setPoolSize(int poolSize) { this.poolSize = poolSize; }
}

View File

@@ -0,0 +1,55 @@
package com.cameleer3.server.app.config;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.boot.context.event.ApplicationReadyEvent;
import org.springframework.context.event.EventListener;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.PathMatchingResourcePatternResolver;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Component;
import java.nio.charset.StandardCharsets;
@Component
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public class ClickHouseSchemaInitializer {
private static final Logger log = LoggerFactory.getLogger(ClickHouseSchemaInitializer.class);
private final JdbcTemplate clickHouseJdbc;
public ClickHouseSchemaInitializer(
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
this.clickHouseJdbc = clickHouseJdbc;
}
@EventListener(ApplicationReadyEvent.class)
public void initializeSchema() {
try {
PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
Resource script = resolver.getResource("classpath:clickhouse/init.sql");
String sql = script.getContentAsString(StandardCharsets.UTF_8);
log.info("Executing ClickHouse schema: {}", script.getFilename());
for (String statement : sql.split(";")) {
String trimmed = statement.trim();
// Skip empty segments and comment-only segments
String withoutComments = trimmed.lines()
.filter(line -> !line.stripLeading().startsWith("--"))
.map(String::trim)
.filter(line -> !line.isEmpty())
.reduce("", (a, b) -> a + b);
if (!withoutComments.isEmpty()) {
clickHouseJdbc.execute(trimmed);
}
}
log.info("ClickHouse schema initialization complete");
} catch (Exception e) {
log.error("ClickHouse schema initialization failed — server will continue but ClickHouse features may not work", e);
}
}
}

View File

@@ -1,7 +1,11 @@
package com.cameleer3.server.app.config;
import com.cameleer3.server.core.ingestion.BufferedLogEntry;
import com.cameleer3.server.core.ingestion.ChunkAccumulator;
import com.cameleer3.server.core.ingestion.MergedExecution;
import com.cameleer3.server.core.ingestion.WriteBuffer;
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@@ -19,4 +23,22 @@ public class IngestionBeanConfig {
public WriteBuffer<MetricsSnapshot> metricsBuffer(IngestionConfig config) {
return new WriteBuffer<>(config.getBufferCapacity());
}
@Bean
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public WriteBuffer<MergedExecution> executionBuffer(IngestionConfig config) {
return new WriteBuffer<>(config.getBufferCapacity());
}
@Bean
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public WriteBuffer<ChunkAccumulator.ProcessorBatch> processorBatchBuffer(IngestionConfig config) {
return new WriteBuffer<>(config.getBufferCapacity());
}
@Bean
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public WriteBuffer<BufferedLogEntry> logBuffer(IngestionConfig config) {
return new WriteBuffer<>(config.getBufferCapacity());
}
}

View File

@@ -0,0 +1,68 @@
package com.cameleer3.server.app.config;
import com.cameleer3.server.core.license.LicenseGate;
import com.cameleer3.server.core.license.LicenseInfo;
import com.cameleer3.server.core.license.LicenseValidator;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.nio.file.Files;
import java.nio.file.Path;
@Configuration
public class LicenseBeanConfig {
private static final Logger log = LoggerFactory.getLogger(LicenseBeanConfig.class);
@Value("${license.token:}")
private String licenseToken;
@Value("${license.file:}")
private String licenseFile;
@Value("${license.public-key:}")
private String licensePublicKey;
@Bean
public LicenseGate licenseGate() {
LicenseGate gate = new LicenseGate();
String token = resolveLicenseToken();
if (token == null || token.isBlank()) {
log.info("No license configured — running in open mode (all features enabled)");
return gate;
}
if (licensePublicKey == null || licensePublicKey.isBlank()) {
log.warn("License token provided but no public key configured (CAMELEER_LICENSE_PUBLIC_KEY). Running in open mode.");
return gate;
}
try {
LicenseValidator validator = new LicenseValidator(licensePublicKey);
LicenseInfo info = validator.validate(token);
gate.load(info);
} catch (Exception e) {
log.error("Failed to validate license: {}. Running in open mode.", e.getMessage());
}
return gate;
}
private String resolveLicenseToken() {
if (licenseToken != null && !licenseToken.isBlank()) {
return licenseToken;
}
if (licenseFile != null && !licenseFile.isBlank()) {
try {
return Files.readString(Path.of(licenseFile)).trim();
} catch (Exception e) {
log.warn("Failed to read license file {}: {}", licenseFile, e.getMessage());
}
}
return null;
}
}

View File

@@ -1,28 +0,0 @@
package com.cameleer3.server.app.config;
import org.apache.http.HttpHost;
import org.opensearch.client.RestClient;
import org.opensearch.client.json.jackson.JacksonJsonpMapper;
import org.opensearch.client.opensearch.OpenSearchClient;
import org.opensearch.client.transport.rest_client.RestClientTransport;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class OpenSearchConfig {
@Value("${opensearch.url:http://localhost:9200}")
private String opensearchUrl;
@Bean(destroyMethod = "close")
public RestClient opensearchRestClient() {
return RestClient.builder(HttpHost.create(opensearchUrl)).build();
}
@Bean
public OpenSearchClient openSearchClient(RestClient restClient) {
var transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
return new OpenSearchClient(transport);
}
}

View File

@@ -0,0 +1,27 @@
package com.cameleer3.server.app.config;
import com.cameleer3.server.app.storage.PostgresClaimMappingRepository;
import com.cameleer3.server.core.rbac.ClaimMappingRepository;
import com.cameleer3.server.core.rbac.ClaimMappingService;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
/**
* Creates the {@link ClaimMappingRepository} and {@link ClaimMappingService} beans.
* <p>
* Follows the established pattern: core module plain class, app module bean config.
*/
@Configuration
public class RbacBeanConfig {
@Bean
public ClaimMappingRepository claimMappingRepository(JdbcTemplate jdbcTemplate) {
return new PostgresClaimMappingRepository(jdbcTemplate);
}
@Bean
public ClaimMappingService claimMappingService() {
return new ClaimMappingService();
}
}

View File

@@ -0,0 +1,77 @@
package com.cameleer3.server.app.config;
import com.cameleer3.server.app.storage.PostgresAppRepository;
import com.cameleer3.server.app.storage.PostgresAppVersionRepository;
import com.cameleer3.server.app.storage.PostgresDeploymentRepository;
import com.cameleer3.server.app.storage.PostgresEnvironmentRepository;
import com.cameleer3.server.core.runtime.AppRepository;
import com.cameleer3.server.core.runtime.AppService;
import com.cameleer3.server.core.runtime.AppVersionRepository;
import com.cameleer3.server.core.runtime.DeploymentRepository;
import com.cameleer3.server.core.runtime.DeploymentService;
import com.cameleer3.server.core.runtime.EnvironmentRepository;
import com.cameleer3.server.core.runtime.EnvironmentService;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import java.util.concurrent.Executor;
/**
* Creates runtime management beans: repositories, services, and async executor.
* <p>
* Follows the established pattern: core module plain class, app module bean config.
*/
@Configuration
public class RuntimeBeanConfig {
@Bean
public EnvironmentRepository environmentRepository(JdbcTemplate jdbc, ObjectMapper objectMapper) {
return new PostgresEnvironmentRepository(jdbc, objectMapper);
}
@Bean
public AppRepository appRepository(JdbcTemplate jdbc, ObjectMapper objectMapper) {
return new PostgresAppRepository(jdbc, objectMapper);
}
@Bean
public AppVersionRepository appVersionRepository(JdbcTemplate jdbc) {
return new PostgresAppVersionRepository(jdbc);
}
@Bean
public DeploymentRepository deploymentRepository(JdbcTemplate jdbc, ObjectMapper objectMapper) {
return new PostgresDeploymentRepository(jdbc, objectMapper);
}
@Bean
public EnvironmentService environmentService(EnvironmentRepository repo) {
return new EnvironmentService(repo);
}
@Bean
public AppService appService(AppRepository appRepo, AppVersionRepository versionRepo,
@Value("${cameleer.runtime.jar-storage-path:/data/jars}") String jarStoragePath) {
return new AppService(appRepo, versionRepo, jarStoragePath);
}
@Bean
public DeploymentService deploymentService(DeploymentRepository deployRepo, AppService appService, EnvironmentService envService) {
return new DeploymentService(deployRepo, appService, envService);
}
@Bean(name = "deploymentTaskExecutor")
public Executor deploymentTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(4);
executor.setMaxPoolSize(4);
executor.setQueueCapacity(25);
executor.setThreadNamePrefix("deploy-");
executor.initialize();
return executor;
}
}

View File

@@ -1,16 +1,37 @@
package com.cameleer3.server.app.config;
import com.cameleer3.server.app.search.ClickHouseLogStore;
import com.cameleer3.server.app.storage.ClickHouseAgentEventRepository;
import com.cameleer3.server.app.storage.ClickHouseUsageTracker;
import com.cameleer3.server.app.storage.ClickHouseDiagramStore;
import com.cameleer3.server.app.storage.ClickHouseMetricsQueryStore;
import com.cameleer3.server.app.storage.ClickHouseMetricsStore;
import com.cameleer3.server.app.storage.ClickHouseStatsStore;
import com.cameleer3.server.core.admin.AuditRepository;
import com.cameleer3.server.core.admin.AuditService;
import com.cameleer3.server.core.agent.AgentEventRepository;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.detail.DetailService;
import com.cameleer3.server.core.indexing.SearchIndexer;
import com.cameleer3.server.app.ingestion.ExecutionFlushScheduler;
import com.cameleer3.server.app.search.ClickHouseSearchIndex;
import com.cameleer3.server.app.storage.ClickHouseExecutionStore;
import com.cameleer3.server.core.ingestion.BufferedLogEntry;
import com.cameleer3.server.core.ingestion.ChunkAccumulator;
import com.cameleer3.server.core.ingestion.IngestionService;
import com.cameleer3.server.core.ingestion.MergedExecution;
import com.cameleer3.server.core.ingestion.WriteBuffer;
import com.cameleer3.server.core.storage.*;
import com.cameleer3.server.core.storage.LogIndex;
import com.cameleer3.server.core.storage.StatsStore;
import com.cameleer3.server.core.storage.model.MetricsSnapshot;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
@Configuration
public class StorageBeanConfig {
@@ -22,8 +43,8 @@ public class StorageBeanConfig {
@Bean(destroyMethod = "shutdown")
public SearchIndexer searchIndexer(ExecutionStore executionStore, SearchIndex searchIndex,
@Value("${opensearch.debounce-ms:2000}") long debounceMs,
@Value("${opensearch.queue-size:10000}") int queueSize) {
@Value("${cameleer.indexer.debounce-ms:2000}") long debounceMs,
@Value("${cameleer.indexer.queue-size:10000}") int queueSize) {
return new SearchIndexer(executionStore, searchIndex, debounceMs, queueSize);
}
@@ -41,4 +62,128 @@ public class StorageBeanConfig {
return new IngestionService(executionStore, diagramStore, metricsBuffer,
searchIndexer::onExecutionUpdated, bodySizeLimit);
}
@Bean
public MetricsStore clickHouseMetricsStore(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseMetricsStore(tenantProperties.getId(), clickHouseJdbc);
}
@Bean
public MetricsQueryStore clickHouseMetricsQueryStore(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseMetricsQueryStore(tenantProperties.getId(), clickHouseJdbc);
}
// ── Execution Store ──────────────────────────────────────────────────
@Bean
public ClickHouseExecutionStore clickHouseExecutionStore(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseExecutionStore(tenantProperties.getId(), clickHouseJdbc);
}
@Bean
public ChunkAccumulator chunkAccumulator(
TenantProperties tenantProperties,
WriteBuffer<MergedExecution> executionBuffer,
WriteBuffer<ChunkAccumulator.ProcessorBatch> processorBatchBuffer,
DiagramStore diagramStore,
AgentRegistryService registryService) {
return new ChunkAccumulator(
tenantProperties.getId(),
executionBuffer::offerOrWarn,
processorBatchBuffer::offerOrWarn,
diagramStore,
java.time.Duration.ofMinutes(5),
instanceId -> {
AgentInfo agent = registryService.findById(instanceId);
return agent != null && agent.environmentId() != null
? agent.environmentId() : "default";
});
}
@Bean
public ExecutionFlushScheduler executionFlushScheduler(
WriteBuffer<MergedExecution> executionBuffer,
WriteBuffer<ChunkAccumulator.ProcessorBatch> processorBatchBuffer,
WriteBuffer<BufferedLogEntry> logBuffer,
ClickHouseExecutionStore executionStore,
ClickHouseLogStore logStore,
ChunkAccumulator accumulator,
IngestionConfig config) {
return new ExecutionFlushScheduler(executionBuffer, processorBatchBuffer,
logBuffer, executionStore, logStore, accumulator, config);
}
@Bean
public SearchIndex clickHouseSearchIndex(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseSearchIndex(tenantProperties.getId(), clickHouseJdbc);
}
// ── ClickHouse Stats Store ─────────────────────────────────────────
@Bean
public StatsStore clickHouseStatsStore(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseStatsStore(tenantProperties.getId(), clickHouseJdbc);
}
// ── ClickHouse Diagram Store ──────────────────────────────────────
@Bean
public DiagramStore clickHouseDiagramStore(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseDiagramStore(tenantProperties.getId(), clickHouseJdbc);
}
// ── ClickHouse Agent Event Repository ─────────────────────────────
@Bean
public AgentEventRepository clickHouseAgentEventRepository(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseAgentEventRepository(tenantProperties.getId(), clickHouseJdbc);
}
// ── ClickHouse Log Store ──────────────────────────────────────────
@Bean
public ClickHouseLogStore clickHouseLogStore(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseLogStore(tenantProperties.getId(), clickHouseJdbc);
}
// ── Usage Analytics ──────────────────────────────────────────────
@Bean
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public ClickHouseUsageTracker clickHouseUsageTracker(
TenantProperties tenantProperties,
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc) {
return new ClickHouseUsageTracker(tenantProperties.getId(), clickHouseJdbc,
new com.cameleer3.server.core.ingestion.WriteBuffer<>(5000));
}
@Bean
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public com.cameleer3.server.app.analytics.UsageTrackingInterceptor usageTrackingInterceptor(
ClickHouseUsageTracker usageTracker) {
return new com.cameleer3.server.app.analytics.UsageTrackingInterceptor(usageTracker);
}
@Bean
@ConditionalOnProperty(name = "clickhouse.enabled", havingValue = "true")
public com.cameleer3.server.app.analytics.UsageFlushScheduler usageFlushScheduler(
ClickHouseUsageTracker usageTracker) {
return new com.cameleer3.server.app.analytics.UsageFlushScheduler(usageTracker);
}
}

View File

@@ -0,0 +1,19 @@
package com.cameleer3.server.app.config;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
@Component
@ConfigurationProperties(prefix = "cameleer.tenant")
public class TenantProperties {
private String id = "default";
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
}

View File

@@ -1,5 +1,6 @@
package com.cameleer3.server.app.config;
import com.cameleer3.server.app.analytics.UsageTrackingInterceptor;
import com.cameleer3.server.app.interceptor.AuditInterceptor;
import com.cameleer3.server.app.interceptor.ProtocolVersionInterceptor;
import org.springframework.context.annotation.Configuration;
@@ -14,11 +15,14 @@ public class WebConfig implements WebMvcConfigurer {
private final ProtocolVersionInterceptor protocolVersionInterceptor;
private final AuditInterceptor auditInterceptor;
private final UsageTrackingInterceptor usageTrackingInterceptor;
public WebConfig(ProtocolVersionInterceptor protocolVersionInterceptor,
AuditInterceptor auditInterceptor) {
AuditInterceptor auditInterceptor,
@org.springframework.lang.Nullable UsageTrackingInterceptor usageTrackingInterceptor) {
this.protocolVersionInterceptor = protocolVersionInterceptor;
this.auditInterceptor = auditInterceptor;
this.usageTrackingInterceptor = usageTrackingInterceptor;
}
@Override
@@ -35,6 +39,18 @@ public class WebConfig implements WebMvcConfigurer {
"/api/v1/agents/*/refresh"
);
// Usage analytics: tracks authenticated UI user requests
if (usageTrackingInterceptor != null) {
registry.addInterceptor(usageTrackingInterceptor)
.addPathPatterns("/api/v1/**")
.excludePathPatterns(
"/api/v1/data/**",
"/api/v1/agents/*/heartbeat",
"/api/v1/agents/*/events",
"/api/v1/health"
);
}
// Safety-net audit: catches any unaudited POST/PUT/DELETE
registry.addInterceptor(auditInterceptor)
.addPathPatterns("/api/v1/**")

View File

@@ -3,6 +3,7 @@ package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.agent.SseConnectionManager;
import com.cameleer3.server.app.dto.CommandAckRequest;
import com.cameleer3.server.app.dto.CommandBroadcastResponse;
import com.cameleer3.server.app.dto.CommandGroupResponse;
import com.cameleer3.server.app.dto.CommandRequest;
import com.cameleer3.server.app.dto.CommandSingleResponse;
import com.cameleer3.server.app.dto.ReplayRequest;
@@ -31,6 +32,7 @@ import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
@@ -109,32 +111,61 @@ public class AgentCommandController {
@PostMapping("/groups/{group}/commands")
@Operation(summary = "Send command to all agents in a group",
description = "Sends a command to all LIVE agents in the specified group")
@ApiResponse(responseCode = "202", description = "Commands accepted")
description = "Sends a command to all LIVE agents in the specified group and waits for responses")
@ApiResponse(responseCode = "200", description = "Commands dispatched and responses collected")
@ApiResponse(responseCode = "400", description = "Invalid command payload")
public ResponseEntity<CommandBroadcastResponse> sendGroupCommand(@PathVariable String group,
@RequestBody CommandRequest request,
HttpServletRequest httpRequest) throws JsonProcessingException {
public ResponseEntity<CommandGroupResponse> sendGroupCommand(@PathVariable String group,
@RequestParam(required = false) String environment,
@RequestBody CommandRequest request,
HttpServletRequest httpRequest) throws JsonProcessingException {
CommandType type = mapCommandType(request.type());
String payloadJson = request.payload() != null ? objectMapper.writeValueAsString(request.payload()) : "{}";
List<AgentInfo> agents = registryService.findAll().stream()
.filter(a -> a.state() == AgentState.LIVE)
.filter(a -> group.equals(a.application()))
.toList();
Map<String, CompletableFuture<CommandReply>> futures =
registryService.addGroupCommandWithReplies(group, environment, type, payloadJson);
List<String> commandIds = new ArrayList<>();
for (AgentInfo agent : agents) {
AgentCommand command = registryService.addCommand(agent.id(), type, payloadJson);
commandIds.add(command.id());
if (futures.isEmpty()) {
auditService.log("broadcast_group_command", AuditCategory.AGENT, group,
java.util.Map.of("type", request.type(), "agentCount", 0),
AuditResult.SUCCESS, httpRequest);
return ResponseEntity.ok(new CommandGroupResponse(true, 0, 0, List.of(), List.of()));
}
// Wait with shared 10-second deadline
long deadline = System.currentTimeMillis() + 10_000;
List<CommandGroupResponse.AgentResponse> responses = new ArrayList<>();
List<String> timedOut = new ArrayList<>();
for (var entry : futures.entrySet()) {
long remaining = deadline - System.currentTimeMillis();
if (remaining <= 0) {
timedOut.add(entry.getKey());
entry.getValue().cancel(false);
continue;
}
try {
CommandReply reply = entry.getValue().get(remaining, TimeUnit.MILLISECONDS);
responses.add(new CommandGroupResponse.AgentResponse(
entry.getKey(), reply.status(), reply.message()));
} catch (TimeoutException e) {
timedOut.add(entry.getKey());
entry.getValue().cancel(false);
} catch (Exception e) {
responses.add(new CommandGroupResponse.AgentResponse(
entry.getKey(), "ERROR", e.getMessage()));
}
}
boolean allSuccess = timedOut.isEmpty() &&
responses.stream().allMatch(r -> "SUCCESS".equals(r.status()));
auditService.log("broadcast_group_command", AuditCategory.AGENT, group,
java.util.Map.of("type", request.type(), "agentCount", agents.size()),
java.util.Map.of("type", request.type(), "agentCount", futures.size(),
"responded", responses.size(), "timedOut", timedOut.size()),
AuditResult.SUCCESS, httpRequest);
return ResponseEntity.status(HttpStatus.ACCEPTED)
.body(new CommandBroadcastResponse(commandIds, agents.size()));
return ResponseEntity.ok(new CommandGroupResponse(
allSuccess, futures.size(), responses.size(), responses, timedOut));
}
@PostMapping("/commands")
@@ -142,16 +173,22 @@ public class AgentCommandController {
description = "Sends a command to all agents currently in LIVE state")
@ApiResponse(responseCode = "202", description = "Commands accepted")
@ApiResponse(responseCode = "400", description = "Invalid command payload")
public ResponseEntity<CommandBroadcastResponse> broadcastCommand(@RequestBody CommandRequest request,
public ResponseEntity<CommandBroadcastResponse> broadcastCommand(@RequestParam(required = false) String environment,
@RequestBody CommandRequest request,
HttpServletRequest httpRequest) throws JsonProcessingException {
CommandType type = mapCommandType(request.type());
String payloadJson = request.payload() != null ? objectMapper.writeValueAsString(request.payload()) : "{}";
List<AgentInfo> liveAgents = registryService.findByState(AgentState.LIVE);
if (environment != null) {
liveAgents = liveAgents.stream()
.filter(a -> environment.equals(a.environmentId()))
.toList();
}
List<String> commandIds = new ArrayList<>();
for (AgentInfo agent : liveAgents) {
AgentCommand command = registryService.addCommand(agent.id(), type, payloadJson);
AgentCommand command = registryService.addCommand(agent.instanceId(), type, payloadJson);
commandIds.add(command.id());
}
@@ -185,7 +222,7 @@ public class AgentCommandController {
// Record command result in agent event log
if (body != null && body.status() != null) {
AgentInfo agent = registryService.findById(id);
String application = agent != null ? agent.application() : "unknown";
String application = agent != null ? agent.applicationId() : "unknown";
agentEventService.recordEvent(id, application, "COMMAND_" + body.status(),
"Command " + commandId + ": " + body.message());
log.debug("Command {} ack from agent {}: {} - {}", commandId, id, body.status(), body.message());

View File

@@ -32,6 +32,7 @@ public class AgentEventsController {
public ResponseEntity<List<AgentEventResponse>> getEvents(
@RequestParam(required = false) String appId,
@RequestParam(required = false) String agentId,
@RequestParam(required = false) String environment,
@RequestParam(required = false) String from,
@RequestParam(required = false) String to,
@RequestParam(defaultValue = "50") int limit) {
@@ -39,7 +40,7 @@ public class AgentEventsController {
Instant fromInstant = from != null ? Instant.parse(from) : null;
Instant toInstant = to != null ? Instant.parse(to) : null;
var events = agentEventService.queryEvents(appId, agentId, fromInstant, toInstant, limit)
var events = agentEventService.queryEvents(appId, agentId, environment, fromInstant, toInstant, limit)
.stream()
.map(AgentEventResponse::from)
.toList();

View File

@@ -2,22 +2,23 @@ package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.dto.AgentMetricsResponse;
import com.cameleer3.server.app.dto.MetricBucket;
import org.springframework.jdbc.core.JdbcTemplate;
import com.cameleer3.server.core.storage.MetricsQueryStore;
import com.cameleer3.server.core.storage.model.MetricTimeSeries;
import org.springframework.web.bind.annotation.*;
import java.sql.Timestamp;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.*;
import java.util.stream.Collectors;
@RestController
@RequestMapping("/api/v1/agents/{agentId}/metrics")
public class AgentMetricsController {
private final JdbcTemplate jdbc;
private final MetricsQueryStore metricsQueryStore;
public AgentMetricsController(JdbcTemplate jdbc) {
this.jdbc = jdbc;
public AgentMetricsController(MetricsQueryStore metricsQueryStore) {
this.metricsQueryStore = metricsQueryStore;
}
@GetMapping
@@ -32,34 +33,18 @@ public class AgentMetricsController {
if (to == null) to = Instant.now();
List<String> metricNames = Arrays.asList(names.split(","));
long intervalMs = (to.toEpochMilli() - from.toEpochMilli()) / Math.max(buckets, 1);
String intervalStr = intervalMs + " milliseconds";
Map<String, List<MetricBucket>> result = new LinkedHashMap<>();
for (String name : metricNames) {
result.put(name.trim(), new ArrayList<>());
}
Map<String, List<MetricTimeSeries.Bucket>> raw =
metricsQueryStore.queryTimeSeries(agentId, metricNames, from, to, buckets);
String sql = """
SELECT time_bucket(CAST(? AS interval), collected_at) AS bucket,
metric_name,
AVG(metric_value) AS avg_value
FROM agent_metrics
WHERE agent_id = ?
AND collected_at >= ? AND collected_at < ?
AND metric_name = ANY(?)
GROUP BY bucket, metric_name
ORDER BY bucket
""";
String[] namesArray = metricNames.stream().map(String::trim).toArray(String[]::new);
jdbc.query(sql, rs -> {
String metricName = rs.getString("metric_name");
Instant bucket = rs.getTimestamp("bucket").toInstant();
double value = rs.getDouble("avg_value");
result.computeIfAbsent(metricName, k -> new ArrayList<>())
.add(new MetricBucket(bucket, value));
}, intervalStr, agentId, Timestamp.from(from), Timestamp.from(to), namesArray);
Map<String, List<MetricBucket>> result = raw.entrySet().stream()
.collect(Collectors.toMap(
Map.Entry::getKey,
e -> e.getValue().stream()
.map(b -> new MetricBucket(b.time(), b.value()))
.toList(),
(a, b) -> a,
LinkedHashMap::new));
return new AgentMetricsResponse(result);
}

View File

@@ -7,7 +7,9 @@ import com.cameleer3.server.app.dto.AgentRefreshResponse;
import com.cameleer3.server.app.dto.AgentRegistrationRequest;
import com.cameleer3.server.app.dto.AgentRegistrationResponse;
import com.cameleer3.server.app.dto.ErrorResponse;
import com.cameleer3.common.model.HeartbeatRequest;
import com.cameleer3.server.app.security.BootstrapTokenValidator;
import com.cameleer3.server.app.security.JwtAuthenticationFilter;
import com.cameleer3.server.core.admin.AuditCategory;
import com.cameleer3.server.core.admin.AuditResult;
import com.cameleer3.server.core.admin.AuditService;
@@ -15,6 +17,7 @@ import com.cameleer3.server.core.agent.AgentEventService;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.agent.AgentState;
import com.cameleer3.server.core.agent.RouteStateRegistry;
import com.cameleer3.server.core.security.Ed25519SigningService;
import com.cameleer3.server.core.security.InvalidTokenException;
import com.cameleer3.server.core.security.JwtService;
@@ -25,6 +28,7 @@ import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.servlet.http.HttpServletRequest;
import org.slf4j.Logger;
import org.springframework.web.servlet.support.ServletUriComponentsBuilder;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
import org.springframework.jdbc.core.JdbcTemplate;
@@ -63,6 +67,7 @@ public class AgentRegistrationController {
private final AgentEventService agentEventService;
private final AuditService auditService;
private final JdbcTemplate jdbc;
private final RouteStateRegistry routeStateRegistry;
public AgentRegistrationController(AgentRegistryService registryService,
AgentRegistryConfig config,
@@ -71,7 +76,8 @@ public class AgentRegistrationController {
Ed25519SigningService ed25519SigningService,
AgentEventService agentEventService,
AuditService auditService,
JdbcTemplate jdbc) {
@org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc,
RouteStateRegistry routeStateRegistry) {
this.registryService = registryService;
this.config = config;
this.bootstrapTokenValidator = bootstrapTokenValidator;
@@ -80,6 +86,7 @@ public class AgentRegistrationController {
this.agentEventService = agentEventService;
this.auditService = auditService;
this.jdbc = jdbc;
this.routeStateRegistry = routeStateRegistry;
}
@PostMapping("/register")
@@ -103,34 +110,41 @@ public class AgentRegistrationController {
return ResponseEntity.status(401).build();
}
if (request.agentId() == null || request.agentId().isBlank()
|| request.name() == null || request.name().isBlank()) {
if (request.instanceId() == null || request.instanceId().isBlank()
|| request.displayName() == null || request.displayName().isBlank()) {
return ResponseEntity.badRequest().build();
}
String application = request.application() != null ? request.application() : "default";
String application = request.applicationId() != null ? request.applicationId() : "default";
String environmentId = request.environmentId() != null ? request.environmentId() : "default";
List<String> routeIds = request.routeIds() != null ? request.routeIds() : List.of();
var capabilities = request.capabilities() != null ? request.capabilities() : Collections.<String, Object>emptyMap();
AgentInfo agent = registryService.register(
request.agentId(), request.name(), application, request.version(), routeIds, capabilities);
log.info("Agent registered: {} (name={}, application={})", request.agentId(), request.name(), application);
request.instanceId(), request.displayName(), application, environmentId,
request.version(), routeIds, capabilities);
log.info("Agent registered: {} (name={}, application={})", request.instanceId(), request.displayName(), application);
agentEventService.recordEvent(request.agentId(), application, "REGISTERED",
"Agent registered: " + request.name());
agentEventService.recordEvent(request.instanceId(), application, "REGISTERED",
"Agent registered: " + request.displayName());
auditService.log(request.agentId(), "agent_register", AuditCategory.AGENT, request.agentId(),
Map.of("application", application, "name", request.name()),
auditService.log(request.instanceId(), "agent_register", AuditCategory.AGENT, request.instanceId(),
Map.of("application", application, "name", request.displayName()),
AuditResult.SUCCESS, httpRequest);
// Issue JWT tokens with AGENT role
// Issue JWT tokens with AGENT role + environment
List<String> roles = List.of("AGENT");
String accessToken = jwtService.createAccessToken(request.agentId(), application, roles);
String refreshToken = jwtService.createRefreshToken(request.agentId(), application, roles);
String accessToken = jwtService.createAccessToken(request.instanceId(), application, environmentId, roles);
String refreshToken = jwtService.createRefreshToken(request.instanceId(), application, environmentId, roles);
String sseEndpoint = ServletUriComponentsBuilder.fromCurrentContextPath()
.path("/api/v1/agents/{id}/events")
.buildAndExpand(agent.instanceId())
.toUriString();
return ResponseEntity.ok(new AgentRegistrationResponse(
agent.id(),
"/api/v1/agents/" + agent.id() + "/events",
agent.instanceId(),
sseEndpoint,
config.getHeartbeatIntervalMs(),
ed25519SigningService.getPublicKeyBase64(),
accessToken,
@@ -168,17 +182,21 @@ public class AgentRegistrationController {
return ResponseEntity.status(401).build();
}
// Verify agent exists
AgentInfo agent = registryService.findById(agentId);
if (agent == null) {
return ResponseEntity.notFound().build();
}
// Preserve roles from refresh token
// Preserve roles and application from refresh token
List<String> roles = result.roles().isEmpty()
? List.of("AGENT") : result.roles();
String newAccessToken = jwtService.createAccessToken(agentId, agent.application(), roles);
String newRefreshToken = jwtService.createRefreshToken(agentId, agent.application(), roles);
String application = result.application() != null ? result.application() : "default";
// Try to get application + environment from registry (agent may not be registered after server restart)
String environment = result.environment() != null ? result.environment() : "default";
AgentInfo agent = registryService.findById(agentId);
if (agent != null) {
application = agent.applicationId();
environment = agent.environmentId();
}
String newAccessToken = jwtService.createAccessToken(agentId, application, environment, roles);
String newRefreshToken = jwtService.createRefreshToken(agentId, application, environment, roles);
auditService.log(agentId, "agent_token_refresh", AuditCategory.AUTH, agentId,
null, AuditResult.SUCCESS, httpRequest);
@@ -188,14 +206,72 @@ public class AgentRegistrationController {
@PostMapping("/{id}/heartbeat")
@Operation(summary = "Agent heartbeat ping",
description = "Updates the agent's last heartbeat timestamp")
description = "Updates the agent's last heartbeat timestamp. Auto-registers the agent if not in registry (e.g. after server restart).")
@ApiResponse(responseCode = "200", description = "Heartbeat accepted")
@ApiResponse(responseCode = "404", description = "Agent not registered")
public ResponseEntity<Void> heartbeat(@PathVariable String id) {
boolean found = registryService.heartbeat(id);
public ResponseEntity<Void> heartbeat(@PathVariable String id,
@RequestBody(required = false) HeartbeatRequest request,
HttpServletRequest httpRequest) {
Map<String, Object> capabilities = request != null ? request.getCapabilities() : null;
String heartbeatEnv = request != null ? request.getEnvironmentId() : null;
boolean found = registryService.heartbeat(id, capabilities);
if (!found) {
// Auto-heal: re-register agent from heartbeat body + JWT claims after server restart
var jwtResult = (JwtService.JwtValidationResult) httpRequest.getAttribute(
JwtAuthenticationFilter.JWT_RESULT_ATTR);
if (jwtResult != null) {
String application = jwtResult.application() != null ? jwtResult.application() : "default";
// Prefer environment from heartbeat body (most current), fall back to JWT claim
String env = heartbeatEnv != null ? heartbeatEnv
: jwtResult.environment() != null ? jwtResult.environment() : "default";
Map<String, Object> caps = capabilities != null ? capabilities : Map.of();
registryService.register(id, id, application, env, "unknown",
List.of(), caps);
registryService.heartbeat(id);
log.info("Auto-registered agent {} (app={}, env={}) from heartbeat after server restart", id, application, env);
} else {
return ResponseEntity.notFound().build();
}
}
if (request != null && request.getRouteStates() != null && !request.getRouteStates().isEmpty()) {
AgentInfo agent = registryService.findById(id);
if (agent != null) {
for (var entry : request.getRouteStates().entrySet()) {
RouteStateRegistry.RouteState state = parseRouteState(entry.getValue());
if (state != null) {
routeStateRegistry.setState(agent.applicationId(), entry.getKey(), state);
}
}
}
}
return ResponseEntity.ok().build();
}
private RouteStateRegistry.RouteState parseRouteState(String state) {
if (state == null) return null;
return switch (state) {
case "Started" -> RouteStateRegistry.RouteState.STARTED;
case "Stopped" -> RouteStateRegistry.RouteState.STOPPED;
case "Suspended" -> RouteStateRegistry.RouteState.SUSPENDED;
default -> null;
};
}
@PostMapping("/{id}/deregister")
@Operation(summary = "Deregister agent",
description = "Removes the agent from the registry. Called by agents during graceful shutdown.")
@ApiResponse(responseCode = "200", description = "Agent deregistered")
@ApiResponse(responseCode = "404", description = "Agent not registered")
public ResponseEntity<Void> deregister(@PathVariable String id, HttpServletRequest httpRequest) {
AgentInfo agent = registryService.findById(id);
if (agent == null) {
return ResponseEntity.notFound().build();
}
String applicationId = agent.applicationId();
registryService.deregister(id);
agentEventService.recordEvent(id, applicationId, "DEREGISTERED", "Agent deregistered");
auditService.log(id, "agent_deregister", AuditCategory.AGENT, id, null, AuditResult.SUCCESS, httpRequest);
return ResponseEntity.ok().build();
}
@@ -207,7 +283,8 @@ public class AgentRegistrationController {
content = @Content(schema = @Schema(implementation = ErrorResponse.class)))
public ResponseEntity<List<AgentInstanceResponse>> listAgents(
@RequestParam(required = false) String status,
@RequestParam(required = false) String application) {
@RequestParam(required = false) String application,
@RequestParam(required = false) String environment) {
List<AgentInfo> agents;
if (status != null) {
@@ -224,7 +301,14 @@ public class AgentRegistrationController {
// Apply application filter if specified
if (application != null && !application.isBlank()) {
agents = agents.stream()
.filter(a -> application.equals(a.application()))
.filter(a -> application.equals(a.applicationId()))
.toList();
}
// Apply environment filter if specified
if (environment != null && !environment.isBlank()) {
agents = agents.stream()
.filter(a -> environment.equals(a.environmentId()))
.toList();
}
@@ -235,10 +319,10 @@ public class AgentRegistrationController {
List<AgentInstanceResponse> response = finalAgents.stream()
.map(a -> {
AgentInstanceResponse dto = AgentInstanceResponse.from(a);
double[] m = agentMetrics.get(a.application());
double[] m = agentMetrics.get(a.applicationId());
if (m != null) {
long appAgentCount = finalAgents.stream()
.filter(ag -> ag.application().equals(a.application())).count();
.filter(ag -> ag.applicationId().equals(a.applicationId())).count();
double agentTps = appAgentCount > 0 ? m[0] / appAgentCount : 0;
double errorRate = m[1];
int activeRoutes = (int) m[2];
@@ -255,25 +339,33 @@ public class AgentRegistrationController {
Instant now = Instant.now();
Instant from1m = now.minus(1, ChronoUnit.MINUTES);
try {
// Literal SQL — ClickHouse JDBC driver wraps prepared statements in sub-queries
// that strip AggregateFunction column types, breaking -Merge combinators
jdbc.query(
"SELECT application_name, " +
"SUM(total_count) AS total, " +
"SUM(failed_count) AS failed, " +
"SELECT application_id, " +
"countMerge(total_count) AS total, " +
"countIfMerge(failed_count) AS failed, " +
"COUNT(DISTINCT route_id) AS active_routes " +
"FROM stats_1m_route WHERE bucket >= ? AND bucket < ? " +
"GROUP BY application_name",
"FROM stats_1m_route WHERE bucket >= " + lit(from1m) + " AND bucket < " + lit(now) +
" GROUP BY application_id",
rs -> {
long total = rs.getLong("total");
long failed = rs.getLong("failed");
double tps = total / 60.0;
double errorRate = total > 0 ? (double) failed / total : 0.0;
int activeRoutes = rs.getInt("active_routes");
result.put(rs.getString("application_name"), new double[]{tps, errorRate, activeRoutes});
},
Timestamp.from(from1m), Timestamp.from(now));
result.put(rs.getString("application_id"), new double[]{tps, errorRate, activeRoutes});
});
} catch (Exception e) {
log.debug("Could not query agent metrics: {}", e.getMessage());
}
return result;
}
/** Format an Instant as a ClickHouse DateTime literal. */
private static String lit(Instant instant) {
return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
.withZone(java.time.ZoneOffset.UTC)
.format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
}
}

View File

@@ -1,12 +1,15 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.agent.SseConnectionManager;
import com.cameleer3.server.app.security.JwtAuthenticationFilter;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.security.JwtService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.Parameter;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.servlet.http.HttpServletRequest;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.HttpStatus;
@@ -19,6 +22,9 @@ import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;
import java.util.List;
import java.util.Map;
/**
* SSE endpoint for real-time event streaming to agents.
* <p>
@@ -47,15 +53,26 @@ public class AgentSseController {
+ "Commands (config-update, deep-trace, replay) are pushed as events. "
+ "Ping keepalive comments sent every 15 seconds.")
@ApiResponse(responseCode = "200", description = "SSE stream opened")
@ApiResponse(responseCode = "404", description = "Agent not registered")
@ApiResponse(responseCode = "404", description = "Agent not registered and cannot be auto-registered")
public SseEmitter events(
@PathVariable String id,
@Parameter(description = "Last received event ID (no replay, acknowledged only)")
@RequestHeader(value = "Last-Event-ID", required = false) String lastEventId) {
@RequestHeader(value = "Last-Event-ID", required = false) String lastEventId,
HttpServletRequest httpRequest) {
AgentInfo agent = registryService.findById(id);
if (agent == null) {
throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Agent not found: " + id);
// Auto-heal: re-register agent from JWT claims after server restart
var jwtResult = (JwtService.JwtValidationResult) httpRequest.getAttribute(
JwtAuthenticationFilter.JWT_RESULT_ATTR);
if (jwtResult != null) {
String application = jwtResult.application() != null ? jwtResult.application() : "default";
String env = jwtResult.environment() != null ? jwtResult.environment() : "default";
registryService.register(id, id, application, env, "unknown", List.of(), Map.of());
log.info("Auto-registered agent {} (app={}, env={}) from SSE connect after server restart", id, application, env);
} else {
throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Agent not found: " + id);
}
}
if (lastEventId != null) {

View File

@@ -0,0 +1,136 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.core.runtime.App;
import com.cameleer3.server.core.runtime.AppService;
import com.cameleer3.server.core.runtime.AppVersion;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.DeleteMapping;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
import java.util.List;
import java.util.Map;
import java.util.UUID;
/**
* App CRUD and JAR upload endpoints.
* All app-scoped endpoints accept the app slug (not UUID) as path variable.
* Protected by {@code ROLE_OPERATOR} or {@code ROLE_ADMIN}.
*/
@RestController
@RequestMapping("/api/v1/apps")
@Tag(name = "App Management", description = "Application lifecycle and JAR uploads")
@PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')")
public class AppController {
private final AppService appService;
public AppController(AppService appService) {
this.appService = appService;
}
@GetMapping
@Operation(summary = "List apps by environment")
@ApiResponse(responseCode = "200", description = "App list returned")
public ResponseEntity<List<App>> listApps(@RequestParam(required = false) UUID environmentId) {
if (environmentId != null) {
return ResponseEntity.ok(appService.listByEnvironment(environmentId));
}
return ResponseEntity.ok(appService.listAll());
}
@GetMapping("/{appSlug}")
@Operation(summary = "Get app by slug")
@ApiResponse(responseCode = "200", description = "App found")
@ApiResponse(responseCode = "404", description = "App not found")
public ResponseEntity<App> getApp(@PathVariable String appSlug) {
try {
return ResponseEntity.ok(appService.getBySlug(appSlug));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping
@Operation(summary = "Create a new app")
@ApiResponse(responseCode = "201", description = "App created")
@ApiResponse(responseCode = "400", description = "Slug already exists in environment")
public ResponseEntity<App> createApp(@RequestBody CreateAppRequest request) {
try {
UUID id = appService.createApp(request.environmentId(), request.slug(), request.displayName());
return ResponseEntity.status(201).body(appService.getById(id));
} catch (IllegalArgumentException e) {
return ResponseEntity.badRequest().build();
}
}
@GetMapping("/{appSlug}/versions")
@Operation(summary = "List app versions")
@ApiResponse(responseCode = "200", description = "Version list returned")
public ResponseEntity<List<AppVersion>> listVersions(@PathVariable String appSlug) {
try {
App app = appService.getBySlug(appSlug);
return ResponseEntity.ok(appService.listVersions(app.id()));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping(value = "/{appSlug}/versions", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
@Operation(summary = "Upload a JAR for a new app version")
@ApiResponse(responseCode = "201", description = "JAR uploaded and version created")
@ApiResponse(responseCode = "404", description = "App not found")
public ResponseEntity<AppVersion> uploadJar(@PathVariable String appSlug,
@RequestParam("file") MultipartFile file) throws IOException {
try {
App app = appService.getBySlug(appSlug);
AppVersion version = appService.uploadJar(app.id(), file.getOriginalFilename(), file.getInputStream(), file.getSize());
return ResponseEntity.status(201).body(version);
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@DeleteMapping("/{appSlug}")
@Operation(summary = "Delete an app")
@ApiResponse(responseCode = "204", description = "App deleted")
public ResponseEntity<Void> deleteApp(@PathVariable String appSlug) {
try {
App app = appService.getBySlug(appSlug);
appService.deleteApp(app.id());
return ResponseEntity.noContent().build();
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PutMapping("/{appSlug}/container-config")
@Operation(summary = "Update container config for an app")
@ApiResponse(responseCode = "200", description = "Container config updated")
@ApiResponse(responseCode = "404", description = "App not found")
public ResponseEntity<App> updateContainerConfig(@PathVariable String appSlug,
@RequestBody Map<String, Object> containerConfig) {
try {
App app = appService.getBySlug(appSlug);
appService.updateContainerConfig(app.id(), containerConfig);
return ResponseEntity.ok(appService.getById(app.id()));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
public record CreateAppRequest(UUID environmentId, String slug, String displayName) {}
}

View File

@@ -48,7 +48,7 @@ public class AppSettingsController {
@GetMapping("/{appId}")
@Operation(summary = "Get settings for a specific application (returns defaults if not configured)")
public ResponseEntity<AppSettings> getByAppId(@PathVariable String appId) {
AppSettings settings = repository.findByAppId(appId).orElse(AppSettings.defaults(appId));
AppSettings settings = repository.findByApplicationId(appId).orElse(AppSettings.defaults(appId));
return ResponseEntity.ok(settings);
}

View File

@@ -1,13 +1,14 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.common.model.ApplicationConfig;
import com.cameleer3.server.app.dto.CommandGroupResponse;
import com.cameleer3.server.app.dto.ConfigUpdateResponse;
import com.cameleer3.server.app.dto.TestExpressionRequest;
import com.cameleer3.server.app.dto.TestExpressionResponse;
import com.cameleer3.server.app.storage.PostgresApplicationConfigRepository;
import com.cameleer3.server.core.admin.AuditCategory;
import com.cameleer3.server.core.admin.AuditResult;
import com.cameleer3.server.core.admin.AuditService;
import com.cameleer3.server.core.agent.AgentCommand;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.agent.AgentState;
@@ -27,6 +28,7 @@ import org.springframework.http.ResponseEntity;
import org.springframework.security.core.Authentication;
import org.springframework.web.bind.annotation.*;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.CompletableFuture;
@@ -88,23 +90,26 @@ public class ApplicationConfigController {
@Operation(summary = "Update application config",
description = "Saves config and pushes CONFIG_UPDATE to all LIVE agents of this application")
@ApiResponse(responseCode = "200", description = "Config saved and pushed")
public ResponseEntity<ApplicationConfig> updateConfig(@PathVariable String application,
@RequestBody ApplicationConfig config,
Authentication auth,
HttpServletRequest httpRequest) {
public ResponseEntity<ConfigUpdateResponse> updateConfig(@PathVariable String application,
@RequestParam(required = false) String environment,
@RequestBody ApplicationConfig config,
Authentication auth,
HttpServletRequest httpRequest) {
String updatedBy = auth != null ? auth.getName() : "system";
config.setApplication(application);
ApplicationConfig saved = configRepository.save(application, config, updatedBy);
int pushed = pushConfigToAgents(application, saved);
log.info("Config v{} saved for '{}', pushed to {} agent(s)", saved.getVersion(), application, pushed);
CommandGroupResponse pushResult = pushConfigToAgents(application, environment, saved);
log.info("Config v{} saved for '{}', pushed to {} agent(s), {} responded",
saved.getVersion(), application, pushResult.total(), pushResult.responded());
auditService.log("update_app_config", AuditCategory.CONFIG, application,
Map.of("version", saved.getVersion(), "agentsPushed", pushed),
Map.of("version", saved.getVersion(), "agentsPushed", pushResult.total(),
"responded", pushResult.responded(), "timedOut", pushResult.timedOut().size()),
AuditResult.SUCCESS, httpRequest);
return ResponseEntity.ok(saved);
return ResponseEntity.ok(new ConfigUpdateResponse(saved, pushResult));
}
@GetMapping("/{application}/processor-routes")
@@ -122,13 +127,16 @@ public class ApplicationConfigController {
@ApiResponse(responseCode = "504", description = "Agent did not respond in time")
public ResponseEntity<TestExpressionResponse> testExpression(
@PathVariable String application,
@RequestParam(required = false) String environment,
@RequestBody TestExpressionRequest request) {
// Find a LIVE agent for this application
AgentInfo agent = registryService.findAll().stream()
.filter(a -> application.equals(a.application()))
.filter(a -> a.state() == AgentState.LIVE)
.findFirst()
.orElse(null);
// Find a LIVE agent for this application, optionally filtered by environment
var candidates = registryService.findAll().stream()
.filter(a -> application.equals(a.applicationId()))
.filter(a -> a.state() == AgentState.LIVE);
if (environment != null) {
candidates = candidates.filter(a -> environment.equals(a.environmentId()));
}
AgentInfo agent = candidates.findFirst().orElse(null);
if (agent == null) {
return ResponseEntity.status(HttpStatus.NOT_FOUND)
@@ -152,7 +160,7 @@ public class ApplicationConfigController {
// Send command and await reply
CompletableFuture<CommandReply> future = registryService.addCommandWithReply(
agent.id(), CommandType.TEST_EXPRESSION, payloadJson);
agent.instanceId(), CommandType.TEST_EXPRESSION, payloadJson);
try {
CommandReply reply = future.orTimeout(5, TimeUnit.SECONDS).join();
@@ -166,30 +174,56 @@ public class ApplicationConfigController {
return ResponseEntity.status(HttpStatus.GATEWAY_TIMEOUT)
.body(new TestExpressionResponse(null, "Agent did not respond within 5 seconds"));
}
log.error("Error awaiting test-expression reply from agent {}", agent.id(), e);
log.error("Error awaiting test-expression reply from agent {}", agent.instanceId(), e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new TestExpressionResponse(null, "Internal error: " + e.getCause().getMessage()));
}
}
private int pushConfigToAgents(String application, ApplicationConfig config) {
private CommandGroupResponse pushConfigToAgents(String application, String environment, ApplicationConfig config) {
String payloadJson;
try {
payloadJson = objectMapper.writeValueAsString(config);
} catch (JsonProcessingException e) {
log.error("Failed to serialize config for push", e);
return 0;
return new CommandGroupResponse(false, 0, 0, List.of(), List.of());
}
List<AgentInfo> agents = registryService.findAll().stream()
.filter(a -> a.state() == AgentState.LIVE)
.filter(a -> application.equals(a.application()))
.toList();
Map<String, CompletableFuture<CommandReply>> futures =
registryService.addGroupCommandWithReplies(application, environment, CommandType.CONFIG_UPDATE, payloadJson);
for (AgentInfo agent : agents) {
registryService.addCommand(agent.id(), CommandType.CONFIG_UPDATE, payloadJson);
if (futures.isEmpty()) {
return new CommandGroupResponse(true, 0, 0, List.of(), List.of());
}
return agents.size();
// Wait with shared 10-second deadline
long deadline = System.currentTimeMillis() + 10_000;
List<CommandGroupResponse.AgentResponse> responses = new ArrayList<>();
List<String> timedOut = new ArrayList<>();
for (var entry : futures.entrySet()) {
long remaining = deadline - System.currentTimeMillis();
if (remaining <= 0) {
timedOut.add(entry.getKey());
entry.getValue().cancel(false);
continue;
}
try {
CommandReply reply = entry.getValue().get(remaining, TimeUnit.MILLISECONDS);
responses.add(new CommandGroupResponse.AgentResponse(
entry.getKey(), reply.status(), reply.message()));
} catch (TimeoutException e) {
timedOut.add(entry.getKey());
entry.getValue().cancel(false);
} catch (Exception e) {
responses.add(new CommandGroupResponse.AgentResponse(
entry.getKey(), "ERROR", e.getMessage()));
}
}
boolean allSuccess = timedOut.isEmpty() &&
responses.stream().allMatch(r -> "SUCCESS".equals(r.status()));
return new CommandGroupResponse(allSuccess, futures.size(), responses.size(), responses, timedOut);
}
private static ApplicationConfig defaultConfig(String application) {

View File

@@ -0,0 +1,296 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.dto.AgentSummary;
import com.cameleer3.server.app.dto.CatalogApp;
import com.cameleer3.server.app.dto.RouteSummary;
import com.cameleer3.common.graph.RouteGraph;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.agent.AgentState;
import com.cameleer3.server.core.agent.RouteStateRegistry;
import com.cameleer3.server.core.runtime.*;
import com.cameleer3.server.core.storage.DiagramStore;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.sql.Timestamp;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.*;
import java.util.stream.Collectors;
/**
* Unified catalog endpoint that merges App records (PostgreSQL) with live agent data
* and ClickHouse stats. Replaces the separate RouteCatalogController.
*/
@RestController
@RequestMapping("/api/v1/catalog")
@Tag(name = "Catalog", description = "Unified application catalog")
public class CatalogController {
private static final Logger log = LoggerFactory.getLogger(CatalogController.class);
private final AgentRegistryService registryService;
private final DiagramStore diagramStore;
private final JdbcTemplate jdbc;
private final RouteStateRegistry routeStateRegistry;
private final AppService appService;
private final EnvironmentService envService;
private final DeploymentRepository deploymentRepo;
public CatalogController(AgentRegistryService registryService,
DiagramStore diagramStore,
@org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc,
RouteStateRegistry routeStateRegistry,
AppService appService,
EnvironmentService envService,
DeploymentRepository deploymentRepo) {
this.registryService = registryService;
this.diagramStore = diagramStore;
this.jdbc = jdbc;
this.routeStateRegistry = routeStateRegistry;
this.appService = appService;
this.envService = envService;
this.deploymentRepo = deploymentRepo;
}
@GetMapping
@Operation(summary = "Get unified catalog",
description = "Returns all applications (managed + unmanaged) with live agent data, routes, and deployment status")
@ApiResponse(responseCode = "200", description = "Catalog returned")
public ResponseEntity<List<CatalogApp>> getCatalog(
@RequestParam(required = false) String environment,
@RequestParam(required = false) String from,
@RequestParam(required = false) String to) {
// 1. Resolve environment
Environment env = null;
if (environment != null && !environment.isBlank()) {
try {
env = envService.getBySlug(environment);
} catch (IllegalArgumentException e) {
return ResponseEntity.ok(List.of());
}
}
// 2. Get managed apps from PostgreSQL
List<App> managedApps = env != null
? appService.listByEnvironment(env.id())
: appService.listAll();
Map<String, App> appsBySlug = managedApps.stream()
.collect(Collectors.toMap(App::slug, a -> a, (a, b) -> a));
// 3. Get active deployments for managed apps
Map<UUID, Deployment> activeDeployments = new HashMap<>();
for (App app : managedApps) {
UUID envId = env != null ? env.id() : app.environmentId();
deploymentRepo.findActiveByAppIdAndEnvironmentId(app.id(), envId)
.ifPresent(d -> activeDeployments.put(app.id(), d));
}
// 4. Get agents, filter by environment
List<AgentInfo> allAgents = registryService.findAll();
if (environment != null && !environment.isBlank()) {
allAgents = allAgents.stream()
.filter(a -> environment.equals(a.environmentId()))
.toList();
}
Map<String, List<AgentInfo>> agentsByApp = allAgents.stream()
.collect(Collectors.groupingBy(AgentInfo::applicationId, LinkedHashMap::new, Collectors.toList()));
// 5. Collect routes per app from agents
Map<String, Set<String>> routesByApp = new LinkedHashMap<>();
for (var entry : agentsByApp.entrySet()) {
Set<String> routes = new LinkedHashSet<>();
for (AgentInfo agent : entry.getValue()) {
if (agent.routeIds() != null) routes.addAll(agent.routeIds());
}
routesByApp.put(entry.getKey(), routes);
}
// 6. ClickHouse exchange counts
Instant now = Instant.now();
Instant rangeFrom = from != null ? Instant.parse(from) : now.minus(24, ChronoUnit.HOURS);
Instant rangeTo = to != null ? Instant.parse(to) : now;
Map<String, Long> routeExchangeCounts = new LinkedHashMap<>();
Map<String, Instant> routeLastSeen = new LinkedHashMap<>();
try {
String envFilter = (environment != null && !environment.isBlank())
? " AND environment = " + lit(environment) : "";
jdbc.query(
"SELECT application_id, route_id, countMerge(total_count) AS cnt, MAX(bucket) AS last_seen " +
"FROM stats_1m_route WHERE bucket >= " + lit(rangeFrom) + " AND bucket < " + lit(rangeTo) +
envFilter + " GROUP BY application_id, route_id",
rs -> {
String key = rs.getString("application_id") + "/" + rs.getString("route_id");
routeExchangeCounts.put(key, rs.getLong("cnt"));
Timestamp ts = rs.getTimestamp("last_seen");
if (ts != null) routeLastSeen.put(key, ts.toInstant());
});
} catch (Exception e) {
log.warn("Failed to query route exchange counts: {}", e.getMessage());
}
// Merge ClickHouse routes into routesByApp
for (var countEntry : routeExchangeCounts.entrySet()) {
String[] parts = countEntry.getKey().split("/", 2);
if (parts.length == 2) {
routesByApp.computeIfAbsent(parts[0], k -> new LinkedHashSet<>()).add(parts[1]);
}
}
// 7. Build unified catalog
Set<String> allSlugs = new LinkedHashSet<>(appsBySlug.keySet());
allSlugs.addAll(agentsByApp.keySet());
allSlugs.addAll(routesByApp.keySet());
String envSlug = env != null ? env.slug() : "";
List<CatalogApp> catalog = new ArrayList<>();
for (String slug : allSlugs) {
App app = appsBySlug.get(slug);
List<AgentInfo> agents = agentsByApp.getOrDefault(slug, List.of());
Set<String> routeIds = routesByApp.getOrDefault(slug, Set.of());
List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
// Routes
List<RouteSummary> routeSummaries = routeIds.stream()
.map(routeId -> {
String key = slug + "/" + routeId;
long count = routeExchangeCounts.getOrDefault(key, 0L);
Instant lastSeen = routeLastSeen.get(key);
String fromUri = resolveFromEndpointUri(routeId, agentIds);
String state = routeStateRegistry.getState(slug, routeId).name().toLowerCase();
String routeState = "started".equals(state) ? null : state;
return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
})
.toList();
// Agent summaries
List<AgentSummary> agentSummaries = agents.stream()
.map(a -> new AgentSummary(a.instanceId(), a.displayName(), a.state().name().toLowerCase(), 0.0))
.toList();
// Agent health
String agentHealth = agents.isEmpty() ? "offline" : computeWorstHealth(agents);
// Total exchanges
long totalExchanges = routeSummaries.stream().mapToLong(RouteSummary::exchangeCount).sum();
// Deployment summary (managed apps only)
CatalogApp.DeploymentSummary deploymentSummary = null;
DeploymentStatus deployStatus = null;
if (app != null) {
Deployment dep = activeDeployments.get(app.id());
if (dep != null) {
deployStatus = dep.status();
int healthy = 0, total = 0;
if (dep.replicaStates() != null) {
total = dep.replicaStates().size();
healthy = (int) dep.replicaStates().stream()
.filter(r -> "RUNNING".equals(r.get("status")))
.count();
}
int version = 0;
try {
var versions = appService.listVersions(app.id());
version = versions.stream()
.filter(v -> v.id().equals(dep.appVersionId()))
.map(AppVersion::version)
.findFirst().orElse(0);
} catch (Exception ignored) {}
deploymentSummary = new CatalogApp.DeploymentSummary(
dep.status().name(),
healthy + "/" + total,
version
);
}
}
// Composite health + tooltip
String health = compositeHealth(app != null ? deployStatus : null, agentHealth);
String healthTooltip = buildHealthTooltip(app != null, deployStatus, agentHealth, agents.size());
String displayName = app != null ? app.displayName() : slug;
String appEnvSlug = envSlug;
if (app != null && appEnvSlug.isEmpty()) {
try {
appEnvSlug = envService.getById(app.environmentId()).slug();
} catch (Exception ignored) {}
}
catalog.add(new CatalogApp(
slug, displayName, app != null, appEnvSlug,
health, healthTooltip, agents.size(), routeSummaries, agentSummaries,
totalExchanges, deploymentSummary
));
}
return ResponseEntity.ok(catalog);
}
private String resolveFromEndpointUri(String routeId, List<String> agentIds) {
return diagramStore.findContentHashForRouteByAgents(routeId, agentIds)
.flatMap(diagramStore::findByContentHash)
.map(RouteGraph::getRoot)
.map(root -> root.getEndpointUri())
.orElse(null);
}
private static String lit(Instant instant) {
return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
.withZone(java.time.ZoneOffset.UTC)
.format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
}
private static String lit(String value) {
return "'" + value.replace("\\", "\\\\").replace("'", "\\'") + "'";
}
private String computeWorstHealth(List<AgentInfo> agents) {
boolean hasDead = false;
boolean hasStale = false;
for (AgentInfo a : agents) {
if (a.state() == AgentState.DEAD) hasDead = true;
if (a.state() == AgentState.STALE) hasStale = true;
}
if (hasDead) return "dead";
if (hasStale) return "stale";
return "live";
}
private String compositeHealth(DeploymentStatus deployStatus, String agentHealth) {
if (deployStatus == null) return agentHealth; // unmanaged or no deployment
return switch (deployStatus) {
case STARTING -> "running";
case STOPPING, DEGRADED -> "stale";
case STOPPED -> "dead";
case FAILED -> "error";
case RUNNING -> "offline".equals(agentHealth) ? "stale" : agentHealth;
};
}
private String buildHealthTooltip(boolean managed, DeploymentStatus deployStatus, String agentHealth, int agentCount) {
if (!managed) {
return "Agents: " + agentHealth + " (" + agentCount + " connected)";
}
if (deployStatus == null) {
return "No deployment";
}
String depPart = "Deployment: " + deployStatus.name();
if (deployStatus == DeploymentStatus.RUNNING || deployStatus == DeploymentStatus.DEGRADED) {
return depPart + ", Agents: " + agentHealth + " (" + agentCount + " connected)";
}
return depPart;
}
}

View File

@@ -0,0 +1,70 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.core.ingestion.ChunkAccumulator;
import com.cameleer3.common.model.ExecutionChunk;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.datatype.jsr310.JavaTimeModule;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.autoconfigure.condition.ConditionalOnBean;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
/**
* Ingestion endpoint for execution chunk data (ClickHouse pipeline).
* <p>
* Accepts single or array {@link ExecutionChunk} payloads and feeds them
* into the {@link ChunkAccumulator}. Only active when
* {@code clickhouse.enabled=true} (conditional on the accumulator bean).
*/
@RestController
@RequestMapping("/api/v1/data")
@ConditionalOnBean(ChunkAccumulator.class)
@Tag(name = "Ingestion", description = "Data ingestion endpoints")
public class ChunkIngestionController {
private static final Logger log = LoggerFactory.getLogger(ChunkIngestionController.class);
private final ChunkAccumulator accumulator;
private final ObjectMapper objectMapper;
public ChunkIngestionController(ChunkAccumulator accumulator) {
this.accumulator = accumulator;
this.objectMapper = new ObjectMapper();
this.objectMapper.registerModule(new JavaTimeModule());
this.objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
}
@PostMapping("/executions")
@Operation(summary = "Ingest execution chunk")
public ResponseEntity<Void> ingestChunks(@RequestBody String body) {
try {
String trimmed = body.strip();
List<ExecutionChunk> chunks;
if (trimmed.startsWith("[")) {
chunks = objectMapper.readValue(trimmed, new TypeReference<List<ExecutionChunk>>() {});
} else {
ExecutionChunk single = objectMapper.readValue(trimmed, ExecutionChunk.class);
chunks = List.of(single);
}
for (ExecutionChunk chunk : chunks) {
accumulator.onChunk(chunk);
}
return ResponseEntity.accepted().build();
} catch (Exception e) {
log.warn("Failed to parse execution chunk payload: {}", e.getMessage());
return ResponseEntity.badRequest().build();
}
}
}

View File

@@ -0,0 +1,77 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.core.rbac.ClaimMappingRepository;
import com.cameleer3.server.core.rbac.ClaimMappingRule;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.*;
import java.net.URI;
import java.util.List;
import java.util.UUID;
@RestController
@RequestMapping("/api/v1/admin/claim-mappings")
@PreAuthorize("hasRole('ADMIN')")
@Tag(name = "Claim Mapping Admin", description = "Manage OIDC claim-to-role/group mapping rules")
public class ClaimMappingAdminController {
private final ClaimMappingRepository repository;
public ClaimMappingAdminController(ClaimMappingRepository repository) {
this.repository = repository;
}
@GetMapping
@Operation(summary = "List all claim mapping rules")
public List<ClaimMappingRule> list() {
return repository.findAll();
}
@GetMapping("/{id}")
@Operation(summary = "Get a claim mapping rule by ID")
public ResponseEntity<ClaimMappingRule> get(@PathVariable UUID id) {
return repository.findById(id)
.map(ResponseEntity::ok)
.orElse(ResponseEntity.notFound().build());
}
record CreateRuleRequest(String claim, String matchType, String matchValue,
String action, String target, int priority) {}
@PostMapping
@Operation(summary = "Create a claim mapping rule")
public ResponseEntity<ClaimMappingRule> create(@RequestBody CreateRuleRequest request) {
UUID id = repository.create(
request.claim(), request.matchType(), request.matchValue(),
request.action(), request.target(), request.priority());
return repository.findById(id)
.map(rule -> ResponseEntity.created(URI.create("/api/v1/admin/claim-mappings/" + id)).body(rule))
.orElse(ResponseEntity.internalServerError().build());
}
@PutMapping("/{id}")
@Operation(summary = "Update a claim mapping rule")
public ResponseEntity<ClaimMappingRule> update(@PathVariable UUID id, @RequestBody CreateRuleRequest request) {
if (repository.findById(id).isEmpty()) {
return ResponseEntity.notFound().build();
}
repository.update(id, request.claim(), request.matchType(), request.matchValue(),
request.action(), request.target(), request.priority());
return repository.findById(id)
.map(ResponseEntity::ok)
.orElse(ResponseEntity.internalServerError().build());
}
@DeleteMapping("/{id}")
@Operation(summary = "Delete a claim mapping rule")
public ResponseEntity<Void> delete(@PathVariable UUID id) {
if (repository.findById(id).isEmpty()) {
return ResponseEntity.notFound().build();
}
repository.delete(id);
return ResponseEntity.noContent().build();
}
}

View File

@@ -0,0 +1,166 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.dto.ClickHousePerformanceResponse;
import com.cameleer3.server.app.dto.ClickHouseQueryInfo;
import com.cameleer3.server.app.dto.ClickHouseStatusResponse;
import com.cameleer3.server.app.dto.ClickHouseTableInfo;
import com.cameleer3.server.app.dto.IndexerPipelineResponse;
import com.cameleer3.server.core.indexing.SearchIndexerStats;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
@RestController
@RequestMapping("/api/v1/admin/clickhouse")
@PreAuthorize("hasRole('ADMIN')")
@Tag(name = "ClickHouse Admin", description = "ClickHouse monitoring and diagnostics (ADMIN only)")
public class ClickHouseAdminController {
private final JdbcTemplate clickHouseJdbc;
private final SearchIndexerStats indexerStats;
private final String clickHouseUrl;
public ClickHouseAdminController(
@Qualifier("clickHouseJdbcTemplate") JdbcTemplate clickHouseJdbc,
SearchIndexerStats indexerStats,
@Value("${clickhouse.url:}") String clickHouseUrl) {
this.clickHouseJdbc = clickHouseJdbc;
this.indexerStats = indexerStats;
this.clickHouseUrl = clickHouseUrl;
}
@GetMapping("/status")
@Operation(summary = "ClickHouse cluster status")
public ClickHouseStatusResponse getStatus() {
try {
var row = clickHouseJdbc.queryForMap(
"SELECT version() AS version, formatReadableTimeDelta(uptime()) AS uptime");
return new ClickHouseStatusResponse(true,
(String) row.get("version"),
(String) row.get("uptime"),
clickHouseUrl);
} catch (Exception e) {
return new ClickHouseStatusResponse(false, null, null, clickHouseUrl);
}
}
@GetMapping("/tables")
@Operation(summary = "List ClickHouse tables with sizes")
public List<ClickHouseTableInfo> getTables() {
return clickHouseJdbc.query("""
SELECT t.name, t.engine,
t.total_rows AS row_count,
formatReadableSize(t.total_bytes) AS data_size,
t.total_bytes AS data_size_bytes,
ifNull(p.partition_count, 0) AS partition_count
FROM system.tables t
LEFT JOIN (
SELECT table, countDistinct(partition) AS partition_count
FROM system.parts
WHERE database = currentDatabase() AND active
GROUP BY table
) p ON t.name = p.table
WHERE t.database = currentDatabase()
ORDER BY t.total_bytes DESC NULLS LAST
""",
(rs, rowNum) -> new ClickHouseTableInfo(
rs.getString("name"),
rs.getString("engine"),
rs.getLong("row_count"),
rs.getString("data_size"),
rs.getLong("data_size_bytes"),
rs.getInt("partition_count")));
}
@GetMapping("/performance")
@Operation(summary = "ClickHouse storage and performance metrics")
public ClickHousePerformanceResponse getPerformance() {
try {
var row = clickHouseJdbc.queryForMap("""
SELECT
formatReadableSize(sum(bytes_on_disk)) AS disk_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size,
if(sum(data_uncompressed_bytes) > 0,
round(sum(bytes_on_disk) / sum(data_uncompressed_bytes), 3), 0) AS compression_ratio,
sum(rows) AS total_rows,
count() AS part_count
FROM system.parts
WHERE database = currentDatabase() AND active
""");
String memory = "N/A";
try {
memory = clickHouseJdbc.queryForObject(
"SELECT formatReadableSize(value) FROM system.metrics WHERE metric = 'MemoryTracking'",
String.class);
} catch (Exception ignored) {}
int currentQueries = 0;
try {
Integer q = clickHouseJdbc.queryForObject(
"SELECT toInt32(value) FROM system.metrics WHERE metric = 'Query'",
Integer.class);
if (q != null) currentQueries = q;
} catch (Exception ignored) {}
return new ClickHousePerformanceResponse(
(String) row.get("disk_size"),
(String) row.get("uncompressed_size"),
((Number) row.get("compression_ratio")).doubleValue(),
((Number) row.get("total_rows")).longValue(),
((Number) row.get("part_count")).intValue(),
memory != null ? memory : "N/A",
currentQueries);
} catch (Exception e) {
return new ClickHousePerformanceResponse("N/A", "N/A", 0, 0, 0, "N/A", 0);
}
}
@GetMapping("/queries")
@Operation(summary = "Active ClickHouse queries")
public List<ClickHouseQueryInfo> getQueries() {
try {
return clickHouseJdbc.query("""
SELECT
query_id,
round(elapsed, 2) AS elapsed_seconds,
formatReadableSize(memory_usage) AS memory,
read_rows,
substring(query, 1, 200) AS query
FROM system.processes
WHERE is_initial_query = 1
AND query NOT LIKE '%system.processes%'
ORDER BY elapsed DESC
""",
(rs, rowNum) -> new ClickHouseQueryInfo(
rs.getString("query_id"),
rs.getDouble("elapsed_seconds"),
rs.getString("memory"),
rs.getLong("read_rows"),
rs.getString("query")));
} catch (Exception e) {
return List.of();
}
}
@GetMapping("/pipeline")
@Operation(summary = "Search indexer pipeline statistics")
public IndexerPipelineResponse getPipeline() {
return new IndexerPipelineResponse(
indexerStats.getQueueDepth(),
indexerStats.getMaxQueueSize(),
indexerStats.getFailedCount(),
indexerStats.getIndexedCount(),
indexerStats.getDebounceMs(),
indexerStats.getIndexingRate(),
indexerStats.getLastIndexedAt());
}
}

View File

@@ -7,7 +7,6 @@ import com.cameleer3.server.app.dto.TableSizeResponse;
import com.cameleer3.server.core.admin.AuditCategory;
import com.cameleer3.server.core.admin.AuditResult;
import com.cameleer3.server.core.admin.AuditService;
import com.cameleer3.server.core.ingestion.IngestionService;
import com.zaxxer.hikari.HikariDataSource;
import com.zaxxer.hikari.HikariPoolMXBean;
import io.swagger.v3.oas.annotations.Operation;
@@ -25,9 +24,7 @@ import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
import javax.sql.DataSource;
import java.time.Instant;
import java.util.List;
import java.util.Map;
@RestController
@RequestMapping("/api/v1/admin/database")
@@ -38,14 +35,12 @@ public class DatabaseAdminController {
private final JdbcTemplate jdbc;
private final DataSource dataSource;
private final AuditService auditService;
private final IngestionService ingestionService;
public DatabaseAdminController(JdbcTemplate jdbc, DataSource dataSource,
AuditService auditService, IngestionService ingestionService) {
AuditService auditService) {
this.jdbc = jdbc;
this.dataSource = dataSource;
this.auditService = auditService;
this.ingestionService = ingestionService;
}
@GetMapping("/status")
@@ -53,14 +48,12 @@ public class DatabaseAdminController {
public ResponseEntity<DatabaseStatusResponse> getStatus() {
try {
String version = jdbc.queryForObject("SELECT version()", String.class);
boolean timescaleDb = Boolean.TRUE.equals(
jdbc.queryForObject("SELECT EXISTS(SELECT 1 FROM pg_extension WHERE extname = 'timescaledb')", Boolean.class));
String schema = jdbc.queryForObject("SELECT current_schema()", String.class);
String host = extractHost(dataSource);
return ResponseEntity.ok(new DatabaseStatusResponse(true, version, host, schema, timescaleDb));
return ResponseEntity.ok(new DatabaseStatusResponse(true, version, host, schema));
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(new DatabaseStatusResponse(false, null, null, null, false));
.body(new DatabaseStatusResponse(false, null, null, null));
}
}
@@ -124,29 +117,6 @@ public class DatabaseAdminController {
return ResponseEntity.ok().build();
}
@GetMapping("/metrics-pipeline")
@Operation(summary = "Get metrics ingestion pipeline diagnostics")
public ResponseEntity<Map<String, Object>> getMetricsPipeline() {
int bufferDepth = ingestionService.getMetricsBufferDepth();
Long totalRows = jdbc.queryForObject(
"SELECT count(*) FROM agent_metrics", Long.class);
List<String> agentIds = jdbc.queryForList(
"SELECT DISTINCT agent_id FROM agent_metrics ORDER BY agent_id", String.class);
Instant latestCollected = jdbc.queryForObject(
"SELECT max(collected_at) FROM agent_metrics", Instant.class);
List<String> metricNames = jdbc.queryForList(
"SELECT DISTINCT metric_name FROM agent_metrics ORDER BY metric_name", String.class);
return ResponseEntity.ok(Map.of(
"bufferDepth", bufferDepth,
"totalRows", totalRows != null ? totalRows : 0,
"distinctAgents", agentIds,
"distinctMetrics", metricNames,
"latestCollectedAt", latestCollected != null ? latestCollected.toString() : "none"
));
}
private String extractHost(DataSource ds) {
try {
if (ds instanceof HikariDataSource hds) {

View File

@@ -0,0 +1,135 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.runtime.DeploymentExecutor;
import com.cameleer3.server.core.runtime.*;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
import java.util.UUID;
import java.util.stream.Collectors;
/**
* Deployment management: deploy, stop, promote, and view logs.
* All app-scoped endpoints accept the app slug (not UUID) as path variable.
* Protected by {@code ROLE_OPERATOR} or {@code ROLE_ADMIN}.
*/
@RestController
@RequestMapping("/api/v1/apps/{appSlug}/deployments")
@Tag(name = "Deployment Management", description = "Deploy, stop, restart, promote, and view logs")
@PreAuthorize("hasAnyRole('OPERATOR', 'ADMIN')")
public class DeploymentController {
private final DeploymentService deploymentService;
private final DeploymentExecutor deploymentExecutor;
private final RuntimeOrchestrator orchestrator;
private final AppService appService;
public DeploymentController(DeploymentService deploymentService,
DeploymentExecutor deploymentExecutor,
RuntimeOrchestrator orchestrator,
AppService appService) {
this.deploymentService = deploymentService;
this.deploymentExecutor = deploymentExecutor;
this.orchestrator = orchestrator;
this.appService = appService;
}
@GetMapping
@Operation(summary = "List deployments for an app")
@ApiResponse(responseCode = "200", description = "Deployment list returned")
public ResponseEntity<List<Deployment>> listDeployments(@PathVariable String appSlug) {
try {
App app = appService.getBySlug(appSlug);
return ResponseEntity.ok(deploymentService.listByApp(app.id()));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@GetMapping("/{deploymentId}")
@Operation(summary = "Get deployment by ID")
@ApiResponse(responseCode = "200", description = "Deployment found")
@ApiResponse(responseCode = "404", description = "Deployment not found")
public ResponseEntity<Deployment> getDeployment(@PathVariable String appSlug, @PathVariable UUID deploymentId) {
try {
return ResponseEntity.ok(deploymentService.getById(deploymentId));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping
@Operation(summary = "Create and start a new deployment")
@ApiResponse(responseCode = "202", description = "Deployment accepted and starting")
public ResponseEntity<Deployment> deploy(@PathVariable String appSlug, @RequestBody DeployRequest request) {
try {
App app = appService.getBySlug(appSlug);
Deployment deployment = deploymentService.createDeployment(app.id(), request.appVersionId(), request.environmentId());
deploymentExecutor.executeAsync(deployment);
return ResponseEntity.accepted().body(deployment);
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping("/{deploymentId}/stop")
@Operation(summary = "Stop a running deployment")
@ApiResponse(responseCode = "200", description = "Deployment stopped")
@ApiResponse(responseCode = "404", description = "Deployment not found")
public ResponseEntity<Deployment> stop(@PathVariable String appSlug, @PathVariable UUID deploymentId) {
try {
Deployment deployment = deploymentService.getById(deploymentId);
deploymentExecutor.stopDeployment(deployment);
return ResponseEntity.ok(deploymentService.getById(deploymentId));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping("/{deploymentId}/promote")
@Operation(summary = "Promote deployment to a different environment")
@ApiResponse(responseCode = "202", description = "Promotion accepted and starting")
@ApiResponse(responseCode = "404", description = "Deployment not found")
public ResponseEntity<Deployment> promote(@PathVariable String appSlug, @PathVariable UUID deploymentId,
@RequestBody PromoteRequest request) {
try {
App app = appService.getBySlug(appSlug);
Deployment source = deploymentService.getById(deploymentId);
Deployment promoted = deploymentService.promote(app.id(), source.appVersionId(), request.targetEnvironmentId());
deploymentExecutor.executeAsync(promoted);
return ResponseEntity.accepted().body(promoted);
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@GetMapping("/{deploymentId}/logs")
@Operation(summary = "Get container logs for a deployment")
@ApiResponse(responseCode = "200", description = "Logs returned")
@ApiResponse(responseCode = "404", description = "Deployment not found or no container")
public ResponseEntity<List<String>> getLogs(@PathVariable String appSlug, @PathVariable UUID deploymentId) {
try {
Deployment deployment = deploymentService.getById(deploymentId);
if (deployment.containerId() == null) {
return ResponseEntity.notFound().build();
}
List<String> logs = orchestrator.getLogs(deployment.containerId(), 200).collect(Collectors.toList());
return ResponseEntity.ok(logs);
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
public record DeployRequest(UUID appVersionId, UUID environmentId) {}
public record PromoteRequest(UUID targetEnvironmentId) {}
}

View File

@@ -81,4 +81,16 @@ public class DetailController {
.map(ResponseEntity::ok)
.orElse(ResponseEntity.notFound().build());
}
@GetMapping("/{executionId}/processors/by-seq/{seq}/snapshot")
@Operation(summary = "Get exchange snapshot for a processor by seq number")
@ApiResponse(responseCode = "200", description = "Snapshot data")
@ApiResponse(responseCode = "404", description = "Snapshot not found")
public ResponseEntity<Map<String, String>> processorSnapshotBySeq(
@PathVariable String executionId,
@PathVariable int seq) {
return detailService.getProcessorSnapshotBySeq(executionId, seq)
.map(ResponseEntity::ok)
.orElse(ResponseEntity.notFound().build());
}
}

View File

@@ -11,8 +11,6 @@ import com.fasterxml.jackson.databind.ObjectMapper;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
import org.springframework.security.core.Authentication;
import org.springframework.security.core.context.SecurityContextHolder;
@@ -34,8 +32,6 @@ import java.util.List;
@Tag(name = "Ingestion", description = "Data ingestion endpoints")
public class DiagramController {
private static final Logger log = LoggerFactory.getLogger(DiagramController.class);
private final IngestionService ingestionService;
private final AgentRegistryService registryService;
private final ObjectMapper objectMapper;
@@ -53,12 +49,12 @@ public class DiagramController {
description = "Accepts a single RouteGraph or an array of RouteGraphs")
@ApiResponse(responseCode = "202", description = "Data accepted for processing")
public ResponseEntity<Void> ingestDiagrams(@RequestBody String body) throws JsonProcessingException {
String agentId = extractAgentId();
String applicationName = resolveApplicationName(agentId);
String instanceId = extractAgentId();
String applicationId = resolveApplicationId(instanceId);
List<RouteGraph> graphs = parsePayload(body);
for (RouteGraph graph : graphs) {
ingestionService.ingestDiagram(new TaggedDiagram(agentId, applicationName, graph));
ingestionService.ingestDiagram(new TaggedDiagram(instanceId, applicationId, graph));
}
return ResponseEntity.accepted().build();
@@ -69,9 +65,9 @@ public class DiagramController {
return auth != null ? auth.getName() : "";
}
private String resolveApplicationName(String agentId) {
AgentInfo agent = registryService.findById(agentId);
return agent != null ? agent.application() : "";
private String resolveApplicationId(String instanceId) {
AgentInfo agent = registryService.findById(instanceId);
return agent != null ? agent.applicationId() : "";
}
private List<RouteGraph> parsePayload(String body) throws JsonProcessingException {

View File

@@ -100,7 +100,7 @@ public class DiagramRenderController {
@RequestParam String routeId,
@RequestParam(defaultValue = "LR") String direction) {
List<String> agentIds = registryService.findByApplication(application).stream()
.map(AgentInfo::id)
.map(AgentInfo::instanceId)
.toList();
if (agentIds.isEmpty()) {

View File

@@ -0,0 +1,127 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.core.runtime.Environment;
import com.cameleer3.server.core.runtime.EnvironmentService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.*;
import java.util.List;
import java.util.Map;
import java.util.UUID;
@RestController
@RequestMapping("/api/v1/admin/environments")
@Tag(name = "Environment Admin", description = "Environment management (ADMIN only)")
@PreAuthorize("hasRole('ADMIN')")
public class EnvironmentAdminController {
private final EnvironmentService environmentService;
public EnvironmentAdminController(EnvironmentService environmentService) {
this.environmentService = environmentService;
}
@GetMapping
@Operation(summary = "List all environments")
@PreAuthorize("isAuthenticated()")
public ResponseEntity<List<Environment>> listEnvironments() {
return ResponseEntity.ok(environmentService.listAll());
}
@GetMapping("/{id}")
@Operation(summary = "Get environment by ID")
@ApiResponse(responseCode = "200", description = "Environment found")
@ApiResponse(responseCode = "404", description = "Environment not found")
public ResponseEntity<Environment> getEnvironment(@PathVariable UUID id) {
try {
return ResponseEntity.ok(environmentService.getById(id));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping
@Operation(summary = "Create a new environment")
@ApiResponse(responseCode = "201", description = "Environment created")
@ApiResponse(responseCode = "400", description = "Slug already exists")
public ResponseEntity<?> createEnvironment(@RequestBody CreateEnvironmentRequest request) {
try {
UUID id = environmentService.create(request.slug(), request.displayName(), request.production());
return ResponseEntity.status(201).body(environmentService.getById(id));
} catch (IllegalArgumentException e) {
return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
}
}
@PutMapping("/{id}")
@Operation(summary = "Update an environment")
@ApiResponse(responseCode = "200", description = "Environment updated")
@ApiResponse(responseCode = "404", description = "Environment not found")
public ResponseEntity<?> updateEnvironment(@PathVariable UUID id, @RequestBody UpdateEnvironmentRequest request) {
try {
environmentService.update(id, request.displayName(), request.production(), request.enabled());
return ResponseEntity.ok(environmentService.getById(id));
} catch (IllegalArgumentException e) {
if (e.getMessage().contains("not found")) {
return ResponseEntity.notFound().build();
}
return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
}
}
@DeleteMapping("/{id}")
@Operation(summary = "Delete an environment")
@ApiResponse(responseCode = "204", description = "Environment deleted")
@ApiResponse(responseCode = "400", description = "Cannot delete default environment")
@ApiResponse(responseCode = "404", description = "Environment not found")
public ResponseEntity<?> deleteEnvironment(@PathVariable UUID id) {
try {
environmentService.delete(id);
return ResponseEntity.noContent().build();
} catch (IllegalArgumentException e) {
if (e.getMessage().contains("not found")) {
return ResponseEntity.notFound().build();
}
return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
}
}
@PutMapping("/{id}/default-container-config")
@Operation(summary = "Update default container config for an environment")
@ApiResponse(responseCode = "200", description = "Default container config updated")
@ApiResponse(responseCode = "404", description = "Environment not found")
public ResponseEntity<?> updateDefaultContainerConfig(@PathVariable UUID id,
@RequestBody Map<String, Object> defaultContainerConfig) {
try {
environmentService.updateDefaultContainerConfig(id, defaultContainerConfig);
return ResponseEntity.ok(environmentService.getById(id));
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
}
}
@PutMapping("/{id}/jar-retention")
@Operation(summary = "Update JAR retention policy for an environment")
@ApiResponse(responseCode = "200", description = "Retention policy updated")
@ApiResponse(responseCode = "404", description = "Environment not found")
public ResponseEntity<?> updateJarRetention(@PathVariable UUID id,
@RequestBody JarRetentionRequest request) {
try {
environmentService.updateJarRetentionCount(id, request.jarRetentionCount());
return ResponseEntity.ok(environmentService.getById(id));
} catch (IllegalArgumentException e) {
if (e.getMessage().contains("not found")) {
return ResponseEntity.notFound().build();
}
return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
}
}
public record CreateEnvironmentRequest(String slug, String displayName, boolean production) {}
public record UpdateEnvironmentRequest(String displayName, boolean production, boolean enabled) {}
public record JarRetentionRequest(Integer jarRetentionCount) {}
}

View File

@@ -0,0 +1,119 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.common.model.AgentEvent;
import com.cameleer3.server.core.agent.AgentEventService;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.agent.RouteStateRegistry;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
import org.springframework.security.core.Authentication;
import org.springframework.security.core.context.SecurityContextHolder;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
import java.util.Map;
/**
* Ingestion endpoint for agent lifecycle events.
* <p>
* Agents emit events (AGENT_STARTED, AGENT_STOPPED, etc.) which are
* stored in the event log. AGENT_STOPPED triggers a graceful shutdown
* transition in the registry.
*/
@RestController
@RequestMapping("/api/v1/data")
@Tag(name = "Ingestion", description = "Data ingestion endpoints")
public class EventIngestionController {
private static final Logger log = LoggerFactory.getLogger(EventIngestionController.class);
private final AgentEventService agentEventService;
private final AgentRegistryService registryService;
private final ObjectMapper objectMapper;
private final RouteStateRegistry routeStateRegistry;
public EventIngestionController(AgentEventService agentEventService,
AgentRegistryService registryService,
ObjectMapper objectMapper,
RouteStateRegistry routeStateRegistry) {
this.agentEventService = agentEventService;
this.registryService = registryService;
this.objectMapper = objectMapper;
this.routeStateRegistry = routeStateRegistry;
}
@PostMapping("/events")
@Operation(summary = "Ingest agent events")
public ResponseEntity<Void> ingestEvents(@RequestBody String body) {
String instanceId = extractInstanceId();
List<AgentEvent> events;
try {
String trimmed = body.strip();
if (trimmed.startsWith("[")) {
events = objectMapper.readValue(trimmed, new TypeReference<List<AgentEvent>>() {});
} else {
events = List.of(objectMapper.readValue(trimmed, AgentEvent.class));
}
} catch (Exception e) {
log.warn("Failed to parse event payload: {}", e.getMessage());
return ResponseEntity.badRequest().build();
}
AgentInfo agent = registryService.findById(instanceId);
String applicationId = agent != null ? agent.applicationId() : "";
for (AgentEvent event : events) {
agentEventService.recordEvent(instanceId, applicationId,
event.getEventType(),
event.getDetails() != null ? event.getDetails().toString() : null);
if ("AGENT_STOPPED".equals(event.getEventType())) {
log.info("Agent {} reported graceful shutdown", instanceId);
registryService.shutdown(instanceId);
}
if ("ROUTE_STATE_CHANGED".equals(event.getEventType())) {
Map<String, String> details = event.getDetails();
if (details != null) {
String routeId = details.get("routeId");
String newState = details.get("newState");
if (routeId != null && newState != null) {
RouteStateRegistry.RouteState state = parseRouteState(newState);
if (state != null) {
routeStateRegistry.setState(applicationId, routeId, state);
log.debug("Route state changed: {}/{} -> {} (reason: {})",
applicationId, routeId, newState, details.get("reason"));
}
}
}
}
}
return ResponseEntity.accepted().build();
}
private RouteStateRegistry.RouteState parseRouteState(String state) {
if (state == null) return null;
return switch (state) {
case "Started" -> RouteStateRegistry.RouteState.STARTED;
case "Stopped" -> RouteStateRegistry.RouteState.STOPPED;
case "Suspended" -> RouteStateRegistry.RouteState.SUSPENDED;
default -> null;
};
}
private String extractInstanceId() {
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
return auth != null ? auth.getName() : "";
}
}

View File

@@ -3,6 +3,7 @@ package com.cameleer3.server.app.controller;
import com.cameleer3.common.model.RouteExecution;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.ingestion.ChunkAccumulator;
import com.cameleer3.server.core.ingestion.IngestionService;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.core.type.TypeReference;
@@ -10,8 +11,7 @@ import com.fasterxml.jackson.databind.ObjectMapper;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.autoconfigure.condition.ConditionalOnMissingBean;
import org.springframework.http.ResponseEntity;
import org.springframework.security.core.Authentication;
import org.springframework.security.core.context.SecurityContextHolder;
@@ -23,18 +23,20 @@ import org.springframework.web.bind.annotation.RestController;
import java.util.List;
/**
* Ingestion endpoint for route execution data.
* Legacy ingestion endpoint for route execution data (PostgreSQL path).
* <p>
* Accepts both single {@link RouteExecution} and arrays. Data is written
* synchronously to PostgreSQL via {@link IngestionService}.
* <p>
* Only active when ClickHouse is disabled — when ClickHouse is enabled,
* {@link ChunkIngestionController} takes over the {@code /executions} mapping.
*/
@RestController
@RequestMapping("/api/v1/data")
@ConditionalOnMissingBean(ChunkAccumulator.class)
@Tag(name = "Ingestion", description = "Data ingestion endpoints")
public class ExecutionController {
private static final Logger log = LoggerFactory.getLogger(ExecutionController.class);
private final IngestionService ingestionService;
private final AgentRegistryService registryService;
private final ObjectMapper objectMapper;
@@ -52,12 +54,12 @@ public class ExecutionController {
description = "Accepts a single RouteExecution or an array of RouteExecutions")
@ApiResponse(responseCode = "202", description = "Data accepted for processing")
public ResponseEntity<Void> ingestExecutions(@RequestBody String body) throws JsonProcessingException {
String agentId = extractAgentId();
String applicationName = resolveApplicationName(agentId);
String instanceId = extractAgentId();
String applicationId = resolveApplicationId(instanceId);
List<RouteExecution> executions = parsePayload(body);
for (RouteExecution execution : executions) {
ingestionService.ingestExecution(agentId, applicationName, execution);
ingestionService.ingestExecution(instanceId, applicationId, execution);
}
return ResponseEntity.accepted().build();
@@ -68,9 +70,9 @@ public class ExecutionController {
return auth != null ? auth.getName() : "";
}
private String resolveApplicationName(String agentId) {
AgentInfo agent = registryService.findById(agentId);
return agent != null ? agent.application() : "";
private String resolveApplicationId(String instanceId) {
AgentInfo agent = registryService.findById(instanceId);
return agent != null ? agent.applicationId() : "";
}
private List<RouteExecution> parsePayload(String body) throws JsonProcessingException {

View File

@@ -7,10 +7,12 @@ import com.cameleer3.server.core.rbac.GroupDetail;
import com.cameleer3.server.core.rbac.GroupRepository;
import com.cameleer3.server.core.rbac.GroupSummary;
import com.cameleer3.server.core.rbac.RbacService;
import com.cameleer3.server.core.rbac.SystemRole;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.servlet.http.HttpServletRequest;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.DeleteMapping;
@@ -21,6 +23,7 @@ import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
import java.util.ArrayList;
import java.util.List;
@@ -39,14 +42,14 @@ import java.util.UUID;
public class GroupAdminController {
private final GroupRepository groupRepository;
private final RbacService rbacService;
private final AuditService auditService;
private final RbacService rbacService;
public GroupAdminController(GroupRepository groupRepository, RbacService rbacService,
AuditService auditService) {
public GroupAdminController(GroupRepository groupRepository, AuditService auditService,
RbacService rbacService) {
this.groupRepository = groupRepository;
this.rbacService = rbacService;
this.auditService = auditService;
this.rbacService = rbacService;
}
@GetMapping
@@ -156,6 +159,10 @@ public class GroupAdminController {
if (groupRepository.findById(id).isEmpty()) {
return ResponseEntity.notFound().build();
}
if (SystemRole.ADMIN_ID.equals(roleId) && rbacService.getEffectivePrincipalsForRole(SystemRole.ADMIN_ID).size() <= 1) {
throw new ResponseStatusException(HttpStatus.CONFLICT,
"Cannot remove the ADMIN role: at least one admin user must exist");
}
groupRepository.removeRole(id, roleId);
auditService.log("remove_role_from_group", AuditCategory.RBAC, id.toString(),
Map.of("roleId", roleId), AuditResult.SUCCESS, httpRequest);

View File

@@ -0,0 +1,53 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.core.license.LicenseGate;
import com.cameleer3.server.core.license.LicenseInfo;
import com.cameleer3.server.core.license.LicenseValidator;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.*;
import java.util.Map;
@RestController
@RequestMapping("/api/v1/admin/license")
@PreAuthorize("hasRole('ADMIN')")
@Tag(name = "License Admin", description = "License management")
public class LicenseAdminController {
private final LicenseGate licenseGate;
private final String licensePublicKey;
public LicenseAdminController(LicenseGate licenseGate,
@Value("${license.public-key:}") String licensePublicKey) {
this.licenseGate = licenseGate;
this.licensePublicKey = licensePublicKey;
}
@GetMapping
@Operation(summary = "Get current license info")
public ResponseEntity<LicenseInfo> getCurrent() {
return ResponseEntity.ok(licenseGate.getCurrent());
}
record UpdateLicenseRequest(String token) {}
@PostMapping
@Operation(summary = "Update license token at runtime")
public ResponseEntity<?> update(@RequestBody UpdateLicenseRequest request) {
if (licensePublicKey == null || licensePublicKey.isBlank()) {
return ResponseEntity.badRequest().body(Map.of("error", "No license public key configured"));
}
try {
LicenseValidator validator = new LicenseValidator(licensePublicKey);
LicenseInfo info = validator.validate(request.token());
licenseGate.load(info);
return ResponseEntity.ok(info);
} catch (Exception e) {
return ResponseEntity.badRequest().body(Map.of("error", e.getMessage()));
}
}
}

View File

@@ -1,12 +1,14 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.common.model.LogBatch;
import com.cameleer3.server.app.search.OpenSearchLogIndex;
import com.cameleer3.server.core.ingestion.BufferedLogEntry;
import com.cameleer3.server.core.ingestion.WriteBuffer;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import com.cameleer3.server.app.config.TenantProperties;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
@@ -24,26 +26,33 @@ public class LogIngestionController {
private static final Logger log = LoggerFactory.getLogger(LogIngestionController.class);
private final OpenSearchLogIndex logIndex;
private final WriteBuffer<BufferedLogEntry> logBuffer;
private final AgentRegistryService registryService;
private final TenantProperties tenantProperties;
public LogIngestionController(OpenSearchLogIndex logIndex,
AgentRegistryService registryService) {
this.logIndex = logIndex;
public LogIngestionController(WriteBuffer<BufferedLogEntry> logBuffer,
AgentRegistryService registryService,
TenantProperties tenantProperties) {
this.logBuffer = logBuffer;
this.registryService = registryService;
this.tenantProperties = tenantProperties;
}
@PostMapping("/logs")
@Operation(summary = "Ingest application log entries",
description = "Accepts a batch of log entries from an agent. Entries are indexed in OpenSearch.")
description = "Accepts a batch of log entries from an agent. Entries are buffered and flushed periodically.")
@ApiResponse(responseCode = "202", description = "Logs accepted for indexing")
public ResponseEntity<Void> ingestLogs(@RequestBody LogBatch batch) {
String agentId = extractAgentId();
String application = resolveApplicationName(agentId);
String instanceId = extractAgentId();
String applicationId = resolveApplicationId(instanceId);
if (batch.getEntries() != null && !batch.getEntries().isEmpty()) {
log.debug("Received {} log entries from agent={}, app={}", batch.getEntries().size(), agentId, application);
logIndex.indexBatch(agentId, application, batch.getEntries());
log.debug("Received {} log entries from instance={}, app={}", batch.getEntries().size(), instanceId, applicationId);
String environment = resolveEnvironment(instanceId);
for (var entry : batch.getEntries()) {
logBuffer.offerOrWarn(new BufferedLogEntry(
tenantProperties.getId(), environment, instanceId, applicationId, entry));
}
}
return ResponseEntity.accepted().build();
@@ -54,8 +63,13 @@ public class LogIngestionController {
return auth != null ? auth.getName() : "";
}
private String resolveApplicationName(String agentId) {
AgentInfo agent = registryService.findById(agentId);
return agent != null ? agent.application() : "";
private String resolveApplicationId(String instanceId) {
AgentInfo agent = registryService.findById(instanceId);
return agent != null ? agent.applicationId() : "";
}
private String resolveEnvironment(String instanceId) {
AgentInfo agent = registryService.findById(instanceId);
return agent != null && agent.environmentId() != null ? agent.environmentId() : "default";
}
}

View File

@@ -1,7 +1,10 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.dto.LogEntryResponse;
import com.cameleer3.server.app.search.OpenSearchLogIndex;
import com.cameleer3.server.app.dto.LogSearchPageResponse;
import com.cameleer3.server.core.search.LogSearchRequest;
import com.cameleer3.server.core.search.LogSearchResponse;
import com.cameleer3.server.core.storage.LogIndex;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.http.ResponseEntity;
@@ -11,40 +14,69 @@ import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.time.Instant;
import java.util.Arrays;
import java.util.List;
@RestController
@RequestMapping("/api/v1/logs")
@Tag(name = "Application Logs", description = "Query application logs stored in OpenSearch")
@Tag(name = "Application Logs", description = "Query application logs")
public class LogQueryController {
private final OpenSearchLogIndex logIndex;
private final LogIndex logIndex;
public LogQueryController(OpenSearchLogIndex logIndex) {
public LogQueryController(LogIndex logIndex) {
this.logIndex = logIndex;
}
@GetMapping
@Operation(summary = "Search application log entries",
description = "Returns log entries for a given application, optionally filtered by agent, level, time range, and text query")
public ResponseEntity<List<LogEntryResponse>> searchLogs(
@RequestParam String application,
@RequestParam(required = false) String agentId,
@RequestParam(required = false) String level,
description = "Returns log entries with cursor-based pagination and level count aggregation. " +
"Supports free-text search, multi-level filtering, and optional application scoping.")
public ResponseEntity<LogSearchPageResponse> searchLogs(
@RequestParam(required = false) String q,
@RequestParam(required = false) String query,
@RequestParam(required = false) String level,
@RequestParam(required = false) String application,
@RequestParam(name = "agentId", required = false) String instanceId,
@RequestParam(required = false) String exchangeId,
@RequestParam(required = false) String logger,
@RequestParam(required = false) String environment,
@RequestParam(required = false) String from,
@RequestParam(required = false) String to,
@RequestParam(defaultValue = "200") int limit) {
@RequestParam(required = false) String cursor,
@RequestParam(defaultValue = "100") int limit,
@RequestParam(defaultValue = "desc") String sort) {
limit = Math.min(limit, 1000);
// q takes precedence over deprecated query param
String searchText = q != null ? q : query;
// Parse CSV levels
List<String> levels = List.of();
if (level != null && !level.isEmpty()) {
levels = Arrays.stream(level.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.toList();
}
Instant fromInstant = from != null ? Instant.parse(from) : null;
Instant toInstant = to != null ? Instant.parse(to) : null;
List<LogEntryResponse> entries = logIndex.search(
application, agentId, level, query, exchangeId, fromInstant, toInstant, limit);
LogSearchRequest request = new LogSearchRequest(
searchText, levels, application, instanceId, exchangeId,
logger, environment, fromInstant, toInstant, cursor, limit, sort);
return ResponseEntity.ok(entries);
LogSearchResponse result = logIndex.search(request);
List<LogEntryResponse> entries = result.data().stream()
.map(r -> new LogEntryResponse(
r.timestamp(), r.level(), r.loggerName(),
r.message(), r.threadName(), r.stackTrace(),
r.exchangeId(), r.instanceId(), r.application(),
r.mdc()))
.toList();
return ResponseEntity.ok(new LogSearchPageResponse(
entries, result.nextCursor(), result.hasMore(), result.levelCounts()));
}
}

View File

@@ -98,10 +98,13 @@ public class OidcConfigAdminController {
request.issuerUri() != null ? request.issuerUri() : "",
request.clientId() != null ? request.clientId() : "",
clientSecret,
request.rolesClaim() != null ? request.rolesClaim() : "realm_access.roles",
request.rolesClaim() != null ? request.rolesClaim() : "roles",
request.defaultRoles() != null ? request.defaultRoles() : List.of("VIEWER"),
request.autoSignup(),
request.displayNameClaim() != null ? request.displayNameClaim() : "name"
request.displayNameClaim() != null ? request.displayNameClaim() : "name",
request.userIdClaim() != null ? request.userIdClaim() : "sub",
request.audience() != null ? request.audience() : "",
request.additionalScopes() != null ? request.additionalScopes() : List.of()
);
configRepository.save(config);

View File

@@ -1,266 +0,0 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.dto.IndexInfoResponse;
import com.cameleer3.server.app.dto.IndicesPageResponse;
import com.cameleer3.server.app.dto.OpenSearchStatusResponse;
import com.cameleer3.server.app.dto.PerformanceResponse;
import com.cameleer3.server.app.dto.PipelineStatsResponse;
import com.cameleer3.server.core.admin.AuditCategory;
import com.cameleer3.server.core.admin.AuditResult;
import com.cameleer3.server.core.admin.AuditService;
import com.cameleer3.server.core.indexing.SearchIndexerStats;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.servlet.http.HttpServletRequest;
import org.opensearch.client.Request;
import org.opensearch.client.Response;
import org.opensearch.client.RestClient;
import org.opensearch.client.opensearch.OpenSearchClient;
import org.opensearch.client.opensearch.cluster.HealthResponse;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.DeleteMapping;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
@RestController
@RequestMapping("/api/v1/admin/opensearch")
@PreAuthorize("hasRole('ADMIN')")
@Tag(name = "OpenSearch Admin", description = "OpenSearch monitoring and management (ADMIN only)")
public class OpenSearchAdminController {
private final OpenSearchClient client;
private final RestClient restClient;
private final SearchIndexerStats indexerStats;
private final AuditService auditService;
private final ObjectMapper objectMapper;
private final String opensearchUrl;
private final String indexPrefix;
private final String logIndexPrefix;
public OpenSearchAdminController(OpenSearchClient client, RestClient restClient,
SearchIndexerStats indexerStats, AuditService auditService,
ObjectMapper objectMapper,
@Value("${opensearch.url:http://localhost:9200}") String opensearchUrl,
@Value("${opensearch.index-prefix:executions-}") String indexPrefix,
@Value("${opensearch.log-index-prefix:logs-}") String logIndexPrefix) {
this.client = client;
this.restClient = restClient;
this.indexerStats = indexerStats;
this.auditService = auditService;
this.objectMapper = objectMapper;
this.opensearchUrl = opensearchUrl;
this.indexPrefix = indexPrefix;
this.logIndexPrefix = logIndexPrefix;
}
@GetMapping("/status")
@Operation(summary = "Get OpenSearch cluster status and version")
public ResponseEntity<OpenSearchStatusResponse> getStatus() {
try {
HealthResponse health = client.cluster().health();
String version = client.info().version().number();
return ResponseEntity.ok(new OpenSearchStatusResponse(
true,
health.status().name(),
version,
health.numberOfNodes(),
opensearchUrl));
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(new OpenSearchStatusResponse(
false, "UNREACHABLE", null, 0, opensearchUrl));
}
}
@GetMapping("/pipeline")
@Operation(summary = "Get indexing pipeline statistics")
public ResponseEntity<PipelineStatsResponse> getPipeline() {
return ResponseEntity.ok(new PipelineStatsResponse(
indexerStats.getQueueDepth(),
indexerStats.getMaxQueueSize(),
indexerStats.getFailedCount(),
indexerStats.getIndexedCount(),
indexerStats.getDebounceMs(),
indexerStats.getIndexingRate(),
indexerStats.getLastIndexedAt()));
}
@GetMapping("/indices")
@Operation(summary = "Get OpenSearch indices with pagination")
public ResponseEntity<IndicesPageResponse> getIndices(
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "20") int size,
@RequestParam(defaultValue = "") String search,
@RequestParam(defaultValue = "executions") String prefix) {
try {
Response response = restClient.performRequest(
new Request("GET", "/_cat/indices?format=json&h=index,health,docs.count,store.size,pri,rep&bytes=b"));
JsonNode indices;
try (InputStream is = response.getEntity().getContent()) {
indices = objectMapper.readTree(is);
}
String filterPrefix = "logs".equals(prefix) ? logIndexPrefix : indexPrefix;
List<IndexInfoResponse> allIndices = new ArrayList<>();
for (JsonNode idx : indices) {
String name = idx.path("index").asText("");
if (!name.startsWith(filterPrefix)) {
continue;
}
if (!search.isEmpty() && !name.contains(search)) {
continue;
}
allIndices.add(new IndexInfoResponse(
name,
parseLong(idx.path("docs.count").asText("0")),
humanSize(parseLong(idx.path("store.size").asText("0"))),
parseLong(idx.path("store.size").asText("0")),
idx.path("health").asText("unknown"),
parseInt(idx.path("pri").asText("0")),
parseInt(idx.path("rep").asText("0"))));
}
allIndices.sort(Comparator.comparing(IndexInfoResponse::name));
long totalDocs = allIndices.stream().mapToLong(IndexInfoResponse::docCount).sum();
long totalBytes = allIndices.stream().mapToLong(IndexInfoResponse::sizeBytes).sum();
int totalIndices = allIndices.size();
int totalPages = Math.max(1, (int) Math.ceil((double) totalIndices / size));
int fromIndex = Math.min(page * size, totalIndices);
int toIndex = Math.min(fromIndex + size, totalIndices);
List<IndexInfoResponse> pageItems = allIndices.subList(fromIndex, toIndex);
return ResponseEntity.ok(new IndicesPageResponse(
pageItems, totalIndices, totalDocs,
humanSize(totalBytes), page, size, totalPages));
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.BAD_GATEWAY)
.body(new IndicesPageResponse(
List.of(), 0, 0, "0 B", page, size, 0));
}
}
@DeleteMapping("/indices/{name}")
@Operation(summary = "Delete an OpenSearch index")
public ResponseEntity<Void> deleteIndex(@PathVariable String name, HttpServletRequest request) {
try {
if (!name.startsWith(indexPrefix) && !name.startsWith(logIndexPrefix)) {
throw new ResponseStatusException(HttpStatus.FORBIDDEN, "Cannot delete index outside application scope");
}
boolean exists = client.indices().exists(r -> r.index(name)).value();
if (!exists) {
throw new ResponseStatusException(HttpStatus.NOT_FOUND, "Index not found: " + name);
}
client.indices().delete(r -> r.index(name));
auditService.log("delete_index", AuditCategory.INFRA, name, null, AuditResult.SUCCESS, request);
return ResponseEntity.ok().build();
} catch (ResponseStatusException e) {
throw e;
} catch (Exception e) {
throw new ResponseStatusException(HttpStatus.INTERNAL_SERVER_ERROR, "Failed to delete index: " + e.getMessage());
}
}
@GetMapping("/performance")
@Operation(summary = "Get OpenSearch performance metrics")
public ResponseEntity<PerformanceResponse> getPerformance() {
try {
Response response = restClient.performRequest(
new Request("GET", "/_nodes/stats/jvm,indices"));
JsonNode root;
try (InputStream is = response.getEntity().getContent()) {
root = objectMapper.readTree(is);
}
JsonNode nodes = root.path("nodes");
long heapUsed = 0, heapMax = 0;
long queryCacheHits = 0, queryCacheMisses = 0;
long requestCacheHits = 0, requestCacheMisses = 0;
long searchQueryTotal = 0, searchQueryTimeMs = 0;
long indexTotal = 0, indexTimeMs = 0;
var it = nodes.fields();
while (it.hasNext()) {
var entry = it.next();
JsonNode node = entry.getValue();
JsonNode jvm = node.path("jvm").path("mem");
heapUsed += jvm.path("heap_used_in_bytes").asLong(0);
heapMax += jvm.path("heap_max_in_bytes").asLong(0);
JsonNode indicesNode = node.path("indices");
JsonNode queryCache = indicesNode.path("query_cache");
queryCacheHits += queryCache.path("hit_count").asLong(0);
queryCacheMisses += queryCache.path("miss_count").asLong(0);
JsonNode requestCache = indicesNode.path("request_cache");
requestCacheHits += requestCache.path("hit_count").asLong(0);
requestCacheMisses += requestCache.path("miss_count").asLong(0);
JsonNode searchNode = indicesNode.path("search");
searchQueryTotal += searchNode.path("query_total").asLong(0);
searchQueryTimeMs += searchNode.path("query_time_in_millis").asLong(0);
JsonNode indexing = indicesNode.path("indexing");
indexTotal += indexing.path("index_total").asLong(0);
indexTimeMs += indexing.path("index_time_in_millis").asLong(0);
}
double queryCacheHitRate = (queryCacheHits + queryCacheMisses) > 0
? (double) queryCacheHits / (queryCacheHits + queryCacheMisses) : 0.0;
double requestCacheHitRate = (requestCacheHits + requestCacheMisses) > 0
? (double) requestCacheHits / (requestCacheHits + requestCacheMisses) : 0.0;
double searchLatency = searchQueryTotal > 0
? (double) searchQueryTimeMs / searchQueryTotal : 0.0;
double indexingLatency = indexTotal > 0
? (double) indexTimeMs / indexTotal : 0.0;
return ResponseEntity.ok(new PerformanceResponse(
queryCacheHitRate, requestCacheHitRate,
searchLatency, indexingLatency,
heapUsed, heapMax));
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.BAD_GATEWAY)
.body(new PerformanceResponse(0, 0, 0, 0, 0, 0));
}
}
private static long parseLong(String s) {
try {
return Long.parseLong(s);
} catch (NumberFormatException e) {
return 0;
}
}
private static int parseInt(String s) {
try {
return Integer.parseInt(s);
} catch (NumberFormatException e) {
return 0;
}
}
private static String humanSize(long bytes) {
if (bytes < 1024) return bytes + " B";
if (bytes < 1024 * 1024) return String.format("%.1f KB", bytes / 1024.0);
if (bytes < 1024 * 1024 * 1024) return String.format("%.1f MB", bytes / (1024.0 * 1024));
return String.format("%.1f GB", bytes / (1024.0 * 1024 * 1024));
}
}

View File

@@ -3,7 +3,6 @@ package com.cameleer3.server.app.controller;
import com.cameleer3.server.core.admin.AuditCategory;
import com.cameleer3.server.core.admin.AuditResult;
import com.cameleer3.server.core.admin.AuditService;
import com.cameleer3.server.core.rbac.RbacService;
import com.cameleer3.server.core.rbac.RoleDetail;
import com.cameleer3.server.core.rbac.RoleRepository;
import com.cameleer3.server.core.rbac.SystemRole;
@@ -37,13 +36,10 @@ import java.util.UUID;
public class RoleAdminController {
private final RoleRepository roleRepository;
private final RbacService rbacService;
private final AuditService auditService;
public RoleAdminController(RoleRepository roleRepository, RbacService rbacService,
AuditService auditService) {
public RoleAdminController(RoleRepository roleRepository, AuditService auditService) {
this.roleRepository = roleRepository;
this.rbacService = rbacService;
this.auditService = auditService;
}

View File

@@ -7,6 +7,7 @@ import com.cameleer3.common.graph.RouteGraph;
import com.cameleer3.server.core.agent.AgentInfo;
import com.cameleer3.server.core.agent.AgentRegistryService;
import com.cameleer3.server.core.agent.AgentState;
import com.cameleer3.server.core.agent.RouteStateRegistry;
import com.cameleer3.server.core.storage.DiagramStore;
import com.cameleer3.server.core.storage.StatsStore;
import io.swagger.v3.oas.annotations.Operation;
@@ -35,16 +36,21 @@ import java.util.stream.Collectors;
@Tag(name = "Route Catalog", description = "Route catalog and discovery")
public class RouteCatalogController {
private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(RouteCatalogController.class);
private final AgentRegistryService registryService;
private final DiagramStore diagramStore;
private final JdbcTemplate jdbc;
private final RouteStateRegistry routeStateRegistry;
public RouteCatalogController(AgentRegistryService registryService,
DiagramStore diagramStore,
JdbcTemplate jdbc) {
@org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc,
RouteStateRegistry routeStateRegistry) {
this.registryService = registryService;
this.diagramStore = diagramStore;
this.jdbc = jdbc;
this.routeStateRegistry = routeStateRegistry;
}
@GetMapping("/catalog")
@@ -53,12 +59,20 @@ public class RouteCatalogController {
@ApiResponse(responseCode = "200", description = "Catalog returned")
public ResponseEntity<List<AppCatalogEntry>> getCatalog(
@RequestParam(required = false) String from,
@RequestParam(required = false) String to) {
@RequestParam(required = false) String to,
@RequestParam(required = false) String environment) {
List<AgentInfo> allAgents = registryService.findAll();
// Filter agents by environment if specified
if (environment != null && !environment.isBlank()) {
allAgents = allAgents.stream()
.filter(a -> environment.equals(a.environmentId()))
.toList();
}
// Group agents by application name
Map<String, List<AgentInfo>> agentsByApp = allAgents.stream()
.collect(Collectors.groupingBy(AgentInfo::application, LinkedHashMap::new, Collectors.toList()));
.collect(Collectors.groupingBy(AgentInfo::applicationId, LinkedHashMap::new, Collectors.toList()));
// Collect all distinct routes per app
Map<String, Set<String>> routesByApp = new LinkedHashMap<>();
@@ -76,64 +90,65 @@ public class RouteCatalogController {
Instant now = Instant.now();
Instant rangeFrom = from != null ? Instant.parse(from) : now.minus(24, ChronoUnit.HOURS);
Instant rangeTo = to != null ? Instant.parse(to) : now;
Instant from1m = now.minus(1, ChronoUnit.MINUTES);
// Route exchange counts from continuous aggregate
// Route exchange counts from AggregatingMergeTree (literal SQL — ClickHouse JDBC driver
// wraps prepared statements in sub-queries that strip AggregateFunction column types)
Map<String, Long> routeExchangeCounts = new LinkedHashMap<>();
Map<String, Instant> routeLastSeen = new LinkedHashMap<>();
try {
String envFilter = (environment != null && !environment.isBlank())
? " AND environment = " + lit(environment) : "";
jdbc.query(
"SELECT application_name, route_id, SUM(total_count) AS cnt, MAX(bucket) AS last_seen " +
"FROM stats_1m_route WHERE bucket >= ? AND bucket < ? " +
"GROUP BY application_name, route_id",
"SELECT application_id, route_id, countMerge(total_count) AS cnt, MAX(bucket) AS last_seen " +
"FROM stats_1m_route WHERE bucket >= " + lit(rangeFrom) + " AND bucket < " + lit(rangeTo) +
envFilter +
" GROUP BY application_id, route_id",
rs -> {
String key = rs.getString("application_name") + "/" + rs.getString("route_id");
String key = rs.getString("application_id") + "/" + rs.getString("route_id");
routeExchangeCounts.put(key, rs.getLong("cnt"));
Timestamp ts = rs.getTimestamp("last_seen");
if (ts != null) routeLastSeen.put(key, ts.toInstant());
},
Timestamp.from(rangeFrom), Timestamp.from(rangeTo));
});
} catch (Exception e) {
// Continuous aggregate may not exist yet
log.warn("Failed to query route exchange counts: {}", e.getMessage());
}
// Per-agent TPS from the last minute
Map<String, Double> agentTps = new LinkedHashMap<>();
try {
jdbc.query(
"SELECT application_name, SUM(total_count) AS cnt " +
"FROM stats_1m_route WHERE bucket >= ? AND bucket < ? " +
"GROUP BY application_name",
rs -> {
// This gives per-app TPS; we'll distribute among agents below
},
Timestamp.from(from1m), Timestamp.from(now));
} catch (Exception e) {
// Continuous aggregate may not exist yet
// Merge route IDs from ClickHouse stats into routesByApp.
// After server restart, auto-healed agents have empty routeIds, but
// ClickHouse still has execution data with the correct route IDs.
for (var countEntry : routeExchangeCounts.entrySet()) {
String[] parts = countEntry.getKey().split("/", 2);
if (parts.length == 2) {
routesByApp.computeIfAbsent(parts[0], k -> new LinkedHashSet<>()).add(parts[1]);
}
}
// Build catalog entries
// Build catalog entries — merge apps from agent registry + ClickHouse data
Set<String> allAppIds = new LinkedHashSet<>(agentsByApp.keySet());
allAppIds.addAll(routesByApp.keySet());
List<AppCatalogEntry> catalog = new ArrayList<>();
for (var entry : agentsByApp.entrySet()) {
String appId = entry.getKey();
List<AgentInfo> agents = entry.getValue();
for (String appId : allAppIds) {
List<AgentInfo> agents = agentsByApp.getOrDefault(appId, List.of());
// Routes
Set<String> routeIds = routesByApp.getOrDefault(appId, Set.of());
List<String> agentIds = agents.stream().map(AgentInfo::id).toList();
List<String> agentIds = agents.stream().map(AgentInfo::instanceId).toList();
List<RouteSummary> routeSummaries = routeIds.stream()
.map(routeId -> {
String key = appId + "/" + routeId;
long count = routeExchangeCounts.getOrDefault(key, 0L);
Instant lastSeen = routeLastSeen.get(key);
String fromUri = resolveFromEndpointUri(routeId, agentIds);
return new RouteSummary(routeId, count, lastSeen, fromUri);
String state = routeStateRegistry.getState(appId, routeId).name().toLowerCase();
// Only include non-default states (stopped/suspended); null means started
String routeState = "started".equals(state) ? null : state;
return new RouteSummary(routeId, count, lastSeen, fromUri, routeState);
})
.toList();
// Agent summaries
List<AgentSummary> agentSummaries = agents.stream()
.map(a -> new AgentSummary(a.id(), a.name(), a.state().name().toLowerCase(), 0.0))
.map(a -> new AgentSummary(a.instanceId(), a.displayName(), a.state().name().toLowerCase(), 0.0))
.toList();
// Health = worst state among agents
@@ -158,6 +173,18 @@ public class RouteCatalogController {
.orElse(null);
}
/** Format an Instant as a ClickHouse DateTime literal in UTC. */
private static String lit(Instant instant) {
return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
.withZone(java.time.ZoneOffset.UTC)
.format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
}
/** Format a string as a ClickHouse SQL literal with backslash + quote escaping. */
private static String lit(String value) {
return "'" + value.replace("\\", "\\\\").replace("'", "\\'") + "'";
}
private String computeWorstHealth(List<AgentInfo> agents) {
boolean hasDead = false;
boolean hasStale = false;

View File

@@ -32,7 +32,7 @@ public class RouteMetricsController {
private final StatsStore statsStore;
private final AppSettingsRepository appSettingsRepository;
public RouteMetricsController(JdbcTemplate jdbc, StatsStore statsStore,
public RouteMetricsController(@org.springframework.beans.factory.annotation.Qualifier("clickHouseJdbcTemplate") JdbcTemplate jdbc, StatsStore statsStore,
AppSettingsRepository appSettingsRepository) {
this.jdbc = jdbc;
this.statsStore = statsStore;
@@ -46,35 +46,33 @@ public class RouteMetricsController {
public ResponseEntity<List<RouteMetrics>> getMetrics(
@RequestParam(required = false) String from,
@RequestParam(required = false) String to,
@RequestParam(required = false) String appId) {
@RequestParam(required = false) String appId,
@RequestParam(required = false) String environment) {
Instant toInstant = to != null ? Instant.parse(to) : Instant.now();
Instant fromInstant = from != null ? Instant.parse(from) : toInstant.minus(24, ChronoUnit.HOURS);
long windowSeconds = Duration.between(fromInstant, toInstant).toSeconds();
// Literal SQL — ClickHouse JDBC driver wraps prepared statements in sub-queries
// that strip AggregateFunction column types, breaking -Merge combinators
var sql = new StringBuilder(
"SELECT application_name, route_id, " +
"SUM(total_count) AS total, " +
"SUM(failed_count) AS failed, " +
"CASE WHEN SUM(total_count) > 0 THEN SUM(duration_sum) / SUM(total_count) ELSE 0 END AS avg_dur, " +
"COALESCE(MAX(p99_duration), 0) AS p99_dur " +
"FROM stats_1m_route WHERE bucket >= ? AND bucket < ?");
var params = new ArrayList<Object>();
params.add(Timestamp.from(fromInstant));
params.add(Timestamp.from(toInstant));
"SELECT application_id, route_id, " +
"countMerge(total_count) AS total, " +
"countIfMerge(failed_count) AS failed, " +
"CASE WHEN countMerge(total_count) > 0 THEN toFloat64(sumMerge(duration_sum)) / countMerge(total_count) ELSE 0 END AS avg_dur, " +
"COALESCE(quantileMerge(0.99)(p99_duration), 0) AS p99_dur " +
"FROM stats_1m_route WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant));
if (appId != null) {
sql.append(" AND application_name = ?");
params.add(appId);
sql.append(" AND application_id = " + lit(appId));
}
sql.append(" GROUP BY application_name, route_id ORDER BY application_name, route_id");
// Key struct for sparkline lookup
record RouteKey(String appId, String routeId) {}
List<RouteKey> routeKeys = new ArrayList<>();
if (environment != null) {
sql.append(" AND environment = " + lit(environment));
}
sql.append(" GROUP BY application_id, route_id ORDER BY application_id, route_id");
List<RouteMetrics> metrics = jdbc.query(sql.toString(), (rs, rowNum) -> {
String applicationName = rs.getString("application_name");
String applicationId = rs.getString("application_id");
String routeId = rs.getString("route_id");
long total = rs.getLong("total");
long failed = rs.getLong("failed");
@@ -85,10 +83,9 @@ public class RouteMetricsController {
double errorRate = total > 0 ? (double) failed / total : 0.0;
double tps = windowSeconds > 0 ? (double) total / windowSeconds : 0.0;
routeKeys.add(new RouteKey(applicationName, routeId));
return new RouteMetrics(routeId, applicationName, total, successRate,
return new RouteMetrics(routeId, applicationId, total, successRate,
avgDur, p99Dur, errorRate, tps, List.of(), -1.0);
}, params.toArray());
});
// Fetch sparklines (12 buckets over the time window)
if (!metrics.isEmpty()) {
@@ -98,15 +95,17 @@ public class RouteMetricsController {
for (int i = 0; i < metrics.size(); i++) {
RouteMetrics m = metrics.get(i);
try {
List<Double> sparkline = jdbc.query(
"SELECT time_bucket(? * INTERVAL '1 second', bucket) AS period, " +
"COALESCE(SUM(total_count), 0) AS cnt " +
"FROM stats_1m_route WHERE bucket >= ? AND bucket < ? " +
"AND application_name = ? AND route_id = ? " +
"GROUP BY period ORDER BY period",
(rs, rowNum) -> rs.getDouble("cnt"),
bucketSeconds, Timestamp.from(fromInstant), Timestamp.from(toInstant),
m.appId(), m.routeId());
var sparkWhere = new StringBuilder(
"FROM stats_1m_route WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant) +
" AND application_id = " + lit(m.appId()) + " AND route_id = " + lit(m.routeId()));
if (environment != null) {
sparkWhere.append(" AND environment = " + lit(environment));
}
String sparkSql = "SELECT toStartOfInterval(bucket, toIntervalSecond(" + bucketSeconds + ")) AS period, " +
"COALESCE(countMerge(total_count), 0) AS cnt " +
sparkWhere + " GROUP BY period ORDER BY period";
List<Double> sparkline = jdbc.query(sparkSql,
(rs, rowNum) -> rs.getDouble("cnt"));
metrics.set(i, new RouteMetrics(m.routeId(), m.appId(), m.exchangeCount(),
m.successRate(), m.avgDurationMs(), m.p99DurationMs(),
m.errorRate(), m.throughputPerSec(), sparkline, m.slaCompliance()));
@@ -120,11 +119,11 @@ public class RouteMetricsController {
if (!metrics.isEmpty()) {
// Determine SLA threshold (per-app or default)
String effectiveAppId = appId != null ? appId : (metrics.isEmpty() ? null : metrics.get(0).appId());
int threshold = appSettingsRepository.findByAppId(effectiveAppId != null ? effectiveAppId : "")
int threshold = appSettingsRepository.findByApplicationId(effectiveAppId != null ? effectiveAppId : "")
.map(AppSettings::slaThresholdMs).orElse(300);
Map<String, long[]> slaCounts = statsStore.slaCountsByRoute(fromInstant, toInstant,
effectiveAppId, threshold);
effectiveAppId, threshold, environment);
for (int i = 0; i < metrics.size(); i++) {
RouteMetrics m = metrics.get(i);
@@ -148,47 +147,63 @@ public class RouteMetricsController {
@RequestParam String routeId,
@RequestParam(required = false) String appId,
@RequestParam(required = false) Instant from,
@RequestParam(required = false) Instant to) {
@RequestParam(required = false) Instant to,
@RequestParam(required = false) String environment) {
Instant toInstant = to != null ? to : Instant.now();
Instant fromInstant = from != null ? from : toInstant.minus(24, ChronoUnit.HOURS);
// Literal SQL for AggregatingMergeTree -Merge combinators.
// Aliases (tc, fc) must NOT shadow column names (total_count, failed_count) —
// ClickHouse 24.12 new analyzer resolves subsequent countMerge(total_count)
// to the alias (UInt64) instead of the AggregateFunction column.
var sql = new StringBuilder(
"SELECT processor_id, processor_type, route_id, application_name, " +
"SUM(total_count) AS total_count, " +
"SUM(failed_count) AS failed_count, " +
"CASE WHEN SUM(total_count) > 0 THEN SUM(duration_sum)::double precision / SUM(total_count) ELSE 0 END AS avg_duration_ms, " +
"MAX(p99_duration) AS p99_duration_ms " +
"SELECT processor_id, processor_type, route_id, application_id, " +
"countMerge(total_count) AS tc, " +
"countIfMerge(failed_count) AS fc, " +
"CASE WHEN countMerge(total_count) > 0 THEN toFloat64(sumMerge(duration_sum)) / countMerge(total_count) ELSE 0 END AS avg_duration_ms, " +
"quantileMerge(0.99)(p99_duration) AS p99_duration_ms " +
"FROM stats_1m_processor_detail " +
"WHERE bucket >= ? AND bucket < ? AND route_id = ?");
var params = new ArrayList<Object>();
params.add(Timestamp.from(fromInstant));
params.add(Timestamp.from(toInstant));
params.add(routeId);
"WHERE bucket >= " + lit(fromInstant) + " AND bucket < " + lit(toInstant) +
" AND route_id = " + lit(routeId));
if (appId != null) {
sql.append(" AND application_name = ?");
params.add(appId);
sql.append(" AND application_id = " + lit(appId));
}
sql.append(" GROUP BY processor_id, processor_type, route_id, application_name");
sql.append(" ORDER BY SUM(total_count) DESC");
if (environment != null) {
sql.append(" AND environment = " + lit(environment));
}
sql.append(" GROUP BY processor_id, processor_type, route_id, application_id");
sql.append(" ORDER BY tc DESC");
List<ProcessorMetrics> metrics = jdbc.query(sql.toString(), (rs, rowNum) -> {
long totalCount = rs.getLong("total_count");
long failedCount = rs.getLong("failed_count");
long totalCount = rs.getLong("tc");
long failedCount = rs.getLong("fc");
double errorRate = failedCount > 0 ? (double) failedCount / totalCount : 0.0;
return new ProcessorMetrics(
rs.getString("processor_id"),
rs.getString("processor_type"),
rs.getString("route_id"),
rs.getString("application_name"),
rs.getString("application_id"),
totalCount,
failedCount,
rs.getDouble("avg_duration_ms"),
rs.getDouble("p99_duration_ms"),
errorRate);
}, params.toArray());
});
return ResponseEntity.ok(metrics);
}
/** Format an Instant as a ClickHouse DateTime literal. */
private static String lit(Instant instant) {
return "'" + java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
.withZone(java.time.ZoneOffset.UTC)
.format(instant.truncatedTo(ChronoUnit.SECONDS)) + "'";
}
/** Format a string as a ClickHouse SQL literal with backslash + quote escaping. */
private static String lit(String value) {
return "'" + value.replace("\\", "\\\\").replace("'", "\\'") + "'";
}
}

View File

@@ -57,9 +57,10 @@ public class SearchController {
@RequestParam(required = false) String correlationId,
@RequestParam(required = false) String text,
@RequestParam(required = false) String routeId,
@RequestParam(required = false) String agentId,
@RequestParam(name = "agentId", required = false) String instanceId,
@RequestParam(required = false) String processorType,
@RequestParam(required = false) String application,
@RequestParam(required = false) String environment,
@RequestParam(defaultValue = "0") int offset,
@RequestParam(defaultValue = "50") int limit,
@RequestParam(required = false) String sortField,
@@ -72,10 +73,11 @@ public class SearchController {
null, null,
correlationId,
text, null, null, null,
routeId, agentId, processorType,
routeId, instanceId, processorType,
application, agentIds,
offset, limit,
sortField, sortDir
sortField, sortDir,
environment
);
return ResponseEntity.ok(searchService.search(request));
@@ -87,9 +89,9 @@ public class SearchController {
@RequestBody SearchRequest request) {
// Resolve application to agentIds if application is specified but agentIds is not
SearchRequest resolved = request;
if (request.application() != null && !request.application().isBlank()
&& (request.agentIds() == null || request.agentIds().isEmpty())) {
resolved = request.withAgentIds(resolveApplicationToAgentIds(request.application()));
if (request.applicationId() != null && !request.applicationId().isBlank()
&& (request.instanceIds() == null || request.instanceIds().isEmpty())) {
resolved = request.withInstanceIds(resolveApplicationToAgentIds(request.applicationId()));
}
return ResponseEntity.ok(searchService.search(resolved));
}
@@ -100,23 +102,24 @@ public class SearchController {
@RequestParam Instant from,
@RequestParam(required = false) Instant to,
@RequestParam(required = false) String routeId,
@RequestParam(required = false) String application) {
@RequestParam(required = false) String application,
@RequestParam(required = false) String environment) {
Instant end = to != null ? to : Instant.now();
ExecutionStats stats;
if (routeId == null && application == null) {
stats = searchService.stats(from, end);
stats = searchService.stats(from, end, environment);
} else if (routeId == null) {
stats = searchService.statsForApp(from, end, application);
stats = searchService.statsForApp(from, end, application, environment);
} else {
List<String> agentIds = resolveApplicationToAgentIds(application);
stats = searchService.stats(from, end, routeId, agentIds);
stats = searchService.stats(from, end, routeId, agentIds, environment);
}
// Enrich with SLA compliance
int threshold = appSettingsRepository
.findByAppId(application != null ? application : "")
.findByApplicationId(application != null ? application : "")
.map(AppSettings::slaThresholdMs).orElse(300);
double sla = searchService.slaCompliance(from, end, threshold, application, routeId);
double sla = searchService.slaCompliance(from, end, threshold, application, routeId, environment);
return ResponseEntity.ok(stats.withSlaCompliance(sla));
}
@@ -127,19 +130,20 @@ public class SearchController {
@RequestParam(required = false) Instant to,
@RequestParam(defaultValue = "24") int buckets,
@RequestParam(required = false) String routeId,
@RequestParam(required = false) String application) {
@RequestParam(required = false) String application,
@RequestParam(required = false) String environment) {
Instant end = to != null ? to : Instant.now();
if (routeId == null && application == null) {
return ResponseEntity.ok(searchService.timeseries(from, end, buckets));
return ResponseEntity.ok(searchService.timeseries(from, end, buckets, environment));
}
if (routeId == null) {
return ResponseEntity.ok(searchService.timeseriesForApp(from, end, buckets, application));
return ResponseEntity.ok(searchService.timeseriesForApp(from, end, buckets, application, environment));
}
List<String> agentIds = resolveApplicationToAgentIds(application);
if (routeId == null && agentIds == null) {
return ResponseEntity.ok(searchService.timeseries(from, end, buckets));
if (routeId == null && agentIds.isEmpty()) {
return ResponseEntity.ok(searchService.timeseries(from, end, buckets, environment));
}
return ResponseEntity.ok(searchService.timeseries(from, end, buckets, routeId, agentIds));
return ResponseEntity.ok(searchService.timeseries(from, end, buckets, routeId, agentIds, environment));
}
@GetMapping("/stats/timeseries/by-app")
@@ -147,9 +151,10 @@ public class SearchController {
public ResponseEntity<Map<String, StatsTimeseries>> timeseriesByApp(
@RequestParam Instant from,
@RequestParam(required = false) Instant to,
@RequestParam(defaultValue = "24") int buckets) {
@RequestParam(defaultValue = "24") int buckets,
@RequestParam(required = false) String environment) {
Instant end = to != null ? to : Instant.now();
return ResponseEntity.ok(searchService.timeseriesGroupedByApp(from, end, buckets));
return ResponseEntity.ok(searchService.timeseriesGroupedByApp(from, end, buckets, environment));
}
@GetMapping("/stats/timeseries/by-route")
@@ -158,18 +163,26 @@ public class SearchController {
@RequestParam Instant from,
@RequestParam(required = false) Instant to,
@RequestParam(defaultValue = "24") int buckets,
@RequestParam String application) {
@RequestParam String application,
@RequestParam(required = false) String environment) {
Instant end = to != null ? to : Instant.now();
return ResponseEntity.ok(searchService.timeseriesGroupedByRoute(from, end, buckets, application));
return ResponseEntity.ok(searchService.timeseriesGroupedByRoute(from, end, buckets, application, environment));
}
@GetMapping("/stats/punchcard")
@Operation(summary = "Transaction punchcard: weekday x hour grid (rolling 7 days)")
public ResponseEntity<List<StatsStore.PunchcardCell>> punchcard(
@RequestParam(required = false) String application) {
@RequestParam(required = false) String application,
@RequestParam(required = false) String environment) {
Instant to = Instant.now();
Instant from = to.minus(java.time.Duration.ofDays(7));
return ResponseEntity.ok(searchService.punchcard(from, to, application));
return ResponseEntity.ok(searchService.punchcard(from, to, application, environment));
}
@GetMapping("/attributes/keys")
@Operation(summary = "Distinct attribute key names across all executions")
public ResponseEntity<List<String>> attributeKeys() {
return ResponseEntity.ok(searchService.distinctAttributeKeys());
}
@GetMapping("/errors/top")
@@ -179,21 +192,22 @@ public class SearchController {
@RequestParam(required = false) Instant to,
@RequestParam(required = false) String application,
@RequestParam(required = false) String routeId,
@RequestParam(required = false) String environment,
@RequestParam(defaultValue = "5") int limit) {
Instant end = to != null ? to : Instant.now();
return ResponseEntity.ok(searchService.topErrors(from, end, application, routeId, limit));
return ResponseEntity.ok(searchService.topErrors(from, end, application, routeId, limit, environment));
}
/**
* Resolve an application name to agent IDs.
* Returns null if application is null/blank (no filtering).
* Returns empty list if application is null/blank (no filtering).
*/
private List<String> resolveApplicationToAgentIds(String application) {
if (application == null || application.isBlank()) {
return null;
return List.of();
}
return registryService.findByApplication(application).stream()
.map(AgentInfo::id)
.map(AgentInfo::instanceId)
.toList();
}
}

View File

@@ -0,0 +1,50 @@
package com.cameleer3.server.app.controller;
import com.cameleer3.server.app.storage.ClickHouseUsageTracker;
import com.cameleer3.server.core.analytics.UsageStats;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import org.springframework.boot.autoconfigure.condition.ConditionalOnBean;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.List;
@RestController
@RequestMapping("/api/v1/admin/usage")
@ConditionalOnBean(ClickHouseUsageTracker.class)
@Tag(name = "Usage Analytics", description = "UI usage pattern analytics")
public class UsageAnalyticsController {
private final ClickHouseUsageTracker tracker;
public UsageAnalyticsController(ClickHouseUsageTracker tracker) {
this.tracker = tracker;
}
@GetMapping
@Operation(summary = "Query usage statistics",
description = "Returns aggregated API usage stats grouped by endpoint, user, or hour")
public ResponseEntity<List<UsageStats>> getUsage(
@RequestParam(required = false) String from,
@RequestParam(required = false) String to,
@RequestParam(required = false) String username,
@RequestParam(defaultValue = "endpoint") String groupBy) {
Instant fromInstant = from != null ? Instant.parse(from) : Instant.now().minus(7, ChronoUnit.DAYS);
Instant toInstant = to != null ? Instant.parse(to) : Instant.now();
List<UsageStats> stats = switch (groupBy) {
case "user" -> tracker.queryByUser(fromInstant, toInstant);
case "hour" -> tracker.queryByHour(fromInstant, toInstant, username);
default -> tracker.queryByEndpoint(fromInstant, toInstant, username);
};
return ResponseEntity.ok(stats);
}
}

View File

@@ -7,6 +7,7 @@ import com.cameleer3.server.core.admin.AuditService;
import com.cameleer3.server.core.rbac.RbacService;
import com.cameleer3.server.core.rbac.SystemRole;
import com.cameleer3.server.core.rbac.UserDetail;
import com.cameleer3.server.core.security.PasswordPolicyValidator;
import com.cameleer3.server.core.security.UserInfo;
import com.cameleer3.server.core.security.UserRepository;
import io.swagger.v3.oas.annotations.Operation;
@@ -14,6 +15,7 @@ import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.validation.Valid;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.web.bind.annotation.DeleteMapping;
@@ -24,7 +26,9 @@ import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ResponseStatusException;
import com.cameleer3.server.app.security.SecurityProperties;
import org.springframework.security.crypto.bcrypt.BCryptPasswordEncoder;
import java.time.Instant;
@@ -32,6 +36,7 @@ import java.util.List;
import java.util.Map;
import java.util.UUID;
/**
* Admin endpoints for user management.
* Protected by {@code ROLE_ADMIN}.
@@ -47,12 +52,15 @@ public class UserAdminController {
private final RbacService rbacService;
private final UserRepository userRepository;
private final AuditService auditService;
private final boolean oidcEnabled;
public UserAdminController(RbacService rbacService, UserRepository userRepository,
AuditService auditService) {
AuditService auditService, SecurityProperties securityProperties) {
this.rbacService = rbacService;
this.userRepository = userRepository;
this.auditService = auditService;
String issuer = securityProperties.getOidcIssuerUri();
this.oidcEnabled = issuer != null && !issuer.isBlank();
}
@GetMapping
@@ -78,8 +86,13 @@ public class UserAdminController {
@PostMapping
@Operation(summary = "Create a local user")
@ApiResponse(responseCode = "200", description = "User created")
public ResponseEntity<UserDetail> createUser(@RequestBody CreateUserRequest request,
@ApiResponse(responseCode = "400", description = "Disabled in OIDC mode")
public ResponseEntity<?> createUser(@RequestBody CreateUserRequest request,
HttpServletRequest httpRequest) {
if (oidcEnabled) {
return ResponseEntity.badRequest()
.body(Map.of("error", "Local user creation is disabled when OIDC is enabled. Users are provisioned automatically via SSO."));
}
String userId = "user:" + request.username();
UserInfo user = new UserInfo(userId, "local",
request.email() != null ? request.email() : "",
@@ -87,6 +100,11 @@ public class UserAdminController {
Instant.now());
userRepository.upsert(user);
if (request.password() != null && !request.password().isBlank()) {
List<String> violations = PasswordPolicyValidator.validate(request.password(), request.username());
if (!violations.isEmpty()) {
throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
"Password policy violation: " + String.join("; ", violations));
}
userRepository.setPassword(userId, passwordEncoder.encode(request.password()));
}
rbacService.assignRoleToUser(userId, SystemRole.VIEWER_ID);
@@ -167,8 +185,14 @@ public class UserAdminController {
@DeleteMapping("/{userId}")
@Operation(summary = "Delete user")
@ApiResponse(responseCode = "204", description = "User deleted")
@ApiResponse(responseCode = "409", description = "Cannot delete the last admin user")
public ResponseEntity<Void> deleteUser(@PathVariable String userId,
HttpServletRequest httpRequest) {
boolean isAdmin = rbacService.getEffectiveRolesForUser(userId).stream()
.anyMatch(r -> r.id().equals(SystemRole.ADMIN_ID));
if (isAdmin && rbacService.getEffectivePrincipalsForRole(SystemRole.ADMIN_ID).size() <= 1) {
throw new ResponseStatusException(HttpStatus.CONFLICT, "Cannot delete the last admin user");
}
userRepository.delete(userId);
auditService.log("delete_user", AuditCategory.USER_MGMT, userId,
null, AuditResult.SUCCESS, httpRequest);
@@ -178,11 +202,24 @@ public class UserAdminController {
@PostMapping("/{userId}/password")
@Operation(summary = "Reset user password")
@ApiResponse(responseCode = "204", description = "Password reset")
@ApiResponse(responseCode = "400", description = "Disabled in OIDC mode or policy violation")
public ResponseEntity<Void> resetPassword(
@PathVariable String userId,
@Valid @RequestBody SetPasswordRequest request,
HttpServletRequest httpRequest) {
if (oidcEnabled) {
return ResponseEntity.badRequest().build();
}
// Extract bare username from "user:username" format for policy check
String username = userId.startsWith("user:") ? userId.substring(5) : userId;
List<String> violations = PasswordPolicyValidator.validate(request.password(), username);
if (!violations.isEmpty()) {
throw new ResponseStatusException(HttpStatus.BAD_REQUEST,
"Password policy violation: " + String.join("; ", violations));
}
userRepository.setPassword(userId, passwordEncoder.encode(request.password()));
// Revoke all existing tokens so the user must re-authenticate with the new password
userRepository.revokeTokensBefore(userId, Instant.now());
auditService.log("reset_password", AuditCategory.USER_MGMT, userId, null, AuditResult.SUCCESS, httpRequest);
return ResponseEntity.noContent().build();
}

View File

@@ -884,6 +884,7 @@ public class ElkDiagramRenderer implements DiagramRenderer {
}
private ElkNode getElkRoot(ElkNode node) {
if (node == null) return null;
ElkNode current = node;
while (current.getParent() != null) {
current = current.getParent();

View File

@@ -9,16 +9,16 @@ import java.time.Instant;
@Schema(description = "Agent lifecycle event")
public record AgentEventResponse(
@NotNull long id,
@NotNull String agentId,
@NotNull String appId,
@NotNull String instanceId,
@NotNull String applicationId,
@NotNull String eventType,
String detail,
@NotNull Instant timestamp
) {
public static AgentEventResponse from(AgentEventRecord record) {
public static AgentEventResponse from(AgentEventRecord event) {
return new AgentEventResponse(
record.id(), record.agentId(), record.appId(),
record.eventType(), record.detail(), record.timestamp()
event.id(), event.instanceId(), event.applicationId(),
event.eventType(), event.detail(), event.timestamp()
);
}
}

View File

@@ -11,9 +11,10 @@ import java.util.Map;
@Schema(description = "Agent instance summary with runtime metrics")
public record AgentInstanceResponse(
@NotNull String id,
@NotNull String name,
@NotNull String application,
@NotNull String instanceId,
@NotNull String displayName,
@NotNull String applicationId,
String environmentId,
@NotNull String status,
@NotNull List<String> routeIds,
@NotNull Instant registeredAt,
@@ -29,7 +30,8 @@ public record AgentInstanceResponse(
public static AgentInstanceResponse from(AgentInfo info) {
long uptime = Duration.between(info.registeredAt(), Instant.now()).toSeconds();
return new AgentInstanceResponse(
info.id(), info.name(), info.application(),
info.instanceId(), info.displayName(), info.applicationId(),
info.environmentId(),
info.state().name(), info.routeIds(),
info.registeredAt(), info.lastHeartbeat(),
info.version(), info.capabilities(),
@@ -41,7 +43,8 @@ public record AgentInstanceResponse(
public AgentInstanceResponse withMetrics(double tps, double errorRate, int activeRoutes) {
return new AgentInstanceResponse(
id, name, application, status, routeIds, registeredAt, lastHeartbeat,
instanceId, displayName, applicationId, environmentId,
status, routeIds, registeredAt, lastHeartbeat,
version, capabilities,
tps, errorRate, activeRoutes, totalRoutes, uptimeSeconds
);

View File

@@ -8,9 +8,10 @@ import java.util.Map;
@Schema(description = "Agent registration payload")
public record AgentRegistrationRequest(
@NotNull String agentId,
@NotNull String name,
@Schema(defaultValue = "default") String application,
@NotNull String instanceId,
@NotNull String displayName,
@Schema(defaultValue = "default") String applicationId,
@Schema(defaultValue = "default") String environmentId,
String version,
List<String> routeIds,
Map<String, Object> capabilities

View File

@@ -5,7 +5,7 @@ import jakarta.validation.constraints.NotNull;
@Schema(description = "Agent registration result with JWT tokens and SSE endpoint")
public record AgentRegistrationResponse(
@NotNull String agentId,
@NotNull String instanceId,
@NotNull String sseEndpoint,
long heartbeatIntervalMs,
@NotNull String serverPublicKey,

View File

@@ -0,0 +1,26 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
import java.util.List;
@Schema(description = "Unified catalog entry combining app records with live agent data")
public record CatalogApp(
@Schema(description = "Application slug (universal identifier)") String slug,
@Schema(description = "Display name") String displayName,
@Schema(description = "True if a managed App record exists in the database") boolean managed,
@Schema(description = "Environment slug") String environmentSlug,
@Schema(description = "Composite health: deployment status + agent health") String health,
@Schema(description = "Human-readable tooltip explaining the health state") String healthTooltip,
@Schema(description = "Number of connected agents") int agentCount,
@Schema(description = "Live routes from agents") List<RouteSummary> routes,
@Schema(description = "Connected agent summaries") List<AgentSummary> agents,
@Schema(description = "Total exchange count from ClickHouse") long exchangeCount,
@Schema(description = "Active deployment info, null if no deployment") DeploymentSummary deployment
) {
public record DeploymentSummary(
String status,
String replicas,
int version
) {}
}

View File

@@ -0,0 +1,14 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "ClickHouse storage and performance metrics")
public record ClickHousePerformanceResponse(
String diskSize,
String uncompressedSize,
double compressionRatio,
long totalRows,
int partCount,
String memoryUsage,
int currentQueries
) {}

View File

@@ -0,0 +1,12 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "Active ClickHouse query information")
public record ClickHouseQueryInfo(
String queryId,
double elapsedSeconds,
String memory,
long readRows,
String query
) {}

View File

@@ -0,0 +1,11 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "ClickHouse cluster status")
public record ClickHouseStatusResponse(
boolean reachable,
String version,
String uptime,
String host
) {}

View File

@@ -0,0 +1,13 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "ClickHouse table information")
public record ClickHouseTableInfo(
String name,
String engine,
long rowCount,
String dataSize,
long dataSizeBytes,
int partitionCount
) {}

View File

@@ -0,0 +1,13 @@
package com.cameleer3.server.app.dto;
import java.util.List;
public record CommandGroupResponse(
boolean success,
int total,
int responded,
List<AgentResponse> responses,
List<String> timedOut
) {
public record AgentResponse(String agentId, String status, String message) {}
}

View File

@@ -0,0 +1,8 @@
package com.cameleer3.server.app.dto;
import com.cameleer3.common.model.ApplicationConfig;
public record ConfigUpdateResponse(
ApplicationConfig config,
CommandGroupResponse pushResult
) {}

View File

@@ -7,6 +7,5 @@ public record DatabaseStatusResponse(
@Schema(description = "Whether the database is reachable") boolean connected,
@Schema(description = "PostgreSQL version string") String version,
@Schema(description = "Database host") String host,
@Schema(description = "Current schema search path") String schema,
@Schema(description = "Whether TimescaleDB extension is available") boolean timescaleDb
@Schema(description = "Current schema") String schema
) {}

View File

@@ -1,14 +0,0 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "OpenSearch index information")
public record IndexInfoResponse(
@Schema(description = "Index name") String name,
@Schema(description = "Document count") long docCount,
@Schema(description = "Human-readable index size") String size,
@Schema(description = "Index size in bytes") long sizeBytes,
@Schema(description = "Index health status") String health,
@Schema(description = "Number of primary shards") int primaryShards,
@Schema(description = "Number of replica shards") int replicaShards
) {}

View File

@@ -0,0 +1,16 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
import java.time.Instant;
@Schema(description = "Search indexer pipeline statistics")
public record IndexerPipelineResponse(
int queueDepth,
int maxQueueSize,
long failedCount,
long indexedCount,
long debounceMs,
double indexingRate,
Instant lastIndexedAt
) {}

View File

@@ -1,16 +0,0 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
import java.util.List;
@Schema(description = "Paginated list of OpenSearch indices")
public record IndicesPageResponse(
@Schema(description = "Index list for current page") List<IndexInfoResponse> indices,
@Schema(description = "Total number of indices") long totalIndices,
@Schema(description = "Total document count across all indices") long totalDocs,
@Schema(description = "Human-readable total size") String totalSize,
@Schema(description = "Current page number (0-based)") int page,
@Schema(description = "Page size") int pageSize,
@Schema(description = "Total number of pages") int totalPages
) {}

View File

@@ -2,12 +2,18 @@ package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "Application log entry from OpenSearch")
import java.util.Map;
@Schema(description = "Application log entry")
public record LogEntryResponse(
@Schema(description = "Log timestamp (ISO-8601)") String timestamp,
@Schema(description = "Log level (INFO, WARN, ERROR, DEBUG)") String level,
@Schema(description = "Log level (INFO, WARN, ERROR, DEBUG, TRACE)") String level,
@Schema(description = "Logger name") String loggerName,
@Schema(description = "Log message") String message,
@Schema(description = "Thread name") String threadName,
@Schema(description = "Stack trace (if present)") String stackTrace
@Schema(description = "Stack trace (if present)") String stackTrace,
@Schema(description = "Camel exchange ID (if present)") String exchangeId,
@Schema(description = "Agent instance ID") String instanceId,
@Schema(description = "Application ID") String application,
@Schema(description = "MDC context map") Map<String, String> mdc
) {}

View File

@@ -0,0 +1,14 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
import java.util.List;
import java.util.Map;
@Schema(description = "Log search response with cursor pagination and level counts")
public record LogSearchPageResponse(
@Schema(description = "Log entries for the current page") List<LogEntryResponse> data,
@Schema(description = "Cursor for next page (null if no more results)") String nextCursor,
@Schema(description = "Whether more results exist beyond this page") boolean hasMore,
@Schema(description = "Count of logs per level (unaffected by level filter)") Map<String, Long> levelCounts
) {}

View File

@@ -13,5 +13,8 @@ public record OidcAdminConfigRequest(
String rolesClaim,
List<String> defaultRoles,
boolean autoSignup,
String displayNameClaim
String displayNameClaim,
String userIdClaim,
String audience,
List<String> additionalScopes
) {}

View File

@@ -16,17 +16,21 @@ public record OidcAdminConfigResponse(
String rolesClaim,
List<String> defaultRoles,
boolean autoSignup,
String displayNameClaim
String displayNameClaim,
String userIdClaim,
String audience,
List<String> additionalScopes
) {
public static OidcAdminConfigResponse unconfigured() {
return new OidcAdminConfigResponse(false, false, null, null, false, null, null, false, null);
return new OidcAdminConfigResponse(false, false, null, null, false, null, null, false, null, null, null, null);
}
public static OidcAdminConfigResponse from(OidcConfig config) {
return new OidcAdminConfigResponse(
true, config.enabled(), config.issuerUri(), config.clientId(),
!config.clientSecret().isBlank(), config.rolesClaim(),
config.defaultRoles(), config.autoSignup(), config.displayNameClaim()
config.defaultRoles(), config.autoSignup(), config.displayNameClaim(),
config.userIdClaim(), config.audience(), config.additionalScopes()
);
}
}

View File

@@ -9,5 +9,9 @@ public record OidcPublicConfigResponse(
@NotNull String clientId,
@NotNull String authorizationEndpoint,
@Schema(description = "Present if the provider supports RP-initiated logout")
String endSessionEndpoint
String endSessionEndpoint,
@Schema(description = "RFC 8707 resource indicator for the authorization request")
String resource,
@Schema(description = "Additional scopes to request beyond openid email profile")
java.util.List<String> additionalScopes
) {}

View File

@@ -1,12 +0,0 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "OpenSearch cluster status")
public record OpenSearchStatusResponse(
@Schema(description = "Whether the cluster is reachable") boolean reachable,
@Schema(description = "Cluster health status (GREEN, YELLOW, RED)") String clusterHealth,
@Schema(description = "OpenSearch version") String version,
@Schema(description = "Number of nodes in the cluster") int nodeCount,
@Schema(description = "OpenSearch host") String host
) {}

View File

@@ -1,13 +0,0 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "OpenSearch performance metrics")
public record PerformanceResponse(
@Schema(description = "Query cache hit rate (0.0-1.0)") double queryCacheHitRate,
@Schema(description = "Request cache hit rate (0.0-1.0)") double requestCacheHitRate,
@Schema(description = "Average search latency in milliseconds") double searchLatencyMs,
@Schema(description = "Average indexing latency in milliseconds") double indexingLatencyMs,
@Schema(description = "JVM heap used in bytes") long jvmHeapUsedBytes,
@Schema(description = "JVM heap max in bytes") long jvmHeapMaxBytes
) {}

View File

@@ -1,16 +0,0 @@
package com.cameleer3.server.app.dto;
import io.swagger.v3.oas.annotations.media.Schema;
import java.time.Instant;
@Schema(description = "Search indexing pipeline statistics")
public record PipelineStatsResponse(
@Schema(description = "Current queue depth") int queueDepth,
@Schema(description = "Maximum queue size") int maxQueueSize,
@Schema(description = "Number of failed indexing operations") long failedCount,
@Schema(description = "Number of successfully indexed documents") long indexedCount,
@Schema(description = "Debounce interval in milliseconds") long debounceMs,
@Schema(description = "Current indexing rate (docs/sec)") double indexingRate,
@Schema(description = "Timestamp of last indexed document") Instant lastIndexedAt
) {}

View File

@@ -11,5 +11,8 @@ public record RouteSummary(
@NotNull long exchangeCount,
Instant lastSeen,
@Schema(description = "The from() endpoint URI, e.g. 'direct:processOrder'")
String fromEndpointUri
String fromEndpointUri,
@Schema(description = "Operational state of the route: stopped, suspended, or null (started/default)")
String routeState
) {}

View File

@@ -5,18 +5,15 @@ import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.Valid;
import jakarta.validation.constraints.Max;
import jakarta.validation.constraints.Min;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.NotNull;
import jakarta.validation.constraints.Positive;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
@Schema(description = "Threshold configuration for admin monitoring")
public record ThresholdConfigRequest(
@Valid @NotNull DatabaseThresholdsRequest database,
@Valid @NotNull OpenSearchThresholdsRequest opensearch
@Valid @NotNull DatabaseThresholdsRequest database
) {
@Schema(description = "Database monitoring thresholds")
@@ -38,41 +35,6 @@ public record ThresholdConfigRequest(
double queryDurationCritical
) {}
@Schema(description = "OpenSearch monitoring thresholds")
public record OpenSearchThresholdsRequest(
@NotBlank
@Schema(description = "Cluster health warning threshold (GREEN, YELLOW, RED)")
String clusterHealthWarning,
@NotBlank
@Schema(description = "Cluster health critical threshold (GREEN, YELLOW, RED)")
String clusterHealthCritical,
@Min(0)
@Schema(description = "Queue depth warning threshold")
int queueDepthWarning,
@Min(0)
@Schema(description = "Queue depth critical threshold")
int queueDepthCritical,
@Min(0) @Max(100)
@Schema(description = "JVM heap usage warning threshold (percentage)")
int jvmHeapWarning,
@Min(0) @Max(100)
@Schema(description = "JVM heap usage critical threshold (percentage)")
int jvmHeapCritical,
@Min(0)
@Schema(description = "Failed document count warning threshold")
int failedDocsWarning,
@Min(0)
@Schema(description = "Failed document count critical threshold")
int failedDocsCritical
) {}
/** Convert to core domain model */
public ThresholdConfig toConfig() {
return new ThresholdConfig(
@@ -81,16 +43,6 @@ public record ThresholdConfigRequest(
database.connectionPoolCritical(),
database.queryDurationWarning(),
database.queryDurationCritical()
),
new ThresholdConfig.OpenSearchThresholds(
opensearch.clusterHealthWarning(),
opensearch.clusterHealthCritical(),
opensearch.queueDepthWarning(),
opensearch.queueDepthCritical(),
opensearch.jvmHeapWarning(),
opensearch.jvmHeapCritical(),
opensearch.failedDocsWarning(),
opensearch.failedDocsCritical()
)
);
}
@@ -108,37 +60,6 @@ public record ThresholdConfigRequest(
}
}
if (opensearch != null) {
if (opensearch.queueDepthWarning() > opensearch.queueDepthCritical()) {
errors.add("opensearch.queueDepthWarning must be <= queueDepthCritical");
}
if (opensearch.jvmHeapWarning() > opensearch.jvmHeapCritical()) {
errors.add("opensearch.jvmHeapWarning must be <= jvmHeapCritical");
}
if (opensearch.failedDocsWarning() > opensearch.failedDocsCritical()) {
errors.add("opensearch.failedDocsWarning must be <= failedDocsCritical");
}
// Validate health severity ordering: GREEN < YELLOW < RED
int warningSeverity = healthSeverity(opensearch.clusterHealthWarning());
int criticalSeverity = healthSeverity(opensearch.clusterHealthCritical());
if (warningSeverity < 0) {
errors.add("opensearch.clusterHealthWarning must be GREEN, YELLOW, or RED");
}
if (criticalSeverity < 0) {
errors.add("opensearch.clusterHealthCritical must be GREEN, YELLOW, or RED");
}
if (warningSeverity >= 0 && criticalSeverity >= 0 && warningSeverity > criticalSeverity) {
errors.add("opensearch.clusterHealthWarning severity must be <= clusterHealthCritical (GREEN < YELLOW < RED)");
}
}
return errors;
}
private static final Map<String, Integer> HEALTH_SEVERITY =
Map.of("GREEN", 0, "YELLOW", 1, "RED", 2);
private static int healthSeverity(String health) {
return HEALTH_SEVERITY.getOrDefault(health != null ? health.toUpperCase() : "", -1);
}
}

View File

@@ -0,0 +1,149 @@
package com.cameleer3.server.app.ingestion;
import com.cameleer3.server.app.config.IngestionConfig;
import com.cameleer3.server.app.search.ClickHouseLogStore;
import com.cameleer3.server.app.storage.ClickHouseExecutionStore;
import com.cameleer3.server.core.ingestion.BufferedLogEntry;
import com.cameleer3.server.core.ingestion.ChunkAccumulator;
import com.cameleer3.server.core.ingestion.MergedExecution;
import com.cameleer3.server.core.ingestion.WriteBuffer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.SmartLifecycle;
import org.springframework.scheduling.annotation.Scheduled;
import java.util.List;
/**
* Scheduled flush task for ClickHouse execution and processor write buffers.
* <p>
* Drains both buffers on a fixed interval and delegates batch inserts to
* {@link ClickHouseExecutionStore}. Also periodically sweeps stale exchanges
* from the {@link ChunkAccumulator}.
* <p>
* Not a {@code @Component} — instantiated as a {@code @Bean} in StorageBeanConfig.
*/
public class ExecutionFlushScheduler implements SmartLifecycle {
private static final Logger log = LoggerFactory.getLogger(ExecutionFlushScheduler.class);
private final WriteBuffer<MergedExecution> executionBuffer;
private final WriteBuffer<ChunkAccumulator.ProcessorBatch> processorBuffer;
private final WriteBuffer<BufferedLogEntry> logBuffer;
private final ClickHouseExecutionStore executionStore;
private final ClickHouseLogStore logStore;
private final ChunkAccumulator accumulator;
private final int batchSize;
private volatile boolean running = false;
public ExecutionFlushScheduler(WriteBuffer<MergedExecution> executionBuffer,
WriteBuffer<ChunkAccumulator.ProcessorBatch> processorBuffer,
WriteBuffer<BufferedLogEntry> logBuffer,
ClickHouseExecutionStore executionStore,
ClickHouseLogStore logStore,
ChunkAccumulator accumulator,
IngestionConfig config) {
this.executionBuffer = executionBuffer;
this.processorBuffer = processorBuffer;
this.logBuffer = logBuffer;
this.executionStore = executionStore;
this.logStore = logStore;
this.accumulator = accumulator;
this.batchSize = config.getBatchSize();
}
@Scheduled(fixedDelayString = "${ingestion.flush-interval-ms:1000}")
public void flush() {
try {
List<MergedExecution> executions = executionBuffer.drain(batchSize);
if (!executions.isEmpty()) {
executionStore.insertExecutionBatch(executions);
log.debug("Flushed {} executions to ClickHouse", executions.size());
}
} catch (Exception e) {
log.error("Failed to flush executions", e);
}
try {
List<ChunkAccumulator.ProcessorBatch> batches = processorBuffer.drain(batchSize);
if (!batches.isEmpty()) {
executionStore.insertProcessorBatches(batches);
log.debug("Flushed {} processor batches to ClickHouse", batches.size());
}
} catch (Exception e) {
log.error("Failed to flush processor batches", e);
}
try {
List<BufferedLogEntry> logEntries = logBuffer.drain(batchSize);
if (!logEntries.isEmpty()) {
logStore.insertBufferedBatch(logEntries);
log.debug("Flushed {} log entries to ClickHouse", logEntries.size());
}
} catch (Exception e) {
log.error("Failed to flush log entries", e);
}
}
@Scheduled(fixedDelay = 60_000)
public void sweepStale() {
try {
accumulator.sweepStale();
} catch (Exception e) {
log.error("Failed to sweep stale exchanges", e);
}
}
@Override
public void start() {
running = true;
}
@Override
public void stop() {
// Drain remaining executions on shutdown
while (executionBuffer.size() > 0) {
List<MergedExecution> batch = executionBuffer.drain(batchSize);
if (batch.isEmpty()) break;
try {
executionStore.insertExecutionBatch(batch);
} catch (Exception e) {
log.error("Failed to flush executions during shutdown", e);
break;
}
}
// Drain remaining processor batches on shutdown
while (processorBuffer.size() > 0) {
List<ChunkAccumulator.ProcessorBatch> batches = processorBuffer.drain(batchSize);
if (batches.isEmpty()) break;
try {
executionStore.insertProcessorBatches(batches);
} catch (Exception e) {
log.error("Failed to flush processor batches during shutdown", e);
break;
}
}
// Drain remaining log entries on shutdown
while (logBuffer.size() > 0) {
List<BufferedLogEntry> entries = logBuffer.drain(batchSize);
if (entries.isEmpty()) break;
try {
logStore.insertBufferedBatch(entries);
} catch (Exception e) {
log.error("Failed to flush log entries during shutdown", e);
break;
}
}
running = false;
}
@Override
public boolean isRunning() {
return running;
}
@Override
public int getPhase() {
return Integer.MAX_VALUE - 1;
}
}

View File

@@ -1,10 +1,19 @@
package com.cameleer3.server.app.rbac;
import com.cameleer3.server.core.rbac.*;
import com.cameleer3.server.core.rbac.GroupRepository;
import com.cameleer3.server.core.rbac.GroupSummary;
import com.cameleer3.server.core.rbac.RbacService;
import com.cameleer3.server.core.rbac.RbacStats;
import com.cameleer3.server.core.rbac.RoleSummary;
import com.cameleer3.server.core.rbac.SystemRole;
import com.cameleer3.server.core.rbac.UserDetail;
import com.cameleer3.server.core.rbac.UserSummary;
import com.cameleer3.server.core.security.UserInfo;
import com.cameleer3.server.core.security.UserRepository;
import org.springframework.http.HttpStatus;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;
import org.springframework.web.server.ResponseStatusException;
import java.util.*;
@@ -14,14 +23,12 @@ public class RbacServiceImpl implements RbacService {
private final JdbcTemplate jdbc;
private final UserRepository userRepository;
private final GroupRepository groupRepository;
private final RoleRepository roleRepository;
public RbacServiceImpl(JdbcTemplate jdbc, UserRepository userRepository,
GroupRepository groupRepository, RoleRepository roleRepository) {
GroupRepository groupRepository) {
this.jdbc = jdbc;
this.userRepository = userRepository;
this.groupRepository = groupRepository;
this.roleRepository = roleRepository;
}
@Override
@@ -50,19 +57,29 @@ public class RbacServiceImpl implements RbacService {
@Override
public void assignRoleToUser(String userId, UUID roleId) {
jdbc.update("INSERT INTO user_roles (user_id, role_id) VALUES (?, ?) ON CONFLICT DO NOTHING",
userId, roleId);
jdbc.update("""
INSERT INTO user_roles (user_id, role_id, origin)
VALUES (?, ?, 'direct')
ON CONFLICT (user_id, role_id, origin) DO NOTHING
""", userId, roleId);
}
@Override
public void removeRoleFromUser(String userId, UUID roleId) {
if (SystemRole.ADMIN_ID.equals(roleId) && getEffectivePrincipalsForRole(SystemRole.ADMIN_ID).size() <= 1) {
throw new ResponseStatusException(HttpStatus.CONFLICT,
"Cannot remove the ADMIN role: at least one admin user must exist");
}
jdbc.update("DELETE FROM user_roles WHERE user_id = ? AND role_id = ?", userId, roleId);
}
@Override
public void addUserToGroup(String userId, UUID groupId) {
jdbc.update("INSERT INTO user_groups (user_id, group_id) VALUES (?, ?) ON CONFLICT DO NOTHING",
userId, groupId);
jdbc.update("""
INSERT INTO user_groups (user_id, group_id, origin)
VALUES (?, ?, 'direct')
ON CONFLICT (user_id, group_id, origin) DO NOTHING
""", userId, groupId);
}
@Override
@@ -235,12 +252,14 @@ public class RbacServiceImpl implements RbacService {
return max;
}
private List<RoleSummary> getDirectRolesForUser(String userId) {
@Override
public List<RoleSummary> getDirectRolesForUser(String userId) {
return jdbc.query("""
SELECT r.id, r.name, r.system FROM user_roles ur
JOIN roles r ON r.id = ur.role_id WHERE ur.user_id = ?
SELECT r.id, r.name, r.system, ur.origin FROM user_roles ur
JOIN roles r ON r.id = ur.role_id
WHERE ur.user_id = ?
""", (rs, rowNum) -> new RoleSummary(rs.getObject("id", UUID.class),
rs.getString("name"), rs.getBoolean("system"), "direct"), userId);
rs.getString("name"), rs.getBoolean("system"), rs.getString("origin")), userId);
}
private List<GroupSummary> getDirectGroupsForUser(String userId) {
@@ -250,4 +269,28 @@ public class RbacServiceImpl implements RbacService {
""", (rs, rowNum) -> new GroupSummary(rs.getObject("id", UUID.class),
rs.getString("name")), userId);
}
@Override
public void clearManagedAssignments(String userId) {
jdbc.update("DELETE FROM user_roles WHERE user_id = ? AND origin = 'managed'", userId);
jdbc.update("DELETE FROM user_groups WHERE user_id = ? AND origin = 'managed'", userId);
}
@Override
public void assignManagedRole(String userId, UUID roleId, UUID mappingId) {
jdbc.update("""
INSERT INTO user_roles (user_id, role_id, origin, mapping_id)
VALUES (?, ?, 'managed', ?)
ON CONFLICT (user_id, role_id, origin) DO UPDATE SET mapping_id = EXCLUDED.mapping_id
""", userId, roleId, mappingId);
}
@Override
public void addUserToManagedGroup(String userId, UUID groupId, UUID mappingId) {
jdbc.update("""
INSERT INTO user_groups (user_id, group_id, origin, mapping_id)
VALUES (?, ?, 'managed', ?)
ON CONFLICT (user_id, group_id, origin) DO UPDATE SET mapping_id = EXCLUDED.mapping_id
""", userId, groupId, mappingId);
}
}

View File

@@ -0,0 +1,112 @@
package com.cameleer3.server.app.retention;
import com.cameleer3.server.core.runtime.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Comparator;
import java.util.List;
import java.util.Set;
import java.util.UUID;
import java.util.stream.Collectors;
/**
* Nightly job that enforces JAR retention policies per environment.
* For each app, keeps the N most recent versions (configured per environment)
* and deletes older ones — unless they are currently deployed.
*/
@Component
public class JarRetentionJob {
private static final Logger log = LoggerFactory.getLogger(JarRetentionJob.class);
private final EnvironmentService environmentService;
private final AppService appService;
private final AppVersionRepository versionRepo;
private final DeploymentRepository deploymentRepo;
public JarRetentionJob(EnvironmentService environmentService,
AppService appService,
AppVersionRepository versionRepo,
DeploymentRepository deploymentRepo) {
this.environmentService = environmentService;
this.appService = appService;
this.versionRepo = versionRepo;
this.deploymentRepo = deploymentRepo;
}
@Scheduled(cron = "0 0 3 * * *") // 03:00 every day
public void cleanupOldVersions() {
log.info("JAR retention job started");
int totalDeleted = 0;
for (Environment env : environmentService.listAll()) {
Integer retentionCount = env.jarRetentionCount();
if (retentionCount == null) {
log.debug("Environment {} has unlimited retention, skipping", env.slug());
continue;
}
for (App app : appService.listByEnvironment(env.id())) {
totalDeleted += cleanupApp(app, retentionCount);
}
}
log.info("JAR retention job completed — deleted {} versions", totalDeleted);
}
private int cleanupApp(App app, int retentionCount) {
List<AppVersion> versions = versionRepo.findByAppId(app.id()); // ordered DESC by version
if (versions.size() <= retentionCount) return 0;
// Find version IDs that are currently deployed (any status)
Set<UUID> deployedVersionIds = deploymentRepo.findByAppId(app.id()).stream()
.map(Deployment::appVersionId)
.collect(Collectors.toSet());
int deleted = 0;
// versions is sorted DESC — skip the first retentionCount, delete the rest
for (int i = retentionCount; i < versions.size(); i++) {
AppVersion version = versions.get(i);
if (deployedVersionIds.contains(version.id())) {
log.debug("Skipping deployed version v{} of app {} ({})", version.version(), app.slug(), version.id());
continue;
}
// Delete JAR from disk
deleteJarFile(version);
// Delete DB record
versionRepo.delete(version.id());
deleted++;
log.info("Deleted version v{} of app {} ({}) — JAR: {}", version.version(), app.slug(), version.id(), version.jarPath());
}
return deleted;
}
private void deleteJarFile(AppVersion version) {
try {
Path jarPath = Path.of(version.jarPath());
if (Files.exists(jarPath)) {
Files.delete(jarPath);
// Try to remove the empty version directory
Path versionDir = jarPath.getParent();
if (versionDir != null && Files.isDirectory(versionDir)) {
try (var entries = Files.list(versionDir)) {
if (entries.findFirst().isEmpty()) {
Files.delete(versionDir);
}
}
}
}
} catch (IOException e) {
log.warn("Failed to delete JAR file for version {}: {}", version.id(), e.getMessage());
}
}
}

View File

@@ -1,48 +0,0 @@
package com.cameleer3.server.app.retention;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
@Component
public class RetentionScheduler {
private static final Logger log = LoggerFactory.getLogger(RetentionScheduler.class);
private final JdbcTemplate jdbc;
private final int retentionDays;
public RetentionScheduler(JdbcTemplate jdbc,
@Value("${cameleer.retention-days:30}") int retentionDays) {
this.jdbc = jdbc;
this.retentionDays = retentionDays;
}
@Scheduled(cron = "0 0 2 * * *") // Daily at 2 AM UTC
public void dropExpiredChunks() {
String interval = retentionDays + " days";
try {
// Raw data
jdbc.execute("SELECT drop_chunks('executions', INTERVAL '" + interval + "')");
jdbc.execute("SELECT drop_chunks('processor_executions', INTERVAL '" + interval + "')");
jdbc.execute("SELECT drop_chunks('agent_metrics', INTERVAL '" + interval + "')");
// Continuous aggregates (keep 3x longer)
String caggInterval = (retentionDays * 3) + " days";
jdbc.execute("SELECT drop_chunks('stats_1m_all', INTERVAL '" + caggInterval + "')");
jdbc.execute("SELECT drop_chunks('stats_1m_app', INTERVAL '" + caggInterval + "')");
jdbc.execute("SELECT drop_chunks('stats_1m_route', INTERVAL '" + caggInterval + "')");
jdbc.execute("SELECT drop_chunks('stats_1m_processor', INTERVAL '" + caggInterval + "')");
log.info("Retention: dropped chunks older than {} days (aggregates: {} days)",
retentionDays, retentionDays * 3);
} catch (Exception e) {
log.error("Retention job failed", e);
}
}
// Note: OpenSearch daily index deletion should be handled via ILM policy
// configured at deployment time, not in application code.
}

View File

@@ -0,0 +1,335 @@
package com.cameleer3.server.app.runtime;
import com.cameleer3.server.app.storage.PostgresDeploymentRepository;
import com.cameleer3.server.core.runtime.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.*;
@Service
public class DeploymentExecutor {
private static final Logger log = LoggerFactory.getLogger(DeploymentExecutor.class);
private final RuntimeOrchestrator orchestrator;
private final DeploymentService deploymentService;
private final AppService appService;
private final EnvironmentService envService;
private final DeploymentRepository deploymentRepository;
private final PostgresDeploymentRepository pgDeployRepo;
@Autowired(required = false)
private DockerNetworkManager networkManager;
@Value("${cameleer.runtime.base-image:cameleer-runtime-base:latest}")
private String baseImage;
@Value("${cameleer.runtime.docker-network:cameleer}")
private String dockerNetwork;
@Value("${cameleer.runtime.container-memory-limit:512m}")
private String globalMemoryLimit;
@Value("${cameleer.runtime.container-cpu-request:500}")
private int globalCpuRequest;
@Value("${cameleer.runtime.health-check-timeout:60}")
private int healthCheckTimeout;
@Value("${cameleer.runtime.agent-health-port:9464}")
private int agentHealthPort;
@Value("${security.bootstrap-token:}")
private String bootstrapToken;
@Value("${cameleer.runtime.routing-mode:path}")
private String globalRoutingMode;
@Value("${cameleer.runtime.routing-domain:localhost}")
private String globalRoutingDomain;
@Value("${cameleer.runtime.server-url:}")
private String globalServerUrl;
@Value("${cameleer.runtime.jar-docker-volume:}")
private String jarDockerVolume;
@Value("${cameleer.runtime.jar-storage-path:/data/jars}")
private String jarStoragePath;
public DeploymentExecutor(RuntimeOrchestrator orchestrator,
DeploymentService deploymentService,
AppService appService,
EnvironmentService envService,
DeploymentRepository deploymentRepository) {
this.orchestrator = orchestrator;
this.deploymentService = deploymentService;
this.appService = appService;
this.envService = envService;
this.deploymentRepository = deploymentRepository;
this.pgDeployRepo = (PostgresDeploymentRepository) deploymentRepository;
}
@Async("deploymentTaskExecutor")
public void executeAsync(Deployment deployment) {
try {
App app = appService.getById(deployment.appId());
Environment env = envService.getById(deployment.environmentId());
String jarPath = appService.resolveJarPath(deployment.appVersionId());
var globalDefaults = new ConfigMerger.GlobalRuntimeDefaults(
parseMemoryLimitMb(globalMemoryLimit),
globalCpuRequest,
globalRoutingMode,
globalRoutingDomain,
globalServerUrl.isBlank() ? "http://cameleer3-server:8081" : globalServerUrl
);
ResolvedContainerConfig config = ConfigMerger.resolve(
globalDefaults, env.defaultContainerConfig(), app.containerConfig());
pgDeployRepo.updateDeploymentStrategy(deployment.id(), config.deploymentStrategy());
pgDeployRepo.updateResolvedConfig(deployment.id(), resolvedConfigToMap(config));
// === PRE-FLIGHT ===
updateStage(deployment.id(), DeployStage.PRE_FLIGHT);
preFlightChecks(jarPath, config);
// === PULL IMAGE ===
updateStage(deployment.id(), DeployStage.PULL_IMAGE);
// Docker pulls on create if not present locally
// === CREATE NETWORKS ===
updateStage(deployment.id(), DeployStage.CREATE_NETWORK);
String primaryNetwork = dockerNetwork;
String envNet = null;
if (networkManager != null) {
primaryNetwork = DockerNetworkManager.TRAEFIK_NETWORK;
networkManager.ensureNetwork(primaryNetwork);
envNet = DockerNetworkManager.envNetworkName(env.slug());
networkManager.ensureNetwork(envNet);
}
// === START REPLICAS ===
updateStage(deployment.id(), DeployStage.START_REPLICAS);
Map<String, String> baseEnvVars = buildEnvVars(app, env, config);
Map<String, String> labels = TraefikLabelBuilder.build(app.slug(), env.slug(), config);
List<Map<String, Object>> replicaStates = new ArrayList<>();
List<String> newContainerIds = new ArrayList<>();
for (int i = 0; i < config.replicas(); i++) {
String containerName = env.slug() + "-" + app.slug() + "-" + i;
String volumeName = jarDockerVolume != null && !jarDockerVolume.isBlank() ? jarDockerVolume : null;
ContainerRequest request = new ContainerRequest(
containerName, baseImage, jarPath,
volumeName, jarStoragePath,
primaryNetwork,
envNet != null ? List.of(envNet) : List.of(),
baseEnvVars, labels,
config.memoryLimitBytes(), config.memoryReserveBytes(),
config.dockerCpuShares(), config.dockerCpuQuota(),
config.exposedPorts(), agentHealthPort,
"on-failure", 3
);
String containerId = orchestrator.startContainer(request);
newContainerIds.add(containerId);
// Connect to environment network after container is started
if (networkManager != null && envNet != null) {
networkManager.connectContainer(containerId, envNet);
}
replicaStates.add(Map.of(
"index", i,
"containerId", containerId,
"containerName", containerName,
"status", "STARTING"
));
}
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
// === HEALTH CHECK ===
updateStage(deployment.id(), DeployStage.HEALTH_CHECK);
int healthyCount = waitForAnyHealthy(newContainerIds, healthCheckTimeout);
if (healthyCount == 0) {
for (String cid : newContainerIds) {
try { orchestrator.stopContainer(cid); orchestrator.removeContainer(cid); }
catch (Exception e) { log.warn("Cleanup failed for {}: {}", cid, e.getMessage()); }
}
pgDeployRepo.updateDeployStage(deployment.id(), null);
deploymentService.markFailed(deployment.id(), "No replicas passed health check within " + healthCheckTimeout + "s");
return;
}
replicaStates = updateReplicaHealth(replicaStates, newContainerIds);
pgDeployRepo.updateReplicaStates(deployment.id(), replicaStates);
// === SWAP TRAFFIC ===
updateStage(deployment.id(), DeployStage.SWAP_TRAFFIC);
Optional<Deployment> existing = deploymentRepository.findActiveByAppIdAndEnvironmentId(
deployment.appId(), deployment.environmentId());
if (existing.isPresent() && !existing.get().id().equals(deployment.id())) {
stopDeploymentContainers(existing.get());
deploymentService.markStopped(existing.get().id());
log.info("Stopped previous deployment {} for replacement", existing.get().id());
}
// === COMPLETE ===
updateStage(deployment.id(), DeployStage.COMPLETE);
String primaryContainerId = newContainerIds.get(0);
DeploymentStatus finalStatus = healthyCount == config.replicas()
? DeploymentStatus.RUNNING : DeploymentStatus.DEGRADED;
deploymentService.markRunning(deployment.id(), primaryContainerId);
if (finalStatus == DeploymentStatus.DEGRADED) {
deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.DEGRADED,
primaryContainerId, null);
}
pgDeployRepo.updateDeployStage(deployment.id(), null);
log.info("Deployment {} is {} ({}/{} replicas healthy)",
deployment.id(), finalStatus, healthyCount, config.replicas());
} catch (Exception e) {
log.error("Deployment {} FAILED: {}", deployment.id(), e.getMessage(), e);
pgDeployRepo.updateDeployStage(deployment.id(), null);
deploymentService.markFailed(deployment.id(), e.getMessage());
}
}
public void stopDeployment(Deployment deployment) {
pgDeployRepo.updateTargetState(deployment.id(), "STOPPED");
deploymentRepository.updateStatus(deployment.id(), DeploymentStatus.STOPPING,
deployment.containerId(), null);
stopDeploymentContainers(deployment);
deploymentService.markStopped(deployment.id());
}
private void stopDeploymentContainers(Deployment deployment) {
List<Map<String, Object>> replicas = deployment.replicaStates() != null
? deployment.replicaStates() : List.of();
for (Map<String, Object> replica : replicas) {
String cid = (String) replica.get("containerId");
if (cid != null) {
try {
orchestrator.stopContainer(cid);
orchestrator.removeContainer(cid);
} catch (Exception e) {
log.warn("Failed to stop replica container {}: {}", cid, e.getMessage());
}
}
}
if (deployment.containerId() != null && replicas.isEmpty()) {
try {
orchestrator.stopContainer(deployment.containerId());
orchestrator.removeContainer(deployment.containerId());
} catch (Exception e) {
log.warn("Failed to stop container {}: {}", deployment.containerId(), e.getMessage());
}
}
}
private void preFlightChecks(String jarPath, ResolvedContainerConfig config) {
if (!Files.exists(Path.of(jarPath))) {
throw new IllegalStateException("JAR file not found: " + jarPath);
}
if (config.memoryLimitMb() <= 0) {
throw new IllegalStateException("Memory limit must be positive, got: " + config.memoryLimitMb());
}
if (config.appPort() <= 0 || config.appPort() > 65535) {
throw new IllegalStateException("Invalid app port: " + config.appPort());
}
if (config.replicas() < 1) {
throw new IllegalStateException("Replicas must be >= 1, got: " + config.replicas());
}
}
private Map<String, String> buildEnvVars(App app, Environment env, ResolvedContainerConfig config) {
Map<String, String> envVars = new LinkedHashMap<>();
envVars.put("CAMELEER_EXPORT_TYPE", "HTTP");
envVars.put("CAMELEER_APPLICATION_ID", app.slug());
envVars.put("CAMELEER_ENVIRONMENT_ID", env.slug());
envVars.put("CAMELEER_SERVER_URL", config.serverUrl());
if (bootstrapToken != null && !bootstrapToken.isBlank()) {
envVars.put("CAMELEER_AUTH_TOKEN", bootstrapToken);
}
envVars.putAll(config.customEnvVars());
return envVars;
}
private int waitForAnyHealthy(List<String> containerIds, int timeoutSeconds) {
long deadline = System.currentTimeMillis() + (timeoutSeconds * 1000L);
int lastHealthy = 0;
while (System.currentTimeMillis() < deadline) {
int healthy = 0;
for (String cid : containerIds) {
ContainerStatus status = orchestrator.getContainerStatus(cid);
if ("healthy".equals(status.state())) healthy++;
}
lastHealthy = healthy;
if (healthy == containerIds.size()) return healthy;
try { Thread.sleep(2000); } catch (InterruptedException e) {
Thread.currentThread().interrupt();
return lastHealthy;
}
}
return lastHealthy;
}
private List<Map<String, Object>> updateReplicaHealth(List<Map<String, Object>> replicas,
List<String> containerIds) {
List<Map<String, Object>> updated = new ArrayList<>();
for (Map<String, Object> replica : replicas) {
String cid = (String) replica.get("containerId");
ContainerStatus status = orchestrator.getContainerStatus(cid);
Map<String, Object> copy = new HashMap<>(replica);
copy.put("status", status.running() ? "RUNNING" : "DEAD");
updated.add(copy);
}
return updated;
}
private void updateStage(UUID deploymentId, DeployStage stage) {
pgDeployRepo.updateDeployStage(deploymentId, stage.name());
}
private int parseMemoryLimitMb(String limit) {
limit = limit.trim().toLowerCase();
if (limit.endsWith("g")) return (int) (Double.parseDouble(limit.replace("g", "")) * 1024);
if (limit.endsWith("m")) return (int) Double.parseDouble(limit.replace("m", ""));
return Integer.parseInt(limit);
}
private Map<String, Object> resolvedConfigToMap(ResolvedContainerConfig config) {
Map<String, Object> map = new LinkedHashMap<>();
map.put("memoryLimitMb", config.memoryLimitMb());
if (config.memoryReserveMb() != null) map.put("memoryReserveMb", config.memoryReserveMb());
map.put("cpuRequest", config.cpuRequest());
if (config.cpuLimit() != null) map.put("cpuLimit", config.cpuLimit());
map.put("appPort", config.appPort());
map.put("exposedPorts", config.exposedPorts());
map.put("customEnvVars", config.customEnvVars());
map.put("stripPathPrefix", config.stripPathPrefix());
map.put("sslOffloading", config.sslOffloading());
map.put("routingMode", config.routingMode());
map.put("routingDomain", config.routingDomain());
map.put("serverUrl", config.serverUrl());
map.put("replicas", config.replicas());
map.put("deploymentStrategy", config.deploymentStrategy());
return map;
}
}

View File

@@ -0,0 +1,16 @@
package com.cameleer3.server.app.runtime;
import com.cameleer3.server.core.runtime.ContainerRequest;
import com.cameleer3.server.core.runtime.ContainerStatus;
import com.cameleer3.server.core.runtime.RuntimeOrchestrator;
import java.util.stream.Stream;
public class DisabledRuntimeOrchestrator implements RuntimeOrchestrator {
@Override public boolean isEnabled() { return false; }
@Override public String startContainer(ContainerRequest r) { throw new UnsupportedOperationException("Runtime management disabled"); }
@Override public void stopContainer(String id) { throw new UnsupportedOperationException("Runtime management disabled"); }
@Override public void removeContainer(String id) { throw new UnsupportedOperationException("Runtime management disabled"); }
@Override public ContainerStatus getContainerStatus(String id) { return ContainerStatus.notFound(); }
@Override public Stream<String> getLogs(String id, int tail) { return Stream.empty(); }
}

View File

@@ -0,0 +1,193 @@
package com.cameleer3.server.app.runtime;
import com.cameleer3.server.app.storage.PostgresDeploymentRepository;
import com.cameleer3.server.core.runtime.ContainerStatus;
import com.cameleer3.server.core.runtime.Deployment;
import com.cameleer3.server.core.runtime.DeploymentStatus;
import com.cameleer3.server.core.runtime.RuntimeOrchestrator;
import com.github.dockerjava.api.DockerClient;
import com.github.dockerjava.api.async.ResultCallback;
import com.github.dockerjava.api.model.Event;
import com.github.dockerjava.api.model.EventType;
import jakarta.annotation.PostConstruct;
import jakarta.annotation.PreDestroy;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.scheduling.annotation.Scheduled;
import java.io.Closeable;
import java.io.IOException;
import java.util.*;
public class DockerEventMonitor {
private static final Logger log = LoggerFactory.getLogger(DockerEventMonitor.class);
private final DockerClient dockerClient;
private final RuntimeOrchestrator runtimeOrchestrator;
private final PostgresDeploymentRepository deploymentRepository;
private Closeable eventStream;
public DockerEventMonitor(DockerRuntimeOrchestrator orchestrator,
PostgresDeploymentRepository deploymentRepository) {
this.dockerClient = orchestrator.getDockerClient();
this.runtimeOrchestrator = orchestrator;
this.deploymentRepository = deploymentRepository;
}
@PostConstruct
public void startListening() {
eventStream = dockerClient.eventsCmd()
.withEventTypeFilter(EventType.CONTAINER)
.withEventFilter("die", "oom", "start", "stop")
.exec(new ResultCallback.Adapter<Event>() {
@Override
public void onNext(Event event) {
handleEvent(event);
}
@Override
public void onError(Throwable throwable) {
log.warn("Docker event stream error, reconnecting: {}", throwable.getMessage());
reconnect();
}
});
log.info("Docker event monitor started");
}
@PreDestroy
public void stop() {
if (eventStream != null) {
try { eventStream.close(); } catch (IOException e) { /* ignore */ }
}
}
private void handleEvent(Event event) {
String containerId = event.getId();
if (containerId == null) return;
Map<String, String> labels = event.getActor() != null ? event.getActor().getAttributes() : null;
if (labels == null || !"cameleer3-server".equals(labels.get("managed-by"))) return;
String action = event.getAction();
log.debug("Docker event: {} for container {} ({})", action, containerId.substring(0, 12),
labels.get("cameleer.app"));
Optional<Deployment> deploymentOpt = deploymentRepository.findByContainerId(containerId);
if (deploymentOpt.isEmpty()) return;
Deployment deployment = deploymentOpt.get();
List<Map<String, Object>> replicas = new ArrayList<>(deployment.replicaStates());
boolean changed = false;
for (int i = 0; i < replicas.size(); i++) {
Map<String, Object> replica = replicas.get(i);
if (containerId.equals(replica.get("containerId"))) {
Map<String, Object> updated = new HashMap<>(replica);
switch (action) {
case "die", "oom", "stop" -> {
updated.put("status", "DEAD");
if ("oom".equals(action)) {
updated.put("oomKilled", true);
log.warn("Container {} OOM-killed (app={}, env={})", containerId.substring(0, 12),
labels.get("cameleer.app"), labels.get("cameleer.environment"));
}
}
case "start" -> updated.put("status", "RUNNING");
}
replicas.set(i, updated);
changed = true;
break;
}
}
if (!changed) return;
deploymentRepository.updateReplicaStates(deployment.id(), replicas);
long running = replicas.stream().filter(r -> "RUNNING".equals(r.get("status"))).count();
DeploymentStatus newStatus;
if (running == replicas.size()) {
newStatus = DeploymentStatus.RUNNING;
} else if (running > 0) {
newStatus = DeploymentStatus.DEGRADED;
} else {
newStatus = DeploymentStatus.FAILED;
}
if (deployment.status() != newStatus) {
deploymentRepository.updateStatus(deployment.id(), newStatus, deployment.containerId(), deployment.errorMessage());
log.info("Deployment {} status: {} -> {} ({}/{} replicas running)",
deployment.id(), deployment.status(), newStatus, running, replicas.size());
}
}
/**
* Periodic reconciliation: inspects actual container state for active deployments
* and corrects status mismatches caused by missed Docker events.
*/
@Scheduled(fixedDelay = 30_000, initialDelay = 60_000)
public void reconcile() {
List<Deployment> active = deploymentRepository.findByStatus(
List.of(DeploymentStatus.RUNNING, DeploymentStatus.DEGRADED, DeploymentStatus.STARTING));
for (Deployment deployment : active) {
if (deployment.replicaStates() == null || deployment.replicaStates().isEmpty()) continue;
List<Map<String, Object>> replicas = new ArrayList<>(deployment.replicaStates());
boolean changed = false;
for (int i = 0; i < replicas.size(); i++) {
Map<String, Object> replica = replicas.get(i);
String containerId = (String) replica.get("containerId");
if (containerId == null) continue;
ContainerStatus actual = runtimeOrchestrator.getContainerStatus(containerId);
String currentStatus = (String) replica.get("status");
String actualStatus = actual.running() ? "RUNNING" : "DEAD";
if (!actualStatus.equals(currentStatus)) {
Map<String, Object> updated = new HashMap<>(replica);
updated.put("status", actualStatus);
replicas.set(i, updated);
changed = true;
}
}
if (!changed) {
// Even if replica states haven't changed, check if deployment status is correct
long running = replicas.stream().filter(r -> "RUNNING".equals(r.get("status"))).count();
DeploymentStatus expected = running == replicas.size() ? DeploymentStatus.RUNNING
: running > 0 ? DeploymentStatus.DEGRADED : DeploymentStatus.FAILED;
if (deployment.status() != expected) {
deploymentRepository.updateStatus(deployment.id(), expected, deployment.containerId(), deployment.errorMessage());
log.info("Reconcile: deployment {} status corrected {} -> {} ({}/{} running)",
deployment.id(), deployment.status(), expected, running, replicas.size());
}
continue;
}
deploymentRepository.updateReplicaStates(deployment.id(), replicas);
long running = replicas.stream().filter(r -> "RUNNING".equals(r.get("status"))).count();
DeploymentStatus newStatus = running == replicas.size() ? DeploymentStatus.RUNNING
: running > 0 ? DeploymentStatus.DEGRADED : DeploymentStatus.FAILED;
if (deployment.status() != newStatus) {
deploymentRepository.updateStatus(deployment.id(), newStatus, deployment.containerId(), deployment.errorMessage());
log.info("Reconcile: deployment {} status {} -> {} ({}/{} replicas running)",
deployment.id(), deployment.status(), newStatus, running, replicas.size());
}
}
}
private void reconnect() {
try {
Thread.sleep(5000);
startListening();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}

View File

@@ -0,0 +1,62 @@
package com.cameleer3.server.app.runtime;
import com.github.dockerjava.api.DockerClient;
import com.github.dockerjava.api.model.Network;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.List;
public class DockerNetworkManager {
private static final Logger log = LoggerFactory.getLogger(DockerNetworkManager.class);
public static final String TRAEFIK_NETWORK = "cameleer-traefik";
public static final String ENV_NETWORK_PREFIX = "cameleer-env-";
private final DockerClient dockerClient;
public DockerNetworkManager(DockerClient dockerClient) {
this.dockerClient = dockerClient;
}
public String ensureNetwork(String networkName) {
List<Network> existing = dockerClient.listNetworksCmd()
.withNameFilter(networkName)
.exec();
for (Network net : existing) {
if (net.getName().equals(networkName)) {
return net.getId();
}
}
String id = dockerClient.createNetworkCmd()
.withName(networkName)
.withDriver("bridge")
.withCheckDuplicate(true)
.exec()
.getId();
log.info("Created Docker network: {} ({})", networkName, id);
return id;
}
public void connectContainer(String containerId, String networkName) {
String networkId = ensureNetwork(networkName);
try {
dockerClient.connectToNetworkCmd()
.withContainerId(containerId)
.withNetworkId(networkId)
.exec();
log.debug("Connected container {} to network {}", containerId, networkName);
} catch (Exception e) {
if (!e.getMessage().contains("already exists")) {
throw e;
}
}
}
public static String envNetworkName(String envSlug) {
return ENV_NETWORK_PREFIX + envSlug;
}
}

Some files were not shown because too many files have changed in this diff Show More