Commit Graph

131 Commits

Author SHA1 Message Date
hsiegeln
9b1ef51d77 feat!: scope per-app config and settings by environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m40s
SonarQube / sonarqube (push) Successful in 4m29s
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).

Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.

- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
  stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
  JWT env claim (agents cannot spoof env); non-agent callers provide env
  via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
  env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
  route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
  removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
  useAllAppSettings, useUpdateAppSettings) take env, wired to
  useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.

Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 22:25:21 +02:00
hsiegeln
4b264b3308 feat: add CPU usage to agent response and compact cards
Backend:
- Add cpuUsage field to AgentInstanceResponse (-1 if unavailable)
- Add queryAgentCpuUsage() to AgentRegistrationController — queries
  avg CPU per instance from agent_metrics over last 2 minutes
- Wire CPU into agent list response via withCpuUsage()

Frontend:
- Add cpuUsage to schema.d.ts
- Compute maxCpu per AppGroup (max across all instances)
- Show "X% cpu" on compact cards when available (hidden when -1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:12:23 +02:00
hsiegeln
cb3ebfea7c chore: rename cameleer3 to cameleer
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:42 +02:00
hsiegeln
fd806b9f2d fix: unify container/agent log identity and fix multi-replica log capture
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m18s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Four logging pipeline fixes:

1. Multi-replica startup logs: remove stopLogCaptureByApp from
   SseConnectionManager — container log capture now expires naturally
   after 60s instead of being killed when the first agent connects SSE.
   This ensures all replicas' bootstrap output is captured.

2. Unified instance_id: container logs and agent logs now share the same
   instance identity ({envSlug}-{appSlug}-{replicaIndex}). DeploymentExecutor
   sets CAMELEER_AGENT_INSTANCEID per replica so the agent uses the same
   ID as ContainerLogForwarder. Instance-level log views now show both
   container and agent logs.

3. Labels-first container identity: TraefikLabelBuilder emits cameleer.replica
   and cameleer.instance-id labels. Container names are tenant-prefixed
   ({tenantId}-{envSlug}-{appSlug}-{idx}) for global Docker daemon uniqueness.

4. Environment filter on log queries: useApplicationLogs now passes the
   selected environment to the API, preventing log leakage across environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:54:05 +02:00
hsiegeln
119cf912b8 feat: add useStartupLogs hook for container startup log polling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:21:23 +02:00
hsiegeln
9ac8e3604c fix: allow testing claim mapping rules before saving and keep rows editable after test
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
The test endpoint now accepts inline rules from the client instead of reading
from the database, so unsaved rules can be tested. Matched rows show the
checkmark alongside action buttons instead of replacing them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:52:18 +02:00
hsiegeln
7b73b5c9c5 feat: add per-app sensitive keys section to AppConfigDetailPage
Adds sensitiveKeys/globalSensitiveKeys/mergedSensitiveKeys fields to
ApplicationConfig, unwraps the new AppConfigResponse envelope in
useApplicationConfig, and renders an editable Sensitive Keys section
with read-only global pills and add/remove app-specific key tags.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 18:26:05 +02:00
hsiegeln
06c719f0dd feat: add sensitive keys API query hooks 2026-04-14 18:22:28 +02:00
hsiegeln
344700e29e feat: add React Query hooks for claim mapping rules API
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 16:46:40 +02:00
hsiegeln
0827fd21e3 feat: persist and display exchange properties from agent
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m59s
CI / docker (push) Successful in 2m13s
CI / deploy (push) Successful in 58s
CI / deploy-feature (push) Has been skipped
Add support for exchange properties sent by the agent alongside headers.
Properties flow through the same pipeline as headers: ClickHouse columns
(input_properties, output_properties) on both executions and
processor_executions tables, MergedExecution record, ChunkAccumulator
extraction, DetailService snapshot, and REST API response.

UI adds a Properties tab next to Headers in the process diagram detail
panel, with the same input/output split table layout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:23:53 +02:00
hsiegeln
e57343e3df feat: add delta mode for counter metrics using ClickHouse lag()
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Counter metrics like chunks.exported.count are monotonically increasing.
Add mode=delta query parameter to the agent metrics API that computes
per-bucket deltas server-side using ClickHouse lag() window function:
max(value) per bucket, then greatest(0, current - previous) to get the
increase per period with counter-reset handling.

The chunks exported/dropped charts now show throughput per bucket
instead of the ever-increasing cumulative total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:56:06 +02:00
hsiegeln
27f2503640 fix: clean up runtime UI and harden session expiry handling
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Remove redundant "X/X LIVE" badge from runtime page, breadcrumb trail
and routes section from agent detail page (pills moved into Process
Information card). Fix session expiry: guard against concurrent 401
refresh races and skip re-entrant triggers on auth endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 22:33:44 +02:00
hsiegeln
00c9a0006e feat: rework runtime charts and fix time range propagation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Runtime page (AgentInstance):
- Rearrange charts: CPU, Memory, GC (top); Threads, Chunks Exported,
  Chunks Dropped (bottom). Removes throughput/error charts (belong on
  Dashboard, not Runtime).
- Pass global time range (from/to) to useAgentMetrics — charts now
  respect the time filter instead of always showing last 60 minutes.
- Bottom row (logs + timeline) fills remaining vertical space.

Dashboard L3:
- Processor metrics section fills remaining vertical space.
- Chart x-axis uses timestamps instead of bucket indices.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:59:38 +02:00
hsiegeln
51a4317440 fix: optimistically remove dismissed app from sidebar cache
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m14s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Sets query cache immediately on dismiss success so the sidebar updates
without waiting for the catalog refetch to complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:05:04 +02:00
hsiegeln
90c82238a0 feat: add orphaned app cleanup — auto-filter stale discovered apps, manual dismiss with data purge
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:19:59 +02:00
hsiegeln
ee435985a9 feat: add Runtime Type and Custom Arguments fields to deployment Resources tab
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:11:59 +02:00
hsiegeln
6b00bf81e3 feat: add log source filter (app/agent) to runtime log viewers
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 1m7s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 10:27:59 +02:00
hsiegeln
c8cdd846c0 feat: fetch server capabilities and hide infra tabs when disabled
Adds a useServerCapabilities hook that fetches /api/v1/health once per
session (staleTime: Infinity) and extracts the infrastructureEndpoints
flag. buildAdminTreeNodes now accepts an opts parameter so ClickHouse
and Database tabs are hidden when the server reports infra endpoints as
disabled. LayoutShell wires the hook result into the admin tree memo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 23:12:30 +02:00
hsiegeln
b6b93dc3cc fix: prevent admin page redirect during token refresh
adminFetch called logout() directly on 401/403 responses, which cleared
roles and caused RequireAdmin to redirect to /exchanges while users were
editing forms. Now adminFetch attempts a token refresh before failing,
and RequireAdmin tolerates a transient empty-roles state during refresh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:28:45 +02:00
hsiegeln
be585934b9 fix: show descriptive error when creating local user with OIDC enabled
Return a JSON error body from UserAdminController instead of an empty 400,
and extract API error messages in adminFetch so toasts display the reason.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:19:10 +02:00
hsiegeln
1971c70638 fix: commands respect selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Backend: AgentRegistryService gains findByApplicationAndEnvironment()
and environment-aware addGroupCommandWithReplies() overload.
AgentCommandController and ApplicationConfigController accept optional
environment query parameter. When set, commands only target agents in
that environment. Backward compatible — null means all environments.

Frontend: All command mutations (config update, route control, traced
processors, tap config, route recording) now pass selectedEnv to the
backend via query parameter.

Prevents cross-environment command leakage — e.g., updating config for
prod no longer pushes to dev agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:28:09 +02:00
hsiegeln
69dcce2a8f fix: Runtime tab respects selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Add environment parameter to AgentEventsController, AgentEventService,
  and ClickHouseAgentEventRepository (filters agent_events by environment)
- Wire selectedEnv to useAgents and useAgentEvents in both AgentHealth
  and AgentInstance pages
- Wire selectedEnv to useStatsTimeseries in AgentInstance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:12:33 +02:00
hsiegeln
cb36d7936f fix: auto-compute environment slug + respect environment filter globally
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Part A: Environment creation slug is now auto-derived from display name
and shown read-only (matching app creation pattern). Removes manual slug
input.

Part B: All data queries now pass the selected environment to backend:
- Exchanges search, Dashboard L1/L2/L3 stats, Routes metrics, Route
  detail, correlation chains, and processor metrics all filter by
  selected environment.
- Backend RouteMetricsController now accepts environment parameter for
  both route and processor metrics endpoints.

Closes #XYZ

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:01:50 +02:00
hsiegeln
2df5e0d7ba feat: active config snapshot, composite StatusDot with tooltip
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 43s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Part 1 — Config snapshot:
- V8 migration adds resolved_config JSONB to deployments table
- DeploymentExecutor saves the full resolved config at deploy time
- Deployment record includes resolvedConfig for auditability

Part 2 — Composite health StatusDot:
- CatalogController computes composite health from deployment status +
  agent health (green only when RUNNING AND agent live)
- CatalogApp includes healthTooltip (e.g. "Deployment: RUNNING,
  Agents: live (1 connected)")
- StatusDot added to app detail header with deployment status Badge
- StatusDot added to deployment table rows
- Sidebar passes composite health + tooltip through to tree nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:00:54 +02:00
hsiegeln
b86e95f08e feat: unified catalog endpoint and slug-based app navigation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m47s
Consolidate route catalog (agent-driven) and apps table (deployment-
driven) into a single GET /api/v1/catalog?environment={slug} endpoint.
Apps table is authoritative; agent data enriches with live health,
routes, and metrics. Unmanaged apps (agents without App record) appear
with managed=false.

- Add CatalogController merging App records + agent registry + ClickHouse
- Add CatalogApp DTO with deployment summary, managed flag, health
- Change AppController and DeploymentController to accept slugs (not UUIDs)
- Add AppRepository.findBySlug() and AppService.getBySlug()
- Replace useRouteCatalog() with useCatalog() across all UI components
- Navigate to /apps/{slug} instead of /apps/{UUID}
- Update sidebar, search, and all catalog lookups to use slug

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:43:14 +02:00
hsiegeln
7e0536b5b3 feat: update Deployment interface with replicas, stages, new statuses
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:29:33 +02:00
hsiegeln
7e47f1628d feat: JAR retention policy with nightly cleanup job
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Per-environment "keep last N versions" setting (default 5, null for
unlimited). Nightly scheduled job at 03:00 deletes old versions from
both database and disk, skipping any version that is currently deployed.

Full stack:
- V6 migration: adds jar_retention_count column to environments
- Environment record, repository, service, admin controller endpoint
- JarRetentionJob: @Scheduled nightly, iterates environments and apps
- UI: retention policy editor on admin Environments page with
  toggle between limited/unlimited and version count input
- AppVersionRepository.delete() for version cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:06:28 +02:00
hsiegeln
863a992cc4 feat: add default container config editor to Environments admin page
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
New "Default Resource Limits" section in environment detail view with
memory limit/reserve, CPU shares/limit. These defaults apply to new
apps unless overridden per-app.

Added useUpdateDefaultContainerConfig hook for the PUT endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:52:39 +02:00
hsiegeln
de4ca10fa5 feat: move Apps from admin to main tab bar with container config
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m16s
- Apps tab visible to OPERATOR+ (hidden for VIEWER), scoped by
  sidebar app selection and environment filter
- List view: DataTable with name, environment, updated, created columns
- Detail view: deployments across all envs, version upload with
  per-env deploy target, container config form (resources, ports,
  custom env vars) with explicit Save
- Memory reserve field disabled for non-production environments
  with info hint
- Admin sidebar sorted alphabetically, Applications entry removed
- Old admin AppsPage deleted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:23:30 +02:00
hsiegeln
e04dca55aa feat: add Applications admin page with version upload and deployments
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- SplitPane layout with environment selector, app list, and detail pane
- Create/delete apps with slug uniqueness validation
- Upload JAR versions with file size display
- Deploy versions and stop running deployments with status badges
- Deployment list auto-refreshes every 5s for live status updates
- Registered at /admin/apps with sidebar entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:24:22 +02:00
hsiegeln
448a63adc9 feat: add About Me dialog showing user info, roles, and groups
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- Add GET /api/v1/auth/me endpoint returning current user's UserDetail
- Add AboutMeDialog component with role badges and group memberships
- Add userMenuItems prop to TopBar via design-system update
- Wire "About Me" menu item into user dropdown above Logout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:12:29 +02:00
hsiegeln
9af0043915 feat: add Environment admin UI page
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
SplitPane with create/edit/delete, production flag toggle,
enabled/disabled toggle. Follows existing admin page patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:19:05 +02:00
hsiegeln
d4b530ff8a refactor: remove PKCE from OIDC flow (confidential client)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Backend holds client_secret and does the token exchange server-side,
making PKCE redundant. Removes code_verifier/code_challenge from all
frontend auth paths and backend exchange method. Eliminates the source
of "grant request is invalid" errors from verifier mismatches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:22:13 +02:00
hsiegeln
03ff9a3813 feat: generic OIDC role extraction from access token
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m1s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The OIDC login flow now reads roles from the access_token (JWT) in
addition to the id_token. This fixes role extraction with providers
like Logto that put scopes/roles in access tokens rather than id_tokens.

- Add audience and additionalScopes to OidcConfig for RFC 8707 resource
  indicator support and configurable extra scopes
- OidcTokenExchanger decodes access_token with at+jwt-compatible processor,
  falls back to id_token if access_token is opaque or has no roles
- syncOidcRoles preserves existing local roles when OIDC returns none
- SPA includes resource and additionalScopes in authorization requests
- Admin UI exposes new config fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:16:52 +02:00
hsiegeln
c502a42f17 refactor: architecture cleanup — OIDC dedup, PKCE, K8s hardening
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m59s
- Extract OidcProviderHelper for shared discovery + JWK source construction
- Add SystemRole.normalizeScope() to centralize role normalization
- Merge duplicate claim extraction in OidcTokenExchanger
- Add PKCE (S256) to OIDC authorization flow (frontend + backend)
- Add SecurityContext (runAsNonRoot) to all K8s deployments
- Fix postgres probe to use $POSTGRES_USER instead of hardcoded username
- Remove default credentials from Dockerfile
- Extract sanitize_branch() to shared .gitea/sanitize-branch.sh
- Fix sidebar to use /exchanges/ paths directly, remove legacy redirects
- Centralize basePath computation in router.tsx via config module

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:57:29 +02:00
hsiegeln
0c77f8d594 feat: add User ID Claim field to OIDC admin config UI
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
New input in the Claim Mapping section lets admins configure which
id_token claim is used as the unique user identifier (default: sub).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:19:38 +02:00
hsiegeln
7ebbc18b31 fix: make API calls respect BASE_PATH for subpath deployments
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
config.apiBaseUrl now derives from <base> tag when no explicit config
is set (e.g., /server/api/v1 instead of /api/v1). commands.ts authFetch
prepends apiBaseUrl and uses relative paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:04:52 +02:00
hsiegeln
69055f7d74 fix: persist environment selection in Zustand store instead of URL params
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Environment selector was losing its value on navigation because URL search
params were silently dropped by navigate() calls. Moved to a Zustand store
with localStorage persistence so the selection survives navigation, page
refresh, and new tabs. Switching environment now resets all filters, clears
URL params, invalidates queries, and remounts pages via Outlet key. Also
syncs openapi.json schema with running backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 17:12:16 +02:00
hsiegeln
694d0eef59 feat: add environment filtering across all APIs and UI
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Backend: Added optional `environment` query parameter to catalog,
search, stats, timeseries, punchcard, top-errors, logs, and agents
endpoints. ClickHouse queries filter by environment when specified
(literal SQL for AggregatingMergeTree, ? binds for raw tables).
StatsStore interface methods all accept environment parameter.

UI: Added EnvironmentSelector component (compact native select).
LayoutShell extracts distinct environments from agent data and
passes selected environment to catalog and agent queries via URL
search param (?env=). TopBar shows current environment label.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:42:26 +02:00
hsiegeln
ac94a67a49 fix: reduce ClickHouse CPU by increasing flush interval, rename LIVE→AUTO labels
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m24s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:05:29 +02:00
hsiegeln
6f00ff2e28 fix: reduce ClickHouse log noise, admin query spam, and diagram scan perf
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
- Set com.clickhouse log level to INFO and org.apache.hc.client5 to WARN
- Admin hooks (useUsers/useGroups/useRoles) now only fetch on admin pages,
  eliminating AUDIT view_users entries on every UI click
- Add ClickHouse projection on route_diagrams for (tenant_id, route_id,
  instance_id, created_at) to avoid full table scans on diagram lookups
- Bump @cameleer/design-system to v0.1.28 (PAUSED mode time range fix,
  refreshTimeRange API)
- Call refreshTimeRange before invalidateQueries in PAUSED mode manual
  refresh so sidebar clicks use current time window

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:48:30 +02:00
hsiegeln
901dfd1eb8 fix: PAUSED mode disabled queries entirely instead of just polling
Some checks failed
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
useLiveQuery returned enabled:false when paused, which prevented
queries from running at all. Changed to enabled:true always —
PAUSED now means "fetch once, no polling" instead of "don't fetch".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:25:04 +02:00
hsiegeln
c3b4f70913 feat(#116): update command hooks for synchronous group response
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 30s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Add CommandGroupResponse and ConfigUpdateResponse types. Switch
useSendGroupCommand and useSendRouteCommand from openapi-fetch to authFetch
returning CommandGroupResponse. Update useUpdateApplicationConfig to return
ConfigUpdateResponse and fix all consumer onSuccess callbacks to access
saved.config.version instead of saved.version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:01:06 +02:00
hsiegeln
b73f5e6dd4 feat: add Logs tab with cursor-paginated search, level filters, and live tail
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 1m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
- Extend GET /api/v1/logs with cursor pagination, multi-level filtering,
  optional application scoping, and level count aggregation
- Add exchangeId, instanceId, application, mdc fields to log responses
- Refactor ClickHouseLogStore with keyset pagination (N+1 pattern)
- Add LogSearchRequest/LogSearchResponse core domain records
- Create LogSearchPageResponse wrapper DTO
- Add Logs as 4th content tab (Exchanges | Dashboard | Runtime | Logs)
- Implement LogSearch component with debounced search, level filter bar,
  expandable log entries, cursor pagination, and live tail mode
- Add cross-navigation: exchange header → logs, log tab → logs tab
- Update ClickHouseLogStoreIT with cursor, multi-level, cross-app tests

Closes: #104

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 08:47:16 +02:00
hsiegeln
f3feaddbfe feat: show distinct attribute keys in cmd-k Attributes tab
All checks were successful
CI / build (push) Successful in 1m58s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m46s
CI / deploy (push) Successful in 47s
CI / deploy-feature (push) Has been skipped
Add GET /search/attributes/keys endpoint that queries distinct
attribute key names from ClickHouse using JSONExtractKeys. Attribute
keys appear in the cmd-k Attributes tab alongside attribute value
matches from exchange results.

- SearchIndex.distinctAttributeKeys() interface method
- ClickHouseSearchIndex implementation using arrayJoin(JSONExtractKeys)
- SearchController /attributes/keys endpoint
- useAttributeKeys() React Query hook
- buildSearchData includes attribute keys as 'attribute' category items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:39:27 +02:00
hsiegeln
e26266532a fix: regenerate OpenAPI types, fix search scoping by applicationId
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m10s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
The identity rename (application→applicationId) broke search filtering
because the stale schema.d.ts still had 'application' as the field name.
The backend silently ignored the unknown field, returning unfiltered results.

- Regenerate openapi.json and schema.d.ts from live backend
- Fix Dashboard: application→applicationId in search request
- Fix RouteDetail: application→applicationId in search request (2 places)
- LayoutShell: scope command palette search by appId/routeId
- LayoutShell: pass sidebarReveal state on sidebar click navigation

Note for DS team: the Sidebar selectedPath logic (line 5451 in dist)
has a hardcoded pathname.startsWith("/exchanges/") guard. This should
be broadened to simply `S ? S : $.pathname` so sidebarReveal works on
all tabs (dashboard, runtime), not just exchanges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:55:19 +02:00
hsiegeln
a028905e41 fix: update agent field names in frontend to match backend DTO
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
The AgentInstanceResponse backend DTO uses instanceId, displayName,
applicationId, status — but the stale schema.d.ts still had id, name,
application, state. This caused the runtime table to show no data.

- Update schema.d.ts AgentInstanceResponse fields
- Fix AgentHealth: row.id→instanceId, row.name→displayName,
  row.application→applicationId, inst.id→instanceId
- Fix AgentInstance: agent.id→instanceId, agent.name→displayName
- Fix ExchangeHeader: agent.id→instanceId, agent.state→status
- Fix LayoutShell search: agent.state→status, agentTps→tps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:36:31 +02:00
hsiegeln
f82aa26371 fix: improve ClickHouse admin page, fix AgentHealth type error
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 3m46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 58s
Rewrite ClickHouse admin to show useful storage metrics instead of
often-empty system.events data. Add active queries section.

- Replace performance endpoint: query system.parts for disk size,
  uncompressed size, compression ratio, total rows, part count
- Add /queries endpoint querying system.processes for active queries
- Frontend: storage overview strip, tables with total size, active
  queries DataTable
- Fix AgentHealth.tsx type: agentId → instanceId in inline type cast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:18:06 +02:00
hsiegeln
188810e54b feat: remove TimescaleDB, dead PG stores, and storage feature flags
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.

- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
  events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
  only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
  (docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:10:58 +02:00
hsiegeln
283e38a20d feat: remove OpenSearch, add ClickHouse admin page
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 33s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Remove all OpenSearch code, dependencies, configuration, deployment
manifests, and CI/CD references. Replace the OpenSearch admin page
with a ClickHouse admin page showing cluster status, table sizes,
performance metrics, and indexer pipeline stats.

- Delete 11 OpenSearch Java files (config, search impl, admin controller, DTOs, tests)
- Delete 3 OpenSearch frontend files (admin page, CSS, query hooks)
- Delete deploy/opensearch.yaml K8s manifest
- Remove opensearch Maven dependencies from pom.xml
- Remove opensearch config from application.yml, Dockerfile, docker-compose
- Remove opensearch from CI workflow (secrets, deploy, cleanup steps)
- Simplify ThresholdConfig (remove OpenSearch thresholds, database-only)
- Change default search backend from opensearch to clickhouse
- Add ClickHouseAdminController with /status, /tables, /performance, /pipeline
- Add ClickHouseAdminPage with StatCards, pipeline ProgressBar, tables DataTable
- Update CLAUDE.md, HOWTO.md, and source comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:56:06 +02:00