Commit Graph

94 Commits

Author SHA1 Message Date
hsiegeln
873e1d3df7 feat!: move query/logs/routes/diagram/agent-view endpoints under /environments/{envSlug}/
P3C — the last data/query wave of the taxonomy migration. Every user-
facing read endpoint that was keyed on env-as-query-param is now under
the env-scoped URL, making env impossible to omit and unambiguous in
server-side tenant+env filtering.

Server:
- SearchController: /api/v1/search/** → /api/v1/environments/{envSlug}/...
  Endpoints: /executions (GET), /executions/search (POST), /stats,
  /stats/timeseries, /stats/timeseries/by-app, /stats/timeseries/by-route,
  /stats/punchcard, /attributes/keys, /errors/top. Env comes from path.
- LogQueryController: /api/v1/logs → /api/v1/environments/{envSlug}/logs.
- RouteCatalogController: /api/v1/routes/catalog → /api/v1/environments/
  {envSlug}/routes. Env filter unconditional (path).
- RouteMetricsController: /api/v1/routes/metrics →
  /api/v1/environments/{envSlug}/routes/metrics (and /metrics/processors).
- DiagramRenderController: /{contentHash}/render stays flat (hashes are
  globally unique). Find-by-route moved to /api/v1/environments/{envSlug}/
  apps/{appSlug}/routes/{routeId}/diagram — the old GET /diagrams?...
  handler is removed.
- Agent views split cleanly:
  - AgentListController (new): /api/v1/environments/{envSlug}/agents
  - AgentEventsController: /api/v1/environments/{envSlug}/agents/events
  - AgentMetricsController: /api/v1/environments/{envSlug}/agents/
    {agentId}/metrics — now also rejects cross-env agents (404) as a
    defense-in-depth check, fulfilling B3.
  Agent self-service endpoints (register/refresh/heartbeat/deregister)
  remain flat at /api/v1/agents/** — JWT-authoritative.

SPA:
- queries/agents.ts, agent-metrics.ts, logs.ts, catalog.ts (route
  metrics only; /catalog stays flat), processor-metrics.ts,
  executions.ts (attributes/keys, stats, timeseries, search),
  dashboard.ts (all stats/errors/punchcard), correlation.ts,
  diagrams.ts (by-route) — all rewritten to env-scoped URLs.
- Hooks now either read env from useEnvironmentStore internally or
  require it as an argument. Query keys include env so switching env
  invalidates caches.
- useAgents/useAgentEvents signature simplified — env is no longer a
  parameter; it's read from the store. Callers (LayoutShell,
  AgentHealth, AgentInstance) updated accordingly.
- LogTab and useStartupLogs thread env through to useLogs.
- envFetch helper introduced in executions.ts for env-prefixed raw
  fetch until schema.d.ts is regenerated against the new backend.

BREAKING CHANGE: All these flat paths are removed:
  /api/v1/search/**, /api/v1/logs, /api/v1/routes/catalog,
  /api/v1/routes/metrics (and /processors), /api/v1/diagrams
  (lookup), /api/v1/agents (list), /api/v1/agents/events-log,
  /api/v1/agents/{id}/metrics, /api/v1/agent-events.
Clients must use the /api/v1/environments/{envSlug}/... equivalents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:48:25 +02:00
hsiegeln
6d9e456b97 feat!: move apps & deployments under /api/v1/environments/{envSlug}/apps/{appSlug}/...
P3B of the taxonomy migration. App and deployment routes are now
env-scoped in the URL itself, making the (env, app_slug) uniqueness
key explicit. Previously /api/v1/apps/{appSlug} was ambiguous: with
the same app deployed to multiple environments (dev/staging/prod),
the handler called AppService.getBySlug(slug) which returns the
first row matching slug regardless of env.

Server:
- AppController: @RequestMapping("/api/v1/environments/{envSlug}/
  apps"). Every handler now calls
  appService.getByEnvironmentAndSlug(env.id(), appSlug) — 404 if the
  app doesn't exist in *this* env. CreateAppRequest body drops
  environmentId (it's in the path).
- DeploymentController: @RequestMapping("/api/v1/environments/
  {envSlug}/apps/{appSlug}/deployments"). DeployRequest body drops
  environmentId. PromoteRequest body switches from
  targetEnvironmentId (UUID) to targetEnvironment (slug);
  promote handler resolves the target env by slug and looks up the
  app with the same slug in the target env (fails with 404 if the
  target app doesn't exist yet — apps must exist in both source
  and target before promote).
- AppService: added getByEnvironmentAndSlug helper; createApp now
  validates slug against ^[a-z0-9][a-z0-9-]{0,63}$ (400 on
  invalid).

SPA:
- queries/admin/apps.ts: rewritten. Hooks take envSlug where
  env-scoped. Removed useAllApps (no flat endpoint). Renamed path
  param naming: appId → appSlug throughout. Added
  usePromoteDeployment. Query keys include envSlug so cache is
  env-scoped.
- AppsTab.tsx: call sites updated. When no environment is selected,
  the managed-app list is empty — cross-env discovery lives in the
  Runtime tab (catalog). handleDeploy/handleStop/etc. pass envSlug
  to the new hook signatures.

BREAKING CHANGE: /api/v1/apps/** paths removed. Clients must use
/api/v1/environments/{envSlug}/apps/{appSlug}/**. Request bodies
for POST /apps and POST /apps/{slug}/deployments no longer accept
environmentId (use the URL path instead). Promote body uses slug
not UUID.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:38:37 +02:00
hsiegeln
969cdb3bd0 feat!: move config & settings under /api/v1/environments/{envSlug}/...
P3A of the taxonomy migration. Env-scoped config and settings endpoints
now live under the env-prefixed URL shape, making env a first-class
path segment instead of a query param. Agent-authoritative config is
split off into a dedicated endpoint so agent env comes from the JWT
only — never spoofable via URL.

Server:
- ApplicationConfigController: @RequestMapping("/api/v1/environments/
  {envSlug}"). Handlers use @EnvPath Environment env, appSlug as
  @PathVariable. Removed the dual-mode resolveEnvironmentForRead —
  user flow only; agent flow moved to AgentConfigController.
- AgentConfigController (new): GET /api/v1/agents/config. Reads
  instanceId from JWT subject, resolves (app, env) from registry,
  returns AppConfigResponse. Registry miss → falls back to JWT env
  claim for environment, but 404s if application cannot be derived
  (no other source without registry).
- AppSettingsController: @RequestMapping("/api/v1/environments/
  {envSlug}"). List at /app-settings, per-app at /apps/{appSlug}/
  settings. Access class-wide PreAuthorize preserved (ADMIN/OPERATOR).

SPA:
- commands.ts: useAllApplicationConfigs, useApplicationConfig,
  useUpdateApplicationConfig, useProcessorRouteMapping,
  useTestExpression — rewritten URLs to /environments/{env}/apps/
  {app}/... shape. environment now required on every call. Query
  keys include environment so cache is env-scoped.
- dashboard.ts: useAppSettings, useAllAppSettings, useUpdateAppSettings
  rewritten.
- TapConfigModal: new required environment prop; callers updated.
- RouteDetail, ExchangesPage: thread selectedEnv into test-expression
  and modal.

Config changes in SecurityConfig for the new shape landed earlier in
P0.2; no security rule changes needed in this commit.

BREAKING CHANGE: /api/v1/config/** and /api/v1/admin/app-settings/**
paths removed. Agents must use /api/v1/agents/config instead of
GET /api/v1/config/{app}; users must use /api/v1/environments/{env}/
apps/{app}/config and /api/v1/environments/{env}/apps/{app}/settings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:33:25 +02:00
hsiegeln
6b5ee10944 feat!: environment admin URLs use slug; validate and immutabilize slug
UUID-based admin paths were the only remaining UUID-in-URL pattern in
the API. Migrates /api/v1/admin/environments/{id} to /{envSlug} so
slugs are the single environment identifier in every URL. UUIDs stay
internal to the database.

- Controller: @PathVariable UUID id → @PathVariable String envSlug on
  get/update/delete and the two nested endpoints (default-container-
  config, jar-retention). Handlers resolve slug → Environment via
  EnvironmentService.getBySlug, then delegate to existing UUID-based
  service methods.
- Service: create() now validates slug against ^[a-z0-9][a-z0-9-]{0,63}$
  and returns 400 on invalid slugs. Rationale documented in the class:
  slugs are immutable after creation because they appear in URLs,
  Docker network names, container names, and ClickHouse partition keys.
- UpdateEnvironmentRequest has no slug field and Jackson's default
  ignore-unknown behavior drops any slug supplied in a PUT body;
  regression test (updateEnvironment_withSlugInBody_ignoresSlug)
  documents this invariant.
- SPA: mutation args change from { id } to { slug }. EnvironmentsPage
  still uses env.id for local selection state (UUID from DB) but
  passes env.slug to every mutation.

BREAKING CHANGE: /api/v1/admin/environments/{id:UUID}/... paths removed.
Clients must use /{envSlug}/... (slug from the environments list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:23:31 +02:00
hsiegeln
fcb53dd010 fix!: require environment on diagram lookup and attribute keys queries
Closes two cross-env data leakage paths. Both endpoints previously
returned data aggregated across all environments, so a diagram or
attribute key from dev could appear in a prod UI query (and vice versa).

B1: GET /api/v1/diagrams?application=&routeId= now requires
?environment= and resolves agents via
registryService.findByApplicationAndEnvironment instead of
findByApplication. Prevents serving a dev diagram for a prod route.

B2: GET /api/v1/search/attributes/keys now requires ?environment=.
SearchIndex.distinctAttributeKeys gains an environment parameter and
the ClickHouse query adds the env filter alongside the existing
tenant_id filter. Prevents prod attribute names leaking into dev
autocompletion (and vice versa).

SPA hooks updated to thread environment through from
useEnvironmentStore; query keys include environment so React Query
re-fetches on env switch. No call-site changes needed — hook
signatures unchanged.

B3 (AgentMetricsController env scope) deferred to P3C: agent-env is
effectively 1:1 today via the instance_id naming
({envSlug}-{appSlug}-{replicaIndex}), and the URL migration in P3C
to /api/v1/environments/{env}/agents/{agentId}/metrics naturally
introduces env from path. A minimal P1 fix would regress the "view
metrics of a killed agent" case.

BREAKING CHANGE: Both endpoints now require ?environment= (slug).
Clients omitting the parameter receive 400.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:19:55 +02:00
hsiegeln
9b1ef51d77 feat!: scope per-app config and settings by environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m40s
SonarQube / sonarqube (push) Successful in 4m29s
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).

Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.

- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
  stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
  JWT env claim (agents cannot spoof env); non-agent callers provide env
  via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
  env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
  route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
  removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
  useAllAppSettings, useUpdateAppSettings) take env, wired to
  useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.

Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 22:25:21 +02:00
hsiegeln
fd806b9f2d fix: unify container/agent log identity and fix multi-replica log capture
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m18s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Four logging pipeline fixes:

1. Multi-replica startup logs: remove stopLogCaptureByApp from
   SseConnectionManager — container log capture now expires naturally
   after 60s instead of being killed when the first agent connects SSE.
   This ensures all replicas' bootstrap output is captured.

2. Unified instance_id: container logs and agent logs now share the same
   instance identity ({envSlug}-{appSlug}-{replicaIndex}). DeploymentExecutor
   sets CAMELEER_AGENT_INSTANCEID per replica so the agent uses the same
   ID as ContainerLogForwarder. Instance-level log views now show both
   container and agent logs.

3. Labels-first container identity: TraefikLabelBuilder emits cameleer.replica
   and cameleer.instance-id labels. Container names are tenant-prefixed
   ({tenantId}-{envSlug}-{appSlug}-{idx}) for global Docker daemon uniqueness.

4. Environment filter on log queries: useApplicationLogs now passes the
   selected environment to the API, preventing log leakage across environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:54:05 +02:00
hsiegeln
119cf912b8 feat: add useStartupLogs hook for container startup log polling
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:21:23 +02:00
hsiegeln
9ac8e3604c fix: allow testing claim mapping rules before saving and keep rows editable after test
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
The test endpoint now accepts inline rules from the client instead of reading
from the database, so unsaved rules can be tested. Matched rows show the
checkmark alongside action buttons instead of replacing them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:52:18 +02:00
hsiegeln
7b73b5c9c5 feat: add per-app sensitive keys section to AppConfigDetailPage
Adds sensitiveKeys/globalSensitiveKeys/mergedSensitiveKeys fields to
ApplicationConfig, unwraps the new AppConfigResponse envelope in
useApplicationConfig, and renders an editable Sensitive Keys section
with read-only global pills and add/remove app-specific key tags.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 18:26:05 +02:00
hsiegeln
06c719f0dd feat: add sensitive keys API query hooks 2026-04-14 18:22:28 +02:00
hsiegeln
344700e29e feat: add React Query hooks for claim mapping rules API
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 16:46:40 +02:00
hsiegeln
e57343e3df feat: add delta mode for counter metrics using ClickHouse lag()
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Counter metrics like chunks.exported.count are monotonically increasing.
Add mode=delta query parameter to the agent metrics API that computes
per-bucket deltas server-side using ClickHouse lag() window function:
max(value) per bucket, then greatest(0, current - previous) to get the
increase per period with counter-reset handling.

The chunks exported/dropped charts now show throughput per bucket
instead of the ever-increasing cumulative total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:56:06 +02:00
hsiegeln
00c9a0006e feat: rework runtime charts and fix time range propagation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Runtime page (AgentInstance):
- Rearrange charts: CPU, Memory, GC (top); Threads, Chunks Exported,
  Chunks Dropped (bottom). Removes throughput/error charts (belong on
  Dashboard, not Runtime).
- Pass global time range (from/to) to useAgentMetrics — charts now
  respect the time filter instead of always showing last 60 minutes.
- Bottom row (logs + timeline) fills remaining vertical space.

Dashboard L3:
- Processor metrics section fills remaining vertical space.
- Chart x-axis uses timestamps instead of bucket indices.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:59:38 +02:00
hsiegeln
51a4317440 fix: optimistically remove dismissed app from sidebar cache
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m14s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Sets query cache immediately on dismiss success so the sidebar updates
without waiting for the catalog refetch to complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:05:04 +02:00
hsiegeln
90c82238a0 feat: add orphaned app cleanup — auto-filter stale discovered apps, manual dismiss with data purge
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:19:59 +02:00
hsiegeln
ee435985a9 feat: add Runtime Type and Custom Arguments fields to deployment Resources tab
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:11:59 +02:00
hsiegeln
6b00bf81e3 feat: add log source filter (app/agent) to runtime log viewers
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 1m7s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 10:27:59 +02:00
hsiegeln
c8cdd846c0 feat: fetch server capabilities and hide infra tabs when disabled
Adds a useServerCapabilities hook that fetches /api/v1/health once per
session (staleTime: Infinity) and extracts the infrastructureEndpoints
flag. buildAdminTreeNodes now accepts an opts parameter so ClickHouse
and Database tabs are hidden when the server reports infra endpoints as
disabled. LayoutShell wires the hook result into the admin tree memo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 23:12:30 +02:00
hsiegeln
b6b93dc3cc fix: prevent admin page redirect during token refresh
adminFetch called logout() directly on 401/403 responses, which cleared
roles and caused RequireAdmin to redirect to /exchanges while users were
editing forms. Now adminFetch attempts a token refresh before failing,
and RequireAdmin tolerates a transient empty-roles state during refresh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:28:45 +02:00
hsiegeln
be585934b9 fix: show descriptive error when creating local user with OIDC enabled
Return a JSON error body from UserAdminController instead of an empty 400,
and extract API error messages in adminFetch so toasts display the reason.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:19:10 +02:00
hsiegeln
1971c70638 fix: commands respect selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Backend: AgentRegistryService gains findByApplicationAndEnvironment()
and environment-aware addGroupCommandWithReplies() overload.
AgentCommandController and ApplicationConfigController accept optional
environment query parameter. When set, commands only target agents in
that environment. Backward compatible — null means all environments.

Frontend: All command mutations (config update, route control, traced
processors, tap config, route recording) now pass selectedEnv to the
backend via query parameter.

Prevents cross-environment command leakage — e.g., updating config for
prod no longer pushes to dev agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:28:09 +02:00
hsiegeln
69dcce2a8f fix: Runtime tab respects selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Add environment parameter to AgentEventsController, AgentEventService,
  and ClickHouseAgentEventRepository (filters agent_events by environment)
- Wire selectedEnv to useAgents and useAgentEvents in both AgentHealth
  and AgentInstance pages
- Wire selectedEnv to useStatsTimeseries in AgentInstance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:12:33 +02:00
hsiegeln
cb36d7936f fix: auto-compute environment slug + respect environment filter globally
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Part A: Environment creation slug is now auto-derived from display name
and shown read-only (matching app creation pattern). Removes manual slug
input.

Part B: All data queries now pass the selected environment to backend:
- Exchanges search, Dashboard L1/L2/L3 stats, Routes metrics, Route
  detail, correlation chains, and processor metrics all filter by
  selected environment.
- Backend RouteMetricsController now accepts environment parameter for
  both route and processor metrics endpoints.

Closes #XYZ

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:01:50 +02:00
hsiegeln
2df5e0d7ba feat: active config snapshot, composite StatusDot with tooltip
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 43s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Part 1 — Config snapshot:
- V8 migration adds resolved_config JSONB to deployments table
- DeploymentExecutor saves the full resolved config at deploy time
- Deployment record includes resolvedConfig for auditability

Part 2 — Composite health StatusDot:
- CatalogController computes composite health from deployment status +
  agent health (green only when RUNNING AND agent live)
- CatalogApp includes healthTooltip (e.g. "Deployment: RUNNING,
  Agents: live (1 connected)")
- StatusDot added to app detail header with deployment status Badge
- StatusDot added to deployment table rows
- Sidebar passes composite health + tooltip through to tree nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:00:54 +02:00
hsiegeln
b86e95f08e feat: unified catalog endpoint and slug-based app navigation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m47s
Consolidate route catalog (agent-driven) and apps table (deployment-
driven) into a single GET /api/v1/catalog?environment={slug} endpoint.
Apps table is authoritative; agent data enriches with live health,
routes, and metrics. Unmanaged apps (agents without App record) appear
with managed=false.

- Add CatalogController merging App records + agent registry + ClickHouse
- Add CatalogApp DTO with deployment summary, managed flag, health
- Change AppController and DeploymentController to accept slugs (not UUIDs)
- Add AppRepository.findBySlug() and AppService.getBySlug()
- Replace useRouteCatalog() with useCatalog() across all UI components
- Navigate to /apps/{slug} instead of /apps/{UUID}
- Update sidebar, search, and all catalog lookups to use slug

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:43:14 +02:00
hsiegeln
7e0536b5b3 feat: update Deployment interface with replicas, stages, new statuses
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 20:29:33 +02:00
hsiegeln
7e47f1628d feat: JAR retention policy with nightly cleanup job
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Per-environment "keep last N versions" setting (default 5, null for
unlimited). Nightly scheduled job at 03:00 deletes old versions from
both database and disk, skipping any version that is currently deployed.

Full stack:
- V6 migration: adds jar_retention_count column to environments
- Environment record, repository, service, admin controller endpoint
- JarRetentionJob: @Scheduled nightly, iterates environments and apps
- UI: retention policy editor on admin Environments page with
  toggle between limited/unlimited and version count input
- AppVersionRepository.delete() for version cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:06:28 +02:00
hsiegeln
863a992cc4 feat: add default container config editor to Environments admin page
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
New "Default Resource Limits" section in environment detail view with
memory limit/reserve, CPU shares/limit. These defaults apply to new
apps unless overridden per-app.

Added useUpdateDefaultContainerConfig hook for the PUT endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:52:39 +02:00
hsiegeln
de4ca10fa5 feat: move Apps from admin to main tab bar with container config
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m16s
- Apps tab visible to OPERATOR+ (hidden for VIEWER), scoped by
  sidebar app selection and environment filter
- List view: DataTable with name, environment, updated, created columns
- Detail view: deployments across all envs, version upload with
  per-env deploy target, container config form (resources, ports,
  custom env vars) with explicit Save
- Memory reserve field disabled for non-production environments
  with info hint
- Admin sidebar sorted alphabetically, Applications entry removed
- Old admin AppsPage deleted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:23:30 +02:00
hsiegeln
e04dca55aa feat: add Applications admin page with version upload and deployments
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m12s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- SplitPane layout with environment selector, app list, and detail pane
- Create/delete apps with slug uniqueness validation
- Upload JAR versions with file size display
- Deploy versions and stop running deployments with status badges
- Deployment list auto-refreshes every 5s for live status updates
- Registered at /admin/apps with sidebar entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:24:22 +02:00
hsiegeln
448a63adc9 feat: add About Me dialog showing user info, roles, and groups
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- Add GET /api/v1/auth/me endpoint returning current user's UserDetail
- Add AboutMeDialog component with role badges and group memberships
- Add userMenuItems prop to TopBar via design-system update
- Wire "About Me" menu item into user dropdown above Logout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:12:29 +02:00
hsiegeln
9af0043915 feat: add Environment admin UI page
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
SplitPane with create/edit/delete, production flag toggle,
enabled/disabled toggle. Follows existing admin page patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:19:05 +02:00
hsiegeln
7ebbc18b31 fix: make API calls respect BASE_PATH for subpath deployments
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
config.apiBaseUrl now derives from <base> tag when no explicit config
is set (e.g., /server/api/v1 instead of /api/v1). commands.ts authFetch
prepends apiBaseUrl and uses relative paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:04:52 +02:00
hsiegeln
694d0eef59 feat: add environment filtering across all APIs and UI
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Backend: Added optional `environment` query parameter to catalog,
search, stats, timeseries, punchcard, top-errors, logs, and agents
endpoints. ClickHouse queries filter by environment when specified
(literal SQL for AggregatingMergeTree, ? binds for raw tables).
StatsStore interface methods all accept environment parameter.

UI: Added EnvironmentSelector component (compact native select).
LayoutShell extracts distinct environments from agent data and
passes selected environment to catalog and agent queries via URL
search param (?env=). TopBar shows current environment label.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:42:26 +02:00
hsiegeln
ac94a67a49 fix: reduce ClickHouse CPU by increasing flush interval, rename LIVE→AUTO labels
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m24s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:05:29 +02:00
hsiegeln
6f00ff2e28 fix: reduce ClickHouse log noise, admin query spam, and diagram scan perf
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
- Set com.clickhouse log level to INFO and org.apache.hc.client5 to WARN
- Admin hooks (useUsers/useGroups/useRoles) now only fetch on admin pages,
  eliminating AUDIT view_users entries on every UI click
- Add ClickHouse projection on route_diagrams for (tenant_id, route_id,
  instance_id, created_at) to avoid full table scans on diagram lookups
- Bump @cameleer/design-system to v0.1.28 (PAUSED mode time range fix,
  refreshTimeRange API)
- Call refreshTimeRange before invalidateQueries in PAUSED mode manual
  refresh so sidebar clicks use current time window

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:48:30 +02:00
hsiegeln
901dfd1eb8 fix: PAUSED mode disabled queries entirely instead of just polling
Some checks failed
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
useLiveQuery returned enabled:false when paused, which prevented
queries from running at all. Changed to enabled:true always —
PAUSED now means "fetch once, no polling" instead of "don't fetch".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:25:04 +02:00
hsiegeln
c3b4f70913 feat(#116): update command hooks for synchronous group response
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 30s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Add CommandGroupResponse and ConfigUpdateResponse types. Switch
useSendGroupCommand and useSendRouteCommand from openapi-fetch to authFetch
returning CommandGroupResponse. Update useUpdateApplicationConfig to return
ConfigUpdateResponse and fix all consumer onSuccess callbacks to access
saved.config.version instead of saved.version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:01:06 +02:00
hsiegeln
b73f5e6dd4 feat: add Logs tab with cursor-paginated search, level filters, and live tail
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 1m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
- Extend GET /api/v1/logs with cursor pagination, multi-level filtering,
  optional application scoping, and level count aggregation
- Add exchangeId, instanceId, application, mdc fields to log responses
- Refactor ClickHouseLogStore with keyset pagination (N+1 pattern)
- Add LogSearchRequest/LogSearchResponse core domain records
- Create LogSearchPageResponse wrapper DTO
- Add Logs as 4th content tab (Exchanges | Dashboard | Runtime | Logs)
- Implement LogSearch component with debounced search, level filter bar,
  expandable log entries, cursor pagination, and live tail mode
- Add cross-navigation: exchange header → logs, log tab → logs tab
- Update ClickHouseLogStoreIT with cursor, multi-level, cross-app tests

Closes: #104

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 08:47:16 +02:00
hsiegeln
f3feaddbfe feat: show distinct attribute keys in cmd-k Attributes tab
All checks were successful
CI / build (push) Successful in 1m58s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m46s
CI / deploy (push) Successful in 47s
CI / deploy-feature (push) Has been skipped
Add GET /search/attributes/keys endpoint that queries distinct
attribute key names from ClickHouse using JSONExtractKeys. Attribute
keys appear in the cmd-k Attributes tab alongside attribute value
matches from exchange results.

- SearchIndex.distinctAttributeKeys() interface method
- ClickHouseSearchIndex implementation using arrayJoin(JSONExtractKeys)
- SearchController /attributes/keys endpoint
- useAttributeKeys() React Query hook
- buildSearchData includes attribute keys as 'attribute' category items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:39:27 +02:00
hsiegeln
f82aa26371 fix: improve ClickHouse admin page, fix AgentHealth type error
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 3m46s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 58s
Rewrite ClickHouse admin to show useful storage metrics instead of
often-empty system.events data. Add active queries section.

- Replace performance endpoint: query system.parts for disk size,
  uncompressed size, compression ratio, total rows, part count
- Add /queries endpoint querying system.processes for active queries
- Frontend: storage overview strip, tables with total size, active
  queries DataTable
- Fix AgentHealth.tsx type: agentId → instanceId in inline type cast

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:18:06 +02:00
hsiegeln
188810e54b feat: remove TimescaleDB, dead PG stores, and storage feature flags
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.

- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
  events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
  only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
  (docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:10:58 +02:00
hsiegeln
283e38a20d feat: remove OpenSearch, add ClickHouse admin page
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 33s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Remove all OpenSearch code, dependencies, configuration, deployment
manifests, and CI/CD references. Replace the OpenSearch admin page
with a ClickHouse admin page showing cluster status, table sizes,
performance metrics, and indexer pipeline stats.

- Delete 11 OpenSearch Java files (config, search impl, admin controller, DTOs, tests)
- Delete 3 OpenSearch frontend files (admin page, CSS, query hooks)
- Delete deploy/opensearch.yaml K8s manifest
- Remove opensearch Maven dependencies from pom.xml
- Remove opensearch config from application.yml, Dockerfile, docker-compose
- Remove opensearch from CI workflow (secrets, deploy, cleanup steps)
- Simplify ThresholdConfig (remove OpenSearch thresholds, database-only)
- Change default search backend from opensearch to clickhouse
- Add ClickHouseAdminController with /status, /tables, /performance, /pipeline
- Add ClickHouseAdminPage with StatCards, pipeline ProgressBar, tables DataTable
- Update CLAUDE.md, HOWTO.md, and source comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:56:06 +02:00
hsiegeln
715cbc1894 feat: synchronous replay endpoint with agent response status
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 56s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Add dedicated POST /agents/{id}/replay endpoint that uses
addCommandWithReply to wait for the agent ACK (30s timeout).
Returns the actual replay result (status, message, data) instead
of just a delivery confirmation.

Frontend toast now reflects the agent's response: "Replay completed"
on success, agent error message on failure, timeout message if the
agent doesn't respond.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:48:02 +02:00
hsiegeln
8b0d473fcd feat: add route control bar and fix replay protocol compliance
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m0s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Add ROUTE_CONTROL command type and route-control mapping in
AgentCommandController. New RouteControlBar component in the exchange
header shows Start/Stop/Suspend/Resume actions (grouped pill bar) and
a Replay button, gated by agent capabilities and OPERATOR/ADMIN role.

Fix useReplayExchange hook to match protocol section 16: payload now
uses { routeId, exchange: { body, headers }, originalExchangeId, nonce }
instead of the flat { headers, body } format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 21:42:06 +02:00
hsiegeln
52c22f1eb9 fix: dashboard flickering on poll, animation replay, and scroll
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 54s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
- Add placeholderData to useRouteMetrics and usePunchcard hooks so data
  stays stable between refetches instead of going undefined → flicker
- Disable Recharts animation on Treemap (isAnimationActive=false)
- Make .content scrollable (overflow-y: auto, flex: 1, min-height: 0)
  so charts below the fold are accessible

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 15:42:02 +02:00
hsiegeln
9d2d87e7e1 feat: add treemap and punchcard heatmap to dashboard L1/L2 (#94)
Treemap: rectangle area = transaction volume, color = SLA compliance
(green→red). Shows apps at L1, routes at L2. Click navigates deeper.

Punchcard heatmap: 7-day rolling weekday x 24-hour grid showing
transaction volume and error patterns. Two side-by-side views
(transactions + errors) reveal temporal clustering.

Backend: new GET /search/stats/punchcard endpoint aggregating
stats_1m_all/app by DOW x hour over rolling 7 days.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 10:26:26 +02:00
hsiegeln
213aa86c47 feat: progressive drill-down dashboard with RED metrics and SLA compliance (#94)
Three-level dashboard driven by sidebar selection:
- L1 (no selection): all-apps overview with health table, per-app charts
- L2 (app selected): route performance table, error velocity, top errors
- L3 (route selected): processor table, latency heatmap data, bottleneck KPI

Backend: 3 new endpoints (timeseries/by-app, timeseries/by-route, errors/top),
per-app SLA settings (app_settings table, V12 migration), exact SLA compliance
from executions hypertable, error velocity with acceleration detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:29:20 +02:00
hsiegeln
e5e6175aca feat: use endpointUri for cross-route drill-down instead of label parsing
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m2s
CI / docker (push) Successful in 1m0s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Server:
- Add endpointUri to PositionedNode (from RouteNode)
- Add fromEndpointUri to RouteSummary (catalog API)
- Catalog controller resolves endpoint URI from diagram store

UI:
- Build endpointRouteMap from catalog's fromEndpointUri field
- Drill-down uses exact match on node.endpointUri against the map
- Remove label parsing heuristics (extractTargetEndpoint, camelToKebab)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:31:08 +01:00