Files
cameleer-server/docs/SERVER-CAPABILITIES.md
hsiegeln 48ce75bf38 feat(server): persist server self-metrics into ClickHouse
Snapshot the full Micrometer registry (cameleer business metrics, alerting
metrics, and Spring Boot Actuator defaults) every 60s into a new
server_metrics table so server health survives restarts without an external
Prometheus. Includes a dashboard-builder reference for the SaaS team.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:20:45 +02:00

21 KiB

Cameleer Server — Capabilities Reference

Standalone reference for systems integrating with or managing Cameleer Server instances. Generated 2026-04-04. Source of truth: the codebase and OpenAPI spec at /api/v1/api-docs.

What It Does

Cameleer Server is an observability platform for Apache Camel applications. It receives execution traces, metrics, logs, and route diagrams from instrumented Camel agents, stores them in ClickHouse, and serves a web UI for searching, visualizing, and controlling routes.

Core capabilities:

  • Real-time execution tracing with processor-level detail
  • Full-text search across executions, logs, and attributes
  • Route topology diagrams with live execution overlays
  • Application configuration push via SSE
  • Route control (start/stop/suspend) and exchange replay
  • Agent lifecycle management with auto-heal on server restart
  • RBAC with local users, groups, roles, and OIDC federation
  • Multi-tenant isolation (one tenant per server instance)

Multi-Tenancy Model

Each server instance serves exactly one tenant. Multiple tenants share infrastructure but are isolated at the data layer.

Concern Isolation
PostgreSQL Schema-per-tenant (?currentSchema=tenant_{id})
ClickHouse Shared DB, tenant_id column on all tables, partitioned by (tenant_id, toYYYYMM(timestamp))
Configuration CAMELEER_SERVER_TENANT_ID env var (default: "default")
Agents Each agent belongs to one tenant, one environment

Environments (dev/staging/prod) are first-class within a tenant. Agents send environmentId at registration and in every heartbeat. The UI filters by environment. JWT tokens carry an env claim for persistence across restarts.


Agent Protocol

Lifecycle

Register (bootstrap token) → Receive JWT + SSE URL
    ↓
Connect SSE ← Receive commands (config-update, deep-trace, replay, route-control)
    ↓
Heartbeat (every 30s) → Send capabilities, environmentId, routeStates
    ↓
Deregister (graceful shutdown)

State Machine

LIVE ──(no heartbeat for 90s)──→ STALE ──(300s more)──→ DEAD
  ↑                                 │
  └────(heartbeat arrives)──────────┘

Thresholds are configurable via cameleer.server.agentregistry.* properties.

Registration

POST /api/v1/agents/register — requires bootstrap token in Authorization: Bearer header.

Request:

{
  "instanceId": "agent-abc-123",
  "applicationId": "order-service",
  "environmentId": "production",
  "version": "3.2.1",
  "routeIds": ["processOrder", "handlePayment"],
  "capabilities": { "replay": true, "routeControl": true }
}

Response:

{
  "instanceId": "agent-abc-123",
  "eventStreamUrl": "/api/v1/agents/agent-abc-123/events",
  "heartbeatIntervalMs": 30000,
  "signingPublicKeyBase64": "<ed25519-public-key>",
  "accessToken": "<jwt>",
  "refreshToken": "<jwt>"
}

Heartbeat

POST /api/v1/agents/{id}/heartbeat — JWT auth.

{
  "capabilities": { "replay": true, "routeControl": true },
  "environmentId": "production",
  "routeStates": { "processOrder": "Started", "handlePayment": "Suspended" }
}

Auto-heals after server restart: if agent not in registry, re-registers from JWT claims + heartbeat body. Environment priority: heartbeat environmentId > JWT env claim > "default".

SSE Event Stream

GET /api/v1/agents/{id}/events — long-lived SSE connection. Keepalive ping every 15s.

Event types pushed to agents: config-update, deep-trace, replay, set-traced-processors, test-expression, route-control.

Token Refresh

POST /api/v1/agents/{id}/refresh — public endpoint, validates refresh token.

{ "refreshToken": "<refresh-jwt>" }

Returns new accessToken + refreshToken. Preserves roles, application, and environment from the original token.


Data Ingestion

All ingestion endpoints require JWT with AGENT role.

Endpoint Data Notes
POST /api/v1/data/executions Execution chunks (route + processor traces) Buffered, flushed periodically
POST /api/v1/data/diagrams Route graph definitions Single or array
POST /api/v1/data/events Agent lifecycle events Triggers registry state transitions
POST /api/v1/data/logs Log entries (JSON array, source: app/agent) Buffered, 503 if buffer full
POST /api/v1/data/metrics Metrics snapshots Buffered, 503 if buffer full

Command System

Commands are delivered to agents via SSE. Three dispatch modes:

Mode Endpoint Behavior
Single agent POST /api/v1/agents/{id}/commands Async (202), DELIVERED or PENDING
Group (application) POST /api/v1/agents/groups/{group}/commands Sync wait (10s), returns per-agent results
Broadcast (all LIVE) POST /api/v1/agents/commands Fire-and-forget (202)

Command types: config-update, deep-trace, replay, set-traced-processors, test-expression, route-control

Replay has a dedicated sync endpoint: POST /api/v1/agents/{id}/replay (30s timeout, returns result or 504).

Acknowledgment: POST /api/v1/agents/{id}/commands/{commandId}/ack — agent confirms receipt with status/message/data.


Query & Analytics API

All query endpoints require JWT with VIEWER role or higher.

Endpoint Description
GET /api/v1/search/executions Search by status, time, text, route, app, environment
POST /api/v1/search/executions Advanced search with full filter object
GET /api/v1/executions/{id} Execution detail with processor tree
GET /api/v1/executions/{id}/processors/by-id/{pid}/snapshot Exchange data at processor

Statistics & Analytics

Endpoint Description
GET /api/v1/search/stats Aggregated stats (P99, error rate, SLA compliance)
GET /api/v1/search/stats/timeseries Bucketed time-series
GET /api/v1/search/stats/timeseries/by-app Time series grouped by application
GET /api/v1/search/stats/timeseries/by-route Time series grouped by route
GET /api/v1/search/stats/punchcard Transaction heatmap (weekday x hour)
GET /api/v1/search/errors/top Top N errors with velocity trends
GET /api/v1/search/attributes/keys Distinct attribute key names

Route Catalog & Metrics

Endpoint Description
GET /api/v1/routes/catalog Applications with routes, agents, health
GET /api/v1/routes/metrics Per-route performance (TPS, P99, error rate)
GET /api/v1/routes/metrics/processors Per-processor metrics for a route

Logs

Endpoint Description
GET /api/v1/logs Cursor-based log search with level aggregation. Filters: source (app/agent), application, agentId, exchangeId, level, logger, q (text), environment, time range

Diagrams

Endpoint Description
GET /api/v1/diagrams Find diagram by application + routeId
GET /api/v1/diagrams/{hash}/render SVG or JSON layout

Agent Monitoring

Endpoint Description
GET /api/v1/agents List agents (filter by status, app, environment)
GET /api/v1/agents/events-log Agent lifecycle event history
GET /api/v1/agents/{id}/metrics Agent-level metrics time series

Server Self-Metrics

The server snapshots its own Micrometer registry into ClickHouse every 60 s (table server_metrics) — JVM, HTTP, DB pools, agent/ingestion business metrics, and alerting metrics. Use this instead of running an external Prometheus when building a server-health dashboard. The live scrape endpoint /api/v1/prometheus remains available for traditional scraping.

See docs/server-self-metrics.md for the full metric catalog, suggested panels, and example queries.


Application Configuration

Endpoint Role Description
GET /api/v1/config VIEWER List all app configs
GET /api/v1/config/{app} VIEWER Get config (returns defaults if none stored)
PUT /api/v1/config/{app} OPERATOR Save config + push to all LIVE agents
GET /api/v1/config/{app}/processor-routes VIEWER Processor-to-route mapping
POST /api/v1/config/{app}/test-expression VIEWER Test Camel expression via live agent

Config fields: metricsEnabled, samplingRate, tracedProcessors, logLevels, engineLevel, payloadCaptureMode, version.


Security

Authentication

Method Endpoint Purpose
Bootstrap token POST /agents/register One-time agent registration
Local credentials POST /auth/login UI login (username/password)
OIDC code exchange POST /auth/oidc/callback External identity provider
OIDC access token Bearer token in Authorization header SaaS M2M / external OIDC
Token refresh POST /auth/refresh UI token refresh
Token refresh POST /agents/{id}/refresh Agent token refresh

JWT Structure

  • Algorithm: HMAC-SHA256
  • Access token: 1 hour (configurable)
  • Refresh token: 7 days (configurable)
  • Claims: sub (agent ID or user:<username>), group (application), env (environment), roles (array), type (access/refresh)

RBAC Roles

Role Permissions
AGENT Data ingestion, heartbeat, SSE, command ack
VIEWER Read-only: executions, search, diagrams, metrics, logs, config
OPERATOR VIEWER + send commands, modify config, replay
ADMIN OPERATOR + user/group/role management, OIDC config, database admin

UI Role Gating

The UI enforces role-based visibility (backend ACLs remain the authoritative check):

UI element VIEWER OPERATOR ADMIN
Exchanges, Dashboard, Runtime, Logs Yes Yes Yes
Config tab Read-only Edit Edit
Route control bar Hidden Yes Yes
Diagram node toolbar Hidden Yes Yes
Admin sidebar section Hidden Hidden Yes
Admin pages (/admin/*) Redirect to / Redirect to / Yes

Config tab is a main tab alongside Exchanges/Dashboard/Runtime/Logs. Navigation: /config shows all-app config table; /config/:appId filters to that app with detail panel open. Sidebar clicks while on Config stay on the config tab — route clicks resolve to the parent app's config (config is per-app).

Ed25519 Config Signing

Server derives an Ed25519 keypair deterministically from the JWT secret. Public key is shared with agents at registration. Config-update payloads are signed so agents can verify authenticity.

OIDC Integration

Configured via admin API (/api/v1/admin/oidc) or admin UI. Supports any OpenID Connect provider. Features: configurable user ID claim (userIdClaim, default sub — e.g., email, preferred_username), role claim extraction from access_token then id_token (supports nested paths like realm_access.roles and space-delimited scope strings), auto-signup (auto-provisions new users on first OIDC login), configurable display name claim, constant-time token rotation via dual bootstrap tokens, RFC 8707 resource indicators (audience config). Backend is a confidential client (client_secret authentication, no PKCE). Supports ES384 (Logto default), ES256, and RS256. Directly-assigned system roles are overwritten on every OIDC login (falls back to defaultRoles when OIDC returns none); uses getDirectRolesForUser so group-inherited roles are never touched. Role normalization via SystemRole.normalizeScope() (case-insensitive, strips server: prefix). Shared OIDC infrastructure (discovery, JWK source, algorithm set) centralized in OidcProviderHelper.

SSO Auto-Redirect

When OIDC is configured and enabled, the login page automatically redirects to the OIDC provider with prompt=none for silent SSO. If the user has an active provider session, they are signed in without seeing a login form. If consent_required is returned (first login, scopes not yet granted), the flow retries without prompt=none so the user can grant consent once. If login_required (no provider session), falls back to the login form. Bypass auto-redirect with /login?local. Logout always redirects to /login?local — either via the OIDC end_session_endpoint (with post_logout_redirect_uri) or as a direct fallback — preventing SSO re-login loops.

OIDC Resource Server

When CAMELEER_SERVER_SECURITY_OIDCISSUERURI is configured, the server accepts external access tokens (e.g., Logto M2M tokens) in addition to internal HMAC JWTs. Dual-path validation: tries internal HMAC first, falls back to OIDC JWKS validation. Supports ES384, ES256, and RS256 algorithms. Handles RFC 9068 at+jwt token type.

Role mapping is case-insensitive and accepts both bare and server:-prefixed names:

Scope/claim value Maps to
admin, server:admin, Server:Admin ADMIN
operator, server:operator OPERATOR
viewer, server:viewer VIEWER

This applies to both M2M tokens (scope claim) and OIDC user login (configurable rolesClaim from id_token). The server: prefix allows dedicated API resource scopes without colliding with other platform scopes.

Variable Purpose
CAMELEER_SERVER_SECURITY_OIDCISSUERURI OIDC issuer URI for token validation (e.g., https://auth.example.com/oidc)
CAMELEER_SERVER_SECURITY_OIDCJWKSETURI Direct JWKS URL (e.g., http://cameleer-logto:3001/oidc/jwks) — use when public issuer isn't reachable from inside containers
CAMELEER_SERVER_SECURITY_OIDCAUDIENCE Expected audience (API resource indicator)
CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY Skip TLS certificate verification for OIDC calls (default false) — use when provider has a self-signed CA

Logto is proxy-aware (TRUST_PROXY_HEADER=1). The LOGTO_ENDPOINT env var sets the public-facing URL used in OIDC discovery, issuer URI, and redirect URLs. Logto requires its own subdomain (not a path prefix).


Admin API

All admin endpoints require ADMIN role. Prefix: /api/v1/admin/.

User Management

Endpoint Method Description
/users GET List all users
/users POST Create local user
/users/{id} GET/PUT/DELETE Get/update/delete user
/users/{id}/password POST Reset password
/users/{id}/roles/{roleId} POST/DELETE Assign/remove role
/users/{id}/groups/{groupId} POST/DELETE Add/remove from group

Group & Role Management

Endpoint Method Description
/groups GET/POST List/create groups
/groups/{id} GET/PUT/DELETE Manage group (cycle detection on parent change)
/groups/{id}/roles/{roleId} POST/DELETE Assign/remove role from group
/roles GET/POST List/create roles
/roles/{id} GET/PUT/DELETE Manage role (system roles protected)
/rbac/stats GET RBAC statistics

Infrastructure

Endpoint Description
/database/status PostgreSQL version, schema, health
/database/pool HikariCP connection pool stats
/database/tables Table sizes and row counts
/database/queries Active queries (with kill)
/clickhouse/status ClickHouse version, uptime
/clickhouse/tables Table info, row counts, sizes
/clickhouse/performance Disk, memory, compression, partitions
/clickhouse/queries Active ClickHouse queries
/clickhouse/pipeline Ingestion pipeline stats

Settings & Configuration

Endpoint Description
/app-settings Per-application settings (CRUD)
/thresholds Monitoring threshold configuration
/oidc OIDC provider configuration (CRUD + test)
/audit Paginated audit log search
/usage UI usage analytics (ClickHouse)

Storage

PostgreSQL

Used for RBAC, configuration, and audit. Schema-per-tenant isolation via ?currentSchema=tenant_{id}.

Tables: users, groups, roles, user_roles, user_groups, group_roles, server_config, application_config, audit_log.

Flyway migrations (V1-V11) manage schema evolution.

ClickHouse

Used for all observability data. Schema managed by ClickHouseSchemaInitializer (idempotent on startup).

Table Engine Purpose TTL
executions ReplacingMergeTree Route execution records 365d
processor_executions MergeTree Per-processor trace data 365d
agent_events MergeTree Agent lifecycle audit trail 365d
route_diagrams ReplacingMergeTree Route graph definitions -
logs MergeTree Application + agent logs (source column: app/agent, mdc Map) 365d
usage_events MergeTree UI action tracking 90d
stats_1m_all AggregatingMergeTree Global 1-minute rollups -
stats_1m_app AggregatingMergeTree Per-application rollups -
stats_1m_route AggregatingMergeTree Per-route rollups -
stats_1m_processor AggregatingMergeTree Per-processor-type rollups -
stats_1m_processor_detail AggregatingMergeTree Per-processor-instance rollups -

All tables include tenant_id and environment columns. Partitioned by (tenant_id, toYYYYMM(timestamp)).

Stats tables are fed by Materialized Views from base tables. Query with -Merge() combinators (e.g., countMerge(total_count)).


Deployment

Container Image

Multi-stage Docker build: Maven 3.9 + JDK 17 (build) → JRE 17 (runtime). Port 8081. No default credentials baked in — all database config comes from env vars at runtime.

Registry: gitea.siegeln.net/cameleer/cameleer-server

Infrastructure Requirements

Component Version Purpose
PostgreSQL 16+ RBAC, config, audit
ClickHouse 24.12+ All observability data

Required Environment Variables

Variable Required Default Purpose
CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKEN Yes - Bootstrap token for agent registration
CAMELEER_SERVER_SECURITY_JWTSECRET Recommended Random (ephemeral) JWT signing secret
CAMELEER_SERVER_TENANT_ID No default Tenant identifier
CAMELEER_SERVER_SECURITY_UIUSER No admin Default admin username
CAMELEER_SERVER_SECURITY_UIPASSWORD No admin Default admin password
CAMELEER_SERVER_SECURITY_UIORIGIN No http://localhost:5173 CORS allowed origin (single, legacy)
CAMELEER_SERVER_SECURITY_CORSALLOWEDORIGINS No (empty) Comma-separated CORS origins — overrides UIORIGIN when set
CAMELEER_SERVER_CLICKHOUSE_URL No jdbc:clickhouse://localhost:8123/cameleer ClickHouse JDBC URL
CAMELEER_SERVER_CLICKHOUSE_USERNAME No default ClickHouse user
CAMELEER_SERVER_CLICKHOUSE_PASSWORD No (empty) ClickHouse password
SPRING_DATASOURCE_URL No jdbc:postgresql://localhost:5432/cameleer PostgreSQL JDBC URL
SPRING_DATASOURCE_USERNAME No cameleer PostgreSQL user
SPRING_DATASOURCE_PASSWORD No cameleer_dev PostgreSQL password
CAMELEER_SERVER_INGESTION_BODYSIZELIMIT No 16384 Max body size per execution (bytes)
CAMELEER_SERVER_SECURITY_OIDCISSUERURI No (empty) OIDC issuer URI — enables resource server mode for M2M tokens
CAMELEER_SERVER_SECURITY_OIDCJWKSETURI No (empty) Direct JWKS URL — bypasses OIDC discovery for container networking
CAMELEER_SERVER_SECURITY_OIDCAUDIENCE No (empty) Expected JWT audience (API resource indicator)
CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY No false Skip TLS cert verification for OIDC calls (self-signed CAs)

Health Probes

  • Endpoint: GET /api/v1/health (public, no auth)
  • Liveness: 30s initial delay, 10s period
  • Readiness: 10s initial delay, 5s period

Ingestion Tuning

Variable Default Purpose
CAMELEER_SERVER_INGESTION_BUFFERCAPACITY 50000 Ring buffer size
CAMELEER_SERVER_INGESTION_BATCHSIZE 5000 Flush batch size
CAMELEER_SERVER_INGESTION_FLUSHINTERVALMS 5000 Periodic flush interval
CAMELEER_SERVER_INGESTION_BODYSIZELIMIT 16384 Max body size per execution (bytes)

Agent Registry Tuning

Variable Default Purpose
CAMELEER_SERVER_AGENTREGISTRY_STALETHRESHOLDMS 90000 Heartbeat miss → STALE
CAMELEER_SERVER_AGENTREGISTRY_DEADTHRESHOLDMS 300000 STALE duration → DEAD
CAMELEER_SERVER_AGENTREGISTRY_PINGINTERVALMS 15000 SSE keepalive interval
CAMELEER_SERVER_AGENTREGISTRY_COMMANDEXPIRYMS 60000 Pending command TTL

Public Endpoints (No Auth)

These endpoints do not require authentication:

  • GET /api/v1/health
  • POST /api/v1/agents/register (requires bootstrap token)
  • POST /api/v1/agents/*/refresh
  • POST /api/v1/auth/login
  • POST /api/v1/auth/refresh
  • GET /api/v1/auth/oidc/config
  • POST /api/v1/auth/oidc/callback
  • GET /api/v1/api-docs/** (OpenAPI spec)
  • GET /swagger-ui.html (Swagger UI)
  • Static resources: /, /index.html, /config.js, /favicon.svg, /assets/**

All other endpoints require a valid JWT with appropriate role.