455 lines
19 KiB
Markdown
455 lines
19 KiB
Markdown
# Cameleer3 Server — Capabilities Reference
|
|
|
|
> Standalone reference for systems integrating with or managing Cameleer3 Server instances.
|
|
> Generated 2026-04-04. Source of truth: the codebase and OpenAPI spec at `/api/v1/api-docs`.
|
|
|
|
## What It Does
|
|
|
|
Cameleer3 Server is an observability platform for Apache Camel applications. It receives execution traces, metrics, logs, and route diagrams from instrumented Camel agents, stores them in ClickHouse, and serves a web UI for searching, visualizing, and controlling routes.
|
|
|
|
**Core capabilities:**
|
|
- Real-time execution tracing with processor-level detail
|
|
- Full-text search across executions, logs, and attributes
|
|
- Route topology diagrams with live execution overlays
|
|
- Application configuration push via SSE
|
|
- Route control (start/stop/suspend) and exchange replay
|
|
- Agent lifecycle management with auto-heal on server restart
|
|
- RBAC with local users, groups, roles, and OIDC federation
|
|
- Multi-tenant isolation (one tenant per server instance)
|
|
|
|
---
|
|
|
|
## Multi-Tenancy Model
|
|
|
|
Each server instance serves exactly one tenant. Multiple tenants share infrastructure but are isolated at the data layer.
|
|
|
|
| Concern | Isolation |
|
|
|---------|-----------|
|
|
| PostgreSQL | Schema-per-tenant (`?currentSchema=tenant_{id}`) |
|
|
| ClickHouse | Shared DB, `tenant_id` column on all tables, partitioned by `(tenant_id, toYYYYMM(timestamp))` |
|
|
| Configuration | `CAMELEER_TENANT_ID` env var (default: `"default"`) |
|
|
| Agents | Each agent belongs to one tenant, one environment |
|
|
|
|
**Environments** (dev/staging/prod) are first-class within a tenant. Agents send `environmentId` at registration and in every heartbeat. The UI filters by environment. JWT tokens carry an `env` claim for persistence across restarts.
|
|
|
|
---
|
|
|
|
## Agent Protocol
|
|
|
|
### Lifecycle
|
|
|
|
```
|
|
Register (bootstrap token) → Receive JWT + SSE URL
|
|
↓
|
|
Connect SSE ← Receive commands (config-update, deep-trace, replay, route-control)
|
|
↓
|
|
Heartbeat (every 30s) → Send capabilities, environmentId, routeStates
|
|
↓
|
|
Deregister (graceful shutdown)
|
|
```
|
|
|
|
### State Machine
|
|
|
|
```
|
|
LIVE ──(no heartbeat for 90s)──→ STALE ──(300s more)──→ DEAD
|
|
↑ │
|
|
└────(heartbeat arrives)──────────┘
|
|
```
|
|
|
|
Thresholds are configurable via `agent-registry.*` properties.
|
|
|
|
### Registration
|
|
|
|
**`POST /api/v1/agents/register`** — requires bootstrap token in `Authorization: Bearer` header.
|
|
|
|
Request:
|
|
```json
|
|
{
|
|
"instanceId": "agent-abc-123",
|
|
"displayName": "Order Service #1",
|
|
"applicationId": "order-service",
|
|
"environmentId": "production",
|
|
"version": "3.2.1",
|
|
"routeIds": ["processOrder", "handlePayment"],
|
|
"capabilities": { "replay": true, "routeControl": true }
|
|
}
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"instanceId": "agent-abc-123",
|
|
"eventStreamUrl": "/api/v1/agents/agent-abc-123/events",
|
|
"heartbeatIntervalMs": 30000,
|
|
"signingPublicKeyBase64": "<ed25519-public-key>",
|
|
"accessToken": "<jwt>",
|
|
"refreshToken": "<jwt>"
|
|
}
|
|
```
|
|
|
|
### Heartbeat
|
|
|
|
**`POST /api/v1/agents/{id}/heartbeat`** — JWT auth.
|
|
|
|
```json
|
|
{
|
|
"capabilities": { "replay": true, "routeControl": true },
|
|
"environmentId": "production",
|
|
"routeStates": { "processOrder": "Started", "handlePayment": "Suspended" }
|
|
}
|
|
```
|
|
|
|
Auto-heals after server restart: if agent not in registry, re-registers from JWT claims + heartbeat body. Environment priority: heartbeat `environmentId` > JWT `env` claim > `"default"`.
|
|
|
|
### SSE Event Stream
|
|
|
|
**`GET /api/v1/agents/{id}/events`** — long-lived SSE connection. Keepalive ping every 15s.
|
|
|
|
Event types pushed to agents: `config-update`, `deep-trace`, `replay`, `set-traced-processors`, `test-expression`, `route-control`.
|
|
|
|
### Token Refresh
|
|
|
|
**`POST /api/v1/agents/{id}/refresh`** — public endpoint, validates refresh token.
|
|
|
|
```json
|
|
{ "refreshToken": "<refresh-jwt>" }
|
|
```
|
|
|
|
Returns new `accessToken` + `refreshToken`. Preserves roles, application, and environment from the original token.
|
|
|
|
---
|
|
|
|
## Data Ingestion
|
|
|
|
All ingestion endpoints require JWT with `AGENT` role.
|
|
|
|
| Endpoint | Data | Notes |
|
|
|----------|------|-------|
|
|
| `POST /api/v1/data/executions` | Execution chunks (route + processor traces) | Buffered, flushed periodically |
|
|
| `POST /api/v1/data/diagrams` | Route graph definitions | Single or array |
|
|
| `POST /api/v1/data/events` | Agent lifecycle events | Triggers registry state transitions |
|
|
| `POST /api/v1/data/logs` | Application log batches | Buffered, 503 if buffer full |
|
|
| `POST /api/v1/data/metrics` | Metrics snapshots | Buffered, 503 if buffer full |
|
|
|
|
---
|
|
|
|
## Command System
|
|
|
|
Commands are delivered to agents via SSE. Three dispatch modes:
|
|
|
|
| Mode | Endpoint | Behavior |
|
|
|------|----------|----------|
|
|
| Single agent | `POST /api/v1/agents/{id}/commands` | Async (202), DELIVERED or PENDING |
|
|
| Group (application) | `POST /api/v1/agents/groups/{group}/commands` | Sync wait (10s), returns per-agent results |
|
|
| Broadcast (all LIVE) | `POST /api/v1/agents/commands` | Fire-and-forget (202) |
|
|
|
|
**Command types:** `config-update`, `deep-trace`, `replay`, `set-traced-processors`, `test-expression`, `route-control`
|
|
|
|
**Replay** has a dedicated sync endpoint: `POST /api/v1/agents/{id}/replay` (30s timeout, returns result or 504).
|
|
|
|
**Acknowledgment:** `POST /api/v1/agents/{id}/commands/{commandId}/ack` — agent confirms receipt with status/message/data.
|
|
|
|
---
|
|
|
|
## Query & Analytics API
|
|
|
|
All query endpoints require JWT with `VIEWER` role or higher.
|
|
|
|
### Execution Search
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /api/v1/search/executions` | Search by status, time, text, route, app, environment |
|
|
| `POST /api/v1/search/executions` | Advanced search with full filter object |
|
|
| `GET /api/v1/executions/{id}` | Execution detail with processor tree |
|
|
| `GET /api/v1/executions/{id}/processors/by-id/{pid}/snapshot` | Exchange data at processor |
|
|
|
|
### Statistics & Analytics
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /api/v1/search/stats` | Aggregated stats (P99, error rate, SLA compliance) |
|
|
| `GET /api/v1/search/stats/timeseries` | Bucketed time-series |
|
|
| `GET /api/v1/search/stats/timeseries/by-app` | Time series grouped by application |
|
|
| `GET /api/v1/search/stats/timeseries/by-route` | Time series grouped by route |
|
|
| `GET /api/v1/search/stats/punchcard` | Transaction heatmap (weekday x hour) |
|
|
| `GET /api/v1/search/errors/top` | Top N errors with velocity trends |
|
|
| `GET /api/v1/search/attributes/keys` | Distinct attribute key names |
|
|
|
|
### Route Catalog & Metrics
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /api/v1/routes/catalog` | Applications with routes, agents, health |
|
|
| `GET /api/v1/routes/metrics` | Per-route performance (TPS, P99, error rate) |
|
|
| `GET /api/v1/routes/metrics/processors` | Per-processor metrics for a route |
|
|
|
|
### Logs
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /api/v1/logs` | Cursor-based log search with level aggregation |
|
|
|
|
### Diagrams
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /api/v1/diagrams` | Find diagram by application + routeId |
|
|
| `GET /api/v1/diagrams/{hash}/render` | SVG or JSON layout |
|
|
|
|
### Agent Monitoring
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /api/v1/agents` | List agents (filter by status, app, environment) |
|
|
| `GET /api/v1/agents/events-log` | Agent lifecycle event history |
|
|
| `GET /api/v1/agents/{id}/metrics` | Agent-level metrics time series |
|
|
|
|
---
|
|
|
|
## Application Configuration
|
|
|
|
| Endpoint | Role | Description |
|
|
|----------|------|-------------|
|
|
| `GET /api/v1/config` | VIEWER | List all app configs |
|
|
| `GET /api/v1/config/{app}` | VIEWER | Get config (returns defaults if none stored) |
|
|
| `PUT /api/v1/config/{app}` | OPERATOR | Save config + push to all LIVE agents |
|
|
| `GET /api/v1/config/{app}/processor-routes` | VIEWER | Processor-to-route mapping |
|
|
| `POST /api/v1/config/{app}/test-expression` | VIEWER | Test Camel expression via live agent |
|
|
|
|
Config fields: `metricsEnabled`, `samplingRate`, `tracedProcessors`, `logLevels`, `engineLevel`, `payloadCaptureMode`, `version`.
|
|
|
|
---
|
|
|
|
## Security
|
|
|
|
### Authentication
|
|
|
|
| Method | Endpoint | Purpose |
|
|
|--------|----------|---------|
|
|
| Bootstrap token | `POST /agents/register` | One-time agent registration |
|
|
| Local credentials | `POST /auth/login` | UI login (username/password) |
|
|
| OIDC code exchange | `POST /auth/oidc/callback` | External identity provider |
|
|
| OIDC access token | Bearer token in Authorization header | SaaS M2M / external OIDC |
|
|
| Token refresh | `POST /auth/refresh` | UI token refresh |
|
|
| Token refresh | `POST /agents/{id}/refresh` | Agent token refresh |
|
|
|
|
### JWT Structure
|
|
|
|
- Algorithm: HMAC-SHA256
|
|
- Access token: 1 hour (configurable)
|
|
- Refresh token: 7 days (configurable)
|
|
- Claims: `sub` (agent ID or `user:<username>`), `group` (application), `env` (environment), `roles` (array), `type` (access/refresh)
|
|
|
|
### RBAC Roles
|
|
|
|
| Role | Permissions |
|
|
|------|-------------|
|
|
| `AGENT` | Data ingestion, heartbeat, SSE, command ack |
|
|
| `VIEWER` | Read-only: executions, search, diagrams, metrics, logs, config |
|
|
| `OPERATOR` | VIEWER + send commands, modify config, replay |
|
|
| `ADMIN` | OPERATOR + user/group/role management, OIDC config, database admin |
|
|
|
|
### Ed25519 Config Signing
|
|
|
|
Server derives an Ed25519 keypair deterministically from the JWT secret. Public key is shared with agents at registration. Config-update payloads are signed so agents can verify authenticity.
|
|
|
|
### OIDC Integration
|
|
|
|
Configured via admin API (`/api/v1/admin/oidc`). Supports any OpenID Connect provider. Features: role claim extraction (supports nested paths like `realm_access.roles`), auto-signup (auto-provisions new users on first OIDC login), configurable display name claim, constant-time token rotation via dual bootstrap tokens. Supports ES384 (Logto default), ES256, and RS256 for id_token validation. System roles are synced on every OIDC login (not just first) — revoking a scope in the provider takes effect on next login. Group memberships (manually assigned) are never touched by the sync.
|
|
|
|
### SSO Auto-Redirect
|
|
|
|
When OIDC is configured and enabled, the login page automatically redirects to the OIDC provider with `prompt=none` for silent SSO. If the user has an active provider session, they are signed in without seeing a login form. If `consent_required` is returned (first login, scopes not yet granted), the flow retries without `prompt=none` so the user can grant consent once. If `login_required` (no provider session), falls back to the login form. Bypass auto-redirect with `/login?local`.
|
|
|
|
### OIDC Resource Server
|
|
|
|
When `CAMELEER_OIDC_ISSUER_URI` is configured, the server accepts external access tokens (e.g., Logto M2M tokens) in addition to internal HMAC JWTs. Dual-path validation: tries internal HMAC first, falls back to OIDC JWKS validation. Supports ES384, ES256, and RS256 algorithms. Handles RFC 9068 `at+jwt` token type.
|
|
|
|
Role mapping is case-insensitive and accepts both bare and `server:`-prefixed names:
|
|
|
|
| Scope/claim value | Maps to |
|
|
|-------------------|---------|
|
|
| `admin`, `server:admin`, `Server:Admin` | ADMIN |
|
|
| `operator`, `server:operator` | OPERATOR |
|
|
| `viewer`, `server:viewer` | VIEWER |
|
|
|
|
This applies to both M2M tokens (`scope` claim) and OIDC user login (configurable `rolesClaim` from id_token). The `server:` prefix allows dedicated API resource scopes without colliding with other platform scopes.
|
|
|
|
| Variable | Purpose |
|
|
|----------|---------|
|
|
| `CAMELEER_OIDC_ISSUER_URI` | OIDC issuer URI for token validation (e.g., `https://auth.example.com/oidc`) |
|
|
| `CAMELEER_OIDC_JWK_SET_URI` | Direct JWKS URL (e.g., `http://logto:3001/oidc/jwks`) — use when public issuer isn't reachable from inside containers |
|
|
| `CAMELEER_OIDC_AUDIENCE` | Expected audience (API resource indicator) |
|
|
| `CAMELEER_OIDC_TLS_SKIP_VERIFY` | Skip TLS certificate verification for OIDC calls (default `false`) — use when provider has a self-signed CA |
|
|
|
|
Logto is proxy-aware (`TRUST_PROXY_HEADER=1`). The `LOGTO_ENDPOINT` env var sets the public-facing URL used in OIDC discovery, issuer URI, and redirect URLs. Logto requires its own subdomain (not a path prefix).
|
|
|
|
---
|
|
|
|
## Admin API
|
|
|
|
All admin endpoints require `ADMIN` role. Prefix: `/api/v1/admin/`.
|
|
|
|
### User Management
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/users` | GET | List all users |
|
|
| `/users` | POST | Create local user |
|
|
| `/users/{id}` | GET/PUT/DELETE | Get/update/delete user |
|
|
| `/users/{id}/password` | POST | Reset password |
|
|
| `/users/{id}/roles/{roleId}` | POST/DELETE | Assign/remove role |
|
|
| `/users/{id}/groups/{groupId}` | POST/DELETE | Add/remove from group |
|
|
|
|
### Group & Role Management
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/groups` | GET/POST | List/create groups |
|
|
| `/groups/{id}` | GET/PUT/DELETE | Manage group (cycle detection on parent change) |
|
|
| `/groups/{id}/roles/{roleId}` | POST/DELETE | Assign/remove role from group |
|
|
| `/roles` | GET/POST | List/create roles |
|
|
| `/roles/{id}` | GET/PUT/DELETE | Manage role (system roles protected) |
|
|
| `/rbac/stats` | GET | RBAC statistics |
|
|
|
|
### Infrastructure
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `/database/status` | PostgreSQL version, schema, health |
|
|
| `/database/pool` | HikariCP connection pool stats |
|
|
| `/database/tables` | Table sizes and row counts |
|
|
| `/database/queries` | Active queries (with kill) |
|
|
| `/clickhouse/status` | ClickHouse version, uptime |
|
|
| `/clickhouse/tables` | Table info, row counts, sizes |
|
|
| `/clickhouse/performance` | Disk, memory, compression, partitions |
|
|
| `/clickhouse/queries` | Active ClickHouse queries |
|
|
| `/clickhouse/pipeline` | Ingestion pipeline stats |
|
|
|
|
### Settings & Configuration
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `/app-settings` | Per-application settings (CRUD) |
|
|
| `/thresholds` | Monitoring threshold configuration |
|
|
| `/oidc` | OIDC provider configuration (CRUD + test) |
|
|
| `/audit` | Paginated audit log search |
|
|
| `/usage` | UI usage analytics (ClickHouse) |
|
|
|
|
---
|
|
|
|
## Storage
|
|
|
|
### PostgreSQL
|
|
|
|
Used for RBAC, configuration, and audit. Schema-per-tenant isolation via `?currentSchema=tenant_{id}`.
|
|
|
|
Tables: `users`, `groups`, `roles`, `user_roles`, `user_groups`, `group_roles`, `server_config`, `application_config`, `audit_log`.
|
|
|
|
Flyway migrations (V1-V11) manage schema evolution.
|
|
|
|
### ClickHouse
|
|
|
|
Used for all observability data. Schema managed by `ClickHouseSchemaInitializer` (idempotent on startup).
|
|
|
|
| Table | Engine | Purpose | TTL |
|
|
|-------|--------|---------|-----|
|
|
| `executions` | ReplacingMergeTree | Route execution records | 365d |
|
|
| `processor_executions` | MergeTree | Per-processor trace data | 365d |
|
|
| `agent_events` | MergeTree | Agent lifecycle audit trail | 365d |
|
|
| `route_diagrams` | ReplacingMergeTree | Route graph definitions | - |
|
|
| `logs` | MergeTree | Application logs | 365d |
|
|
| `usage_events` | MergeTree | UI action tracking | 90d |
|
|
| `stats_1m_all` | AggregatingMergeTree | Global 1-minute rollups | - |
|
|
| `stats_1m_app` | AggregatingMergeTree | Per-application rollups | - |
|
|
| `stats_1m_route` | AggregatingMergeTree | Per-route rollups | - |
|
|
| `stats_1m_processor` | AggregatingMergeTree | Per-processor-type rollups | - |
|
|
| `stats_1m_processor_detail` | AggregatingMergeTree | Per-processor-instance rollups | - |
|
|
|
|
All tables include `tenant_id` and `environment` columns. Partitioned by `(tenant_id, toYYYYMM(timestamp))`.
|
|
|
|
Stats tables are fed by Materialized Views from base tables. Query with `-Merge()` combinators (e.g., `countMerge(total_count)`).
|
|
|
|
---
|
|
|
|
## Deployment
|
|
|
|
### Container Image
|
|
|
|
Multi-stage Docker build: Maven 3.9 + JDK 17 (build) → JRE 17 (runtime). Port 8081.
|
|
|
|
Registry: `gitea.siegeln.net/cameleer/cameleer3-server`
|
|
|
|
### Infrastructure Requirements
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| PostgreSQL | 16+ | RBAC, config, audit |
|
|
| ClickHouse | 24.12+ | All observability data |
|
|
|
|
### Required Environment Variables
|
|
|
|
| Variable | Required | Default | Purpose |
|
|
|----------|----------|---------|---------|
|
|
| `CAMELEER_AUTH_TOKEN` | Yes | - | Bootstrap token for agent registration |
|
|
| `CAMELEER_JWT_SECRET` | Recommended | Random (ephemeral) | JWT signing secret |
|
|
| `CAMELEER_TENANT_ID` | No | `default` | Tenant identifier |
|
|
| `CAMELEER_UI_USER` | No | `admin` | Default admin username |
|
|
| `CAMELEER_UI_PASSWORD` | No | `admin` | Default admin password |
|
|
| `CAMELEER_UI_ORIGIN` | No | `http://localhost:5173` | CORS allowed origin (single, legacy) |
|
|
| `CAMELEER_CORS_ALLOWED_ORIGINS` | No | (empty) | Comma-separated CORS origins — overrides `UI_ORIGIN` when set |
|
|
| `CLICKHOUSE_URL` | No | `jdbc:clickhouse://localhost:8123/cameleer` | ClickHouse JDBC URL |
|
|
| `CLICKHOUSE_USERNAME` | No | `default` | ClickHouse user |
|
|
| `CLICKHOUSE_PASSWORD` | No | (empty) | ClickHouse password |
|
|
| `SPRING_DATASOURCE_URL` | No | `jdbc:postgresql://localhost:5432/cameleer3` | PostgreSQL JDBC URL |
|
|
| `SPRING_DATASOURCE_USERNAME` | No | `cameleer` | PostgreSQL user |
|
|
| `SPRING_DATASOURCE_PASSWORD` | No | `cameleer_dev` | PostgreSQL password |
|
|
| `CAMELEER_DB_SCHEMA` | No | `tenant_{CAMELEER_TENANT_ID}` | PostgreSQL schema (override for feature branches) |
|
|
| `CAMELEER_OIDC_ISSUER_URI` | No | (empty) | OIDC issuer URI — enables resource server mode for M2M tokens |
|
|
| `CAMELEER_OIDC_JWK_SET_URI` | No | (empty) | Direct JWKS URL — bypasses OIDC discovery for container networking |
|
|
| `CAMELEER_OIDC_AUDIENCE` | No | (empty) | Expected JWT audience (API resource indicator) |
|
|
| `CAMELEER_OIDC_TLS_SKIP_VERIFY` | No | `false` | Skip TLS cert verification for OIDC calls (self-signed CAs) |
|
|
|
|
### Health Probes
|
|
|
|
- **Endpoint:** `GET /api/v1/health` (public, no auth)
|
|
- **Liveness:** 30s initial delay, 10s period
|
|
- **Readiness:** 10s initial delay, 5s period
|
|
|
|
### Ingestion Tuning
|
|
|
|
| Variable | Default | Purpose |
|
|
|----------|---------|---------|
|
|
| `INGESTION_BUFFER_CAPACITY` | 50000 | Ring buffer size |
|
|
| `INGESTION_BATCH_SIZE` | 5000 | Flush batch size |
|
|
| `INGESTION_FLUSH_INTERVAL_MS` | 5000 | Periodic flush interval |
|
|
|
|
### Agent Registry Tuning
|
|
|
|
| Variable | Default | Purpose |
|
|
|----------|---------|---------|
|
|
| `AGENT_REGISTRY_STALE_THRESHOLD_MS` | 90000 | Heartbeat miss → STALE |
|
|
| `AGENT_REGISTRY_DEAD_THRESHOLD_MS` | 300000 | STALE duration → DEAD |
|
|
| `AGENT_REGISTRY_PING_INTERVAL_MS` | 15000 | SSE keepalive interval |
|
|
| `AGENT_REGISTRY_COMMAND_EXPIRY_MS` | 60000 | Pending command TTL |
|
|
|
|
---
|
|
|
|
## Public Endpoints (No Auth)
|
|
|
|
These endpoints do not require authentication:
|
|
|
|
- `GET /api/v1/health`
|
|
- `POST /api/v1/agents/register` (requires bootstrap token)
|
|
- `POST /api/v1/agents/*/refresh`
|
|
- `POST /api/v1/auth/login`
|
|
- `POST /api/v1/auth/refresh`
|
|
- `GET /api/v1/auth/oidc/config`
|
|
- `POST /api/v1/auth/oidc/callback`
|
|
- `GET /api/v1/api-docs/**` (OpenAPI spec)
|
|
- `GET /swagger-ui.html` (Swagger UI)
|
|
- Static resources: `/`, `/index.html`, `/config.js`, `/favicon.svg`, `/assets/**`
|
|
|
|
All other endpoints require a valid JWT with appropriate role.
|