Files
cameleer-server/HOWTO.md
hsiegeln 3438216fd9
Some checks failed
CI / build (push) Successful in 1m2s
CI / docker (push) Successful in 15s
CI / deploy (push) Has been cancelled
Update docs for RBAC, OIDC, and user management
Add RBAC role table, OIDC login flow, user admin API examples, and
new configuration properties to HOWTO.md. Update CLAUDE.md with RBAC
roles, OIDC support, and user persistence. Add user repository to
ARCHITECTURE.md component table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 12:41:41 +01:00

399 lines
15 KiB
Markdown

# HOWTO — Cameleer3 Server
## Prerequisites
- Java 17+
- Maven 3.9+
- Node.js 22+ and npm
- Docker & Docker Compose
- Access to the Gitea Maven registry (for `cameleer3-common` dependency)
## Build
```bash
# Build UI first (required for embedded mode)
cd ui && npm ci && npm run build && cd ..
# Backend
mvn clean compile # compile only
mvn clean verify # compile + run all tests (needs Docker for integration tests)
```
## Infrastructure Setup
Start ClickHouse:
```bash
docker compose up -d
```
This starts ClickHouse 25.3 and automatically runs the schema init scripts (`clickhouse/init/01-schema.sql`, `clickhouse/init/02-search-columns.sql`, `clickhouse/init/03-users.sql`).
| Service | Port | Purpose |
|------------|------|------------------|
| ClickHouse | 8123 | HTTP API (JDBC) |
| ClickHouse | 9000 | Native protocol |
ClickHouse credentials: `cameleer` / `cameleer_dev`, database `cameleer3`.
## Run the Server
```bash
mvn clean package -DskipTests
CAMELEER_AUTH_TOKEN=my-secret-token java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar
```
The server starts on **port 8081**. The `CAMELEER_AUTH_TOKEN` environment variable is **required** — the server fails fast on startup if it is not set.
For token rotation without downtime, set `CAMELEER_AUTH_TOKEN_PREVIOUS` to the old token while rolling out the new one. The server accepts both during the overlap window.
## API Endpoints
### Authentication (Phase 4)
All endpoints except health, registration, and docs require a JWT Bearer token. The typical flow:
```bash
# 1. Register agent (requires bootstrap token)
curl -s -X POST http://localhost:8081/api/v1/agents/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-secret-token" \
-d '{"agentId":"agent-1","name":"Order Service","group":"order-service-prod","version":"1.0.0","routeIds":["route-1"],"capabilities":["deep-trace","replay"]}'
# Response includes: accessToken, refreshToken, serverPublicKey (Ed25519, Base64)
# 2. Use access token for all subsequent requests
TOKEN="<accessToken from registration>"
# 3. Refresh when access token expires (1h default)
curl -s -X POST http://localhost:8081/api/v1/agents/agent-1/refresh \
-H "Authorization: Bearer <refreshToken>"
# Response: { "accessToken": "new-jwt" }
```
**UI Login (for browser access):**
```bash
# Login with UI credentials (returns JWT tokens)
curl -s -X POST http://localhost:8081/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"admin","password":"admin"}'
# Response: { "accessToken": "...", "refreshToken": "..." }
# Refresh UI token
curl -s -X POST http://localhost:8081/api/v1/auth/refresh \
-H "Content-Type: application/json" \
-d '{"refreshToken":"<refreshToken>"}'
```
UI credentials are configured via `CAMELEER_UI_USER` / `CAMELEER_UI_PASSWORD` env vars (default: `admin` / `admin`).
**Public endpoints (no JWT required):** `GET /api/v1/health`, `POST /api/v1/agents/register` (uses bootstrap token), `POST /api/v1/auth/**`, OpenAPI/Swagger docs.
**Protected endpoints (JWT required):** All other endpoints including ingestion, search, agent management, commands.
**SSE connections:** Authenticated via query parameter: `/agents/{id}/events?token=<jwt>` (EventSource API doesn't support custom headers).
**Ed25519 signatures:** All SSE command payloads (config-update, deep-trace, replay) include a `signature` field. Agents verify payload integrity using the `serverPublicKey` received during registration. The server generates a new ephemeral keypair on each startup — agents must re-register to get the new key.
### RBAC (Role-Based Access Control)
JWTs carry a `roles` claim. Endpoints are restricted by role:
| Role | Access |
|------|--------|
| `AGENT` | Data ingestion (`/data/**`), heartbeat, SSE events, command ack |
| `VIEWER` | Search, execution detail, diagrams, agent list |
| `OPERATOR` | VIEWER + send commands to agents |
| `ADMIN` | OPERATOR + user management (`/admin/**`) |
The env-var local user gets `ADMIN` role. Agents get `AGENT` role at registration.
### OIDC Login (Optional)
When `CAMELEER_OIDC_ENABLED=true`, the server supports external identity providers (e.g. Authentik, Keycloak):
```bash
# 1. SPA checks if OIDC is available
curl -s http://localhost:8081/api/v1/auth/oidc/config
# Returns: { "issuer": "...", "clientId": "...", "authorizationEndpoint": "..." }
# 2. After OIDC redirect, SPA sends the authorization code
curl -s -X POST http://localhost:8081/api/v1/auth/oidc/callback \
-H "Content-Type: application/json" \
-d '{"code":"auth-code-from-provider","redirectUri":"http://localhost:5173/callback"}'
# Returns: { "accessToken": "...", "refreshToken": "..." }
```
Local login remains available as fallback even when OIDC is enabled.
### User Management (ADMIN only)
```bash
# List all users
curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8081/api/v1/admin/users
# Update user roles
curl -s -X PUT http://localhost:8081/api/v1/admin/users/{userId}/roles \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"roles":["VIEWER","OPERATOR"]}'
# Delete user
curl -s -X DELETE http://localhost:8081/api/v1/admin/users/{userId} \
-H "Authorization: Bearer $TOKEN"
```
### Ingestion (POST, returns 202 Accepted)
```bash
# Post route execution data (JWT required)
curl -s -X POST http://localhost:8081/api/v1/data/executions \
-H "Content-Type: application/json" \
-H "X-Protocol-Version: 1" \
-H "Authorization: Bearer $TOKEN" \
-d '{"agentId":"agent-1","routeId":"route-1","executionId":"exec-1","status":"COMPLETED","startTime":"2026-03-11T00:00:00Z","endTime":"2026-03-11T00:00:01Z","processorExecutions":[]}'
# Post route diagram
curl -s -X POST http://localhost:8081/api/v1/data/diagrams \
-H "Content-Type: application/json" \
-H "X-Protocol-Version: 1" \
-H "Authorization: Bearer $TOKEN" \
-d '{"agentId":"agent-1","routeId":"route-1","version":1,"nodes":[],"edges":[]}'
# Post agent metrics
curl -s -X POST http://localhost:8081/api/v1/data/metrics \
-H "Content-Type: application/json" \
-H "X-Protocol-Version: 1" \
-H "Authorization: Bearer $TOKEN" \
-d '[{"agentId":"agent-1","metricName":"cpu","value":42.0,"timestamp":"2026-03-11T00:00:00Z","tags":{}}]'
```
**Note:** The `X-Protocol-Version: 1` header is required on all `/api/v1/data/**` endpoints. Missing or wrong version returns 400.
### Health & Docs
```bash
# Health check
curl -s http://localhost:8081/api/v1/health
# OpenAPI JSON
curl -s http://localhost:8081/api/v1/api-docs
# Swagger UI
open http://localhost:8081/api/v1/swagger-ui.html
```
### Search (Phase 2)
```bash
# Search by status (GET with basic filters)
curl -s -H "Authorization: Bearer $TOKEN" \
"http://localhost:8081/api/v1/search/executions?status=COMPLETED&limit=10"
# Search by time range
curl -s -H "Authorization: Bearer $TOKEN" \
"http://localhost:8081/api/v1/search/executions?timeFrom=2026-03-11T00:00:00Z&timeTo=2026-03-12T00:00:00Z"
# Advanced search (POST with full-text)
curl -s -X POST http://localhost:8081/api/v1/search/executions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"status":"FAILED","text":"NullPointerException","limit":20}'
# Transaction detail (nested processor tree)
curl -s -H "Authorization: Bearer $TOKEN" \
http://localhost:8081/api/v1/executions/{executionId}
# Processor exchange snapshot
curl -s -H "Authorization: Bearer $TOKEN" \
http://localhost:8081/api/v1/executions/{executionId}/processors/{index}/snapshot
# Render diagram as SVG
curl -s -H "Authorization: Bearer $TOKEN" \
-H "Accept: image/svg+xml" \
http://localhost:8081/api/v1/diagrams/{contentHash}/render
# Render diagram as JSON layout
curl -s -H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json" \
http://localhost:8081/api/v1/diagrams/{contentHash}/render
```
**Search response format:** `{ "data": [...], "total": N, "offset": 0, "limit": 50 }`
**Supported search filters (GET):** `status`, `timeFrom`, `timeTo`, `correlationId`, `limit`, `offset`
**Additional POST filters:** `durationMin`, `durationMax`, `text` (global full-text), `textInBody`, `textInHeaders`, `textInErrors`
### Agent Registry & SSE (Phase 3)
```bash
# Register an agent (uses bootstrap token, not JWT — see Authentication section above)
curl -s -X POST http://localhost:8081/api/v1/agents/register \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-secret-token" \
-d '{"agentId":"agent-1","name":"Order Service","group":"order-service-prod","version":"1.0.0","routeIds":["route-1","route-2"],"capabilities":["deep-trace","replay"]}'
# Heartbeat (call every 30s)
curl -s -X POST http://localhost:8081/api/v1/agents/agent-1/heartbeat \
-H "Authorization: Bearer $TOKEN"
# List agents (optionally filter by status)
curl -s -H "Authorization: Bearer $TOKEN" "http://localhost:8081/api/v1/agents"
curl -s -H "Authorization: Bearer $TOKEN" "http://localhost:8081/api/v1/agents?status=LIVE"
# Connect to SSE event stream (JWT via query parameter)
curl -s -N "http://localhost:8081/api/v1/agents/agent-1/events?token=$TOKEN"
# Send command to single agent
curl -s -X POST http://localhost:8081/api/v1/agents/agent-1/commands \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"type":"config-update","payload":{"samplingRate":0.5}}'
# Send command to agent group
curl -s -X POST http://localhost:8081/api/v1/agents/groups/order-service-prod/commands \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"type":"deep-trace","payload":{"routeId":"route-1","durationSeconds":60}}'
# Broadcast command to all live agents
curl -s -X POST http://localhost:8081/api/v1/agents/commands \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"type":"config-update","payload":{"samplingRate":1.0}}'
# Acknowledge command delivery
curl -s -X POST http://localhost:8081/api/v1/agents/agent-1/commands/{commandId}/ack \
-H "Authorization: Bearer $TOKEN"
```
**Agent lifecycle:** LIVE (heartbeat within 90s) → STALE (missed 3 heartbeats) → DEAD (5min after STALE). DEAD agents kept indefinitely.
**SSE events:** `config-update`, `deep-trace`, `replay` commands pushed in real time. Server sends ping keepalive every 15s.
**Command expiry:** Unacknowledged commands expire after 60 seconds.
### Backpressure
When the write buffer is full (default capacity: 50,000), ingestion endpoints return **503 Service Unavailable**. Already-buffered data is not lost.
## Configuration
Key settings in `cameleer3-server-app/src/main/resources/application.yml`:
| Setting | Default | Description |
|---------|---------|-------------|
| `server.port` | 8081 | Server port |
| `ingestion.buffer-capacity` | 50000 | Max items in write buffer |
| `ingestion.batch-size` | 5000 | Items per ClickHouse batch insert |
| `ingestion.flush-interval-ms` | 1000 | Buffer flush interval (ms) |
| `ingestion.data-ttl-days` | 30 | ClickHouse TTL for auto-deletion |
| `agent-registry.heartbeat-interval-seconds` | 30 | Expected heartbeat interval |
| `agent-registry.stale-threshold-seconds` | 90 | Time before agent marked STALE |
| `agent-registry.dead-threshold-seconds` | 300 | Time after STALE before DEAD |
| `agent-registry.command-expiry-seconds` | 60 | Pending command TTL |
| `agent-registry.keepalive-interval-seconds` | 15 | SSE ping keepalive interval |
| `security.access-token-expiry-ms` | 3600000 | JWT access token lifetime (1h) |
| `security.refresh-token-expiry-ms` | 604800000 | Refresh token lifetime (7d) |
| `security.bootstrap-token` | `${CAMELEER_AUTH_TOKEN}` | Bootstrap token for agent registration (required) |
| `security.bootstrap-token-previous` | `${CAMELEER_AUTH_TOKEN_PREVIOUS}` | Previous bootstrap token for rotation (optional) |
| `security.ui-user` | `admin` | UI login username (`CAMELEER_UI_USER` env var) |
| `security.ui-password` | `admin` | UI login password (`CAMELEER_UI_PASSWORD` env var) |
| `security.ui-origin` | `http://localhost:5173` | CORS allowed origin for UI (`CAMELEER_UI_ORIGIN` env var) |
| `security.jwt-secret` | *(random)* | HMAC secret for JWT signing (`CAMELEER_JWT_SECRET`). If set, tokens survive restarts |
| `security.oidc.enabled` | `false` | Enable OIDC login (`CAMELEER_OIDC_ENABLED`) |
| `security.oidc.issuer-uri` | | OIDC provider issuer URL (`CAMELEER_OIDC_ISSUER`) |
| `security.oidc.client-id` | | OAuth2 client ID (`CAMELEER_OIDC_CLIENT_ID`) |
| `security.oidc.client-secret` | | OAuth2 client secret (`CAMELEER_OIDC_CLIENT_SECRET`) |
| `security.oidc.roles-claim` | `realm_access.roles` | JSONPath to roles in OIDC id_token (`CAMELEER_OIDC_ROLES_CLAIM`) |
| `security.oidc.default-roles` | `VIEWER` | Default roles for new OIDC users (`CAMELEER_OIDC_DEFAULT_ROLES`) |
## Web UI Development
```bash
cd ui
npm install
npm run dev # Vite dev server on http://localhost:5173 (proxies /api to :8081)
npm run build # Production build to ui/dist/
```
Login with `admin` / `admin` (or whatever `CAMELEER_UI_USER` / `CAMELEER_UI_PASSWORD` are set to).
The UI uses runtime configuration via `public/config.js`. In Kubernetes, a ConfigMap overrides this file to set the correct API base URL.
### Regenerate API Types
When the backend OpenAPI spec changes:
```bash
cd ui
npm run generate-api # Requires backend running on :8081
```
## Running Tests
Integration tests use Testcontainers (starts ClickHouse automatically — requires Docker):
```bash
# All tests
mvn verify
# Unit tests only (no Docker needed)
mvn test -pl cameleer3-server-core
# Specific integration test
mvn test -pl cameleer3-server-app -Dtest=ExecutionControllerIT
```
## Verify ClickHouse Data
After posting data and waiting for the flush interval (1s default):
```bash
docker exec -it cameleer3-server-clickhouse-1 clickhouse-client \
--user cameleer --password cameleer_dev -d cameleer3 \
-q "SELECT count() FROM route_executions"
```
## Kubernetes Deployment
The full stack is deployed to k3s via CI/CD on push to `main`. K8s manifests are in `deploy/`.
### Architecture
```
cameleer namespace:
ClickHouse (StatefulSet, 2Gi PVC) ← clickhouse:8123 (ClusterIP)
cameleer3-server (Deployment) ← NodePort 30081
cameleer3-ui (Deployment, Nginx) ← NodePort 30090
```
### Access (from your network)
| Service | URL |
|---------|-----|
| Web UI | `http://192.168.50.86:30090` |
| Server API | `http://192.168.50.86:30081/api/v1/health` |
| Swagger UI | `http://192.168.50.86:30081/api/v1/swagger-ui.html` |
### CI/CD Pipeline
Push to `main` triggers: **build** (UI npm + Maven, unit tests) → **docker** (buildx amd64 for server + UI, push to Gitea registry) → **deploy** (kubectl apply + rolling update).
Required Gitea org secrets: `REGISTRY_TOKEN`, `KUBECONFIG_BASE64`, `CAMELEER_AUTH_TOKEN`, `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, `CAMELEER_UI_USER` (optional), `CAMELEER_UI_PASSWORD` (optional), `CAMELEER_JWT_SECRET` (recommended — tokens survive restarts).
### Manual K8s Commands
```bash
# Check pod status
kubectl -n cameleer get pods
# View server logs
kubectl -n cameleer logs -f deploy/cameleer3-server
# View ClickHouse logs
kubectl -n cameleer logs -f statefulset/clickhouse
# Restart server
kubectl -n cameleer rollout restart deployment/cameleer3-server
```