Files
cameleer-server/CLAUDE.md
hsiegeln a4a569a253
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m55s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m6s
fix: improve deployment progress UI and prevent duplicate deployment rows
- Redesign DeploymentProgress component: track-based layout with amber
  brand color, checkmarks for completed steps, user-friendly labels
  (Prepare, Image, Network, Launch, Verify, Activate, Live)
- Delete terminal (STOPPED/FAILED) deployments before creating new ones
  for the same app+environment, preventing duplicate rows in the UI
- Update CLAUDE.md with comprehensive key class locations, correct deploy
  stages, database migration reference, and REST endpoint summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:10:59 +02:00

266 lines
20 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project
Cameleer3 Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer3 agents. Pushes config and commands to agents via SSE. Also orchestrates Docker container deployments when running under cameleer-saas.
## Related Project
- **cameleer3** (`https://gitea.siegeln.net/cameleer/cameleer3`) — the Java agent that instruments Camel applications
- Protocol defined in `cameleer3-common/PROTOCOL.md` in the agent repo
- This server depends on `com.cameleer3:cameleer3-common` (shared models and graph API)
## Modules
- `cameleer3-server-core` — domain logic, storage interfaces, services (no Spring dependencies)
- `cameleer3-server-app` — Spring Boot web app, REST controllers, SSE, persistence, Docker orchestration
## Build Commands
```bash
mvn clean compile # Compile all modules
mvn clean verify # Full build with tests
```
## Run
```bash
java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar
```
## Key Classes by Package
### Core Module (`cameleer3-server-core/src/main/java/com/cameleer3/server/core/`)
**agent/** — Agent lifecycle and commands
- `AgentRegistryService` — in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycle
- `AgentInfo` — record: id, name, application, environmentId, version, routeIds, capabilities, state
- `AgentCommand` — record: id, type, targetAgent, payload, createdAt, expiresAt
- `AgentEventService` — records agent state changes, heartbeats
**runtime/** — App/Environment/Deployment domain
- `App` — record: id, environmentId, slug, displayName, containerConfig (JSONB)
- `AppVersion` — record: id, appId, version, jarPath
- `Environment` — record: id, slug, jarRetentionCount
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
- `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
- `ContainerRequest` — record: 17 fields for Docker container creation
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, cpuShares, cpuLimit, appPort, replicas, routingMode, etc.
- `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
- `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs
**search/** — Execution search
- `SearchService` — search, topErrors, punchcard, distinctAttributeKeys
- `SearchRequest` / `SearchResult` — search DTOs
**storage/** — Storage abstractions
- `ExecutionStore`, `MetricsStore`, `DiagramStore`, `SearchIndex`, `LogIndex` — interfaces
**rbac/** — Role-based access control
- `RbacService` — getDirectRolesForUser, syncOidcRoles, assignRole
- `SystemRole` — enum: AGENT, VIEWER, OPERATOR, ADMIN; `normalizeScope()` maps scopes
- `UserDetail`, `RoleDetail`, `GroupDetail` — records
**security/** — Auth
- `JwtService` — interface: createAccessToken, validateAccessToken
- `Ed25519SigningService` — interface: sign, verify (config signing)
- `OidcConfig` — record: issuerUri, clientId, audience, rolesClaim, additionalScopes
**ingestion/** — Buffered data pipeline
- `IngestionService` — ingestExecution, ingestMetric, ingestLog, ingestDiagram
- `ChunkAccumulator` — batches data for efficient flush
### App Module (`cameleer3-server-app/src/main/java/com/cameleer3/server/app/`)
**controller/** — REST endpoints
- `AgentRegistrationController` — POST /register, POST /heartbeat, GET / (list), POST /refresh-token
- `AgentSseController` — GET /sse (Server-Sent Events connection)
- `AgentCommandController` — POST /broadcast, POST /{agentId}, POST /{agentId}/ack
- `AppController` — CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versions
- `DeploymentController` — GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logs
- `EnvironmentAdminController` — CRUD /api/v1/admin/environments, PUT /{id}/jar-retention
- `ExecutionController` — GET /api/v1/executions (search + detail)
- `SearchController` — POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcard
- `LogQueryController` — GET /api/v1/logs, GET /tail
- `ChunkIngestionController` — POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}
- `UserAdminController` — CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-password
- `RoleAdminController` — CRUD /api/v1/admin/roles
- `GroupAdminController` — CRUD /api/v1/admin/groups
- `OidcConfigAdminController` — GET/POST /api/v1/admin/oidc, POST /test
- `AuditLogController` — GET /api/v1/admin/audit
- `MetricsController` — GET /api/v1/metrics, GET /timeseries
- `DiagramController` — GET /api/v1/diagrams/{id}, POST /
- `DiagramRenderController` — POST /api/v1/diagrams/render (ELK layout)
- `LicenseAdminController` — GET/POST /api/v1/admin/license
**runtime/** — Docker orchestration
- `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
- `DeploymentExecutor`@Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE
- `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
- `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing
- `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
**storage/** — PostgreSQL repositories (JdbcTemplate)
- `PostgresAppRepository`, `PostgresAppVersionRepository`, `PostgresEnvironmentRepository`
- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId
- `PostgresUserRepository`, `PostgresRoleRepository`, `PostgresGroupRepository`
- `PostgresAuditRepository`, `PostgresOidcConfigRepository`, `PostgresClaimMappingRepository`
**storage/** — ClickHouse stores
- `ClickHouseExecutionStore`, `ClickHouseMetricsStore`, `ClickHouseLogStore`
- `ClickHouseStatsStore` — pre-aggregated stats, punchcard
- `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
- `ClickHouseSearchIndex` — full-text search
- `ClickHouseUsageTracker` — usage_events for billing
**security/** — Spring Security
- `SecurityConfig` — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional
- `JwtAuthenticationFilter` — OncePerRequestFilter, validates Bearer tokens
- `JwtServiceImpl` — HMAC-SHA256 JWT (Nimbus JOSE)
- `OidcAuthController` — /api/v1/auth/oidc (login-uri, token-exchange, logout)
- `OidcTokenExchanger` — code -> tokens, role extraction from access_token then id_token
- `OidcProviderHelper` — OIDC discovery, JWK source cache
**agent/** — Agent lifecycle
- `SseConnectionManager` — manages per-agent SSE connections, delivers commands
- `AgentLifecycleMonitor`@Scheduled 10s, LIVE->STALE->DEAD transitions
**retention/** — JAR cleanup
- `JarRetentionJob`@Scheduled 03:00 daily, per-environment retention, skips deployed versions
**config/** — Spring beans
- `RuntimeOrchestratorAutoConfig` — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
- `RuntimeBeanConfig` — DeploymentExecutor, AppService, EnvironmentService
- `SecurityBeanConfig` — JwtService, Ed25519, BootstrapTokenValidator
- `StorageBeanConfig` — all repositories
- `ClickHouseConfig` — ClickHouse JdbcTemplate, schema initializer
## Key Conventions
- Java 17+ required
- Spring Boot 3.4.3 parent POM
- Depends on `com.cameleer3:cameleer3-common` from Gitea Maven registry
- Jackson `JavaTimeModule` for `Instant` deserialization
- Communication: receives HTTP POST data from agents (executions, diagrams, metrics, logs), serves SSE event streams for config push/commands (config-update, deep-trace, replay, route-control)
- Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT `env` claim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeat `environmentId` > JWT `env` claim > `"default"`). Capabilities and route states updated on every heartbeat (protocol v2). Route catalog falls back to ClickHouse stats for route discovery when registry has incomplete data.
- Multi-tenancy: each server instance serves one tenant (configured via `CAMELEER_TENANT_ID`, default: `"default"`). Environments (dev/staging/prod) are first-class — agents send `environmentId` at registration and in heartbeats. JWT carries `env` claim for environment persistence across token refresh. PostgreSQL isolated via schema-per-tenant (`?currentSchema=tenant_{id}`). ClickHouse shared DB with `tenant_id` + `environment` columns, partitioned by `(tenant_id, toYYYYMM(timestamp))`.
- Storage: PostgreSQL for RBAC, config, and audit; ClickHouse for all observability data (executions, search, logs, metrics, stats, diagrams). ClickHouse schema migrations in `clickhouse/*.sql`, run idempotently on startup by `ClickHouseSchemaInitializer`. Use `IF NOT EXISTS` for CREATE and ADD PROJECTION.
- Logging: ClickHouse JDBC set to INFO (`com.clickhouse`), HTTP client to WARN (`org.apache.hc.client5`) in application.yml
- Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing (key derived deterministically from JWT secret via HMAC-SHA256), bootstrap token for registration. CORS: `CAMELEER_CORS_ALLOWED_ORIGINS` (comma-separated) overrides `CAMELEER_UI_ORIGIN` for multi-origin setups (e.g., reverse proxy). UI role gating: Admin sidebar/routes hidden for non-ADMIN; diagram toolbar and route control hidden for VIEWER. Read-only for VIEWER, editable for OPERATOR+. Role helpers: `useIsAdmin()`, `useCanControl()` in `auth-store.ts`. Route guard: `RequireAdmin` in `auth/RequireAdmin.tsx`.
- OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API/UI, stored in database (`server_config` table). Configurable `userIdClaim` (default `sub`) determines which id_token claim is used as the user identifier. Resource server mode: accepts external access tokens (Logto M2M) via JWKS validation when `CAMELEER_OIDC_ISSUER_URI` is set. `CAMELEER_OIDC_JWK_SET_URI` overrides JWKS discovery for container networking. `CAMELEER_OIDC_TLS_SKIP_VERIFY=true` disables TLS cert verification for OIDC calls (self-signed CAs). Scope-based role mapping via `SystemRole.normalizeScope()` (case-insensitive, strips `server:` prefix): `admin`/`server:admin` -> ADMIN, `operator`/`server:operator` -> OPERATOR, `viewer`/`server:viewer` -> VIEWER. SSO: when OIDC enabled, UI auto-redirects to provider with `prompt=none` for silent sign-in; falls back to `/login?local` on `login_required`, retries without `prompt=none` on `consent_required`. Logout always redirects to `/login?local` (via OIDC end_session or direct fallback) to prevent SSO re-login loops. Auto-signup provisions new OIDC users with default roles. System roles synced on every OIDC login via `syncOidcRoles` — always overwrites directly-assigned roles (falls back to `defaultRoles` when OIDC returns none); uses `getDirectRolesForUser` to avoid touching group-inherited roles. Group memberships are never touched. Supports ES384, ES256, RS256. Shared OIDC logic in `OidcProviderHelper` (discovery, JWK source, algorithm set).
- OIDC role extraction: `OidcTokenExchanger` reads roles from the **access_token** first (JWT with `at+jwt` type, decoded by a separate processor), then falls back to id_token. `OidcConfig` includes `audience` (RFC 8707 resource indicator — included in both authorization request and token exchange POST body to trigger JWT access tokens) and `additionalScopes` (extra scopes for the SPA to request). The `rolesClaim` config points to the claim name in the token (e.g., `"roles"` for Custom JWT claims, `"realm_access.roles"` for Keycloak). All provider-specific configuration is external — no provider-specific code in the server.
- User persistence: PostgreSQL `users` table, admin CRUD at `/api/v1/admin/users`
- Usage analytics: ClickHouse `usage_events` table tracks authenticated UI requests, flushed every 5s
## Database Migrations
PostgreSQL (Flyway): `cameleer3-server-app/src/main/resources/db/migration/`
- V1 — RBAC (users, roles, groups, audit_log)
- V2 — Claim mappings (OIDC)
- V3 — Runtime management (apps, environments, deployments, app_versions)
- V4 — Environment config (default_container_config JSONB)
- V5 — App container config (container_config JSONB on apps)
- V6 — JAR retention policy (jar_retention_count on environments)
- V7 — Deployment orchestration (target_state, deployment_strategy, replica_states JSONB, deploy_stage)
ClickHouse: `cameleer3-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup)
## CI/CD & Deployment
- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches
- Build step skips integration tests (`-DskipITs`) — Testcontainers needs Docker daemon
- Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime
- `REGISTRY_TOKEN` build arg required for `cameleer3-common` dependency resolution
- Registry: `gitea.siegeln.net/cameleer/cameleer3-server` (container images)
- K8s manifests in `deploy/` — Kustomize base + overlays (main/feature), shared infra (PostgreSQL, ClickHouse, Logto) as top-level manifests
- Deployment target: k3s at 192.168.50.86, namespace `cameleer` (main), `cam-<slug>` (feature branches)
- Feature branches: isolated namespace, PG schema; Traefik Ingress at `<slug>-api.cameleer.siegeln.net`
- Secrets managed in CI deploy step (idempotent `--dry-run=client | kubectl apply`): `cameleer-auth`, `postgres-credentials`, `clickhouse-credentials`
- K8s probes: server uses `/api/v1/health`, PostgreSQL uses `pg_isready -U "$POSTGRES_USER"` (env var, not hardcoded)
- K8s security: server and database pods run with `securityContext.runAsNonRoot`. UI (nginx) runs without securityContext (needs root for entrypoint setup).
- Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
- Docker build uses buildx registry cache + `--provenance=false` for Gitea compatibility
- CI: branch slug sanitization extracted to `.gitea/sanitize-branch.sh`, sourced by docker and deploy-feature jobs
## UI Structure
The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments**.
- **Exchanges** — route execution search and detail (`ui/src/pages/Exchanges/`)
- **Dashboard** — metrics and stats with L1/L2/L3 drill-down (`ui/src/pages/DashboardTab/`)
- **Runtime** — live agent status, logs, commands (`ui/src/pages/RuntimeTab/`)
- **Deployments** — app management, JAR upload, deployment lifecycle (`ui/src/pages/AppsTab/`)
- Config sub-tabs: **Variables | Monitoring | Traces & Taps | Route Recording | Resources**
- Create app: full page at `/apps/new` (not a modal)
- Deployment progress: `ui/src/components/DeploymentProgress.tsx` (7-stage step indicator)
### Key UI Files
- `ui/src/router.tsx` — React Router v6 routes
- `ui/src/config.ts` — apiBaseUrl, basePath
- `ui/src/auth/auth-store.ts` — Zustand: accessToken, user, roles, login/logout
- `ui/src/api/environment-store.ts` — Zustand: selected environment (localStorage)
- `ui/src/components/ContentTabs.tsx` — main tab switcher
- `ui/src/components/ExecutionDiagram/` — interactive trace view (canvas)
- `ui/src/components/ProcessDiagram/` — ELK-rendered route diagram
- `ui/src/hooks/useScope.ts` — TabKey type, scope inference
## UI Styling
- Always use `@cameleer/design-system` CSS variables for colors (`var(--amber)`, `var(--error)`, `var(--success)`, etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG `fill`/`stroke` attributes. SVG presentation attributes resolve `var()` correctly.
- Brand assets: `@cameleer/design-system/assets/` provides `camel-logo.svg` (currentColor), `cameleer3-{16,32,48,192,512}.png`, and `cameleer3-logo.png`. Copied to `ui/public/` for use as favicon (`favicon-16.png`, `favicon-32.png`) and logo (`camel-logo.svg` — login dialog 36px, sidebar 28x24px).
- Sidebar generates `/exchanges/` paths directly (no legacy `/apps/` redirects). basePath is centralized in `ui/src/config.ts`; router.tsx imports it instead of re-reading `<base>` tag.
- Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g. `?text=` search query). Switching environment resets all filters and remounts pages.
## Docker Orchestration
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles.
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer3-server` DNS alias.
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS.
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer3-server` label. Detects die/oom/start/stop events and updates deployment replica states.
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
### Deployment Status Model
Deployments move through these statuses:
| Status | Meaning |
|--------|---------|
| `STOPPED` | Intentionally stopped or initial state |
| `STARTING` | Deploy in progress |
| `RUNNING` | All replicas healthy and serving |
| `DEGRADED` | Some replicas healthy, some dead |
| `STOPPING` | Graceful shutdown in progress |
| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
### JAR Management
- **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
- **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
- **Volume-based JAR mounting** for Docker-in-Docker setups: set `CAMELEER_JAR_DOCKER_VOLUME` to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
### nginx / Reverse Proxy
- `client_max_body_size 200m` is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.
## Disabled Skills
- Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands.