Files
cameleer-server/CLAUDE.md
hsiegeln 891abbfcfd
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
docs: add sensitive keys feature documentation
- CLAUDE.md: add SensitiveKeysConfig, SensitiveKeysRepository, SensitiveKeysMerger
  to core admin classes; add SensitiveKeysAdminController endpoint; add
  PostgresSensitiveKeysRepository; add sensitive keys convention; add admin page
  to UI structure
- Design spec and implementation plan for the feature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:29:15 +02:00

39 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project

Cameleer3 Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer3 agents. Pushes config and commands to agents via SSE. Also orchestrates Docker container deployments when running under cameleer-saas.

  • cameleer3 (https://gitea.siegeln.net/cameleer/cameleer3) — the Java agent that instruments Camel applications
  • Protocol defined in cameleer3-common/PROTOCOL.md in the agent repo
  • This server depends on com.cameleer3:cameleer3-common (shared models and graph API)

Modules

  • cameleer3-server-core — domain logic, storage interfaces, services (no Spring dependencies)
  • cameleer3-server-app — Spring Boot web app, REST controllers, SSE, persistence, Docker orchestration

Build Commands

mvn clean compile          # Compile all modules
mvn clean verify           # Full build with tests

Run

java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar

Key Classes by Package

Core Module (cameleer3-server-core/src/main/java/com/cameleer3/server/core/)

agent/ — Agent lifecycle and commands

  • AgentRegistryService — in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycle
  • AgentInfo — record: id, name, application, environmentId, version, routeIds, capabilities, state
  • AgentCommand — record: id, type, targetAgent, payload, createdAt, expiresAt
  • AgentEventService — records agent state changes, heartbeats

runtime/ — App/Environment/Deployment domain

  • App — record: id, environmentId, slug, displayName, containerConfig (JSONB)
  • AppVersion — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
  • Environment — record: id, slug, jarRetentionCount
  • Deployment — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
  • DeploymentStatus — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
  • DeployStage — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
  • DeploymentService — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
  • RuntimeType — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
  • RuntimeDetector — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
  • ContainerRequest — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
  • ResolvedContainerConfig — record: typed config with memoryLimitMb, cpuShares, cpuLimit, appPort, replicas, routingMode, routeControlEnabled, replayEnabled, runtimeType, customArgs, etc.
  • ConfigMerger — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
  • RuntimeOrchestrator — interface: startContainer, stopContainer, getContainerStatus, getLogs

search/ — Execution search

  • SearchService — search, topErrors, punchcard, distinctAttributeKeys
  • SearchRequest / SearchResult — search DTOs

storage/ — Storage abstractions

  • ExecutionStore, MetricsStore, DiagramStore, SearchIndex, LogIndex — interfaces

rbac/ — Role-based access control

  • RbacService — getDirectRolesForUser, syncOidcRoles, assignRole
  • SystemRole — enum: AGENT, VIEWER, OPERATOR, ADMIN; normalizeScope() maps scopes
  • UserDetail, RoleDetail, GroupDetail — records

admin/ — Server-wide admin config

  • SensitiveKeysConfig — record: keys (List, immutable)
  • SensitiveKeysRepository — interface: find(), save()
  • SensitiveKeysMerger — pure function: merge(global, perApp) -> union with case-insensitive dedup, preserves first-seen casing. Returns null when both inputs null.

security/ — Auth

  • JwtService — interface: createAccessToken, validateAccessToken
  • Ed25519SigningService — interface: sign, verify (config signing)
  • OidcConfig — record: issuerUri, clientId, audience, rolesClaim, additionalScopes

ingestion/ — Buffered data pipeline

  • IngestionService — ingestExecution, ingestMetric, ingestLog, ingestDiagram
  • ChunkAccumulator — batches data for efficient flush

App Module (cameleer3-server-app/src/main/java/com/cameleer3/server/app/)

controller/ — REST endpoints

  • AgentRegistrationController — POST /register, POST /heartbeat, GET / (list), POST /refresh-token
  • AgentSseController — GET /sse (Server-Sent Events connection)
  • AgentCommandController — POST /broadcast, POST /{agentId}, POST /{agentId}/ack
  • AppController — CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versions
  • DeploymentController — GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logs
  • EnvironmentAdminController — CRUD /api/v1/admin/environments, PUT /{id}/jar-retention
  • ExecutionController — GET /api/v1/executions (search + detail)
  • SearchController — POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcard
  • LogQueryController — GET /api/v1/logs (filters: source, application, agentId, exchangeId, level, logger, q, environment, time range)
  • LogIngestionController — POST /api/v1/data/logs (accepts List<LogEntry> JSON array, each entry has source: app/agent). Logs WARN for: missing agent identity, unregistered agents, empty payloads, buffer-full drops, deserialization failures. Normal acceptance at DEBUG.
  • CatalogController — GET /api/v1/catalog (unified app catalog merging PG managed apps + in-memory agents + CH stats), DELETE /api/v1/catalog/{applicationId} (ADMIN: dismiss app, purge all CH data + PG record). Auto-filters discovered apps older than discoveryttldays with no live agents.
  • ChunkIngestionController — POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}
  • UserAdminController — CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-password
  • RoleAdminController — CRUD /api/v1/admin/roles
  • GroupAdminController — CRUD /api/v1/admin/groups
  • OidcConfigAdminController — GET/POST /api/v1/admin/oidc, POST /test
  • SensitiveKeysAdminController — GET/PUT /api/v1/admin/sensitive-keys. GET returns 200 with config or 204 if not configured. PUT accepts { keys: [...] } with optional ?pushToAgents=true to fan out merged keys to all LIVE agents. Stored in server_config table (key sensitive_keys).
  • AuditLogController — GET /api/v1/admin/audit
  • MetricsController — GET /api/v1/metrics, GET /timeseries
  • DiagramController — GET /api/v1/diagrams/{id}, POST /
  • DiagramRenderController — POST /api/v1/diagrams/render (ELK layout)
  • LicenseAdminController — GET/POST /api/v1/admin/license

runtime/ — Docker orchestration

  • DockerRuntimeOrchestrator — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
  • DeploymentExecutor — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Primary network for app containers is set via CAMELEER_SERVER_RUNTIME_DOCKERNETWORK env var (in SaaS mode: cameleer-tenant-{slug}); apps also connect to cameleer-traefik (routing) and cameleer-env-{tenantId}-{envSlug} (per-environment discovery) as additional networks. Resolves runtimeType: auto to concrete type from AppVersion.detectedRuntimeType at PRE_FLIGHT (fails deployment if unresolvable). Builds framework-specific Docker entrypoint per runtime type (Spring Boot PropertiesLauncher, Quarkus -jar, plain Java classpath, native binary). Sets CAMELEER_AGENT_* env vars from ResolvedContainerConfig (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
  • DockerNetworkManager — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
  • DockerEventMonitor — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
  • TraefikLabelBuilder — generates Traefik Docker labels for path-based or subdomain routing
  • PrometheusLabelBuilder — generates Prometheus Docker labels (prometheus.scrape/path/port) per runtime type for docker_sd_configs auto-discovery
  • DisabledRuntimeOrchestrator — no-op when runtime not enabled

metrics/ — Prometheus observability

  • ServerMetrics — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via /api/v1/prometheus.

storage/ — PostgreSQL repositories (JdbcTemplate)

  • PostgresAppRepository, PostgresAppVersionRepository, PostgresEnvironmentRepository
  • PostgresDeploymentRepository — includes JSONB replica_states, deploy_stage, findByContainerId
  • PostgresUserRepository, PostgresRoleRepository, PostgresGroupRepository
  • PostgresAuditRepository, PostgresOidcConfigRepository, PostgresClaimMappingRepository, PostgresSensitiveKeysRepository

storage/ — ClickHouse stores

  • ClickHouseExecutionStore, ClickHouseMetricsStore, ClickHouseLogStore
  • ClickHouseStatsStore — pre-aggregated stats, punchcard
  • ClickHouseDiagramStore, ClickHouseAgentEventRepository
  • ClickHouseSearchIndex — full-text search
  • ClickHouseUsageTracker — usage_events for billing

security/ — Spring Security

  • SecurityConfig — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional
  • JwtAuthenticationFilter — OncePerRequestFilter, validates Bearer tokens
  • JwtServiceImpl — HMAC-SHA256 JWT (Nimbus JOSE)
  • OidcAuthController — /api/v1/auth/oidc (login-uri, token-exchange, logout)
  • OidcTokenExchanger — code -> tokens, role extraction from access_token then id_token
  • OidcProviderHelper — OIDC discovery, JWK source cache

agent/ — Agent lifecycle

  • SseConnectionManager — manages per-agent SSE connections, delivers commands
  • AgentLifecycleMonitor — @Scheduled 10s, LIVE->STALE->DEAD transitions

retention/ — JAR cleanup

  • JarRetentionJob — @Scheduled 03:00 daily, per-environment retention, skips deployed versions

config/ — Spring beans

  • RuntimeOrchestratorAutoConfig — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
  • RuntimeBeanConfig — DeploymentExecutor, AppService, EnvironmentService
  • SecurityBeanConfig — JwtService, Ed25519, BootstrapTokenValidator
  • StorageBeanConfig — all repositories
  • ClickHouseConfig — ClickHouse JdbcTemplate, schema initializer

Key Conventions

  • Java 17+ required
  • Spring Boot 3.4.3 parent POM
  • Depends on com.cameleer3:cameleer3-common from Gitea Maven registry
  • Jackson JavaTimeModule for Instant deserialization
  • Communication: receives HTTP POST data from agents (executions, diagrams, metrics, logs), serves SSE event streams for config push/commands (config-update, deep-trace, replay, route-control)
  • Environment filtering: all data queries (exchanges, dashboard stats, route metrics, agent events, correlation) filter by the selected environment. All commands (config-update, route-control, set-traced-processors, replay) target only agents in the selected environment when one is selected. AgentRegistryService.findByApplicationAndEnvironment() for environment-scoped command dispatch. Backend endpoints accept optional environment query parameter; null = all environments (backward compatible).
  • Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT env claim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeat environmentId > JWT env claim > "default"). Capabilities and route states updated on every heartbeat (protocol v2). Route catalog falls back to ClickHouse stats for route discovery when registry has incomplete data.
  • Multi-tenancy: each server instance serves one tenant (configured via CAMELEER_SERVER_TENANT_ID, default: "default"). Environments (dev/staging/prod) are first-class — agents send environmentId at registration and in heartbeats. JWT carries env claim for environment persistence across token refresh. PostgreSQL isolated via schema-per-tenant (?currentSchema=tenant_{id}). ClickHouse shared DB with tenant_id + environment columns, partitioned by (tenant_id, toYYYYMM(timestamp)).
  • Storage: PostgreSQL for RBAC, config, and audit; ClickHouse for all observability data (executions, search, logs, metrics, stats, diagrams). ClickHouse schema migrations in clickhouse/*.sql, run idempotently on startup by ClickHouseSchemaInitializer. Use IF NOT EXISTS for CREATE and ADD PROJECTION.
  • Logging: ClickHouse JDBC set to INFO (com.clickhouse), HTTP client to WARN (org.apache.hc.client5) in application.yml
  • Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing (key derived deterministically from JWT secret via HMAC-SHA256), bootstrap token for registration. CORS: CAMELEER_SERVER_SECURITY_CORSALLOWEDORIGINS (comma-separated) overrides CAMELEER_SERVER_SECURITY_UIORIGIN for multi-origin setups (e.g., reverse proxy). Infrastructure access: CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false disables Database and ClickHouse admin endpoints (set by SaaS provisioner on tenant servers). Health endpoint exposes the flag for UI tab visibility. UI role gating: Admin sidebar/routes hidden for non-ADMIN; diagram toolbar and route control hidden for VIEWER. Read-only for VIEWER, editable for OPERATOR+. Role helpers: useIsAdmin(), useCanControl() in auth-store.ts. Route guard: RequireAdmin in auth/RequireAdmin.tsx. Last-ADMIN guard: system prevents removal of the last ADMIN role (409 Conflict on role removal, user deletion, group role removal). Password policy: min 12 chars, 3-of-4 character classes, no username match (enforced on user creation and admin password reset). Brute-force protection: 5 failed attempts -> 15 min lockout (tracked via failed_login_attempts / locked_until on users table). Token revocation: token_revoked_before column on users, checked in JwtAuthenticationFilter, set on password change.
  • OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API/UI, stored in database (server_config table). Configurable userIdClaim (default sub) determines which id_token claim is used as the user identifier. Resource server mode: accepts external access tokens (Logto M2M) via JWKS validation when CAMELEER_SERVER_SECURITY_OIDCISSUERURI is set. CAMELEER_SERVER_SECURITY_OIDCJWKSETURI overrides JWKS discovery for container networking. CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY=true disables TLS cert verification for OIDC calls (self-signed CAs). Scope-based role mapping via SystemRole.normalizeScope() (case-insensitive, strips server: prefix): admin/server:admin -> ADMIN, operator/server:operator -> OPERATOR, viewer/server:viewer -> VIEWER. SSO: when OIDC enabled, UI auto-redirects to provider with prompt=none for silent sign-in; falls back to /login?local on login_required, retries without prompt=none on consent_required. Logout always redirects to /login?local (via OIDC end_session or direct fallback) to prevent SSO re-login loops. Auto-signup provisions new OIDC users with default roles. System roles synced on every OIDC login via syncOidcRoles — always overwrites directly-assigned roles (falls back to defaultRoles when OIDC returns none); uses getDirectRolesForUser to avoid touching group-inherited roles. Group memberships are never touched. Supports ES384, ES256, RS256. Shared OIDC logic in OidcProviderHelper (discovery, JWK source, algorithm set).
  • OIDC role extraction: OidcTokenExchanger reads roles from the access_token first (JWT with at+jwt type, decoded by a separate processor), then falls back to id_token. OidcConfig includes audience (RFC 8707 resource indicator — included in both authorization request and token exchange POST body to trigger JWT access tokens) and additionalScopes (extra scopes for the SPA to request). The rolesClaim config points to the claim name in the token (e.g., "roles" for Custom JWT claims, "realm_access.roles" for Keycloak). All provider-specific configuration is external — no provider-specific code in the server.
  • Sensitive keys: Global enforced baseline for masking sensitive data in agent payloads. Admin configures via PUT /api/v1/admin/sensitive-keys (stored in server_config table, key sensitive_keys). Per-app additions stored in ApplicationConfig.sensitiveKeys. Merge rule: final = global UNION per-app (case-insensitive dedup, per-app can only add, never remove global keys). When no config exists, agents use built-in defaults. ApplicationConfigController.getConfig() returns AppConfigResponse wrapping config with globalSensitiveKeys and mergedSensitiveKeys for UI rendering. Config-update SSE payloads carry the merged list. SaaS propagation: platform calls the same admin API on each tenant server (no special protocol).
  • User persistence: PostgreSQL users table, admin CRUD at /api/v1/admin/users
  • Usage analytics: ClickHouse usage_events table tracks authenticated UI requests, flushed every 5s

Database Migrations

PostgreSQL (Flyway): cameleer3-server-app/src/main/resources/db/migration/

  • V1 — RBAC (users, roles, groups, audit_log)
  • V2 — Claim mappings (OIDC)
  • V3 — Runtime management (apps, environments, deployments, app_versions)
  • V4 — Environment config (default_container_config JSONB)
  • V5 — App container config (container_config JSONB on apps)
  • V6 — JAR retention policy (jar_retention_count on environments)
  • V7 — Deployment orchestration (target_state, deployment_strategy, replica_states JSONB, deploy_stage)
  • V8 — Deployment active config (resolved_config JSONB on deployments)
  • V9 — Password hardening (failed_login_attempts, locked_until, token_revoked_before on users)
  • V10 — Runtime type detection (detected_runtime_type, detected_main_class on app_versions)

ClickHouse: cameleer3-server-app/src/main/resources/clickhouse/init.sql (run idempotently on startup)

CI/CD & Deployment

  • CI workflow: .gitea/workflows/ci.yml — build -> docker -> deploy on push to main or feature branches
  • Build step skips integration tests (-DskipITs) — Testcontainers needs Docker daemon
  • Docker: multi-stage build (Dockerfile), $BUILDPLATFORM for native Maven on ARM64 runner, amd64 runtime. docker-entrypoint.sh imports /certs/ca.pem into JVM truststore before starting the app (supports custom CAs for OIDC discovery without CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY).
  • REGISTRY_TOKEN build arg required for cameleer3-common dependency resolution
  • Registry: gitea.siegeln.net/cameleer/cameleer3-server (container images)
  • K8s manifests in deploy/ — Kustomize base + overlays (main/feature), shared infra (PostgreSQL, ClickHouse, Logto) as top-level manifests
  • Deployment target: k3s at 192.168.50.86, namespace cameleer (main), cam-<slug> (feature branches)
  • Feature branches: isolated namespace, PG schema; Traefik Ingress at <slug>-api.cameleer.siegeln.net
  • Secrets managed in CI deploy step (idempotent --dry-run=client | kubectl apply): cameleer-auth, cameleer-postgres-credentials, cameleer-clickhouse-credentials
  • K8s probes: server uses /api/v1/health, PostgreSQL uses pg_isready -U "$POSTGRES_USER" (env var, not hardcoded)
  • K8s security: server and database pods run with securityContext.runAsNonRoot. UI (nginx) runs without securityContext (needs root for entrypoint setup).
  • Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
  • Docker build uses buildx registry cache + --provenance=false for Gitea compatibility
  • CI: branch slug sanitization extracted to .gitea/sanitize-branch.sh, sourced by docker and deploy-feature jobs

UI Structure

The UI has 4 main tabs: Exchanges, Dashboard, Runtime, Deployments.

  • Exchanges — route execution search and detail (ui/src/pages/Exchanges/)
  • Dashboard — metrics and stats with L1/L2/L3 drill-down (ui/src/pages/DashboardTab/)
  • Runtime — live agent status, logs, commands (ui/src/pages/RuntimeTab/)
  • Deployments — app management, JAR upload, deployment lifecycle (ui/src/pages/AppsTab/)
    • Config sub-tabs: Variables | Monitoring | Traces & Taps | Route Recording | Resources
    • Create app: full page at /apps/new (not a modal)
    • Deployment progress: ui/src/components/DeploymentProgress.tsx (7-stage step indicator)

Admin pages (ADMIN-only, under /admin/):

  • Sensitive Keys (ui/src/pages/Admin/SensitiveKeysPage.tsx) — global sensitive key masking config with tag/pill editor, push-to-agents toggle. Per-app additions shown in AppConfigDetailPage.tsx with read-only global pills (greyed Badge) + editable per-app pills (Tag with remove).

Key UI Files

  • ui/src/router.tsx — React Router v6 routes
  • ui/src/config.ts — apiBaseUrl, basePath
  • ui/src/auth/auth-store.ts — Zustand: accessToken, user, roles, login/logout
  • ui/src/api/environment-store.ts — Zustand: selected environment (localStorage)
  • ui/src/components/ContentTabs.tsx — main tab switcher
  • ui/src/components/ExecutionDiagram/ — interactive trace view (canvas)
  • ui/src/components/ProcessDiagram/ — ELK-rendered route diagram
  • ui/src/hooks/useScope.ts — TabKey type, scope inference

UI Styling

  • Always use @cameleer/design-system CSS variables for colors (var(--amber), var(--error), var(--success), etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG fill/stroke attributes. SVG presentation attributes resolve var() correctly. All colors use CSS variables (no hardcoded hex).
  • Shared CSS modules in ui/src/styles/ (table-section, log-panel, rate-colors, refresh-indicator, chart-card, section-card) — import these instead of duplicating patterns.
  • Shared PageLoader component replaces copy-pasted spinner patterns.
  • Design system components used consistently: Select, Tabs, Toggle, Button, LogViewer, Label — prefer DS components over raw HTML elements.
  • Environment slugs are auto-computed from display name (read-only in UI).
  • Brand assets: @cameleer/design-system/assets/ provides camel-logo.svg (currentColor), cameleer3-{16,32,48,192,512}.png, and cameleer3-logo.png. Copied to ui/public/ for use as favicon (favicon-16.png, favicon-32.png) and logo (camel-logo.svg — login dialog 36px, sidebar 28x24px).
  • Sidebar generates /exchanges/ paths directly (no legacy /apps/ redirects). basePath is centralized in ui/src/config.ts; router.tsx imports it instead of re-reading <base> tag.
  • Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g. ?text= search query). Switching environment resets all filters and remounts pages.

Docker Orchestration

When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:

  • ConfigMerger (core/runtime/ConfigMerger.java) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes runtimeType (default "auto") and customArgs (default "").
  • TraefikLabelBuilder (app/runtime/TraefikLabelBuilder.java) — generates Traefik Docker labels for path-based (/{envSlug}/{appSlug}/) or subdomain-based ({appSlug}-{envSlug}.{domain}) routing. Supports strip-prefix and SSL offloading toggles.
  • PrometheusLabelBuilder (app/runtime/PrometheusLabelBuilder.java) — generates Prometheus docker_sd_configs labels per resolved runtime type: Spring Boot /actuator/prometheus:8081, Quarkus/native /q/metrics:9000, plain Java /metrics:9464. Labels merged into container metadata alongside Traefik labels at deploy time.
  • DockerNetworkManager (app/runtime/DockerNetworkManager.java) — manages two Docker network tiers:
    • cameleer-traefik — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with cameleer3-server DNS alias.
    • cameleer-env-{slug} — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: cameleer-env-{tenantId}-{envSlug} (overloaded envNetworkName(tenantId, envSlug) method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
  • DockerEventMonitor (app/runtime/DockerEventMonitor.java) — persistent Docker event stream listener for containers with managed-by=cameleer3-server label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
  • DeploymentProgress (ui/src/components/DeploymentProgress.tsx) — UI step indicator showing 7 deploy stages with amber active/green completed styling.

Deployment Status Model

Deployments move through these statuses:

Status Meaning
STOPPED Intentionally stopped or initial state
STARTING Deploy in progress
RUNNING All replicas healthy and serving
DEGRADED Some replicas healthy, some dead
STOPPING Graceful shutdown in progress
FAILED Terminal failure (pre-flight, health check, or crash)

Replica support: deployments can specify a replica count. DEGRADED is used when at least one but not all replicas are healthy.

Deploy stages (DeployStage): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).

Blue/green strategy: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.

Deployment uniqueness: DeploymentService.createDeployment() deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.

JAR Management

  • Retention policy per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
  • Nightly cleanup job (JarRetentionJob, Spring @Scheduled 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
  • Volume-based JAR mounting for Docker-in-Docker setups: set CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).

Runtime Type Detection

The server detects the app framework from uploaded JARs and builds framework-specific Docker entrypoints:

  • Detection (RuntimeDetector): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes META-INF/MANIFEST.MF Main-Class: Spring Boot loader prefix → spring-boot, Quarkus entry point → quarkus, other Main-Class → plain-java (extracts class name). Results stored on AppVersion (detected_runtime_type, detected_main_class).
  • Runtime types (RuntimeType enum): AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE. Configurable per app/environment via containerConfig.runtimeType (default "auto").
  • Entrypoint per type: Spring Boot uses PropertiesLauncher with -Dloader.path for log appender; Quarkus uses -jar (appender compiled in); plain Java uses classpath with appender JAR; native runs binary directly (agent compiled in). All JVM types get -javaagent:/app/agent.jar.
  • Custom arguments (containerConfig.customArgs): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses sh -c).
  • AUTO resolution: at deploy time (PRE_FLIGHT), "auto" resolves to the detected type from AppVersion. Fails deployment if detection was unsuccessful — user must set type explicitly.
  • UI: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.

SaaS Multi-Tenant Network Isolation

In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:

  • Tenant network (cameleer-tenant-{slug}) — primary internal bridge for all of a tenant's containers. Set as CAMELEER_SERVER_RUNTIME_DOCKERNETWORK for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
  • Shared services network — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and cameleer-traefik for HTTP routing.
  • Tenant-scoped environment networks (cameleer-env-{tenantId}-{envSlug}) — per-environment discovery is scoped per tenant, so alpha-corp's "dev" environment network is separate from beta-corp's "dev" environment network.

nginx / Reverse Proxy

  • client_max_body_size 200m is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.

Prometheus Metrics

Server exposes /api/v1/prometheus (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and http.server.requests metrics automatically. Business metrics via ServerMetrics component:

Gauges (auto-polled):

Metric Tags Source
cameleer.agents.connected state (live, stale, dead, shutdown) AgentRegistryService.findByState()
cameleer.agents.sse.active SseConnectionManager.getConnectionCount()
cameleer.ingestion.buffer.size type (execution, processor, log, metrics) WriteBuffer.size()
cameleer.ingestion.accumulator.pending ChunkAccumulator.getPendingCount()

Counters:

Metric Tags Instrumented in
cameleer.ingestion.drops reason (buffer_full, no_agent, no_identity) LogIngestionController
cameleer.agents.transitions transition (went_stale, went_dead, recovered) AgentLifecycleMonitor
cameleer.deployments.outcome status (running, failed, degraded) DeploymentExecutor
cameleer.auth.failures reason (invalid_token, revoked, oidc_rejected) JwtAuthenticationFilter

Timers:

Metric Tags Instrumented in
cameleer.ingestion.flush.duration type (execution, processor, log) ExecutionFlushScheduler
cameleer.deployments.duration DeploymentExecutor

Agent container Prometheus labels (set by PrometheusLabelBuilder at deploy time):

Runtime Type prometheus.path prometheus.port
spring-boot /actuator/prometheus 8081
quarkus / native /q/metrics 9000
plain-java /metrics 9464

All containers also get prometheus.scrape=true. These labels enable Prometheus docker_sd_configs auto-discovery.

Agent Metric Names (Micrometer)

Agents send MetricsSnapshot records with Micrometer-convention metric names. The server stores them generically (ClickHouse agent_metrics.metric_name). The UI references specific names in AgentInstance.tsx for JVM charts.

JVM metrics (used by UI):

Metric name UI usage
process.cpu.usage.value CPU % stat card + chart
jvm.memory.used.value Heap MB stat card + chart (tags: area=heap)
jvm.memory.max.value Heap max for % calculation (tags: area=heap)
jvm.threads.live.value Thread count chart
jvm.gc.pause.total_time GC time chart

Camel route metrics (stored, queried by dashboard):

Metric name Type Tags
camel.exchanges.succeeded.count counter routeId, camelContext
camel.exchanges.failed.count counter routeId, camelContext
camel.exchanges.total.count counter routeId, camelContext
camel.exchanges.failures.handled.count counter routeId, camelContext
camel.route.policy.count count routeId, camelContext
camel.route.policy.total_time total routeId, camelContext
camel.route.policy.max gauge routeId, camelContext
camel.routes.running.value gauge

Mean processing time = camel.route.policy.total_time / camel.route.policy.count. Min processing time is not available (Micrometer does not track minimums).

Cameleer agent metrics:

Metric name Type Tags
cameleer.chunks.exported.count counter instanceId
cameleer.chunks.dropped.count counter instanceId, reason
cameleer.sse.reconnects.count counter instanceId
cameleer.taps.evaluated.count counter instanceId
cameleer.metrics.exported.count counter instanceId

Disabled Skills

  • Do NOT use any gsd:* skills in this project. This includes all /gsd: prefixed commands.

GitNexus — Code Intelligence

This project is indexed by GitNexus as cameleer3-server (6155 symbols, 15501 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.

If any GitNexus tool warns the index is stale, run npx gitnexus analyze in terminal first.

Always Do

  • MUST run impact analysis before editing any symbol. Before modifying a function, class, or method, run gitnexus_impact({target: "symbolName", direction: "upstream"}) and report the blast radius (direct callers, affected processes, risk level) to the user.
  • MUST run gitnexus_detect_changes() before committing to verify your changes only affect expected symbols and execution flows.
  • MUST warn the user if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
  • When exploring unfamiliar code, use gitnexus_query({query: "concept"}) to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
  • When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use gitnexus_context({name: "symbolName"}).

When Debugging

  1. gitnexus_query({query: "<error or symptom>"}) — find execution flows related to the issue
  2. gitnexus_context({name: "<suspect function>"}) — see all callers, callees, and process participation
  3. READ gitnexus://repo/cameleer3-server/process/{processName} — trace the full execution flow step by step
  4. For regressions: gitnexus_detect_changes({scope: "compare", base_ref: "main"}) — see what your branch changed

When Refactoring

  • Renaming: MUST use gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true}) first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with dry_run: false.
  • Extracting/Splitting: MUST run gitnexus_context({name: "target"}) to see all incoming/outgoing refs, then gitnexus_impact({target: "target", direction: "upstream"}) to find all external callers before moving code.
  • After any refactor: run gitnexus_detect_changes({scope: "all"}) to verify only expected files changed.

Never Do

  • NEVER edit a function, class, or method without first running gitnexus_impact on it.
  • NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
  • NEVER rename symbols with find-and-replace — use gitnexus_rename which understands the call graph.
  • NEVER commit changes without running gitnexus_detect_changes() to check affected scope.

Tools Quick Reference

Tool When to use Command
query Find code by concept gitnexus_query({query: "auth validation"})
context 360-degree view of one symbol gitnexus_context({name: "validateUser"})
impact Blast radius before editing gitnexus_impact({target: "X", direction: "upstream"})
detect_changes Pre-commit scope check gitnexus_detect_changes({scope: "staged"})
rename Safe multi-file rename gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})
cypher Custom graph queries gitnexus_cypher({query: "MATCH ..."})

Impact Risk Levels

Depth Meaning Action
d=1 WILL BREAK — direct callers/importers MUST update these
d=2 LIKELY AFFECTED — indirect deps Should test
d=3 MAY NEED TESTING — transitive Test if critical path

Resources

Resource Use for
gitnexus://repo/cameleer3-server/context Codebase overview, check index freshness
gitnexus://repo/cameleer3-server/clusters All functional areas
gitnexus://repo/cameleer3-server/processes All execution flows
gitnexus://repo/cameleer3-server/process/{name} Step-by-step execution trace

Self-Check Before Finishing

Before completing any code modification task, verify:

  1. gitnexus_impact was run for all modified symbols
  2. No HIGH/CRITICAL risk warnings were ignored
  3. gitnexus_detect_changes() confirms changes match expected scope
  4. All d=1 (WILL BREAK) dependents were updated

Keeping the Index Fresh

After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:

npx gitnexus analyze

If the index previously included embeddings, preserve them by adding --embeddings:

npx gitnexus analyze --embeddings

To check whether embeddings exist, inspect .gitnexus/meta.json — the stats.embeddings field shows the count (0 means no embeddings). Running analyze without --embeddings will delete any previously generated embeddings.

Claude Code users: A PostToolUse hook handles this automatically after git commit and git merge.

CLI

Task Read this skill file
Understand architecture / "How does X work?" .claude/skills/gitnexus/gitnexus-exploring/SKILL.md
Blast radius / "What breaks if I change X?" .claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md
Trace bugs / "Why is X failing?" .claude/skills/gitnexus/gitnexus-debugging/SKILL.md
Rename / extract / split / refactor .claude/skills/gitnexus/gitnexus-refactoring/SKILL.md
Tools, resources, schema reference .claude/skills/gitnexus/gitnexus-guide/SKILL.md
Index, status, clean, wiki CLI commands .claude/skills/gitnexus/gitnexus-cli/SKILL.md