cameleer/cameleer-server

Fork 0

Files

hsiegeln 891abbfcfd

CI / cleanup-branch (push) Has been skipped

Details

CI / build (push) Successful in 1m26s

Details

CI / docker (push) Successful in 1m8s

Details

CI / deploy-feature (push) Has been skipped

Details

CI / deploy (push) Successful in 41s

Details

docs: add sensitive keys feature documentation

- CLAUDE.md: add SensitiveKeysConfig, SensitiveKeysRepository, SensitiveKeysMerger
  to core admin classes; add SensitiveKeysAdminController endpoint; add
  PostgresSensitiveKeysRepository; add sensitive keys convention; add admin page
  to UI structure
- Design spec and implementation plan for the feature

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-14 18:29:15 +02:00

39 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project

Cameleer3 Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer3 agents. Pushes config and commands to agents via SSE. Also orchestrates Docker container deployments when running under cameleer-saas.

cameleer3 (https://gitea.siegeln.net/cameleer/cameleer3) — the Java agent that instruments Camel applications
Protocol defined in cameleer3-common/PROTOCOL.md in the agent repo
This server depends on com.cameleer3:cameleer3-common (shared models and graph API)

Modules

cameleer3-server-core — domain logic, storage interfaces, services (no Spring dependencies)
cameleer3-server-app — Spring Boot web app, REST controllers, SSE, persistence, Docker orchestration

Build Commands

mvn clean compile          # Compile all modules
mvn clean verify           # Full build with tests

Run

java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar

Key Classes by Package

Core Module (`cameleer3-server-core/src/main/java/com/cameleer3/server/core/`)

agent/ — Agent lifecycle and commands

AgentRegistryService — in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycle
AgentInfo — record: id, name, application, environmentId, version, routeIds, capabilities, state
AgentCommand — record: id, type, targetAgent, payload, createdAt, expiresAt
AgentEventService — records agent state changes, heartbeats

runtime/ — App/Environment/Deployment domain

App — record: id, environmentId, slug, displayName, containerConfig (JSONB)
AppVersion — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
Environment — record: id, slug, jarRetentionCount
Deployment — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
DeploymentStatus — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
DeployStage — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
DeploymentService — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
RuntimeType — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
RuntimeDetector — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
ContainerRequest — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
ResolvedContainerConfig — record: typed config with memoryLimitMb, cpuShares, cpuLimit, appPort, replicas, routingMode, routeControlEnabled, replayEnabled, runtimeType, customArgs, etc.
ConfigMerger — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
RuntimeOrchestrator — interface: startContainer, stopContainer, getContainerStatus, getLogs

search/ — Execution search

SearchService — search, topErrors, punchcard, distinctAttributeKeys
SearchRequest / SearchResult — search DTOs

storage/ — Storage abstractions

ExecutionStore, MetricsStore, DiagramStore, SearchIndex, LogIndex — interfaces

rbac/ — Role-based access control

RbacService — getDirectRolesForUser, syncOidcRoles, assignRole
SystemRole — enum: AGENT, VIEWER, OPERATOR, ADMIN; normalizeScope() maps scopes
UserDetail, RoleDetail, GroupDetail — records

admin/ — Server-wide admin config

SensitiveKeysConfig — record: keys (List, immutable)
SensitiveKeysRepository — interface: find(), save()
SensitiveKeysMerger — pure function: merge(global, perApp) -> union with case-insensitive dedup, preserves first-seen casing. Returns null when both inputs null.

security/ — Auth

JwtService — interface: createAccessToken, validateAccessToken
Ed25519SigningService — interface: sign, verify (config signing)
OidcConfig — record: issuerUri, clientId, audience, rolesClaim, additionalScopes

ingestion/ — Buffered data pipeline

IngestionService — ingestExecution, ingestMetric, ingestLog, ingestDiagram
ChunkAccumulator — batches data for efficient flush

App Module (`cameleer3-server-app/src/main/java/com/cameleer3/server/app/`)

controller/ — REST endpoints

AgentRegistrationController — POST /register, POST /heartbeat, GET / (list), POST /refresh-token
AgentSseController — GET /sse (Server-Sent Events connection)
AgentCommandController — POST /broadcast, POST /{agentId}, POST /{agentId}/ack
AppController — CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versions
DeploymentController — GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logs
EnvironmentAdminController — CRUD /api/v1/admin/environments, PUT /{id}/jar-retention
ExecutionController — GET /api/v1/executions (search + detail)
SearchController — POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcard
LogQueryController — GET /api/v1/logs (filters: source, application, agentId, exchangeId, level, logger, q, environment, time range)
LogIngestionController — POST /api/v1/data/logs (accepts List<LogEntry> JSON array, each entry has source: app/agent). Logs WARN for: missing agent identity, unregistered agents, empty payloads, buffer-full drops, deserialization failures. Normal acceptance at DEBUG.
CatalogController — GET /api/v1/catalog (unified app catalog merging PG managed apps + in-memory agents + CH stats), DELETE /api/v1/catalog/{applicationId} (ADMIN: dismiss app, purge all CH data + PG record). Auto-filters discovered apps older than discoveryttldays with no live agents.
ChunkIngestionController — POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}
UserAdminController — CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-password
RoleAdminController — CRUD /api/v1/admin/roles
GroupAdminController — CRUD /api/v1/admin/groups
OidcConfigAdminController — GET/POST /api/v1/admin/oidc, POST /test
SensitiveKeysAdminController — GET/PUT /api/v1/admin/sensitive-keys. GET returns 200 with config or 204 if not configured. PUT accepts { keys: [...] } with optional ?pushToAgents=true to fan out merged keys to all LIVE agents. Stored in server_config table (key sensitive_keys).
AuditLogController — GET /api/v1/admin/audit
MetricsController — GET /api/v1/metrics, GET /timeseries
DiagramController — GET /api/v1/diagrams/{id}, POST /
DiagramRenderController — POST /api/v1/diagrams/render (ELK layout)
LicenseAdminController — GET/POST /api/v1/admin/license

runtime/ — Docker orchestration

DockerRuntimeOrchestrator — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
DeploymentExecutor — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Primary network for app containers is set via CAMELEER_SERVER_RUNTIME_DOCKERNETWORK env var (in SaaS mode: cameleer-tenant-{slug}); apps also connect to cameleer-traefik (routing) and cameleer-env-{tenantId}-{envSlug} (per-environment discovery) as additional networks. Resolves runtimeType: auto to concrete type from AppVersion.detectedRuntimeType at PRE_FLIGHT (fails deployment if unresolvable). Builds framework-specific Docker entrypoint per runtime type (Spring Boot PropertiesLauncher, Quarkus -jar, plain Java classpath, native binary). Sets CAMELEER_AGENT_* env vars from ResolvedContainerConfig (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
DockerNetworkManager — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
DockerEventMonitor — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
TraefikLabelBuilder — generates Traefik Docker labels for path-based or subdomain routing
PrometheusLabelBuilder — generates Prometheus Docker labels (prometheus.scrape/path/port) per runtime type for docker_sd_configs auto-discovery
DisabledRuntimeOrchestrator — no-op when runtime not enabled

metrics/ — Prometheus observability

ServerMetrics — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via /api/v1/prometheus.

storage/ — PostgreSQL repositories (JdbcTemplate)

PostgresAppRepository, PostgresAppVersionRepository, PostgresEnvironmentRepository
PostgresDeploymentRepository — includes JSONB replica_states, deploy_stage, findByContainerId
PostgresUserRepository, PostgresRoleRepository, PostgresGroupRepository
PostgresAuditRepository, PostgresOidcConfigRepository, PostgresClaimMappingRepository, PostgresSensitiveKeysRepository

storage/ — ClickHouse stores

ClickHouseExecutionStore, ClickHouseMetricsStore, ClickHouseLogStore
ClickHouseStatsStore — pre-aggregated stats, punchcard
ClickHouseDiagramStore, ClickHouseAgentEventRepository
ClickHouseSearchIndex — full-text search
ClickHouseUsageTracker — usage_events for billing

security/ — Spring Security

SecurityConfig — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional
JwtAuthenticationFilter — OncePerRequestFilter, validates Bearer tokens
JwtServiceImpl — HMAC-SHA256 JWT (Nimbus JOSE)
OidcAuthController — /api/v1/auth/oidc (login-uri, token-exchange, logout)
OidcTokenExchanger — code -> tokens, role extraction from access_token then id_token
OidcProviderHelper — OIDC discovery, JWK source cache

agent/ — Agent lifecycle

SseConnectionManager — manages per-agent SSE connections, delivers commands
AgentLifecycleMonitor — @Scheduled 10s, LIVE->STALE->DEAD transitions

retention/ — JAR cleanup

JarRetentionJob — @Scheduled 03:00 daily, per-environment retention, skips deployed versions

config/ — Spring beans

RuntimeOrchestratorAutoConfig — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
RuntimeBeanConfig — DeploymentExecutor, AppService, EnvironmentService
SecurityBeanConfig — JwtService, Ed25519, BootstrapTokenValidator
StorageBeanConfig — all repositories
ClickHouseConfig — ClickHouse JdbcTemplate, schema initializer

Key Conventions

Java 17+ required
Spring Boot 3.4.3 parent POM
Depends on com.cameleer3:cameleer3-common from Gitea Maven registry
Jackson JavaTimeModule for Instant deserialization
Communication: receives HTTP POST data from agents (executions, diagrams, metrics, logs), serves SSE event streams for config push/commands (config-update, deep-trace, replay, route-control)
Environment filtering: all data queries (exchanges, dashboard stats, route metrics, agent events, correlation) filter by the selected environment. All commands (config-update, route-control, set-traced-processors, replay) target only agents in the selected environment when one is selected. AgentRegistryService.findByApplicationAndEnvironment() for environment-scoped command dispatch. Backend endpoints accept optional environment query parameter; null = all environments (backward compatible).
Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT env claim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeat environmentId > JWT env claim > "default"). Capabilities and route states updated on every heartbeat (protocol v2). Route catalog falls back to ClickHouse stats for route discovery when registry has incomplete data.
Multi-tenancy: each server instance serves one tenant (configured via CAMELEER_SERVER_TENANT_ID, default: "default"). Environments (dev/staging/prod) are first-class — agents send environmentId at registration and in heartbeats. JWT carries env claim for environment persistence across token refresh. PostgreSQL isolated via schema-per-tenant (?currentSchema=tenant_{id}). ClickHouse shared DB with tenant_id + environment columns, partitioned by (tenant_id, toYYYYMM(timestamp)).
Storage: PostgreSQL for RBAC, config, and audit; ClickHouse for all observability data (executions, search, logs, metrics, stats, diagrams). ClickHouse schema migrations in clickhouse/*.sql, run idempotently on startup by ClickHouseSchemaInitializer. Use IF NOT EXISTS for CREATE and ADD PROJECTION.
Logging: ClickHouse JDBC set to INFO (com.clickhouse), HTTP client to WARN (org.apache.hc.client5) in application.yml
Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing (key derived deterministically from JWT secret via HMAC-SHA256), bootstrap token for registration. CORS: CAMELEER_SERVER_SECURITY_CORSALLOWEDORIGINS (comma-separated) overrides CAMELEER_SERVER_SECURITY_UIORIGIN for multi-origin setups (e.g., reverse proxy). Infrastructure access: CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false disables Database and ClickHouse admin endpoints (set by SaaS provisioner on tenant servers). Health endpoint exposes the flag for UI tab visibility. UI role gating: Admin sidebar/routes hidden for non-ADMIN; diagram toolbar and route control hidden for VIEWER. Read-only for VIEWER, editable for OPERATOR+. Role helpers: useIsAdmin(), useCanControl() in auth-store.ts. Route guard: RequireAdmin in auth/RequireAdmin.tsx. Last-ADMIN guard: system prevents removal of the last ADMIN role (409 Conflict on role removal, user deletion, group role removal). Password policy: min 12 chars, 3-of-4 character classes, no username match (enforced on user creation and admin password reset). Brute-force protection: 5 failed attempts -> 15 min lockout (tracked via failed_login_attempts / locked_until on users table). Token revocation: token_revoked_before column on users, checked in JwtAuthenticationFilter, set on password change.
OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API/UI, stored in database (server_config table). Configurable userIdClaim (default sub) determines which id_token claim is used as the user identifier. Resource server mode: accepts external access tokens (Logto M2M) via JWKS validation when CAMELEER_SERVER_SECURITY_OIDCISSUERURI is set. CAMELEER_SERVER_SECURITY_OIDCJWKSETURI overrides JWKS discovery for container networking. CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY=true disables TLS cert verification for OIDC calls (self-signed CAs). Scope-based role mapping via SystemRole.normalizeScope() (case-insensitive, strips server: prefix): admin/server:admin -> ADMIN, operator/server:operator -> OPERATOR, viewer/server:viewer -> VIEWER. SSO: when OIDC enabled, UI auto-redirects to provider with prompt=none for silent sign-in; falls back to /login?local on login_required, retries without prompt=none on consent_required. Logout always redirects to /login?local (via OIDC end_session or direct fallback) to prevent SSO re-login loops. Auto-signup provisions new OIDC users with default roles. System roles synced on every OIDC login via syncOidcRoles — always overwrites directly-assigned roles (falls back to defaultRoles when OIDC returns none); uses getDirectRolesForUser to avoid touching group-inherited roles. Group memberships are never touched. Supports ES384, ES256, RS256. Shared OIDC logic in OidcProviderHelper (discovery, JWK source, algorithm set).
OIDC role extraction: OidcTokenExchanger reads roles from the access_token first (JWT with at+jwt type, decoded by a separate processor), then falls back to id_token. OidcConfig includes audience (RFC 8707 resource indicator — included in both authorization request and token exchange POST body to trigger JWT access tokens) and additionalScopes (extra scopes for the SPA to request). The rolesClaim config points to the claim name in the token (e.g., "roles" for Custom JWT claims, "realm_access.roles" for Keycloak). All provider-specific configuration is external — no provider-specific code in the server.
Sensitive keys: Global enforced baseline for masking sensitive data in agent payloads. Admin configures via PUT /api/v1/admin/sensitive-keys (stored in server_config table, key sensitive_keys). Per-app additions stored in ApplicationConfig.sensitiveKeys. Merge rule: final = global UNION per-app (case-insensitive dedup, per-app can only add, never remove global keys). When no config exists, agents use built-in defaults. ApplicationConfigController.getConfig() returns AppConfigResponse wrapping config with globalSensitiveKeys and mergedSensitiveKeys for UI rendering. Config-update SSE payloads carry the merged list. SaaS propagation: platform calls the same admin API on each tenant server (no special protocol).
User persistence: PostgreSQL users table, admin CRUD at /api/v1/admin/users
Usage analytics: ClickHouse usage_events table tracks authenticated UI requests, flushed every 5s

Database Migrations

PostgreSQL (Flyway): cameleer3-server-app/src/main/resources/db/migration/

V1 — RBAC (users, roles, groups, audit_log)
V2 — Claim mappings (OIDC)
V3 — Runtime management (apps, environments, deployments, app_versions)
V4 — Environment config (default_container_config JSONB)
V5 — App container config (container_config JSONB on apps)
V6 — JAR retention policy (jar_retention_count on environments)
V7 — Deployment orchestration (target_state, deployment_strategy, replica_states JSONB, deploy_stage)
V8 — Deployment active config (resolved_config JSONB on deployments)
V9 — Password hardening (failed_login_attempts, locked_until, token_revoked_before on users)
V10 — Runtime type detection (detected_runtime_type, detected_main_class on app_versions)

ClickHouse: cameleer3-server-app/src/main/resources/clickhouse/init.sql (run idempotently on startup)

CI/CD & Deployment

CI workflow: .gitea/workflows/ci.yml — build -> docker -> deploy on push to main or feature branches
Build step skips integration tests (-DskipITs) — Testcontainers needs Docker daemon
Docker: multi-stage build (Dockerfile), $BUILDPLATFORM for native Maven on ARM64 runner, amd64 runtime. docker-entrypoint.sh imports /certs/ca.pem into JVM truststore before starting the app (supports custom CAs for OIDC discovery without CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY).
REGISTRY_TOKEN build arg required for cameleer3-common dependency resolution
Registry: gitea.siegeln.net/cameleer/cameleer3-server (container images)
K8s manifests in deploy/ — Kustomize base + overlays (main/feature), shared infra (PostgreSQL, ClickHouse, Logto) as top-level manifests
Deployment target: k3s at 192.168.50.86, namespace cameleer (main), cam-<slug> (feature branches)
Feature branches: isolated namespace, PG schema; Traefik Ingress at <slug>-api.cameleer.siegeln.net
Secrets managed in CI deploy step (idempotent --dry-run=client | kubectl apply): cameleer-auth, cameleer-postgres-credentials, cameleer-clickhouse-credentials
K8s probes: server uses /api/v1/health, PostgreSQL uses pg_isready -U "$POSTGRES_USER" (env var, not hardcoded)
K8s security: server and database pods run with securityContext.runAsNonRoot. UI (nginx) runs without securityContext (needs root for entrypoint setup).
Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
Docker build uses buildx registry cache + --provenance=false for Gitea compatibility
CI: branch slug sanitization extracted to .gitea/sanitize-branch.sh, sourced by docker and deploy-feature jobs

UI Structure

The UI has 4 main tabs: Exchanges, Dashboard, Runtime, Deployments.

Exchanges — route execution search and detail (ui/src/pages/Exchanges/)
Dashboard — metrics and stats with L1/L2/L3 drill-down (ui/src/pages/DashboardTab/)
Runtime — live agent status, logs, commands (ui/src/pages/RuntimeTab/)
Deployments — app management, JAR upload, deployment lifecycle (ui/src/pages/AppsTab/)
- Config sub-tabs: Variables | Monitoring | Traces & Taps | Route Recording | Resources
- Create app: full page at /apps/new (not a modal)
- Deployment progress: ui/src/components/DeploymentProgress.tsx (7-stage step indicator)

Admin pages (ADMIN-only, under /admin/):

Sensitive Keys (ui/src/pages/Admin/SensitiveKeysPage.tsx) — global sensitive key masking config with tag/pill editor, push-to-agents toggle. Per-app additions shown in AppConfigDetailPage.tsx with read-only global pills (greyed Badge) + editable per-app pills (Tag with remove).

Key UI Files

ui/src/router.tsx — React Router v6 routes
ui/src/config.ts — apiBaseUrl, basePath
ui/src/auth/auth-store.ts — Zustand: accessToken, user, roles, login/logout
ui/src/api/environment-store.ts — Zustand: selected environment (localStorage)
ui/src/components/ContentTabs.tsx — main tab switcher
ui/src/components/ExecutionDiagram/ — interactive trace view (canvas)
ui/src/components/ProcessDiagram/ — ELK-rendered route diagram
ui/src/hooks/useScope.ts — TabKey type, scope inference

UI Styling

Always use @cameleer/design-system CSS variables for colors (var(--amber), var(--error), var(--success), etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG fill/stroke attributes. SVG presentation attributes resolve var() correctly. All colors use CSS variables (no hardcoded hex).
Shared CSS modules in ui/src/styles/ (table-section, log-panel, rate-colors, refresh-indicator, chart-card, section-card) — import these instead of duplicating patterns.
Shared PageLoader component replaces copy-pasted spinner patterns.
Design system components used consistently: Select, Tabs, Toggle, Button, LogViewer, Label — prefer DS components over raw HTML elements.
Environment slugs are auto-computed from display name (read-only in UI).
Brand assets: @cameleer/design-system/assets/ provides camel-logo.svg (currentColor), cameleer3-{16,32,48,192,512}.png, and cameleer3-logo.png. Copied to ui/public/ for use as favicon (favicon-16.png, favicon-32.png) and logo (camel-logo.svg — login dialog 36px, sidebar 28x24px).
Sidebar generates /exchanges/ paths directly (no legacy /apps/ redirects). basePath is centralized in ui/src/config.ts; router.tsx imports it instead of re-reading <base> tag.
Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g. ?text= search query). Switching environment resets all filters and remounts pages.

Docker Orchestration

When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:

ConfigMerger (core/runtime/ConfigMerger.java) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes runtimeType (default "auto") and customArgs (default "").
TraefikLabelBuilder (app/runtime/TraefikLabelBuilder.java) — generates Traefik Docker labels for path-based (/{envSlug}/{appSlug}/) or subdomain-based ({appSlug}-{envSlug}.{domain}) routing. Supports strip-prefix and SSL offloading toggles.
PrometheusLabelBuilder (app/runtime/PrometheusLabelBuilder.java) — generates Prometheus docker_sd_configs labels per resolved runtime type: Spring Boot /actuator/prometheus:8081, Quarkus/native /q/metrics:9000, plain Java /metrics:9464. Labels merged into container metadata alongside Traefik labels at deploy time.
DockerNetworkManager (app/runtime/DockerNetworkManager.java) — manages two Docker network tiers:
- cameleer-traefik — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with cameleer3-server DNS alias.
- cameleer-env-{slug} — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: cameleer-env-{tenantId}-{envSlug} (overloaded envNetworkName(tenantId, envSlug) method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
DockerEventMonitor (app/runtime/DockerEventMonitor.java) — persistent Docker event stream listener for containers with managed-by=cameleer3-server label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
DeploymentProgress (ui/src/components/DeploymentProgress.tsx) — UI step indicator showing 7 deploy stages with amber active/green completed styling.

Deployment Status Model

Deployments move through these statuses:

Status	Meaning
`STOPPED`	Intentionally stopped or initial state
`STARTING`	Deploy in progress
`RUNNING`	All replicas healthy and serving
`DEGRADED`	Some replicas healthy, some dead
`STOPPING`	Graceful shutdown in progress
`FAILED`	Terminal failure (pre-flight, health check, or crash)

Replica support: deployments can specify a replica count. DEGRADED is used when at least one but not all replicas are healthy.

Deploy stages (DeployStage): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).

Blue/green strategy: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.

Deployment uniqueness: DeploymentService.createDeployment() deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.

JAR Management

Retention policy per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
Nightly cleanup job (JarRetentionJob, Spring @Scheduled 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
Volume-based JAR mounting for Docker-in-Docker setups: set CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).

Runtime Type Detection

The server detects the app framework from uploaded JARs and builds framework-specific Docker entrypoints:

Detection (RuntimeDetector): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes META-INF/MANIFEST.MF Main-Class: Spring Boot loader prefix → spring-boot, Quarkus entry point → quarkus, other Main-Class → plain-java (extracts class name). Results stored on AppVersion (detected_runtime_type, detected_main_class).
Runtime types (RuntimeType enum): AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE. Configurable per app/environment via containerConfig.runtimeType (default "auto").
Entrypoint per type: Spring Boot uses PropertiesLauncher with -Dloader.path for log appender; Quarkus uses -jar (appender compiled in); plain Java uses classpath with appender JAR; native runs binary directly (agent compiled in). All JVM types get -javaagent:/app/agent.jar.
Custom arguments (containerConfig.customArgs): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses sh -c).
AUTO resolution: at deploy time (PRE_FLIGHT), "auto" resolves to the detected type from AppVersion. Fails deployment if detection was unsuccessful — user must set type explicitly.
UI: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.

SaaS Multi-Tenant Network Isolation

In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:

Tenant network (cameleer-tenant-{slug}) — primary internal bridge for all of a tenant's containers. Set as CAMELEER_SERVER_RUNTIME_DOCKERNETWORK for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
Shared services network — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and cameleer-traefik for HTTP routing.
Tenant-scoped environment networks (cameleer-env-{tenantId}-{envSlug}) — per-environment discovery is scoped per tenant, so alpha-corp's "dev" environment network is separate from beta-corp's "dev" environment network.

nginx / Reverse Proxy

client_max_body_size 200m is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.

Prometheus Metrics

Server exposes /api/v1/prometheus (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and http.server.requests metrics automatically. Business metrics via ServerMetrics component:

Gauges (auto-polled):

Metric	Tags	Source
`cameleer.agents.connected`	`state` (live, stale, dead, shutdown)	`AgentRegistryService.findByState()`
`cameleer.agents.sse.active`	—	`SseConnectionManager.getConnectionCount()`
`cameleer.ingestion.buffer.size`	`type` (execution, processor, log, metrics)	`WriteBuffer.size()`
`cameleer.ingestion.accumulator.pending`	—	`ChunkAccumulator.getPendingCount()`

Counters:

Metric	Tags	Instrumented in
`cameleer.ingestion.drops`	`reason` (buffer_full, no_agent, no_identity)	`LogIngestionController`
`cameleer.agents.transitions`	`transition` (went_stale, went_dead, recovered)	`AgentLifecycleMonitor`
`cameleer.deployments.outcome`	`status` (running, failed, degraded)	`DeploymentExecutor`
`cameleer.auth.failures`	`reason` (invalid_token, revoked, oidc_rejected)	`JwtAuthenticationFilter`

Timers:

Metric	Tags	Instrumented in
`cameleer.ingestion.flush.duration`	`type` (execution, processor, log)	`ExecutionFlushScheduler`
`cameleer.deployments.duration`	—	`DeploymentExecutor`

Agent container Prometheus labels (set by PrometheusLabelBuilder at deploy time):

Runtime Type	`prometheus.path`	`prometheus.port`
`spring-boot`	`/actuator/prometheus`	`8081`
`quarkus` / `native`	`/q/metrics`	`9000`
`plain-java`	`/metrics`	`9464`

All containers also get prometheus.scrape=true. These labels enable Prometheus docker_sd_configs auto-discovery.

Agent Metric Names (Micrometer)

Agents send MetricsSnapshot records with Micrometer-convention metric names. The server stores them generically (ClickHouse agent_metrics.metric_name). The UI references specific names in AgentInstance.tsx for JVM charts.

JVM metrics (used by UI):

Metric name	UI usage
`process.cpu.usage.value`	CPU % stat card + chart
`jvm.memory.used.value`	Heap MB stat card + chart (tags: `area=heap`)
`jvm.memory.max.value`	Heap max for % calculation (tags: `area=heap`)
`jvm.threads.live.value`	Thread count chart
`jvm.gc.pause.total_time`	GC time chart

Camel route metrics (stored, queried by dashboard):

Metric name	Type	Tags
`camel.exchanges.succeeded.count`	counter	`routeId`, `camelContext`
`camel.exchanges.failed.count`	counter	`routeId`, `camelContext`
`camel.exchanges.total.count`	counter	`routeId`, `camelContext`
`camel.exchanges.failures.handled.count`	counter	`routeId`, `camelContext`
`camel.route.policy.count`	count	`routeId`, `camelContext`
`camel.route.policy.total_time`	total	`routeId`, `camelContext`
`camel.route.policy.max`	gauge	`routeId`, `camelContext`
`camel.routes.running.value`	gauge	—

Mean processing time = camel.route.policy.total_time / camel.route.policy.count. Min processing time is not available (Micrometer does not track minimums).

Cameleer agent metrics:

Metric name	Type	Tags
`cameleer.chunks.exported.count`	counter	`instanceId`
`cameleer.chunks.dropped.count`	counter	`instanceId`, `reason`
`cameleer.sse.reconnects.count`	counter	`instanceId`
`cameleer.taps.evaluated.count`	counter	`instanceId`
`cameleer.metrics.exported.count`	counter	`instanceId`

Disabled Skills

Do NOT use any gsd:* skills in this project. This includes all /gsd: prefixed commands.

GitNexus — Code Intelligence

This project is indexed by GitNexus as cameleer3-server (6155 symbols, 15501 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.

If any GitNexus tool warns the index is stale, run npx gitnexus analyze in terminal first.

Always Do

MUST run impact analysis before editing any symbol. Before modifying a function, class, or method, run gitnexus_impact({target: "symbolName", direction: "upstream"}) and report the blast radius (direct callers, affected processes, risk level) to the user.
MUST run gitnexus_detect_changes() before committing to verify your changes only affect expected symbols and execution flows.
MUST warn the user if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
When exploring unfamiliar code, use gitnexus_query({query: "concept"}) to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use gitnexus_context({name: "symbolName"}).

When Debugging

gitnexus_query({query: "<error or symptom>"}) — find execution flows related to the issue
gitnexus_context({name: "<suspect function>"}) — see all callers, callees, and process participation
READ gitnexus://repo/cameleer3-server/process/{processName} — trace the full execution flow step by step
For regressions: gitnexus_detect_changes({scope: "compare", base_ref: "main"}) — see what your branch changed

When Refactoring

Renaming: MUST use gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true}) first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with dry_run: false.
Extracting/Splitting: MUST run gitnexus_context({name: "target"}) to see all incoming/outgoing refs, then gitnexus_impact({target: "target", direction: "upstream"}) to find all external callers before moving code.
After any refactor: run gitnexus_detect_changes({scope: "all"}) to verify only expected files changed.

Never Do

NEVER edit a function, class, or method without first running gitnexus_impact on it.
NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
NEVER rename symbols with find-and-replace — use gitnexus_rename which understands the call graph.
NEVER commit changes without running gitnexus_detect_changes() to check affected scope.

Tools Quick Reference

Tool	When to use	Command
`query`	Find code by concept	`gitnexus_query({query: "auth validation"})`
`context`	360-degree view of one symbol	`gitnexus_context({name: "validateUser"})`
`impact`	Blast radius before editing	`gitnexus_impact({target: "X", direction: "upstream"})`
`detect_changes`	Pre-commit scope check	`gitnexus_detect_changes({scope: "staged"})`
`rename`	Safe multi-file rename	`gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})`
`cypher`	Custom graph queries	`gitnexus_cypher({query: "MATCH ..."})`

Impact Risk Levels

Depth	Meaning	Action
d=1	WILL BREAK — direct callers/importers	MUST update these
d=2	LIKELY AFFECTED — indirect deps	Should test
d=3	MAY NEED TESTING — transitive	Test if critical path

Resources

Resource	Use for
`gitnexus://repo/cameleer3-server/context`	Codebase overview, check index freshness
`gitnexus://repo/cameleer3-server/clusters`	All functional areas
`gitnexus://repo/cameleer3-server/processes`	All execution flows
`gitnexus://repo/cameleer3-server/process/{name}`	Step-by-step execution trace

Self-Check Before Finishing

Before completing any code modification task, verify:

gitnexus_impact was run for all modified symbols
No HIGH/CRITICAL risk warnings were ignored
gitnexus_detect_changes() confirms changes match expected scope
All d=1 (WILL BREAK) dependents were updated

Keeping the Index Fresh

After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:

npx gitnexus analyze

If the index previously included embeddings, preserve them by adding --embeddings:

npx gitnexus analyze --embeddings

To check whether embeddings exist, inspect .gitnexus/meta.json — the stats.embeddings field shows the count (0 means no embeddings). Running analyze without --embeddings will delete any previously generated embeddings.

Claude Code users: A PostToolUse hook handles this automatically after git commit and git merge.

CLI

Task	Read this skill file
Understand architecture / "How does X work?"	`.claude/skills/gitnexus/gitnexus-exploring/SKILL.md`
Blast radius / "What breaks if I change X?"	`.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md`
Trace bugs / "Why is X failing?"	`.claude/skills/gitnexus/gitnexus-debugging/SKILL.md`
Rename / extract / split / refactor	`.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md`
Tools, resources, schema reference	`.claude/skills/gitnexus/gitnexus-guide/SKILL.md`
Index, status, clean, wiki CLI commands	`.claude/skills/gitnexus/gitnexus-cli/SKILL.md`

39 KiB Raw Blame History

CLAUDE.md

Project

Related Project

Modules

Build Commands

Run

Key Classes by Package

Core Module (cameleer3-server-core/src/main/java/com/cameleer3/server/core/)

App Module (cameleer3-server-app/src/main/java/com/cameleer3/server/app/)

Key Conventions

Database Migrations

CI/CD & Deployment

UI Structure

Key UI Files

UI Styling

Docker Orchestration

Deployment Status Model

JAR Management

Runtime Type Detection

SaaS Multi-Tenant Network Isolation

nginx / Reverse Proxy

Prometheus Metrics

Agent Metric Names (Micrometer)

Disabled Skills

GitNexus — Code Intelligence

Always Do

When Debugging

When Refactoring

Never Do

Tools Quick Reference

Impact Risk Levels

Resources

Self-Check Before Finishing

Keeping the Index Fresh

CLI

39 KiB

Raw Blame History

Core Module (`cameleer3-server-core/src/main/java/com/cameleer3/server/core/`)

App Module (`cameleer3-server-app/src/main/java/com/cameleer3/server/app/`)