Rename all Docker/K8s service names, DNS hostnames, secrets, volumes, and manifest files to use the cameleer- prefix, making it clear which software package each container belongs to. Services renamed: - postgres → cameleer-postgres - clickhouse → cameleer-clickhouse - logto → cameleer-logto - logto-postgresql → cameleer-logto-postgresql - traefik (service) → cameleer-traefik - postgres-external → cameleer-postgres-external Secrets renamed: - postgres-credentials → cameleer-postgres-credentials - clickhouse-credentials → cameleer-clickhouse-credentials - logto-credentials → cameleer-logto-credentials Volumes renamed: - pgdata → cameleer-pgdata - chdata → cameleer-chdata - certs → cameleer-certs - bootstrapdata → cameleer-bootstrapdata K8s manifests renamed: - deploy/postgres.yaml → deploy/cameleer-postgres.yaml - deploy/clickhouse.yaml → deploy/cameleer-clickhouse.yaml - deploy/logto.yaml → deploy/cameleer-logto.yaml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
37 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project
Cameleer3 Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer3 agents. Pushes config and commands to agents via SSE. Also orchestrates Docker container deployments when running under cameleer-saas.
Related Project
- cameleer3 (
https://gitea.siegeln.net/cameleer/cameleer3) — the Java agent that instruments Camel applications - Protocol defined in
cameleer3-common/PROTOCOL.mdin the agent repo - This server depends on
com.cameleer3:cameleer3-common(shared models and graph API)
Modules
cameleer3-server-core— domain logic, storage interfaces, services (no Spring dependencies)cameleer3-server-app— Spring Boot web app, REST controllers, SSE, persistence, Docker orchestration
Build Commands
mvn clean compile # Compile all modules
mvn clean verify # Full build with tests
Run
java -jar cameleer3-server-app/target/cameleer3-server-app-1.0-SNAPSHOT.jar
Key Classes by Package
Core Module (cameleer3-server-core/src/main/java/com/cameleer3/server/core/)
agent/ — Agent lifecycle and commands
AgentRegistryService— in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycleAgentInfo— record: id, name, application, environmentId, version, routeIds, capabilities, stateAgentCommand— record: id, type, targetAgent, payload, createdAt, expiresAtAgentEventService— records agent state changes, heartbeats
runtime/ — App/Environment/Deployment domain
App— record: id, environmentId, slug, displayName, containerConfig (JSONB)AppVersion— record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClassEnvironment— record: id, slug, jarRetentionCountDeployment— record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerNameDeploymentStatus— enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILEDDeployStage— enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETEDeploymentService— createDeployment (deletes terminal deployments first), markRunning, markFailed, markStoppedRuntimeType— enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVERuntimeDetector— probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)ContainerRequest— record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)ResolvedContainerConfig— record: typed config with memoryLimitMb, cpuShares, cpuLimit, appPort, replicas, routingMode, routeControlEnabled, replayEnabled, runtimeType, customArgs, etc.ConfigMerger— pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfigRuntimeOrchestrator— interface: startContainer, stopContainer, getContainerStatus, getLogs
search/ — Execution search
SearchService— search, topErrors, punchcard, distinctAttributeKeysSearchRequest/SearchResult— search DTOs
storage/ — Storage abstractions
ExecutionStore,MetricsStore,DiagramStore,SearchIndex,LogIndex— interfaces
rbac/ — Role-based access control
RbacService— getDirectRolesForUser, syncOidcRoles, assignRoleSystemRole— enum: AGENT, VIEWER, OPERATOR, ADMIN;normalizeScope()maps scopesUserDetail,RoleDetail,GroupDetail— records
security/ — Auth
JwtService— interface: createAccessToken, validateAccessTokenEd25519SigningService— interface: sign, verify (config signing)OidcConfig— record: issuerUri, clientId, audience, rolesClaim, additionalScopes
ingestion/ — Buffered data pipeline
IngestionService— ingestExecution, ingestMetric, ingestLog, ingestDiagramChunkAccumulator— batches data for efficient flush
App Module (cameleer3-server-app/src/main/java/com/cameleer3/server/app/)
controller/ — REST endpoints
AgentRegistrationController— POST /register, POST /heartbeat, GET / (list), POST /refresh-tokenAgentSseController— GET /sse (Server-Sent Events connection)AgentCommandController— POST /broadcast, POST /{agentId}, POST /{agentId}/ackAppController— CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versionsDeploymentController— GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logsEnvironmentAdminController— CRUD /api/v1/admin/environments, PUT /{id}/jar-retentionExecutionController— GET /api/v1/executions (search + detail)SearchController— POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcardLogQueryController— GET /api/v1/logs (filters: source, application, agentId, exchangeId, level, logger, q, environment, time range)LogIngestionController— POST /api/v1/data/logs (acceptsList<LogEntry>JSON array, each entry hassource: app/agent). Logs WARN for: missing agent identity, unregistered agents, empty payloads, buffer-full drops, deserialization failures. Normal acceptance at DEBUG.CatalogController— GET /api/v1/catalog (unified app catalog merging PG managed apps + in-memory agents + CH stats), DELETE /api/v1/catalog/{applicationId} (ADMIN: dismiss app, purge all CH data + PG record). Auto-filters discovered apps older thandiscoveryttldayswith no live agents.ChunkIngestionController— POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}UserAdminController— CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-passwordRoleAdminController— CRUD /api/v1/admin/rolesGroupAdminController— CRUD /api/v1/admin/groupsOidcConfigAdminController— GET/POST /api/v1/admin/oidc, POST /testAuditLogController— GET /api/v1/admin/auditMetricsController— GET /api/v1/metrics, GET /timeseriesDiagramController— GET /api/v1/diagrams/{id}, POST /DiagramRenderController— POST /api/v1/diagrams/render (ELK layout)LicenseAdminController— GET/POST /api/v1/admin/license
runtime/ — Docker orchestration
DockerRuntimeOrchestrator— implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycleDeploymentExecutor— @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Primary network for app containers is set viaCAMELEER_SERVER_RUNTIME_DOCKERNETWORKenv var (in SaaS mode:cameleer-tenant-{slug}); apps also connect tocameleer-traefik(routing) andcameleer-env-{tenantId}-{envSlug}(per-environment discovery) as additional networks. ResolvesruntimeType: autoto concrete type fromAppVersion.detectedRuntimeTypeat PRE_FLIGHT (fails deployment if unresolvable). Builds framework-specific Docker entrypoint per runtime type (Spring Boot PropertiesLauncher, Quarkus-jar, plain Java classpath, native binary). SetsCAMELEER_AGENT_*env vars fromResolvedContainerConfig(routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.DockerNetworkManager— ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containersDockerEventMonitor— persistent Docker event stream listener (die, oom, start, stop), updates deployment statusTraefikLabelBuilder— generates Traefik Docker labels for path-based or subdomain routingPrometheusLabelBuilder— generates Prometheus Docker labels (prometheus.scrape/path/port) per runtime type fordocker_sd_configsauto-discoveryDisabledRuntimeOrchestrator— no-op when runtime not enabled
metrics/ — Prometheus observability
ServerMetrics— centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via/api/v1/prometheus.
storage/ — PostgreSQL repositories (JdbcTemplate)
PostgresAppRepository,PostgresAppVersionRepository,PostgresEnvironmentRepositoryPostgresDeploymentRepository— includes JSONB replica_states, deploy_stage, findByContainerIdPostgresUserRepository,PostgresRoleRepository,PostgresGroupRepositoryPostgresAuditRepository,PostgresOidcConfigRepository,PostgresClaimMappingRepository
storage/ — ClickHouse stores
ClickHouseExecutionStore,ClickHouseMetricsStore,ClickHouseLogStoreClickHouseStatsStore— pre-aggregated stats, punchcardClickHouseDiagramStore,ClickHouseAgentEventRepositoryClickHouseSearchIndex— full-text searchClickHouseUsageTracker— usage_events for billing
security/ — Spring Security
SecurityConfig— WebSecurityFilterChain, JWT filter, CORS, OIDC conditionalJwtAuthenticationFilter— OncePerRequestFilter, validates Bearer tokensJwtServiceImpl— HMAC-SHA256 JWT (Nimbus JOSE)OidcAuthController— /api/v1/auth/oidc (login-uri, token-exchange, logout)OidcTokenExchanger— code -> tokens, role extraction from access_token then id_tokenOidcProviderHelper— OIDC discovery, JWK source cache
agent/ — Agent lifecycle
SseConnectionManager— manages per-agent SSE connections, delivers commandsAgentLifecycleMonitor— @Scheduled 10s, LIVE->STALE->DEAD transitions
retention/ — JAR cleanup
JarRetentionJob— @Scheduled 03:00 daily, per-environment retention, skips deployed versions
config/ — Spring beans
RuntimeOrchestratorAutoConfig— conditional Docker/Disabled orchestrator + NetworkManager + EventMonitorRuntimeBeanConfig— DeploymentExecutor, AppService, EnvironmentServiceSecurityBeanConfig— JwtService, Ed25519, BootstrapTokenValidatorStorageBeanConfig— all repositoriesClickHouseConfig— ClickHouse JdbcTemplate, schema initializer
Key Conventions
- Java 17+ required
- Spring Boot 3.4.3 parent POM
- Depends on
com.cameleer3:cameleer3-commonfrom Gitea Maven registry - Jackson
JavaTimeModuleforInstantdeserialization - Communication: receives HTTP POST data from agents (executions, diagrams, metrics, logs), serves SSE event streams for config push/commands (config-update, deep-trace, replay, route-control)
- Environment filtering: all data queries (exchanges, dashboard stats, route metrics, agent events, correlation) filter by the selected environment. All commands (config-update, route-control, set-traced-processors, replay) target only agents in the selected environment when one is selected.
AgentRegistryService.findByApplicationAndEnvironment()for environment-scoped command dispatch. Backend endpoints accept optionalenvironmentquery parameter; null = all environments (backward compatible). - Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT
envclaim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeatenvironmentId> JWTenvclaim >"default"). Capabilities and route states updated on every heartbeat (protocol v2). Route catalog falls back to ClickHouse stats for route discovery when registry has incomplete data. - Multi-tenancy: each server instance serves one tenant (configured via
CAMELEER_SERVER_TENANT_ID, default:"default"). Environments (dev/staging/prod) are first-class — agents sendenvironmentIdat registration and in heartbeats. JWT carriesenvclaim for environment persistence across token refresh. PostgreSQL isolated via schema-per-tenant (?currentSchema=tenant_{id}). ClickHouse shared DB withtenant_id+environmentcolumns, partitioned by(tenant_id, toYYYYMM(timestamp)). - Storage: PostgreSQL for RBAC, config, and audit; ClickHouse for all observability data (executions, search, logs, metrics, stats, diagrams). ClickHouse schema migrations in
clickhouse/*.sql, run idempotently on startup byClickHouseSchemaInitializer. UseIF NOT EXISTSfor CREATE and ADD PROJECTION. - Logging: ClickHouse JDBC set to INFO (
com.clickhouse), HTTP client to WARN (org.apache.hc.client5) in application.yml - Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing (key derived deterministically from JWT secret via HMAC-SHA256), bootstrap token for registration. CORS:
CAMELEER_SERVER_SECURITY_CORSALLOWEDORIGINS(comma-separated) overridesCAMELEER_SERVER_SECURITY_UIORIGINfor multi-origin setups (e.g., reverse proxy). Infrastructure access:CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=falsedisables Database and ClickHouse admin endpoints (set by SaaS provisioner on tenant servers). Health endpoint exposes the flag for UI tab visibility. UI role gating: Admin sidebar/routes hidden for non-ADMIN; diagram toolbar and route control hidden for VIEWER. Read-only for VIEWER, editable for OPERATOR+. Role helpers:useIsAdmin(),useCanControl()inauth-store.ts. Route guard:RequireAdmininauth/RequireAdmin.tsx. Last-ADMIN guard: system prevents removal of the last ADMIN role (409 Conflict on role removal, user deletion, group role removal). Password policy: min 12 chars, 3-of-4 character classes, no username match (enforced on user creation and admin password reset). Brute-force protection: 5 failed attempts -> 15 min lockout (tracked viafailed_login_attempts/locked_untilon users table). Token revocation:token_revoked_beforecolumn on users, checked inJwtAuthenticationFilter, set on password change. - OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API/UI, stored in database (
server_configtable). ConfigurableuserIdClaim(defaultsub) determines which id_token claim is used as the user identifier. Resource server mode: accepts external access tokens (Logto M2M) via JWKS validation whenCAMELEER_SERVER_SECURITY_OIDCISSUERURIis set.CAMELEER_SERVER_SECURITY_OIDCJWKSETURIoverrides JWKS discovery for container networking.CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY=truedisables TLS cert verification for OIDC calls (self-signed CAs). Scope-based role mapping viaSystemRole.normalizeScope()(case-insensitive, stripsserver:prefix):admin/server:admin-> ADMIN,operator/server:operator-> OPERATOR,viewer/server:viewer-> VIEWER. SSO: when OIDC enabled, UI auto-redirects to provider withprompt=nonefor silent sign-in; falls back to/login?localonlogin_required, retries withoutprompt=noneonconsent_required. Logout always redirects to/login?local(via OIDC end_session or direct fallback) to prevent SSO re-login loops. Auto-signup provisions new OIDC users with default roles. System roles synced on every OIDC login viasyncOidcRoles— always overwrites directly-assigned roles (falls back todefaultRoleswhen OIDC returns none); usesgetDirectRolesForUserto avoid touching group-inherited roles. Group memberships are never touched. Supports ES384, ES256, RS256. Shared OIDC logic inOidcProviderHelper(discovery, JWK source, algorithm set). - OIDC role extraction:
OidcTokenExchangerreads roles from the access_token first (JWT withat+jwttype, decoded by a separate processor), then falls back to id_token.OidcConfigincludesaudience(RFC 8707 resource indicator — included in both authorization request and token exchange POST body to trigger JWT access tokens) andadditionalScopes(extra scopes for the SPA to request). TherolesClaimconfig points to the claim name in the token (e.g.,"roles"for Custom JWT claims,"realm_access.roles"for Keycloak). All provider-specific configuration is external — no provider-specific code in the server. - User persistence: PostgreSQL
userstable, admin CRUD at/api/v1/admin/users - Usage analytics: ClickHouse
usage_eventstable tracks authenticated UI requests, flushed every 5s
Database Migrations
PostgreSQL (Flyway): cameleer3-server-app/src/main/resources/db/migration/
- V1 — RBAC (users, roles, groups, audit_log)
- V2 — Claim mappings (OIDC)
- V3 — Runtime management (apps, environments, deployments, app_versions)
- V4 — Environment config (default_container_config JSONB)
- V5 — App container config (container_config JSONB on apps)
- V6 — JAR retention policy (jar_retention_count on environments)
- V7 — Deployment orchestration (target_state, deployment_strategy, replica_states JSONB, deploy_stage)
- V8 — Deployment active config (resolved_config JSONB on deployments)
- V9 — Password hardening (failed_login_attempts, locked_until, token_revoked_before on users)
- V10 — Runtime type detection (detected_runtime_type, detected_main_class on app_versions)
ClickHouse: cameleer3-server-app/src/main/resources/clickhouse/init.sql (run idempotently on startup)
CI/CD & Deployment
- CI workflow:
.gitea/workflows/ci.yml— build -> docker -> deploy on push to main or feature branches - Build step skips integration tests (
-DskipITs) — Testcontainers needs Docker daemon - Docker: multi-stage build (
Dockerfile),$BUILDPLATFORMfor native Maven on ARM64 runner, amd64 runtime.docker-entrypoint.shimports/certs/ca.peminto JVM truststore before starting the app (supports custom CAs for OIDC discovery withoutCAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY). REGISTRY_TOKENbuild arg required forcameleer3-commondependency resolution- Registry:
gitea.siegeln.net/cameleer/cameleer3-server(container images) - K8s manifests in
deploy/— Kustomize base + overlays (main/feature), shared infra (PostgreSQL, ClickHouse, Logto) as top-level manifests - Deployment target: k3s at 192.168.50.86, namespace
cameleer(main),cam-<slug>(feature branches) - Feature branches: isolated namespace, PG schema; Traefik Ingress at
<slug>-api.cameleer.siegeln.net - Secrets managed in CI deploy step (idempotent
--dry-run=client | kubectl apply):cameleer-auth,cameleer-postgres-credentials,cameleer-clickhouse-credentials - K8s probes: server uses
/api/v1/health, PostgreSQL usespg_isready -U "$POSTGRES_USER"(env var, not hardcoded) - K8s security: server and database pods run with
securityContext.runAsNonRoot. UI (nginx) runs without securityContext (needs root for entrypoint setup). - Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
- Docker build uses buildx registry cache +
--provenance=falsefor Gitea compatibility - CI: branch slug sanitization extracted to
.gitea/sanitize-branch.sh, sourced by docker and deploy-feature jobs
UI Structure
The UI has 4 main tabs: Exchanges, Dashboard, Runtime, Deployments.
- Exchanges — route execution search and detail (
ui/src/pages/Exchanges/) - Dashboard — metrics and stats with L1/L2/L3 drill-down (
ui/src/pages/DashboardTab/) - Runtime — live agent status, logs, commands (
ui/src/pages/RuntimeTab/) - Deployments — app management, JAR upload, deployment lifecycle (
ui/src/pages/AppsTab/)- Config sub-tabs: Variables | Monitoring | Traces & Taps | Route Recording | Resources
- Create app: full page at
/apps/new(not a modal) - Deployment progress:
ui/src/components/DeploymentProgress.tsx(7-stage step indicator)
Key UI Files
ui/src/router.tsx— React Router v6 routesui/src/config.ts— apiBaseUrl, basePathui/src/auth/auth-store.ts— Zustand: accessToken, user, roles, login/logoutui/src/api/environment-store.ts— Zustand: selected environment (localStorage)ui/src/components/ContentTabs.tsx— main tab switcherui/src/components/ExecutionDiagram/— interactive trace view (canvas)ui/src/components/ProcessDiagram/— ELK-rendered route diagramui/src/hooks/useScope.ts— TabKey type, scope inference
UI Styling
- Always use
@cameleer/design-systemCSS variables for colors (var(--amber),var(--error),var(--success), etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVGfill/strokeattributes. SVG presentation attributes resolvevar()correctly. All colors use CSS variables (no hardcoded hex). - Shared CSS modules in
ui/src/styles/(table-section, log-panel, rate-colors, refresh-indicator, chart-card, section-card) — import these instead of duplicating patterns. - Shared
PageLoadercomponent replaces copy-pasted spinner patterns. - Design system components used consistently:
Select,Tabs,Toggle,Button,LogViewer,Label— prefer DS components over raw HTML elements. - Environment slugs are auto-computed from display name (read-only in UI).
- Brand assets:
@cameleer/design-system/assets/providescamel-logo.svg(currentColor),cameleer3-{16,32,48,192,512}.png, andcameleer3-logo.png. Copied toui/public/for use as favicon (favicon-16.png,favicon-32.png) and logo (camel-logo.svg— login dialog 36px, sidebar 28x24px). - Sidebar generates
/exchanges/paths directly (no legacy/apps/redirects). basePath is centralized inui/src/config.ts; router.tsx imports it instead of re-reading<base>tag. - Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g.
?text=search query). Switching environment resets all filters and remounts pages.
Docker Orchestration
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
- ConfigMerger (
core/runtime/ConfigMerger.java) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). IncludesruntimeType(default"auto") andcustomArgs(default""). - TraefikLabelBuilder (
app/runtime/TraefikLabelBuilder.java) — generates Traefik Docker labels for path-based (/{envSlug}/{appSlug}/) or subdomain-based ({appSlug}-{envSlug}.{domain}) routing. Supports strip-prefix and SSL offloading toggles. - PrometheusLabelBuilder (
app/runtime/PrometheusLabelBuilder.java) — generates Prometheusdocker_sd_configslabels per resolved runtime type: Spring Boot/actuator/prometheus:8081, Quarkus/native/q/metrics:9000, plain Java/metrics:9464. Labels merged into container metadata alongside Traefik labels at deploy time. - DockerNetworkManager (
app/runtime/DockerNetworkManager.java) — manages two Docker network tiers:cameleer-traefik— shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose withcameleer3-serverDNS alias.cameleer-env-{slug}— per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped:cameleer-env-{tenantId}-{envSlug}(overloadedenvNetworkName(tenantId, envSlug)method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
- DockerEventMonitor (
app/runtime/DockerEventMonitor.java) — persistent Docker event stream listener for containers withmanaged-by=cameleer3-serverlabel. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy). - DeploymentProgress (
ui/src/components/DeploymentProgress.tsx) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
Deployment Status Model
Deployments move through these statuses:
| Status | Meaning |
|---|---|
STOPPED |
Intentionally stopped or initial state |
STARTING |
Deploy in progress |
RUNNING |
All replicas healthy and serving |
DEGRADED |
Some replicas healthy, some dead |
STOPPING |
Graceful shutdown in progress |
FAILED |
Terminal failure (pre-flight, health check, or crash) |
Replica support: deployments can specify a replica count. DEGRADED is used when at least one but not all replicas are healthy.
Deploy stages (DeployStage): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
Blue/green strategy: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
Deployment uniqueness: DeploymentService.createDeployment() deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
JAR Management
- Retention policy per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
- Nightly cleanup job (
JarRetentionJob, Spring@Scheduled03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed. - Volume-based JAR mounting for Docker-in-Docker setups: set
CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUMEto the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
Runtime Type Detection
The server detects the app framework from uploaded JARs and builds framework-specific Docker entrypoints:
- Detection (
RuntimeDetector): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probesMETA-INF/MANIFEST.MFMain-Class: Spring Boot loader prefix →spring-boot, Quarkus entry point →quarkus, other Main-Class →plain-java(extracts class name). Results stored onAppVersion(detected_runtime_type,detected_main_class). - Runtime types (
RuntimeTypeenum):AUTO,SPRING_BOOT,QUARKUS,PLAIN_JAVA,NATIVE. Configurable per app/environment viacontainerConfig.runtimeType(default"auto"). - Entrypoint per type: Spring Boot uses
PropertiesLauncherwith-Dloader.pathfor log appender; Quarkus uses-jar(appender compiled in); plain Java uses classpath with appender JAR; native runs binary directly (agent compiled in). All JVM types get-javaagent:/app/agent.jar. - Custom arguments (
containerConfig.customArgs): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint usessh -c). - AUTO resolution: at deploy time (PRE_FLIGHT),
"auto"resolves to the detected type fromAppVersion. Fails deployment if detection was unsuccessful — user must set type explicitly. - UI: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.
SaaS Multi-Tenant Network Isolation
In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:
- Tenant network (
cameleer-tenant-{slug}) — primary internal bridge for all of a tenant's containers. Set asCAMELEER_SERVER_RUNTIME_DOCKERNETWORKfor the tenant's server instance. Tenant A's apps cannot reach tenant B's apps. - Shared services network — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and
cameleer-traefikfor HTTP routing. - Tenant-scoped environment networks (
cameleer-env-{tenantId}-{envSlug}) — per-environment discovery is scoped per tenant, soalpha-corp's "dev" environment network is separate frombeta-corp's "dev" environment network.
nginx / Reverse Proxy
client_max_body_size 200mis required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.
Prometheus Metrics
Server exposes /api/v1/prometheus (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and http.server.requests metrics automatically. Business metrics via ServerMetrics component:
Gauges (auto-polled):
| Metric | Tags | Source |
|---|---|---|
cameleer.agents.connected |
state (live, stale, dead, shutdown) |
AgentRegistryService.findByState() |
cameleer.agents.sse.active |
— | SseConnectionManager.getConnectionCount() |
cameleer.ingestion.buffer.size |
type (execution, processor, log, metrics) |
WriteBuffer.size() |
cameleer.ingestion.accumulator.pending |
— | ChunkAccumulator.getPendingCount() |
Counters:
| Metric | Tags | Instrumented in |
|---|---|---|
cameleer.ingestion.drops |
reason (buffer_full, no_agent, no_identity) |
LogIngestionController |
cameleer.agents.transitions |
transition (went_stale, went_dead, recovered) |
AgentLifecycleMonitor |
cameleer.deployments.outcome |
status (running, failed, degraded) |
DeploymentExecutor |
cameleer.auth.failures |
reason (invalid_token, revoked, oidc_rejected) |
JwtAuthenticationFilter |
Timers:
| Metric | Tags | Instrumented in |
|---|---|---|
cameleer.ingestion.flush.duration |
type (execution, processor, log) |
ExecutionFlushScheduler |
cameleer.deployments.duration |
— | DeploymentExecutor |
Agent container Prometheus labels (set by PrometheusLabelBuilder at deploy time):
| Runtime Type | prometheus.path |
prometheus.port |
|---|---|---|
spring-boot |
/actuator/prometheus |
8081 |
quarkus / native |
/q/metrics |
9000 |
plain-java |
/metrics |
9464 |
All containers also get prometheus.scrape=true. These labels enable Prometheus docker_sd_configs auto-discovery.
Agent Metric Names (Micrometer)
Agents send MetricsSnapshot records with Micrometer-convention metric names. The server stores them generically (ClickHouse agent_metrics.metric_name). The UI references specific names in AgentInstance.tsx for JVM charts.
JVM metrics (used by UI):
| Metric name | UI usage |
|---|---|
process.cpu.usage.value |
CPU % stat card + chart |
jvm.memory.used.value |
Heap MB stat card + chart (tags: area=heap) |
jvm.memory.max.value |
Heap max for % calculation (tags: area=heap) |
jvm.threads.live.value |
Thread count chart |
jvm.gc.pause.total_time |
GC time chart |
Camel route metrics (stored, queried by dashboard):
| Metric name | Type | Tags |
|---|---|---|
camel.exchanges.succeeded.count |
counter | routeId, camelContext |
camel.exchanges.failed.count |
counter | routeId, camelContext |
camel.exchanges.total.count |
counter | routeId, camelContext |
camel.exchanges.failures.handled.count |
counter | routeId, camelContext |
camel.route.policy.count |
count | routeId, camelContext |
camel.route.policy.total_time |
total | routeId, camelContext |
camel.route.policy.max |
gauge | routeId, camelContext |
camel.routes.running.value |
gauge | — |
Mean processing time = camel.route.policy.total_time / camel.route.policy.count. Min processing time is not available (Micrometer does not track minimums).
Cameleer agent metrics:
| Metric name | Type | Tags |
|---|---|---|
cameleer.chunks.exported.count |
counter | instanceId |
cameleer.chunks.dropped.count |
counter | instanceId, reason |
cameleer.sse.reconnects.count |
counter | instanceId |
cameleer.taps.evaluated.count |
counter | instanceId |
cameleer.metrics.exported.count |
counter | instanceId |
Disabled Skills
- Do NOT use any
gsd:*skills in this project. This includes all/gsd:prefixed commands.
GitNexus — Code Intelligence
This project is indexed by GitNexus as cameleer3-server (6027 symbols, 15299 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
If any GitNexus tool warns the index is stale, run
npx gitnexus analyzein terminal first.
Always Do
- MUST run impact analysis before editing any symbol. Before modifying a function, class, or method, run
gitnexus_impact({target: "symbolName", direction: "upstream"})and report the blast radius (direct callers, affected processes, risk level) to the user. - MUST run
gitnexus_detect_changes()before committing to verify your changes only affect expected symbols and execution flows. - MUST warn the user if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use
gitnexus_query({query: "concept"})to find execution flows instead of grepping. It returns process-grouped results ranked by relevance. - When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use
gitnexus_context({name: "symbolName"}).
When Debugging
gitnexus_query({query: "<error or symptom>"})— find execution flows related to the issuegitnexus_context({name: "<suspect function>"})— see all callers, callees, and process participationREAD gitnexus://repo/cameleer3-server/process/{processName}— trace the full execution flow step by step- For regressions:
gitnexus_detect_changes({scope: "compare", base_ref: "main"})— see what your branch changed
When Refactoring
- Renaming: MUST use
gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})first. Review the preview — graph edits are safe, text_search edits need manual review. Then run withdry_run: false. - Extracting/Splitting: MUST run
gitnexus_context({name: "target"})to see all incoming/outgoing refs, thengitnexus_impact({target: "target", direction: "upstream"})to find all external callers before moving code. - After any refactor: run
gitnexus_detect_changes({scope: "all"})to verify only expected files changed.
Never Do
- NEVER edit a function, class, or method without first running
gitnexus_impacton it. - NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use
gitnexus_renamewhich understands the call graph. - NEVER commit changes without running
gitnexus_detect_changes()to check affected scope.
Tools Quick Reference
| Tool | When to use | Command |
|---|---|---|
query |
Find code by concept | gitnexus_query({query: "auth validation"}) |
context |
360-degree view of one symbol | gitnexus_context({name: "validateUser"}) |
impact |
Blast radius before editing | gitnexus_impact({target: "X", direction: "upstream"}) |
detect_changes |
Pre-commit scope check | gitnexus_detect_changes({scope: "staged"}) |
rename |
Safe multi-file rename | gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true}) |
cypher |
Custom graph queries | gitnexus_cypher({query: "MATCH ..."}) |
Impact Risk Levels
| Depth | Meaning | Action |
|---|---|---|
| d=1 | WILL BREAK — direct callers/importers | MUST update these |
| d=2 | LIKELY AFFECTED — indirect deps | Should test |
| d=3 | MAY NEED TESTING — transitive | Test if critical path |
Resources
| Resource | Use for |
|---|---|
gitnexus://repo/cameleer3-server/context |
Codebase overview, check index freshness |
gitnexus://repo/cameleer3-server/clusters |
All functional areas |
gitnexus://repo/cameleer3-server/processes |
All execution flows |
gitnexus://repo/cameleer3-server/process/{name} |
Step-by-step execution trace |
Self-Check Before Finishing
Before completing any code modification task, verify:
gitnexus_impactwas run for all modified symbols- No HIGH/CRITICAL risk warnings were ignored
gitnexus_detect_changes()confirms changes match expected scope- All d=1 (WILL BREAK) dependents were updated
Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
npx gitnexus analyze
If the index previously included embeddings, preserve them by adding --embeddings:
npx gitnexus analyze --embeddings
To check whether embeddings exist, inspect .gitnexus/meta.json — the stats.embeddings field shows the count (0 means no embeddings). Running analyze without --embeddings will delete any previously generated embeddings.
Claude Code users: A PostToolUse hook handles this automatically after
git commitandgit merge.
CLI
| Task | Read this skill file |
|---|---|
| Understand architecture / "How does X work?" | .claude/skills/gitnexus/gitnexus-exploring/SKILL.md |
| Blast radius / "What breaks if I change X?" | .claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md |
| Trace bugs / "Why is X failing?" | .claude/skills/gitnexus/gitnexus-debugging/SKILL.md |
| Rename / extract / split / refactor | .claude/skills/gitnexus/gitnexus-refactoring/SKILL.md |
| Tools, resources, schema reference | .claude/skills/gitnexus/gitnexus-guide/SKILL.md |
| Index, status, clean, wiki CLI commands | .claude/skills/gitnexus/gitnexus-cli/SKILL.md |