chore: track .claude/rules/ and add self-maintenance instruction
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 5m22s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s

Un-ignore .claude/rules/ so path-scoped rule files are shared via git.
Add instruction in CLAUDE.md to update rule files when modifying classes,
controllers, endpoints, or metrics — keeps rules current as part of
normal workflow rather than requiring separate maintenance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-04-16 09:26:53 +02:00
parent 95730b02ad
commit 810f493639
9 changed files with 538 additions and 1 deletions

View File

@@ -0,0 +1,109 @@
---
paths:
- "cameleer-server-app/**"
---
# App Module Key Classes
`cameleer-server-app/src/main/java/com/cameleer/server/app/`
## controller/ — REST endpoints
- `AgentRegistrationController` — POST /register, POST /heartbeat, GET / (list), POST /refresh-token
- `AgentSseController` — GET /sse (Server-Sent Events connection)
- `AgentCommandController` — POST /broadcast, POST /{agentId}, POST /{agentId}/ack
- `AppController` — CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versions
- `DeploymentController` — GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logs
- `EnvironmentAdminController` — CRUD /api/v1/admin/environments, PUT /{id}/jar-retention
- `ExecutionController` — GET /api/v1/executions (search + detail)
- `SearchController` — POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcard
- `LogQueryController` — GET /api/v1/logs (filters: source, application, agentId, exchangeId, level, logger, q, environment, time range)
- `LogIngestionController` — POST /api/v1/data/logs (accepts `List<LogEntry>` JSON array, each entry has `source`: app/agent). Logs WARN for: missing agent identity, unregistered agents, empty payloads, buffer-full drops, deserialization failures. Normal acceptance at DEBUG.
- `CatalogController` — GET /api/v1/catalog (unified app catalog merging PG managed apps + in-memory agents + CH stats), DELETE /api/v1/catalog/{applicationId} (ADMIN: dismiss app, purge all CH data + PG record). Auto-filters discovered apps older than `discoveryttldays` with no live agents.
- `ChunkIngestionController` — POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}
- `UserAdminController` — CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-password
- `RoleAdminController` — CRUD /api/v1/admin/roles
- `GroupAdminController` — CRUD /api/v1/admin/groups
- `OidcConfigAdminController` — GET/POST /api/v1/admin/oidc, POST /test
- `SensitiveKeysAdminController` — GET/PUT /api/v1/admin/sensitive-keys. GET returns 200 with config or 204 if not configured. PUT accepts `{ keys: [...] }` with optional `?pushToAgents=true` to fan out merged keys to all LIVE agents. Stored in `server_config` table (key `sensitive_keys`).
- `AuditLogController` — GET /api/v1/admin/audit
- `MetricsController` — GET /api/v1/metrics, GET /timeseries
- `DiagramController` — GET /api/v1/diagrams/{id}, POST /
- `DiagramRenderController` — POST /api/v1/diagrams/render (ELK layout)
- `ClaimMappingAdminController` — CRUD /api/v1/admin/claim-mappings, POST /test (accepts inline rules + claims for preview without saving)
- `LicenseAdminController` — GET/POST /api/v1/admin/license
- `AgentEventsController` — GET /api/v1/agent-events (agent state change history)
- `AgentMetricsController` — GET /api/v1/agent-metrics (JVM/Camel metrics per agent instance)
- `AppSettingsController` — GET/PUT /api/v1/apps/{appId}/settings
- `ApplicationConfigController` — GET/PUT /api/v1/apps/{appId}/config (traced processors, route recording, sensitive keys per app)
- `ClickHouseAdminController` — GET /api/v1/admin/clickhouse (ClickHouse admin, conditional on infrastructure endpoints)
- `DatabaseAdminController` — GET /api/v1/admin/database (PG admin, conditional on infrastructure endpoints)
- `DetailController` — GET /api/v1/detail (execution detail with processor tree)
- `EventIngestionController` — POST /api/v1/data/events (agent event ingestion)
- `RbacStatsController` — GET /api/v1/admin/rbac/stats
- `RouteCatalogController` — GET /api/v1/routes/catalog (merged route catalog from registry + ClickHouse)
- `RouteMetricsController` — GET /api/v1/route-metrics (per-route Camel metrics)
- `ThresholdAdminController` — CRUD /api/v1/admin/thresholds
- `UsageAnalyticsController` — GET /api/v1/admin/usage (ClickHouse usage_events)
## runtime/ — Docker orchestration
- `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
- `DeploymentExecutor`@Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}` (globally unique on Docker daemon). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}`.
- `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
- `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
- `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
- `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
- `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
## metrics/ — Prometheus observability
- `ServerMetrics` — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via `/api/v1/prometheus`.
## storage/ — PostgreSQL repositories (JdbcTemplate)
- `PostgresAppRepository`, `PostgresAppVersionRepository`, `PostgresEnvironmentRepository`
- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId
- `PostgresUserRepository`, `PostgresRoleRepository`, `PostgresGroupRepository`
- `PostgresAuditRepository`, `PostgresOidcConfigRepository`, `PostgresClaimMappingRepository`, `PostgresSensitiveKeysRepository`
- `PostgresAppSettingsRepository`, `PostgresApplicationConfigRepository`, `PostgresThresholdRepository`
## storage/ — ClickHouse stores
- `ClickHouseExecutionStore`, `ClickHouseMetricsStore`, `ClickHouseMetricsQueryStore`
- `ClickHouseStatsStore` — pre-aggregated stats, punchcard
- `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
- `ClickHouseUsageTracker` — usage_events for billing
## search/ — ClickHouse search and log stores
- `ClickHouseLogStore` — log storage and query, MDC-based exchange/processor correlation
- `ClickHouseSearchIndex` — full-text search
## security/ — Spring Security
- `SecurityConfig` — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional
- `JwtAuthenticationFilter` — OncePerRequestFilter, validates Bearer tokens
- `JwtServiceImpl` — HMAC-SHA256 JWT (Nimbus JOSE)
- `OidcAuthController` — /api/v1/auth/oidc (login-uri, token-exchange, logout)
- `OidcTokenExchanger` — code -> tokens, role extraction from access_token then id_token
- `OidcProviderHelper` — OIDC discovery, JWK source cache
## agent/ — Agent lifecycle
- `SseConnectionManager` — manages per-agent SSE connections, delivers commands
- `AgentLifecycleMonitor`@Scheduled 10s, LIVE->STALE->DEAD transitions
- `SsePayloadSigner` — Ed25519 signs SSE payloads for agent verification
## retention/ — JAR cleanup
- `JarRetentionJob`@Scheduled 03:00 daily, per-environment retention, skips deployed versions
## config/ — Spring beans
- `RuntimeOrchestratorAutoConfig` — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
- `RuntimeBeanConfig` — DeploymentExecutor, AppService, EnvironmentService
- `SecurityBeanConfig` — JwtService, Ed25519, BootstrapTokenValidator
- `StorageBeanConfig` — all repositories
- `ClickHouseConfig` — ClickHouse JdbcTemplate, schema initializer

24
.claude/rules/cicd.md Normal file
View File

@@ -0,0 +1,24 @@
---
paths:
- ".gitea/**"
- "deploy/**"
- "Dockerfile"
- "docker-entrypoint.sh"
---
# CI/CD & Deployment
- CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches
- Build step skips integration tests (`-DskipITs`) — Testcontainers needs Docker daemon
- Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime. `docker-entrypoint.sh` imports `/certs/ca.pem` into JVM truststore before starting the app (supports custom CAs for OIDC discovery without `CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY`).
- `REGISTRY_TOKEN` build arg required for `cameleer-common` dependency resolution
- Registry: `gitea.siegeln.net/cameleer/cameleer-server` (container images)
- K8s manifests in `deploy/` — Kustomize base + overlays (main/feature), shared infra (PostgreSQL, ClickHouse, Logto) as top-level manifests
- Deployment target: k3s at 192.168.50.86, namespace `cameleer` (main), `cam-<slug>` (feature branches)
- Feature branches: isolated namespace, PG schema; Traefik Ingress at `<slug>-api.cameleer.siegeln.net`
- Secrets managed in CI deploy step (idempotent `--dry-run=client | kubectl apply`): `cameleer-auth`, `cameleer-postgres-credentials`, `cameleer-clickhouse-credentials`
- K8s probes: server uses `/api/v1/health`, PostgreSQL uses `pg_isready -U "$POSTGRES_USER"` (env var, not hardcoded)
- K8s security: server and database pods run with `securityContext.runAsNonRoot`. UI (nginx) runs without securityContext (needs root for entrypoint setup).
- Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
- Docker build uses buildx registry cache + `--provenance=false` for Gitea compatibility
- CI: branch slug sanitization extracted to `.gitea/sanitize-branch.sh`, sourced by docker and deploy-feature jobs

View File

@@ -0,0 +1,97 @@
---
paths:
- "cameleer-server-core/**"
---
# Core Module Key Classes
`cameleer-server-core/src/main/java/com/cameleer/server/core/`
## agent/ — Agent lifecycle and commands
- `AgentRegistryService` — in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycle
- `AgentInfo` — record: id, name, application, environmentId, version, routeIds, capabilities, state
- `AgentCommand` — record: id, type, targetAgent, payload, createdAt, expiresAt
- `AgentEventService` — records agent state changes, heartbeats
- `AgentState` — enum: LIVE, STALE, DEAD, SHUTDOWN
- `CommandType` — enum for command types (config-update, deep-trace, replay, route-control, etc.)
- `CommandStatus` — enum for command acknowledgement states
- `CommandReply` — record: command execution result from agent
- `AgentEventRecord`, `AgentEventRepository` — event persistence
- `AgentEventListener` — callback interface for agent events
- `RouteStateRegistry` — tracks per-agent route states
## runtime/ — App/Environment/Deployment domain
- `App` — record: id, environmentId, slug, displayName, containerConfig (JSONB)
- `AppVersion` — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
- `Environment` — record: id, slug, jarRetentionCount
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
- `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
- `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
- `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
- `ContainerRequest` — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
- `ContainerStatus` — record: state, running, exitCode, error
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks
- `RoutingMode` — enum for routing strategies
- `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
- `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs, startLogCapture, stopLogCapture
- `AppRepository`, `AppVersionRepository`, `EnvironmentRepository`, `DeploymentRepository` — repository interfaces
- `AppService`, `EnvironmentService` — domain services
## search/ — Execution search and stats
- `SearchService` — search, count, stats, statsForApp, timeseries, timeseriesForApp, timeseriesForRoute, timeseriesGroupedByApp, timeseriesGroupedByRoute, slaCompliance, slaCountsByApp, slaCountsByRoute, topErrors, activeErrorTypes, punchcard, distinctAttributeKeys
- `SearchRequest` / `SearchResult` — search DTOs
- `ExecutionStats`, `ExecutionSummary` — stats aggregation records
- `StatsTimeseries`, `TopError` — timeseries and error DTOs
- `LogSearchRequest` / `LogSearchResponse` — log search DTOs
## storage/ — Storage abstractions
- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `SearchIndex`, `LogIndex` — interfaces
- `LogEntryResult` — log query result record
- `model/``ExecutionDocument`, `MetricTimeSeries`, `MetricsSnapshot`
## rbac/ — Role-based access control
- `RbacService` — interface: role/group CRUD, assignRoleToUser, removeRoleFromUser, addUserToGroup, removeUserFromGroup, getDirectRolesForUser, getEffectiveRolesForUser, clearManagedAssignments, assignManagedRole, addUserToManagedGroup, getStats, listUsers
- `SystemRole` — enum: AGENT, VIEWER, OPERATOR, ADMIN; `normalizeScope()` maps scopes
- `UserDetail`, `RoleDetail`, `GroupDetail` — records
- `UserSummary`, `RoleSummary`, `GroupSummary` — lightweight list records
- `RbacStats` — aggregate stats record
- `AssignmentOrigin` — enum: DIRECT, CLAIM_MAPPING (tracks how roles were assigned)
- `ClaimMappingRule` — record: OIDC claim-to-role mapping rule
- `ClaimMappingService` — interface: CRUD for claim mapping rules
- `ClaimMappingRepository` — persistence interface
- `RoleRepository`, `GroupRepository` — persistence interfaces
## admin/ — Server-wide admin config
- `SensitiveKeysConfig` — record: keys (List<String>, immutable)
- `SensitiveKeysRepository` — interface: find(), save()
- `SensitiveKeysMerger` — pure function: merge(global, perApp) -> union with case-insensitive dedup, preserves first-seen casing. Returns null when both inputs null.
- `AppSettings`, `AppSettingsRepository` — per-app settings config and persistence
- `ThresholdConfig`, `ThresholdRepository` — alerting threshold config and persistence
- `AuditService` — audit logging facade
- `AuditRecord`, `AuditResult`, `AuditCategory`, `AuditRepository` — audit trail records and persistence
## security/ — Auth
- `JwtService` — interface: createAccessToken, createRefreshToken, validateAccessToken, validateRefreshToken
- `Ed25519SigningService` — interface: sign, getPublicKeyBase64 (config signing)
- `OidcConfig` — record: enabled, issuerUri, clientId, clientSecret, rolesClaim, defaultRoles, autoSignup, displayNameClaim, userIdClaim, audience, additionalScopes
- `OidcConfigRepository` — persistence interface
- `PasswordPolicyValidator` — min 12 chars, 3-of-4 character classes, no username match
- `UserInfo`, `UserRepository` — user identity records and persistence
- `InvalidTokenException` — thrown on revoked/expired tokens
## ingestion/ — Buffered data pipeline
- `IngestionService` — ingestExecution, ingestMetric, ingestLog, ingestDiagram
- `ChunkAccumulator` — batches data for efficient flush
- `WriteBuffer` — bounded ring buffer for async flush
- `BufferedLogEntry` — log entry wrapper with metadata
- `MergedExecution`, `TaggedExecution`, `TaggedDiagram` — tagged ingestion records

View File

@@ -0,0 +1,76 @@
---
paths:
- "cameleer-server-app/**/runtime/**"
- "cameleer-server-core/**/runtime/**"
- "deploy/**"
- "docker-compose*.yml"
- "Dockerfile"
- "docker-entrypoint.sh"
---
# Docker Orchestration
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
- **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
- `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
- `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
- **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
- **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
## DeploymentExecutor Details
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
## Deployment Status Model
| Status | Meaning |
|--------|---------|
| `STOPPED` | Intentionally stopped or initial state |
| `STARTING` | Deploy in progress |
| `RUNNING` | All replicas healthy and serving |
| `DEGRADED` | Some replicas healthy, some dead |
| `STOPPING` | Graceful shutdown in progress |
| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
**Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
## JAR Management
- **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
- **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
- **Volume-based JAR mounting** for Docker-in-Docker setups: set `CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME` to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
## Runtime Type Detection
The server detects the app framework from uploaded JARs and builds Docker entrypoints. The agent shaded JAR bundles the log appender, so no separate `cameleer-log-appender.jar` or `PropertiesLauncher` is needed:
- **Detection** (`RuntimeDetector`): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes `META-INF/MANIFEST.MF` Main-Class: Spring Boot loader prefix -> `spring-boot`, Quarkus entry point -> `quarkus`, other Main-Class -> `plain-java` (extracts class name). Results stored on `AppVersion` (`detected_runtime_type`, `detected_main_class`).
- **Runtime types** (`RuntimeType` enum): `AUTO`, `SPRING_BOOT`, `QUARKUS`, `PLAIN_JAVA`, `NATIVE`. Configurable per app/environment via `containerConfig.runtimeType` (default `"auto"`).
- **Entrypoint per type**: All JVM types use `java -javaagent:/app/agent.jar -jar app.jar`. Plain Java uses `-cp` with explicit main class instead of `-jar`. Native runs the binary directly.
- **Custom arguments** (`containerConfig.customArgs`): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses `sh -c`).
- **AUTO resolution**: at deploy time (PRE_FLIGHT), `"auto"` resolves to the detected type from `AppVersion`. Fails deployment if detection was unsuccessful — user must set type explicitly.
- **UI**: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.
## SaaS Multi-Tenant Network Isolation
In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:
- **Tenant network** (`cameleer-tenant-{slug}`) — primary internal bridge for all of a tenant's containers. Set as `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
- **Shared services network** — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and `cameleer-traefik` for HTTP routing.
- **Tenant-scoped environment networks** (`cameleer-env-{tenantId}-{envSlug}`) — per-environment discovery is scoped per tenant, so `alpha-corp`'s "dev" environment network is separate from `beta-corp`'s "dev" environment network.
## nginx / Reverse Proxy
- `client_max_body_size 200m` is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.

98
.claude/rules/gitnexus.md Normal file
View File

@@ -0,0 +1,98 @@
# GitNexus — Code Intelligence
This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
## Always Do
- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
## When Debugging
1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
3. `READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
## When Refactoring
- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
## Never Do
- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
## Tools Quick Reference
| Tool | When to use | Command |
|------|-------------|---------|
| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` |
| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` |
| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` |
| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` |
| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` |
| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` |
## Impact Risk Levels
| Depth | Meaning | Action |
|-------|---------|--------|
| d=1 | WILL BREAK — direct callers/importers | MUST update these |
| d=2 | LIKELY AFFECTED — indirect deps | Should test |
| d=3 | MAY NEED TESTING — transitive | Test if critical path |
## Resources
| Resource | Use for |
|----------|---------|
| `gitnexus://repo/cameleer-server/context` | Codebase overview, check index freshness |
| `gitnexus://repo/cameleer-server/clusters` | All functional areas |
| `gitnexus://repo/cameleer-server/processes` | All execution flows |
| `gitnexus://repo/cameleer-server/process/{name}` | Step-by-step execution trace |
## Self-Check Before Finishing
Before completing any code modification task, verify:
1. `gitnexus_impact` was run for all modified symbols
2. No HIGH/CRITICAL risk warnings were ignored
3. `gitnexus_detect_changes()` confirms changes match expected scope
4. All d=1 (WILL BREAK) dependents were updated
## Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
```bash
npx gitnexus analyze
```
If the index previously included embeddings, preserve them by adding `--embeddings`:
```bash
npx gitnexus analyze --embeddings
```
To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
## CLI
| Task | Read this skill file |
|------|---------------------|
| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` |
| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` |
| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` |

85
.claude/rules/metrics.md Normal file
View File

@@ -0,0 +1,85 @@
---
paths:
- "cameleer-server-app/**/metrics/**"
- "cameleer-server-app/**/ServerMetrics*"
- "ui/src/pages/RuntimeTab/**"
- "ui/src/pages/DashboardTab/**"
---
# Prometheus Metrics
Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component:
## Gauges (auto-polled)
| Metric | Tags | Source |
|--------|------|--------|
| `cameleer.agents.connected` | `state` (live, stale, dead, shutdown) | `AgentRegistryService.findByState()` |
| `cameleer.agents.sse.active` | — | `SseConnectionManager.getConnectionCount()` |
| `cameleer.ingestion.buffer.size` | `type` (execution, processor, log, metrics) | `WriteBuffer.size()` |
| `cameleer.ingestion.accumulator.pending` | — | `ChunkAccumulator.getPendingCount()` |
## Counters
| Metric | Tags | Instrumented in |
|--------|------|-----------------|
| `cameleer.ingestion.drops` | `reason` (buffer_full, no_agent, no_identity) | `LogIngestionController` |
| `cameleer.agents.transitions` | `transition` (went_stale, went_dead, recovered) | `AgentLifecycleMonitor` |
| `cameleer.deployments.outcome` | `status` (running, failed, degraded) | `DeploymentExecutor` |
| `cameleer.auth.failures` | `reason` (invalid_token, revoked, oidc_rejected) | `JwtAuthenticationFilter` |
## Timers
| Metric | Tags | Instrumented in |
|--------|------|-----------------|
| `cameleer.ingestion.flush.duration` | `type` (execution, processor, log) | `ExecutionFlushScheduler` |
| `cameleer.deployments.duration` | — | `DeploymentExecutor` |
## Agent container Prometheus labels (set by PrometheusLabelBuilder at deploy time)
| Runtime Type | `prometheus.path` | `prometheus.port` |
|---|---|---|
| `spring-boot` | `/actuator/prometheus` | `8081` |
| `quarkus` / `native` | `/q/metrics` | `9000` |
| `plain-java` | `/metrics` | `9464` |
All containers also get `prometheus.scrape=true`. These labels enable Prometheus `docker_sd_configs` auto-discovery.
## Agent Metric Names (Micrometer)
Agents send `MetricsSnapshot` records with Micrometer-convention metric names. The server stores them generically (ClickHouse `agent_metrics.metric_name`). The UI references specific names in `AgentInstance.tsx` for JVM charts.
### JVM metrics (used by UI)
| Metric name | UI usage |
|---|---|
| `process.cpu.usage.value` | CPU % stat card + chart |
| `jvm.memory.used.value` | Heap MB stat card + chart (tags: `area=heap`) |
| `jvm.memory.max.value` | Heap max for % calculation (tags: `area=heap`) |
| `jvm.threads.live.value` | Thread count chart |
| `jvm.gc.pause.total_time` | GC time chart |
### Camel route metrics (stored, queried by dashboard)
| Metric name | Type | Tags |
|---|---|---|
| `camel.exchanges.succeeded.count` | counter | `routeId`, `camelContext` |
| `camel.exchanges.failed.count` | counter | `routeId`, `camelContext` |
| `camel.exchanges.total.count` | counter | `routeId`, `camelContext` |
| `camel.exchanges.failures.handled.count` | counter | `routeId`, `camelContext` |
| `camel.route.policy.count` | count | `routeId`, `camelContext` |
| `camel.route.policy.total_time` | total | `routeId`, `camelContext` |
| `camel.route.policy.max` | gauge | `routeId`, `camelContext` |
| `camel.routes.running.value` | gauge | — |
Mean processing time = `camel.route.policy.total_time / camel.route.policy.count`. Min processing time is not available (Micrometer does not track minimums).
### Cameleer agent metrics
| Metric name | Type | Tags |
|---|---|---|
| `cameleer.chunks.exported.count` | counter | `instanceId` |
| `cameleer.chunks.dropped.count` | counter | `instanceId`, `reason` |
| `cameleer.sse.reconnects.count` | counter | `instanceId` |
| `cameleer.taps.evaluated.count` | counter | `instanceId` |
| `cameleer.metrics.exported.count` | counter | `instanceId` |

43
.claude/rules/ui.md Normal file
View File

@@ -0,0 +1,43 @@
---
paths:
- "ui/**"
---
# UI Structure
The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments**.
- **Exchanges** — route execution search and detail (`ui/src/pages/Exchanges/`)
- **Dashboard** — metrics and stats with L1/L2/L3 drill-down (`ui/src/pages/DashboardTab/`)
- **Runtime** — live agent status, logs, commands (`ui/src/pages/RuntimeTab/`)
- **Deployments** — app management, JAR upload, deployment lifecycle (`ui/src/pages/AppsTab/`)
- Config sub-tabs: **Monitoring | Resources | Variables | Traces & Taps | Route Recording**
- Create app: full page at `/apps/new` (not a modal)
- Deployment progress: `ui/src/components/DeploymentProgress.tsx` (7-stage step indicator)
**Admin pages** (ADMIN-only, under `/admin/`):
- **Sensitive Keys** (`ui/src/pages/Admin/SensitiveKeysPage.tsx`) — global sensitive key masking config. Shows agent built-in defaults as outlined Badge reference, editable Tag pills for custom keys, amber-highlighted push-to-agents toggle. Keys add to (not replace) agent defaults. Per-app sensitive key additions managed via `ApplicationConfigController` API. Note: `AppConfigDetailPage.tsx` exists but is not routed in `router.tsx`.
## Key UI Files
- `ui/src/router.tsx` — React Router v6 routes
- `ui/src/config.ts` — apiBaseUrl, basePath
- `ui/src/auth/auth-store.ts` — Zustand: accessToken, user, roles, login/logout
- `ui/src/api/environment-store.ts` — Zustand: selected environment (localStorage)
- `ui/src/components/ContentTabs.tsx` — main tab switcher
- `ui/src/components/ExecutionDiagram/` — interactive trace view (canvas)
- `ui/src/components/ProcessDiagram/` — ELK-rendered route diagram
- `ui/src/hooks/useScope.ts` — TabKey type, scope inference
- `ui/src/components/StartupLogPanel.tsx` — deployment startup log viewer (container logs from ClickHouse, polls 3s while STARTING)
- `ui/src/api/queries/logs.ts``useStartupLogs` hook for container startup log polling, `useLogs`/`useApplicationLogs` for general log search
## UI Styling
- Always use `@cameleer/design-system` CSS variables for colors (`var(--amber)`, `var(--error)`, `var(--success)`, etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG `fill`/`stroke` attributes. SVG presentation attributes resolve `var()` correctly. All colors use CSS variables (no hardcoded hex).
- Shared CSS modules in `ui/src/styles/` (table-section, log-panel, rate-colors, refresh-indicator, chart-card, section-card) — import these instead of duplicating patterns.
- Shared `PageLoader` component replaces copy-pasted spinner patterns.
- Design system components used consistently: `Select`, `Tabs`, `Toggle`, `Button`, `LogViewer`, `Label` — prefer DS components over raw HTML elements. `LogViewer` renders optional source badges (`container`, `app`, `agent`) via `LogEntry.source` field (DS v0.1.49+).
- Environment slugs are auto-computed from display name (read-only in UI).
- Brand assets: `@cameleer/design-system/assets/` provides `camel-logo.svg` (currentColor), `cameleer-{16,32,48,192,512}.png`, and `cameleer-logo.png`. Copied to `ui/public/` for use as favicon (`favicon-16.png`, `favicon-32.png`) and logo (`camel-logo.svg` — login dialog 36px, sidebar 28x24px).
- Sidebar generates `/exchanges/` paths directly (no legacy `/apps/` redirects). basePath is centralized in `ui/src/config.ts`; router.tsx imports it instead of re-reading `<base>` tag.
- Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g. `?text=` search query). Switching environment resets all filters and remounts pages.

3
.gitignore vendored
View File

@@ -38,7 +38,8 @@ Thumbs.db
logs/ logs/
# Claude # Claude
.claude/ .claude/*
!.claude/rules/
.superpowers/ .superpowers/
.playwright-mcp/ .playwright-mcp/
.worktrees/ .worktrees/

View File

@@ -67,6 +67,10 @@ PostgreSQL (Flyway): `cameleer-server-app/src/main/resources/db/migration/`
ClickHouse: `cameleer-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup) ClickHouse: `cameleer-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup)
## Maintaining .claude/rules/
When adding, removing, or renaming classes, controllers, endpoints, UI components, or metrics, update the corresponding `.claude/rules/` file as part of the same change. The rule files are the class/API map that future sessions rely on — stale rules cause wrong assumptions. Treat rule file updates like updating an import: part of the change, not a separate task.
## Disabled Skills ## Disabled Skills
- Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands. - Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands.