chore: track .claude/rules/ and add self-maintenance instruction

Un-ignore .claude/rules/ so path-scoped rule files are shared via git. Add instruction in CLAUDE.md to update rule files when modifying classes, controllers, endpoints, or metrics — keeps rules current as part of normal workflow rather than requiring separate maintenance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:26:53 +02:00
parent 95730b02ad
commit 810f493639
9 changed files with 538 additions and 1 deletions
--- a/.claude/rules/app-classes.md
+++ b/.claude/rules/app-classes.md
@@ -0,0 +1,109 @@
 ---
 paths:
  - "cameleer-server-app/**"
 ---
 # App Module Key Classes
 `cameleer-server-app/src/main/java/com/cameleer/server/app/`
 ## controller/ — REST endpoints
 - `AgentRegistrationController` — POST /register, POST /heartbeat, GET / (list), POST /refresh-token
 - `AgentSseController` — GET /sse (Server-Sent Events connection)
 - `AgentCommandController` — POST /broadcast, POST /{agentId}, POST /{agentId}/ack
 - `AppController` — CRUD /api/v1/apps, POST /{appId}/upload-jar, GET /{appId}/versions
 - `DeploymentController` — GET/POST /api/v1/apps/{appId}/deployments, POST /{id}/stop, POST /{id}/promote, GET /{id}/logs
 - `EnvironmentAdminController` — CRUD /api/v1/admin/environments, PUT /{id}/jar-retention
 - `ExecutionController` — GET /api/v1/executions (search + detail)
 - `SearchController` — POST /api/v1/search, GET /routes, GET /top-errors, GET /punchcard
 - `LogQueryController` — GET /api/v1/logs (filters: source, application, agentId, exchangeId, level, logger, q, environment, time range)
 - `LogIngestionController` — POST /api/v1/data/logs (accepts `List<LogEntry>` JSON array, each entry has `source`: app/agent). Logs WARN for: missing agent identity, unregistered agents, empty payloads, buffer-full drops, deserialization failures. Normal acceptance at DEBUG.
 - `CatalogController` — GET /api/v1/catalog (unified app catalog merging PG managed apps + in-memory agents + CH stats), DELETE /api/v1/catalog/{applicationId} (ADMIN: dismiss app, purge all CH data + PG record). Auto-filters discovered apps older than `discoveryttldays` with no live agents.
 - `ChunkIngestionController` — POST /api/v1/ingestion/chunk/{executions|metrics|diagrams}
 - `UserAdminController` — CRUD /api/v1/admin/users, POST /{id}/roles, POST /{id}/set-password
 - `RoleAdminController` — CRUD /api/v1/admin/roles
 - `GroupAdminController` — CRUD /api/v1/admin/groups
 - `OidcConfigAdminController` — GET/POST /api/v1/admin/oidc, POST /test
 - `SensitiveKeysAdminController` — GET/PUT /api/v1/admin/sensitive-keys. GET returns 200 with config or 204 if not configured. PUT accepts `{ keys: [...] }` with optional `?pushToAgents=true` to fan out merged keys to all LIVE agents. Stored in `server_config` table (key `sensitive_keys`).
 - `AuditLogController` — GET /api/v1/admin/audit
 - `MetricsController` — GET /api/v1/metrics, GET /timeseries
 - `DiagramController` — GET /api/v1/diagrams/{id}, POST /
 - `DiagramRenderController` — POST /api/v1/diagrams/render (ELK layout)
 - `ClaimMappingAdminController` — CRUD /api/v1/admin/claim-mappings, POST /test (accepts inline rules + claims for preview without saving)
 - `LicenseAdminController` — GET/POST /api/v1/admin/license
 - `AgentEventsController` — GET /api/v1/agent-events (agent state change history)
 - `AgentMetricsController` — GET /api/v1/agent-metrics (JVM/Camel metrics per agent instance)
 - `AppSettingsController` — GET/PUT /api/v1/apps/{appId}/settings
 - `ApplicationConfigController` — GET/PUT /api/v1/apps/{appId}/config (traced processors, route recording, sensitive keys per app)
 - `ClickHouseAdminController` — GET /api/v1/admin/clickhouse (ClickHouse admin, conditional on infrastructure endpoints)
 - `DatabaseAdminController` — GET /api/v1/admin/database (PG admin, conditional on infrastructure endpoints)
 - `DetailController` — GET /api/v1/detail (execution detail with processor tree)
 - `EventIngestionController` — POST /api/v1/data/events (agent event ingestion)
 - `RbacStatsController` — GET /api/v1/admin/rbac/stats
 - `RouteCatalogController` — GET /api/v1/routes/catalog (merged route catalog from registry + ClickHouse)
 - `RouteMetricsController` — GET /api/v1/route-metrics (per-route Camel metrics)
 - `ThresholdAdminController` — CRUD /api/v1/admin/thresholds
 - `UsageAnalyticsController` — GET /api/v1/admin/usage (ClickHouse usage_events)
 ## runtime/ — Docker orchestration
 - `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
 - `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}` (globally unique on Docker daemon). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}`.
 - `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
 - `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
 - `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
 - `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
 - `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
 - `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
 ## metrics/ — Prometheus observability
 - `ServerMetrics` — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via `/api/v1/prometheus`.
 ## storage/ — PostgreSQL repositories (JdbcTemplate)
 - `PostgresAppRepository`, `PostgresAppVersionRepository`, `PostgresEnvironmentRepository`
 - `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId
 - `PostgresUserRepository`, `PostgresRoleRepository`, `PostgresGroupRepository`
 - `PostgresAuditRepository`, `PostgresOidcConfigRepository`, `PostgresClaimMappingRepository`, `PostgresSensitiveKeysRepository`
 - `PostgresAppSettingsRepository`, `PostgresApplicationConfigRepository`, `PostgresThresholdRepository`
 ## storage/ — ClickHouse stores
 - `ClickHouseExecutionStore`, `ClickHouseMetricsStore`, `ClickHouseMetricsQueryStore`
 - `ClickHouseStatsStore` — pre-aggregated stats, punchcard
 - `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
 - `ClickHouseUsageTracker` — usage_events for billing
 ## search/ — ClickHouse search and log stores
 - `ClickHouseLogStore` — log storage and query, MDC-based exchange/processor correlation
 - `ClickHouseSearchIndex` — full-text search
 ## security/ — Spring Security
 - `SecurityConfig` — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional
 - `JwtAuthenticationFilter` — OncePerRequestFilter, validates Bearer tokens
 - `JwtServiceImpl` — HMAC-SHA256 JWT (Nimbus JOSE)
 - `OidcAuthController` — /api/v1/auth/oidc (login-uri, token-exchange, logout)
 - `OidcTokenExchanger` — code -> tokens, role extraction from access_token then id_token
 - `OidcProviderHelper` — OIDC discovery, JWK source cache
 ## agent/ — Agent lifecycle
 - `SseConnectionManager` — manages per-agent SSE connections, delivers commands
 - `AgentLifecycleMonitor` — @Scheduled 10s, LIVE->STALE->DEAD transitions
 - `SsePayloadSigner` — Ed25519 signs SSE payloads for agent verification
 ## retention/ — JAR cleanup
 - `JarRetentionJob` — @Scheduled 03:00 daily, per-environment retention, skips deployed versions
 ## config/ — Spring beans
 - `RuntimeOrchestratorAutoConfig` — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
 - `RuntimeBeanConfig` — DeploymentExecutor, AppService, EnvironmentService
 - `SecurityBeanConfig` — JwtService, Ed25519, BootstrapTokenValidator
 - `StorageBeanConfig` — all repositories
 - `ClickHouseConfig` — ClickHouse JdbcTemplate, schema initializer
--- a/.claude/rules/cicd.md
+++ b/.claude/rules/cicd.md
@@ -0,0 +1,24 @@
 ---
 paths:
  - ".gitea/**"
  - "deploy/**"
  - "Dockerfile"
  - "docker-entrypoint.sh"
 ---
 # CI/CD & Deployment
 - CI workflow: `.gitea/workflows/ci.yml` — build -> docker -> deploy on push to main or feature branches
 - Build step skips integration tests (`-DskipITs`) — Testcontainers needs Docker daemon
 - Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime. `docker-entrypoint.sh` imports `/certs/ca.pem` into JVM truststore before starting the app (supports custom CAs for OIDC discovery without `CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY`).
 - `REGISTRY_TOKEN` build arg required for `cameleer-common` dependency resolution
 - Registry: `gitea.siegeln.net/cameleer/cameleer-server` (container images)
 - K8s manifests in `deploy/` — Kustomize base + overlays (main/feature), shared infra (PostgreSQL, ClickHouse, Logto) as top-level manifests
 - Deployment target: k3s at 192.168.50.86, namespace `cameleer` (main), `cam-<slug>` (feature branches)
 - Feature branches: isolated namespace, PG schema; Traefik Ingress at `<slug>-api.cameleer.siegeln.net`
 - Secrets managed in CI deploy step (idempotent `--dry-run=client | kubectl apply`): `cameleer-auth`, `cameleer-postgres-credentials`, `cameleer-clickhouse-credentials`
 - K8s probes: server uses `/api/v1/health`, PostgreSQL uses `pg_isready -U "$POSTGRES_USER"` (env var, not hardcoded)
 - K8s security: server and database pods run with `securityContext.runAsNonRoot`. UI (nginx) runs without securityContext (needs root for entrypoint setup).
 - Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
 - Docker build uses buildx registry cache + `--provenance=false` for Gitea compatibility
 - CI: branch slug sanitization extracted to `.gitea/sanitize-branch.sh`, sourced by docker and deploy-feature jobs
--- a/.claude/rules/core-classes.md
+++ b/.claude/rules/core-classes.md
@@ -0,0 +1,97 @@
 ---
 paths:
  - "cameleer-server-core/**"
 ---
 # Core Module Key Classes
 `cameleer-server-core/src/main/java/com/cameleer/server/core/`
 ## agent/ — Agent lifecycle and commands
 - `AgentRegistryService` — in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycle
 - `AgentInfo` — record: id, name, application, environmentId, version, routeIds, capabilities, state
 - `AgentCommand` — record: id, type, targetAgent, payload, createdAt, expiresAt
 - `AgentEventService` — records agent state changes, heartbeats
 - `AgentState` — enum: LIVE, STALE, DEAD, SHUTDOWN
 - `CommandType` — enum for command types (config-update, deep-trace, replay, route-control, etc.)
 - `CommandStatus` — enum for command acknowledgement states
 - `CommandReply` — record: command execution result from agent
 - `AgentEventRecord`, `AgentEventRepository` — event persistence
 - `AgentEventListener` — callback interface for agent events
 - `RouteStateRegistry` — tracks per-agent route states
 ## runtime/ — App/Environment/Deployment domain
 - `App` — record: id, environmentId, slug, displayName, containerConfig (JSONB)
 - `AppVersion` — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
 - `Environment` — record: id, slug, jarRetentionCount
 - `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
 - `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
 - `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
 - `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
 - `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
 - `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
 - `ContainerRequest` — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
 - `ContainerStatus` — record: state, running, exitCode, error
 - `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks
 - `RoutingMode` — enum for routing strategies
 - `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
 - `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs, startLogCapture, stopLogCapture
 - `AppRepository`, `AppVersionRepository`, `EnvironmentRepository`, `DeploymentRepository` — repository interfaces
 - `AppService`, `EnvironmentService` — domain services
 ## search/ — Execution search and stats
 - `SearchService` — search, count, stats, statsForApp, timeseries, timeseriesForApp, timeseriesForRoute, timeseriesGroupedByApp, timeseriesGroupedByRoute, slaCompliance, slaCountsByApp, slaCountsByRoute, topErrors, activeErrorTypes, punchcard, distinctAttributeKeys
 - `SearchRequest` / `SearchResult` — search DTOs
 - `ExecutionStats`, `ExecutionSummary` — stats aggregation records
 - `StatsTimeseries`, `TopError` — timeseries and error DTOs
 - `LogSearchRequest` / `LogSearchResponse` — log search DTOs
 ## storage/ — Storage abstractions
 - `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `SearchIndex`, `LogIndex` — interfaces
 - `LogEntryResult` — log query result record
 - `model/` — `ExecutionDocument`, `MetricTimeSeries`, `MetricsSnapshot`
 ## rbac/ — Role-based access control
 - `RbacService` — interface: role/group CRUD, assignRoleToUser, removeRoleFromUser, addUserToGroup, removeUserFromGroup, getDirectRolesForUser, getEffectiveRolesForUser, clearManagedAssignments, assignManagedRole, addUserToManagedGroup, getStats, listUsers
 - `SystemRole` — enum: AGENT, VIEWER, OPERATOR, ADMIN; `normalizeScope()` maps scopes
 - `UserDetail`, `RoleDetail`, `GroupDetail` — records
 - `UserSummary`, `RoleSummary`, `GroupSummary` — lightweight list records
 - `RbacStats` — aggregate stats record
 - `AssignmentOrigin` — enum: DIRECT, CLAIM_MAPPING (tracks how roles were assigned)
 - `ClaimMappingRule` — record: OIDC claim-to-role mapping rule
 - `ClaimMappingService` — interface: CRUD for claim mapping rules
 - `ClaimMappingRepository` — persistence interface
 - `RoleRepository`, `GroupRepository` — persistence interfaces
 ## admin/ — Server-wide admin config
 - `SensitiveKeysConfig` — record: keys (List<String>, immutable)
 - `SensitiveKeysRepository` — interface: find(), save()
 - `SensitiveKeysMerger` — pure function: merge(global, perApp) -> union with case-insensitive dedup, preserves first-seen casing. Returns null when both inputs null.
 - `AppSettings`, `AppSettingsRepository` — per-app settings config and persistence
 - `ThresholdConfig`, `ThresholdRepository` — alerting threshold config and persistence
 - `AuditService` — audit logging facade
 - `AuditRecord`, `AuditResult`, `AuditCategory`, `AuditRepository` — audit trail records and persistence
 ## security/ — Auth
 - `JwtService` — interface: createAccessToken, createRefreshToken, validateAccessToken, validateRefreshToken
 - `Ed25519SigningService` — interface: sign, getPublicKeyBase64 (config signing)
 - `OidcConfig` — record: enabled, issuerUri, clientId, clientSecret, rolesClaim, defaultRoles, autoSignup, displayNameClaim, userIdClaim, audience, additionalScopes
 - `OidcConfigRepository` — persistence interface
 - `PasswordPolicyValidator` — min 12 chars, 3-of-4 character classes, no username match
 - `UserInfo`, `UserRepository` — user identity records and persistence
 - `InvalidTokenException` — thrown on revoked/expired tokens
 ## ingestion/ — Buffered data pipeline
 - `IngestionService` — ingestExecution, ingestMetric, ingestLog, ingestDiagram
 - `ChunkAccumulator` — batches data for efficient flush
 - `WriteBuffer` — bounded ring buffer for async flush
 - `BufferedLogEntry` — log entry wrapper with metadata
 - `MergedExecution`, `TaggedExecution`, `TaggedDiagram` — tagged ingestion records
--- a/.claude/rules/docker-orchestration.md
+++ b/.claude/rules/docker-orchestration.md
@@ -0,0 +1,76 @@
 ---
 paths:
  - "cameleer-server-app/**/runtime/**"
  - "cameleer-server-core/**/runtime/**"
  - "deploy/**"
  - "docker-compose*.yml"
  - "Dockerfile"
  - "docker-entrypoint.sh"
 ---
 # Docker Orchestration
 When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
 - **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
 - **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
 - **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
 - **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
  - `cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
  - `cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
 - **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
 - **DeploymentProgress** (`ui/src/components/DeploymentProgress.tsx`) — UI step indicator showing 7 deploy stages with amber active/green completed styling.
 - **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
 - **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
 ## DeploymentExecutor Details
 Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
 ## Deployment Status Model
 | Status | Meaning |
 |--------|---------|
 | `STOPPED` | Intentionally stopped or initial state |
 | `STARTING` | Deploy in progress |
 | `RUNNING` | All replicas healthy and serving |
 | `DEGRADED` | Some replicas healthy, some dead |
 | `STOPPING` | Graceful shutdown in progress |
 | `FAILED` | Terminal failure (pre-flight, health check, or crash) |
 **Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
 **Deploy stages** (`DeployStage`): PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE (or FAILED at any stage).
 **Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
 **Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
 ## JAR Management
 - **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
 - **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
 - **Volume-based JAR mounting** for Docker-in-Docker setups: set `CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME` to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
 ## Runtime Type Detection
 The server detects the app framework from uploaded JARs and builds Docker entrypoints. The agent shaded JAR bundles the log appender, so no separate `cameleer-log-appender.jar` or `PropertiesLauncher` is needed:
 - **Detection** (`RuntimeDetector`): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes `META-INF/MANIFEST.MF` Main-Class: Spring Boot loader prefix -> `spring-boot`, Quarkus entry point -> `quarkus`, other Main-Class -> `plain-java` (extracts class name). Results stored on `AppVersion` (`detected_runtime_type`, `detected_main_class`).
 - **Runtime types** (`RuntimeType` enum): `AUTO`, `SPRING_BOOT`, `QUARKUS`, `PLAIN_JAVA`, `NATIVE`. Configurable per app/environment via `containerConfig.runtimeType` (default `"auto"`).
 - **Entrypoint per type**: All JVM types use `java -javaagent:/app/agent.jar -jar app.jar`. Plain Java uses `-cp` with explicit main class instead of `-jar`. Native runs the binary directly.
 - **Custom arguments** (`containerConfig.customArgs`): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses `sh -c`).
 - **AUTO resolution**: at deploy time (PRE_FLIGHT), `"auto"` resolves to the detected type from `AppVersion`. Fails deployment if detection was unsuccessful — user must set type explicitly.
 - **UI**: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.
 ## SaaS Multi-Tenant Network Isolation
 In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:
 - **Tenant network** (`cameleer-tenant-{slug}`) — primary internal bridge for all of a tenant's containers. Set as `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
 - **Shared services network** — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and `cameleer-traefik` for HTTP routing.
 - **Tenant-scoped environment networks** (`cameleer-env-{tenantId}-{envSlug}`) — per-environment discovery is scoped per tenant, so `alpha-corp`'s "dev" environment network is separate from `beta-corp`'s "dev" environment network.
 ## nginx / Reverse Proxy
 - `client_max_body_size 200m` is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.
--- a/.claude/rules/gitnexus.md
+++ b/.claude/rules/gitnexus.md
@@ -0,0 +1,98 @@
 # GitNexus — Code Intelligence
 This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
 > If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
 ## Always Do
 - **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
 - **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
 - **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
 - When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
 - When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
 ## When Debugging
 1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
 2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
 3. `READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
 4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
 ## When Refactoring
 - **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
 - **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
 - After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
 ## Never Do
 - NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
 - NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
 - NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
 - NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
 ## Tools Quick Reference
 | Tool | When to use | Command |
 |------|-------------|---------|
 | `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` |
 | `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` |
 | `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` |
 | `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` |
 | `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` |
 | `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` |
 ## Impact Risk Levels
 | Depth | Meaning | Action |
 |-------|---------|--------|
 | d=1 | WILL BREAK — direct callers/importers | MUST update these |
 | d=2 | LIKELY AFFECTED — indirect deps | Should test |
 | d=3 | MAY NEED TESTING — transitive | Test if critical path |
 ## Resources
 | Resource | Use for |
 |----------|---------|
 | `gitnexus://repo/cameleer-server/context` | Codebase overview, check index freshness |
 | `gitnexus://repo/cameleer-server/clusters` | All functional areas |
 | `gitnexus://repo/cameleer-server/processes` | All execution flows |
 | `gitnexus://repo/cameleer-server/process/{name}` | Step-by-step execution trace |
 ## Self-Check Before Finishing
 Before completing any code modification task, verify:
 1. `gitnexus_impact` was run for all modified symbols
 2. No HIGH/CRITICAL risk warnings were ignored
 3. `gitnexus_detect_changes()` confirms changes match expected scope
 4. All d=1 (WILL BREAK) dependents were updated
 ## Keeping the Index Fresh
 After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
 ```bash
 npx gitnexus analyze
 ```
 If the index previously included embeddings, preserve them by adding `--embeddings`:
 ```bash
 npx gitnexus analyze --embeddings
 ```
 To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
 > Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
 ## CLI
 | Task | Read this skill file |
 |------|---------------------|
 | Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
 | Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
 | Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
 | Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` |
 | Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` |
 | Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` |
--- a/.claude/rules/metrics.md
+++ b/.claude/rules/metrics.md
@@ -0,0 +1,85 @@
 ---
 paths:
  - "cameleer-server-app/**/metrics/**"
  - "cameleer-server-app/**/ServerMetrics*"
  - "ui/src/pages/RuntimeTab/**"
  - "ui/src/pages/DashboardTab/**"
 ---
 # Prometheus Metrics
 Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component:
 ## Gauges (auto-polled)
 | Metric | Tags | Source |
 |--------|------|--------|
 | `cameleer.agents.connected` | `state` (live, stale, dead, shutdown) | `AgentRegistryService.findByState()` |
 | `cameleer.agents.sse.active` | — | `SseConnectionManager.getConnectionCount()` |
 | `cameleer.ingestion.buffer.size` | `type` (execution, processor, log, metrics) | `WriteBuffer.size()` |
 | `cameleer.ingestion.accumulator.pending` | — | `ChunkAccumulator.getPendingCount()` |
 ## Counters
 | Metric | Tags | Instrumented in |
 |--------|------|-----------------|
 | `cameleer.ingestion.drops` | `reason` (buffer_full, no_agent, no_identity) | `LogIngestionController` |
 | `cameleer.agents.transitions` | `transition` (went_stale, went_dead, recovered) | `AgentLifecycleMonitor` |
 | `cameleer.deployments.outcome` | `status` (running, failed, degraded) | `DeploymentExecutor` |
 | `cameleer.auth.failures` | `reason` (invalid_token, revoked, oidc_rejected) | `JwtAuthenticationFilter` |
 ## Timers
 | Metric | Tags | Instrumented in |
 |--------|------|-----------------|
 | `cameleer.ingestion.flush.duration` | `type` (execution, processor, log) | `ExecutionFlushScheduler` |
 | `cameleer.deployments.duration` | — | `DeploymentExecutor` |
 ## Agent container Prometheus labels (set by PrometheusLabelBuilder at deploy time)
 | Runtime Type | `prometheus.path` | `prometheus.port` |
 |---|---|---|
 | `spring-boot` | `/actuator/prometheus` | `8081` |
 | `quarkus` / `native` | `/q/metrics` | `9000` |
 | `plain-java` | `/metrics` | `9464` |
 All containers also get `prometheus.scrape=true`. These labels enable Prometheus `docker_sd_configs` auto-discovery.
 ## Agent Metric Names (Micrometer)
 Agents send `MetricsSnapshot` records with Micrometer-convention metric names. The server stores them generically (ClickHouse `agent_metrics.metric_name`). The UI references specific names in `AgentInstance.tsx` for JVM charts.
 ### JVM metrics (used by UI)
 | Metric name | UI usage |
 |---|---|
 | `process.cpu.usage.value` | CPU % stat card + chart |
 | `jvm.memory.used.value` | Heap MB stat card + chart (tags: `area=heap`) |
 | `jvm.memory.max.value` | Heap max for % calculation (tags: `area=heap`) |
 | `jvm.threads.live.value` | Thread count chart |
 | `jvm.gc.pause.total_time` | GC time chart |
 ### Camel route metrics (stored, queried by dashboard)
 | Metric name | Type | Tags |
 |---|---|---|
 | `camel.exchanges.succeeded.count` | counter | `routeId`, `camelContext` |
 | `camel.exchanges.failed.count` | counter | `routeId`, `camelContext` |
 | `camel.exchanges.total.count` | counter | `routeId`, `camelContext` |
 | `camel.exchanges.failures.handled.count` | counter | `routeId`, `camelContext` |
 | `camel.route.policy.count` | count | `routeId`, `camelContext` |
 | `camel.route.policy.total_time` | total | `routeId`, `camelContext` |
 | `camel.route.policy.max` | gauge | `routeId`, `camelContext` |
 | `camel.routes.running.value` | gauge | — |
 Mean processing time = `camel.route.policy.total_time / camel.route.policy.count`. Min processing time is not available (Micrometer does not track minimums).
 ### Cameleer agent metrics
 | Metric name | Type | Tags |
 |---|---|---|
 | `cameleer.chunks.exported.count` | counter | `instanceId` |
 | `cameleer.chunks.dropped.count` | counter | `instanceId`, `reason` |
 | `cameleer.sse.reconnects.count` | counter | `instanceId` |
 | `cameleer.taps.evaluated.count` | counter | `instanceId` |
 | `cameleer.metrics.exported.count` | counter | `instanceId` |
--- a/.claude/rules/ui.md
+++ b/.claude/rules/ui.md
@@ -0,0 +1,43 @@
 ---
 paths:
  - "ui/**"
 ---
 # UI Structure
 The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments**.
 - **Exchanges** — route execution search and detail (`ui/src/pages/Exchanges/`)
 - **Dashboard** — metrics and stats with L1/L2/L3 drill-down (`ui/src/pages/DashboardTab/`)
 - **Runtime** — live agent status, logs, commands (`ui/src/pages/RuntimeTab/`)
 - **Deployments** — app management, JAR upload, deployment lifecycle (`ui/src/pages/AppsTab/`)
  - Config sub-tabs: **Monitoring | Resources | Variables | Traces & Taps | Route Recording**
  - Create app: full page at `/apps/new` (not a modal)
  - Deployment progress: `ui/src/components/DeploymentProgress.tsx` (7-stage step indicator)
 **Admin pages** (ADMIN-only, under `/admin/`):
 - **Sensitive Keys** (`ui/src/pages/Admin/SensitiveKeysPage.tsx`) — global sensitive key masking config. Shows agent built-in defaults as outlined Badge reference, editable Tag pills for custom keys, amber-highlighted push-to-agents toggle. Keys add to (not replace) agent defaults. Per-app sensitive key additions managed via `ApplicationConfigController` API. Note: `AppConfigDetailPage.tsx` exists but is not routed in `router.tsx`.
 ## Key UI Files
 - `ui/src/router.tsx` — React Router v6 routes
 - `ui/src/config.ts` — apiBaseUrl, basePath
 - `ui/src/auth/auth-store.ts` — Zustand: accessToken, user, roles, login/logout
 - `ui/src/api/environment-store.ts` — Zustand: selected environment (localStorage)
 - `ui/src/components/ContentTabs.tsx` — main tab switcher
 - `ui/src/components/ExecutionDiagram/` — interactive trace view (canvas)
 - `ui/src/components/ProcessDiagram/` — ELK-rendered route diagram
 - `ui/src/hooks/useScope.ts` — TabKey type, scope inference
 - `ui/src/components/StartupLogPanel.tsx` — deployment startup log viewer (container logs from ClickHouse, polls 3s while STARTING)
 - `ui/src/api/queries/logs.ts` — `useStartupLogs` hook for container startup log polling, `useLogs`/`useApplicationLogs` for general log search
 ## UI Styling
 - Always use `@cameleer/design-system` CSS variables for colors (`var(--amber)`, `var(--error)`, `var(--success)`, etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG `fill`/`stroke` attributes. SVG presentation attributes resolve `var()` correctly. All colors use CSS variables (no hardcoded hex).
 - Shared CSS modules in `ui/src/styles/` (table-section, log-panel, rate-colors, refresh-indicator, chart-card, section-card) — import these instead of duplicating patterns.
 - Shared `PageLoader` component replaces copy-pasted spinner patterns.
 - Design system components used consistently: `Select`, `Tabs`, `Toggle`, `Button`, `LogViewer`, `Label` — prefer DS components over raw HTML elements. `LogViewer` renders optional source badges (`container`, `app`, `agent`) via `LogEntry.source` field (DS v0.1.49+).
 - Environment slugs are auto-computed from display name (read-only in UI).
 - Brand assets: `@cameleer/design-system/assets/` provides `camel-logo.svg` (currentColor), `cameleer-{16,32,48,192,512}.png`, and `cameleer-logo.png`. Copied to `ui/public/` for use as favicon (`favicon-16.png`, `favicon-32.png`) and logo (`camel-logo.svg` — login dialog 36px, sidebar 28x24px).
 - Sidebar generates `/exchanges/` paths directly (no legacy `/apps/` redirects). basePath is centralized in `ui/src/config.ts`; router.tsx imports it instead of re-reading `<base>` tag.
 - Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g. `?text=` search query). Switching environment resets all filters and remounts pages.
--- a/.gitignore
+++ b/.gitignore
@@ -38,7 +38,8 @@ Thumbs.db
 logs/
 # Claude
-.claude/
+.claude/*
 !.claude/rules/
 .superpowers/
 .playwright-mcp/
 .worktrees/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -67,6 +67,10 @@ PostgreSQL (Flyway): `cameleer-server-app/src/main/resources/db/migration/`
 ClickHouse: `cameleer-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup)
 ## Maintaining .claude/rules/
 When adding, removing, or renaming classes, controllers, endpoints, UI components, or metrics, update the corresponding `.claude/rules/` file as part of the same change. The rule files are the class/API map that future sessions rely on — stale rules cause wrong assumptions. Treat rule file updates like updating an import: part of the change, not a separate task.
 ## Disabled Skills
 - Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands.