Files
cameleer-saas/CLAUDE.md

363 lines
23 KiB
Markdown
Raw Normal View History

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project
Cameleer SaaS — **vendor management plane** for the Cameleer observability stack. Two personas: **vendor** (platform:admin) manages the platform and provisions tenants; **tenant admin** (tenant:manage) manages their observability instance. The vendor creates tenants, which provisions per-tenant cameleer3-server + UI instances via Docker API. No example tenant — clean slate bootstrap, vendor creates everything.
## Ecosystem
This repo is the SaaS layer on top of two proven components:
- **cameleer3** (sibling repo) — Java agent using ByteBuddy for zero-code instrumentation of Camel apps. Captures route executions, processor traces, payloads, metrics, and route graph topology. Deploys as `-javaagent` JAR.
- **cameleer3-server** (sibling repo) — Spring Boot observability backend. Receives agent data via HTTP, pushes config/commands via SSE. PostgreSQL + ClickHouse storage. React SPA dashboard. JWT auth with Ed25519 config signing. Docker container orchestration for app deployments.
- **cameleer-website** — Marketing site (Astro 5)
- **design-system** — Shared React component library (`@cameleer/design-system` on Gitea npm registry)
Agent-server protocol is defined in `cameleer3/cameleer3-common/PROTOCOL.md`. The agent and server are mature, proven components — this repo wraps them with multi-tenancy, billing, and self-service onboarding.
## Key Classes
### Java Backend (`src/main/java/net/siegeln/cameleer/saas/`)
**config/** — Security, tenant isolation, web config
- `SecurityConfig.java` — OAuth2 JWT decoder (ES384, issuer/audience validation, scope extraction)
- `TenantIsolationInterceptor.java` — HandlerInterceptor on `/api/**`; JWT org_id -> TenantContext, path variable validation, fail-closed
- `TenantContext.java` — ThreadLocal<UUID> tenant ID storage
- `WebConfig.java` — registers TenantIsolationInterceptor
- `PublicConfigController.java` — GET /api/config (Logto endpoint, SPA client ID, scopes)
- `MeController.java` — GET /api/me (authenticated user, tenant list)
**tenant/** — Tenant data model
- `TenantEntity.java` — JPA entity (id, name, slug, tier, status, logto_org_id, stripe IDs, settings JSONB)
**vendor/** — Vendor console (platform:admin only)
- `VendorTenantService.java` — orchestrates tenant creation: DB record -> Logto org -> admin user -> license -> Docker provisioning -> OIDC config push -> redirect URI registration
- `VendorTenantController.java` — REST at `/api/vendor/tenants` (platform:admin required)
**portal/** — Tenant admin portal (org-scoped)
- `TenantPortalService.java` — customer-facing: dashboard (health from server), license, OIDC config, team, settings
- `TenantPortalController.java` — REST at `/api/tenant/*` (org-scoped)
**provisioning/** — Pluggable tenant provisioning
- `TenantProvisioner.java` — pluggable interface (like server's RuntimeOrchestrator)
- `DockerTenantProvisioner.java` — Docker implementation, creates per-tenant server + UI containers
- `TenantProvisionerAutoConfig.java` — auto-detects Docker socket
**license/** — License management
- `LicenseEntity.java` — JPA entity (id, tenant_id, tier, features JSONB, limits JSONB, expires_at)
- `LicenseService.java` — generation, validation, feature/limit lookups
- `LicenseController.java` — POST issue, GET verify, DELETE revoke
**identity/** — Logto & server integration
- `LogtoConfig.java` — Logto endpoint, M2M credentials (reads from bootstrap file)
- `LogtoManagementClient.java` — Logto Management API calls (create org, create user, add to org)
- `ServerApiClient.java` — M2M client for cameleer3-server API (Logto M2M token, `X-Cameleer-Protocol-Version: 1` header)
**audit/** — Audit logging
- `AuditEntity.java` — JPA entity (actor_id, tenant_id, action, resource, status)
- `AuditService.java` — log audit events (TENANT_CREATE, TENANT_UPDATE, etc.)
### React Frontend (`ui/src/`)
- `main.tsx` — React 19 root
- `router.tsx``/vendor/*` + `/tenant/*` with `RequireScope` guards and `LandingRedirect` that waits for scopes
- `Layout.tsx` — persona-aware sidebar: vendor sees expandable "Vendor" section (Tenants + Logto link), tenant admin sees Dashboard/License/OIDC/Team/Settings
- `OrgResolver.tsx` — merges global + org-scoped token scopes (vendor's platform:admin is global)
- `config.ts` — fetch Logto config from /platform/api/config
- `auth/useAuth.ts` — auth hook (isAuthenticated, logout, signIn)
- `auth/useOrganization.ts` — Zustand store for current tenant
- `auth/useScopes.ts` — decode JWT scopes, hasScope()
- `auth/ProtectedRoute.tsx` — guard (redirects to /login)
- **Vendor pages**: `VendorTenantsPage.tsx`, `CreateTenantPage.tsx`, `TenantDetailPage.tsx`
- **Tenant pages**: `TenantDashboardPage.tsx`, `TenantLicensePage.tsx`, `OidcConfigPage.tsx`, `TeamPage.tsx`, `SettingsPage.tsx`
### Custom Sign-in UI (`ui/sign-in/src/`)
- `SignInPage.tsx` — form with @cameleer/design-system components
- `experience-api.ts` — Logto Experience API client (4-step: init -> verify -> identify -> submit)
## Architecture Context
The SaaS platform is a **vendor management plane**. It does not proxy requests to servers — instead it provisions dedicated per-tenant cameleer3-server instances via Docker API. Each tenant gets isolated server + UI containers with their own database schemas, networks, and Traefik routing.
### Routing (single-domain, path-based via Traefik)
All services on one hostname. Two env vars control everything: `PUBLIC_HOST` + `PUBLIC_PROTOCOL`.
| Path | Target | Notes |
|------|--------|-------|
| `/platform/*` | cameleer-saas:8080 | SPA + API (`server.servlet.context-path: /platform`) |
| `/platform/vendor/*` | (SPA routes) | Vendor console (platform:admin) |
| `/platform/tenant/*` | (SPA routes) | Tenant admin portal (org-scoped) |
| `/t/{slug}/*` | per-tenant server-ui | Provisioned tenant UI containers (Traefik labels) |
| `/` | redirect -> `/platform/` | Via `docker/traefik-dynamic.yml` |
| `/*` (catch-all) | cameleer-logto:3001 (priority=1) | Custom sign-in UI, OIDC, interaction |
- SPA assets at `/_app/` (Vite `assetsDir: '_app'`) to avoid conflict with Logto's `/assets/`
- Logto `ENDPOINT` = `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}` (same domain, same origin)
- TLS: self-signed cert init container (`traefik-certs`) for dev, ACME for production
- Root `/` -> `/platform/` redirect via Traefik file provider (`docker/traefik-dynamic.yml`)
- LoginPage auto-redirects to Logto OIDC (no intermediate button)
- Per-tenant server containers get Traefik labels for `/t/{slug}/*` routing at provisioning time
### Docker Networks
Compose-defined networks:
| Network | Name on Host | Purpose |
|---------|-------------|---------|
| `cameleer` | `cameleer-saas_cameleer` | Compose default — shared services (DB, Logto, SaaS) |
| `cameleer-traefik` | `cameleer-traefik` (fixed `name:`) | Traefik + provisioned tenant containers |
Per-tenant networks (created dynamically by `DockerTenantProvisioner`):
| Network | Name Pattern | Purpose |
|---------|-------------|---------|
| Tenant network | `cameleer-tenant-{slug}` | Internal bridge, no internet — isolates tenant server + apps |
| Environment network | `cameleer-env-{tenantId}-{envSlug}` | Tenant-scoped (includes tenantId to prevent slug collision across tenants) |
Server containers join three networks: tenant network (primary), shared services network (`cameleer`), and traefik network. Apps deployed by the server use the tenant network as primary.
### Custom sign-in UI (`ui/sign-in/`)
Separate Vite+React SPA replacing Logto's default sign-in page. Visually matches cameleer3-server LoginPage.
- Built as custom Logto Docker image (`cameleer-logto`): `ui/sign-in/Dockerfile` = node build stage + `FROM ghcr.io/logto-io/logto:latest` + COPY dist over `/etc/logto/packages/experience/dist/`
- Uses `@cameleer/design-system` components (Card, Input, Button, FormField, Alert)
- Authenticates via Logto Experience API (4-step: init -> verify password -> identify -> submit -> redirect)
- `CUSTOM_UI_PATH` env var does NOT work for Logto OSS — must volume-mount or replace the experience dist directory
- Favicon bundled in `ui/sign-in/public/favicon.svg` (served by Logto, not SaaS)
### Auth enforcement
- All API endpoints enforce OAuth2 scopes via `@PreAuthorize("hasAuthority('SCOPE_xxx')")` annotations
- Tenant isolation enforced by `TenantIsolationInterceptor` (a single `HandlerInterceptor` on `/api/**` that resolves JWT org_id to TenantContext and validates `{tenantId}`, `{environmentId}`, `{appId}` path variables; fail-closed, platform admins bypass)
- 13 OAuth2 scopes on the Logto API resource (`https://api.cameleer.local`): 10 platform scopes + 3 server scopes (`server:admin`, `server:operator`, `server:viewer`), served to the frontend from `GET /platform/api/config`
- Server scopes map to server RBAC roles via JWT `scope` claim (SaaS platform path) or `roles` claim (server-ui OIDC login path)
- Org roles: `owner` -> `server:admin` + `tenant:manage`, `operator` -> `server:operator`, `viewer` -> `server:viewer`
- `saas-vendor` global role injected via `docker/vendor-seed.sh` (`VENDOR_SEED_ENABLED=true` in dev) — has `platform:admin` + all tenant scopes
- Custom `JwtDecoder` in `SecurityConfig.java` — ES384 algorithm, `at+jwt` token type, split issuer-uri (string validation) / jwk-set-uri (Docker-internal fetch), audience validation (`https://api.cameleer.local`)
- Logto Custom JWT (Phase 7b in bootstrap) injects a `roles` claim into access tokens based on org roles and global roles — this makes role data available to the server without Logto-specific code
### Auth routing by persona
| Persona | Logto role | Key scope | Landing route |
|---------|-----------|-----------|---------------|
| Vendor | `saas-vendor` (global) | `platform:admin` | `/vendor/tenants` |
| Tenant admin | org `owner` | `tenant:manage` | `/tenant` (dashboard) |
| Regular user (operator/viewer) | org member | `server:operator` or `server:viewer` | Redirected to server dashboard directly |
- `LandingRedirect` component waits for scopes to load, then routes to the correct persona landing page
- `RequireScope` guard on route groups enforces scope requirements
- SSO bridge: Logto session carries over to provisioned server's OIDC flow (Traditional Web App per tenant)
### Per-tenant server env vars (set by DockerTenantProvisioner)
These env vars are injected into provisioned per-tenant server containers:
| Env var | Value | Purpose |
|---------|-------|---------|
| `CAMELEER_OIDC_ISSUER_URI` | `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}/oidc` | Token issuer claim validation |
| `CAMELEER_OIDC_JWK_SET_URI` | `http://logto:3001/oidc/jwks` | Docker-internal JWK fetch |
| `CAMELEER_OIDC_TLS_SKIP_VERIFY` | `true` | Skip cert verify for OIDC discovery (dev only) |
| `CAMELEER_CORS_ALLOWED_ORIGINS` | `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}` | Allow browser requests through Traefik |
| `CAMELEER_RUNTIME_ENABLED` | `true` | Enable Docker orchestration |
| `CAMELEER_SERVER_URL` | `http://cameleer3-server-{slug}:8081` | Per-tenant server URL (DNS alias on tenant network) |
| `CAMELEER_ROUTING_DOMAIN` | `${PUBLIC_HOST}` | Domain for Traefik routing labels |
| `CAMELEER_ROUTING_MODE` | `path` | `path` or `subdomain` routing |
| `BASE_PATH` (server-ui) | `/t/{slug}` | React Router basename + `<base>` tag |
### Server OIDC role extraction (two paths)
| Path | Token type | Role source | How it works |
|------|-----------|-------------|--------------|
| SaaS platform -> server API | Logto org-scoped access token | `scope` claim | `JwtAuthenticationFilter.extractRolesFromScopes()` reads `server:admin` from scope |
| Server-ui SSO login | Logto JWT access token (via Traditional Web App) | `roles` claim | `OidcTokenExchanger` decodes access_token, reads `roles` injected by Custom JWT |
The server's OIDC config (`OidcConfig`) includes `audience` (RFC 8707 resource indicator) and `additionalScopes`. The `audience` is sent as `resource` in both the authorization request and token exchange, which makes Logto return a JWT access token instead of opaque. The Custom JWT script maps org roles to `roles: ["server:admin"]`. If OIDC returns no roles and the user already exists, `syncOidcRoles` preserves existing local roles.
### Deployment pipeline
App deployment is handled by the cameleer3-server's `DeploymentExecutor` (7-stage async flow):
1. PRE_FLIGHT — validate config, check JAR exists
2. PULL_IMAGE — pull base image if missing
3. CREATE_NETWORK — ensure cameleer-traefik and cameleer-env-{slug} networks
4. START_REPLICAS — create N containers with Traefik labels
5. HEALTH_CHECK — poll `/cameleer/health` on agent port 9464
6. SWAP_TRAFFIC — stop old deployment (blue/green)
7. COMPLETE — mark RUNNING or DEGRADED
Key files:
- `DeploymentExecutor.java` (in cameleer3-server) — async staged deployment
- `DockerRuntimeOrchestrator.java` (in cameleer3-server) — Docker client, container lifecycle
- `docker/runtime-base/Dockerfile` — base image with agent JAR, maps env vars to `-D` system properties
- `ServerApiClient.java` — M2M token acquisition for SaaS->server API calls (agent status). Uses `X-Cameleer-Protocol-Version: 1` header
- Docker socket access: `group_add: ["0"]` in docker-compose.dev.yml (not root group membership in Dockerfile)
- Network: deployed containers join `cameleer-tenant-{slug}` (primary, isolation) + `cameleer-traefik` (routing) + `cameleer-env-{tenantId}-{envSlug}` (environment isolation)
### Bootstrap (`docker/logto-bootstrap.sh`)
Idempotent script run via `logto-bootstrap` init container. **Clean slate** — no example tenant, no viewer user, no server configuration. Phases:
1. Wait for Logto health (no server to wait for — servers are provisioned per-tenant)
2. Get Management API token (reads `m-default` secret from DB)
3. Create Logto apps (SPA, Traditional Web App with `skipConsent`, M2M with Management API role + server API role)
3b. Create API resource scopes (10 platform + 3 server scopes)
4. Create org roles (owner, operator, viewer with API resource scope assignments) + M2M server role (`cameleer-m2m-server` with `server:admin` scope)
5. Create admin user (platform owner with Logto console access)
7b. Configure Logto Custom JWT for access tokens (maps org roles -> `roles` claim: admin->server:admin, member->server:viewer)
8. Configure Logto sign-in branding (Cameleer colors `#C6820E`/`#D4941E`, logo from `/platform/logo.svg`)
9. Cleanup seeded Logto apps
10. Write bootstrap results to `/data/logto-bootstrap.json`
Vendor user is seeded separately via `docker/vendor-seed.sh` (`VENDOR_SEED_ENABLED=true` in dev). The compose stack is: Traefik + PostgreSQL + ClickHouse + Logto + logto-bootstrap + cameleer-saas. No `cameleer3-server` or `cameleer3-server-ui` in compose — those are provisioned per-tenant by `DockerTenantProvisioner`.
### Tenant Provisioning Flow
When vendor creates a tenant via `VendorTenantService`:
1. Create `TenantEntity` (status=PROVISIONING) + Logto organization
2. Create admin user in Logto with owner org role
3. Add vendor user to new org for support access
4. Register OIDC redirect URIs for `/t/{slug}/oidc/callback` on Logto Traditional Web App
5. Generate license (tier-appropriate, 365 days)
6. Create tenant-isolated Docker network (`cameleer-tenant-{slug}`)
7. Create server + UI containers with correct env vars, Traefik labels, health check
8. Wait for health check (`/api/v1/health`, not `/actuator/health` which requires auth)
9. Push license token to server via M2M API
10. Push OIDC config (Logto Traditional Web App credentials) to server for SSO
11. Update tenant status -> ACTIVE
## Database Migrations
PostgreSQL (Flyway): `src/main/resources/db/migration/`
- V001 — tenants (id, name, slug, tier, status, logto_org_id, stripe IDs, settings JSONB)
- V002 — licenses (id, tenant_id, tier, features JSONB, limits JSONB, expires_at)
- V003 — environments (tenant -> environments 1:N)
- V004 — api_keys (auth tokens for agent registration)
- V005 — apps (Camel applications)
- V006 — deployments (app versions, deployment history)
- V007 — audit_log
- V008 — app resource limits
- V010 — cleanup of migrated tables
## Related Conventions
- Gitea-hosted: `gitea.siegeln.net/cameleer/`
- CI: `.gitea/workflows/` — Gitea Actions
- K8s target: k3s cluster at 192.168.50.86
- Docker images: CI builds and pushes all images — Dockerfiles use multi-stage builds, no local builds needed
- `cameleer-saas` — SaaS vendor management plane (frontend + JAR baked in)
- `cameleer-logto` — custom Logto with sign-in UI baked in
- `cameleer3-server` / `cameleer3-server-ui` — provisioned per-tenant (not in compose, created by `DockerTenantProvisioner`)
- `cameleer-runtime-base` — base image for deployed apps (agent JAR + JRE). CI downloads latest agent SNAPSHOT from Gitea Maven registry. Uses `CAMELEER_SERVER_URL` env var (not CAMELEER_EXPORT_ENDPOINT).
- Docker builds: `--no-cache`, `--provenance=false` for Gitea compatibility
- `docker-compose.dev.yml` — exposes ports for direct access, sets `SPRING_PROFILES_ACTIVE: dev`, `VENDOR_SEED_ENABLED: true`. Volume-mounts `./ui/dist` into the container so local UI builds are served without rebuilding the Docker image (`SPRING_WEB_RESOURCES_STATIC_LOCATIONS` overrides classpath). Adds Docker socket mount for tenant provisioning.
- Design system: import from `@cameleer/design-system` (Gitea npm registry)
## Disabled Skills
- Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands.
<!-- gitnexus:start -->
# GitNexus — Code Intelligence
This project is indexed by GitNexus as **cameleer-saas** (1686 symbols, 2709 relationships, 97 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
## Always Do
- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
## When Debugging
1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
3. `READ gitnexus://repo/cameleer-saas/process/{processName}` — trace the full execution flow step by step
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
## When Refactoring
- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
## Never Do
- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
## Tools Quick Reference
| Tool | When to use | Command |
|------|-------------|---------|
| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` |
| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` |
| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` |
| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` |
| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` |
| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` |
## Impact Risk Levels
| Depth | Meaning | Action |
|-------|---------|--------|
| d=1 | WILL BREAK — direct callers/importers | MUST update these |
| d=2 | LIKELY AFFECTED — indirect deps | Should test |
| d=3 | MAY NEED TESTING — transitive | Test if critical path |
## Resources
| Resource | Use for |
|----------|---------|
| `gitnexus://repo/cameleer-saas/context` | Codebase overview, check index freshness |
| `gitnexus://repo/cameleer-saas/clusters` | All functional areas |
| `gitnexus://repo/cameleer-saas/processes` | All execution flows |
| `gitnexus://repo/cameleer-saas/process/{name}` | Step-by-step execution trace |
## Self-Check Before Finishing
Before completing any code modification task, verify:
1. `gitnexus_impact` was run for all modified symbols
2. No HIGH/CRITICAL risk warnings were ignored
3. `gitnexus_detect_changes()` confirms changes match expected scope
4. All d=1 (WILL BREAK) dependents were updated
## Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
```bash
npx gitnexus analyze
```
If the index previously included embeddings, preserve them by adding `--embeddings`:
```bash
npx gitnexus analyze --embeddings
```
To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
## CLI
| Task | Read this skill file |
|------|---------------------|
| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` |
| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` |
| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` |
<!-- gitnexus:end -->