Cameleer SaaS — **vendor management plane** for the Cameleer observability stack. Two personas: **vendor** (platform:admin) manages the platform and provisions tenants; **tenant admin** (tenant:manage) manages their observability instance. The vendor creates tenants, which provisions per-tenant cameleer3-server + UI instances via Docker API. No example tenant — clean slate bootstrap, vendor creates everything.
Agent-server protocol is defined in `cameleer3/cameleer3-common/PROTOCOL.md`. The agent and server are mature, proven components — this repo wraps them with multi-tenancy, billing, and self-service onboarding.
-`VendorTenantController.java` — REST at `/api/vendor/tenants` (platform:admin required). List endpoint returns `VendorTenantSummary` with fleet health data (agentCount, environmentCount, agentLimit) fetched in parallel via `CompletableFuture`.
-`InfrastructureService.java` — raw JDBC queries against shared PostgreSQL and ClickHouse for per-tenant infrastructure monitoring (schema sizes, table stats, row counts, disk usage)
-`InfrastructureController.java` — REST at `/api/vendor/infrastructure` (platform:admin required). PostgreSQL and ClickHouse overview with per-tenant breakdown.
-`TenantPortalService.java` — customer-facing: dashboard (health + agent/env counts from server via M2M), license, SSO connectors, team, settings (public endpoint URL), server restart/upgrade, password management (own + team + server admin)
-`TenantPortalController.java` — REST at `/api/tenant/*` (org-scoped, includes CA cert management at `/api/tenant/ca`, password endpoints at `/api/tenant/password` and `/api/tenant/server/admin-password`)
-`DockerTenantProvisioner.java` — Docker implementation, creates per-tenant server + UI containers. `upgrade(slug)` force-pulls latest images and removes server+UI containers (preserves app containers, volumes, networks) for re-provisioning. `remove(slug)` does full cleanup: label-based container removal, env networks, tenant network, JAR volume.
-`TenantDataCleanupService.java` — GDPR data erasure on tenant delete: drops PostgreSQL `tenant_{slug}` schema, deletes ClickHouse data across all tables with `tenant_id` column
-`AuditService.java` — log audit events (TENANT_CREATE, TENANT_UPDATE, etc.); auto-resolves actor name from Logto when actorEmail is null (cached in-memory)
The SaaS platform is a **vendor management plane**. It does not proxy requests to servers — instead it provisions dedicated per-tenant cameleer3-server instances via Docker API. Each tenant gets isolated server + UI containers with their own database schemas, networks, and Traefik routing.
All services on one hostname. Infrastructure containers (Traefik, Logto) use `PUBLIC_HOST` + `PUBLIC_PROTOCOL` env vars directly. The SaaS app reads these via `CAMELEER_SAAS_PROVISIONING_PUBLICHOST` / `CAMELEER_SAAS_PROVISIONING_PUBLICPROTOCOL` (Spring Boot properties `cameleer.saas.provisioning.publichost` / `cameleer.saas.provisioning.publicprotocol`).
- TLS: `traefik-certs` init container generates self-signed cert (dev) or copies user-supplied cert via `CERT_FILE`/`KEY_FILE`/`CA_FILE` env vars. Default cert configured in `docker/traefik-dynamic.yml` (NOT static `traefik.yml` — Traefik v3 ignores `tls.stores.default` in static config). Runtime cert replacement via vendor UI (stage/activate/restore). ACME for production (future). Server containers import `/certs/ca.pem` into JVM truststore at startup via `docker-entrypoint.sh` for OIDC trust.
Server containers join three networks: tenant network (primary), shared services network (`cameleer`), and traefik network. Apps deployed by the server use the tenant network as primary.
**IMPORTANT:** Dynamically-created containers MUST have `traefik.docker.network=cameleer-traefik` label. Traefik's Docker provider defaults to `network: cameleer` (compose-internal name) for IP resolution, which doesn't match dynamically-created containers connected via Docker API using the host network name (`cameleer-saas_cameleer`). Without this label, Traefik returns 504 Gateway Timeout for `/t/{slug}/api/*` paths.
- Tenant isolation enforced by `TenantIsolationInterceptor` (a single `HandlerInterceptor` on `/api/**` that resolves JWT org_id to TenantContext and validates `{tenantId}`, `{environmentId}`, `{appId}` path variables; fail-closed, platform admins bypass)
- 13 OAuth2 scopes on the Logto API resource (`https://api.cameleer.local`): 10 platform scopes + 3 server scopes (`server:admin`, `server:operator`, `server:viewer`), served to the frontend from `GET /platform/api/config`
- Logto Custom JWT (Phase 7b in bootstrap) injects a `roles` claim into access tokens based on org roles and global roles — this makes role data available to the server without Logto-specific code
| `CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY` | `true` (conditional) | Skip cert verify for OIDC discovery; only set when no `/certs/ca.pem` exists. When ca.pem exists, the server's `docker-entrypoint.sh` imports it into the JVM truststore instead. |
| `CAMELEER_SAAS_PROVISIONING_PUBLICHOST` | `cameleer.saas.provisioning.publichost` | Public hostname (same value as infrastructure `PUBLIC_HOST`) |
| `CAMELEER_SAAS_PROVISIONING_PUBLICPROTOCOL` | `cameleer.saas.provisioning.publicprotocol` | Public protocol (same value as infrastructure `PUBLIC_PROTOCOL`) |
**Note:** `PUBLIC_HOST` and `PUBLIC_PROTOCOL` remain as infrastructure env vars for Traefik and Logto containers. The SaaS app reads its own copies via the `CAMELEER_SAAS_PROVISIONING_*` prefix. `LOGTO_ENDPOINT` and `LOGTO_DB_PASSWORD` are infrastructure env vars for the Logto service and are unchanged.
The server's OIDC config (`OidcConfig`) includes `audience` (RFC 8707 resource indicator) and `additionalScopes`. The `audience` is sent as `resource` in both the authorization request and token exchange, which makes Logto return a JWT access token instead of opaque. The Custom JWT script maps org roles to `roles: ["server:admin"]`.
**CRITICAL:** `additionalScopes` MUST include `urn:logto:scope:organizations` and `urn:logto:scope:organization_roles` — without these, Logto doesn't populate `context.user.organizationRoles` in the Custom JWT script, so the `roles` claim is empty and all users get `defaultRoles` (VIEWER). The server's `OidcAuthController.applyClaimMappings()` uses OIDC token roles (from Custom JWT) as fallback when no DB claim mapping rules exist: claim mapping rules > OIDC token roles > defaultRoles.
The compose stack is: Traefik + traefik-certs (init) + PostgreSQL + ClickHouse + Logto + logto-bootstrap (init) + cameleer-saas. No `cameleer3-server` or `cameleer3-server-ui` in compose — those are provisioned per-tenant by `DockerTenantProvisioner`.
8. Create server container with env vars, Traefik labels (`traefik.docker.network`), health check, Docker socket bind, JAR volume, certs volume (ro)
9. Create UI container with `CAMELEER_API_URL`, `BASE_PATH`, Traefik strip-prefix labels
10. Wait for health check (`/api/v1/health`, not `/actuator/health` which requires auth)
11. Push license token to server via M2M API
12. Push OIDC config (Traditional Web App credentials + `additionalScopes: [urn:logto:scope:organizations, urn:logto:scope:organization_roles]`) to server for SSO
13. Update tenant status -> ACTIVE (or set `provisionError` on failure)
**Server restart** (available to vendor + tenant admin):
-`POST /api/vendor/tenants/{id}/restart` (vendor) and `POST /api/tenant/server/restart` (tenant)
-`POST /api/tenant/team/{userId}/password` — tenant admin resets a team member's Logto password (validates org membership first)
-`POST /api/tenant/server/admin-password` — tenant admin resets the server's built-in local admin password (via M2M API to `POST /api/v1/admin/users/user:admin/password`)
-`cameleer-runtime-base` — base image for deployed apps (agent JAR + JRE). CI downloads latest agent SNAPSHOT from Gitea Maven registry. Uses `CAMELEER_SERVER_RUNTIME_SERVERURL` env var (not CAMELEER_EXPORT_ENDPOINT).
-`docker-compose.dev.yml` — exposes ports for direct access, sets `SPRING_PROFILES_ACTIVE: dev`, `VENDOR_SEED_ENABLED: true`. Volume-mounts `./ui/dist` into the container so local UI builds are served without rebuilding the Docker image (`SPRING_WEB_RESOURCES_STATIC_LOCATIONS` overrides classpath). Adds Docker socket mount for tenant provisioning.
This project is indexed by GitNexus as **cameleer-saas** (2468 symbols, 5458 relationships, 207 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
## Always Do
- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
## When Debugging
1.`gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
2.`gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
3.`READ gitnexus://repo/cameleer-saas/process/{processName}` — trace the full execution flow step by step
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
## When Refactoring
- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
## Never Do
- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
Before completing any code modification task, verify:
1.`gitnexus_impact` was run for all modified symbols
2. No HIGH/CRITICAL risk warnings were ignored
3.`gitnexus_detect_changes()` confirms changes match expected scope
4. All d=1 (WILL BREAK) dependents were updated
## Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
```bash
npx gitnexus analyze
```
If the index previously included embeddings, preserve them by adding `--embeddings`:
```bash
npx gitnexus analyze --embeddings
```
To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
## CLI
| Task | Read this skill file |
|------|---------------------|
| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |