diff --git a/CLAUDE.md b/CLAUDE.md index af94528..b54a3f1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -33,17 +33,27 @@ Agent-server protocol is defined in `cameleer3/cameleer3-common/PROTOCOL.md`. Th - `TenantEntity.java` — JPA entity (id, name, slug, tier, status, logto_org_id, stripe IDs, settings JSONB) **vendor/** — Vendor console (platform:admin only) -- `VendorTenantService.java` — orchestrates tenant creation: DB record -> Logto org -> admin user -> license -> Docker provisioning -> OIDC config push -> redirect URI registration +- `VendorTenantService.java` — orchestrates tenant creation (sync: DB + Logto + license, async: Docker provisioning + config push), suspend/activate, delete, restart server, license renewal - `VendorTenantController.java` — REST at `/api/vendor/tenants` (platform:admin required) **portal/** — Tenant admin portal (org-scoped) -- `TenantPortalService.java` — customer-facing: dashboard (health from server), license, OIDC config, team, settings +- `TenantPortalService.java` — customer-facing: dashboard (health from server), license, SSO connectors, team, settings, server restart - `TenantPortalController.java` — REST at `/api/tenant/*` (org-scoped) **provisioning/** — Pluggable tenant provisioning - `TenantProvisioner.java` — pluggable interface (like server's RuntimeOrchestrator) - `DockerTenantProvisioner.java` — Docker implementation, creates per-tenant server + UI containers - `TenantProvisionerAutoConfig.java` — auto-detects Docker socket +- `DockerCertificateManager.java` — file-based cert management with atomic `.wip` swap (Docker volume) +- `DisabledCertificateManager.java` — no-op when certs dir unavailable +- `CertificateManagerAutoConfig.java` — auto-detects `/certs` directory + +**certificate/** — TLS certificate lifecycle management +- `CertificateManager.java` — provider interface (Docker now, K8s later) +- `CertificateService.java` — orchestrates stage/activate/restore/discard, DB metadata, tenant CA staleness +- `CertificateController.java` — REST at `/api/vendor/certificates` (platform:admin required) +- `CertificateEntity.java` — JPA entity (status: ACTIVE/STAGED/ARCHIVED, subject, fingerprint, etc.) +- `CertificateStartupListener.java` — seeds DB from filesystem on boot (for bootstrap-generated certs) **license/** — License management - `LicenseEntity.java` — JPA entity (id, tenant_id, tier, features JSONB, limits JSONB, expires_at) @@ -52,26 +62,26 @@ Agent-server protocol is defined in `cameleer3/cameleer3-common/PROTOCOL.md`. Th **identity/** — Logto & server integration - `LogtoConfig.java` — Logto endpoint, M2M credentials (reads from bootstrap file) -- `LogtoManagementClient.java` — Logto Management API calls (create org, create user, add to org) +- `LogtoManagementClient.java` — Logto Management API calls (create org, create user, add to org, get user, SSO connectors, JIT provisioning) - `ServerApiClient.java` — M2M client for cameleer3-server API (Logto M2M token, `X-Cameleer-Protocol-Version: 1` header) **audit/** — Audit logging -- `AuditEntity.java` — JPA entity (actor_id, tenant_id, action, resource, status) -- `AuditService.java` — log audit events (TENANT_CREATE, TENANT_UPDATE, etc.) +- `AuditEntity.java` — JPA entity (actor_id, actor_email, tenant_id, action, resource, status) +- `AuditService.java` — log audit events (TENANT_CREATE, TENANT_UPDATE, etc.); auto-resolves actor name from Logto when actorEmail is null (cached in-memory) ### React Frontend (`ui/src/`) - `main.tsx` — React 19 root - `router.tsx` — `/vendor/*` + `/tenant/*` with `RequireScope` guards and `LandingRedirect` that waits for scopes -- `Layout.tsx` — persona-aware sidebar: vendor sees expandable "Vendor" section (Tenants + Logto link), tenant admin sees Dashboard/License/OIDC/Team/Settings +- `Layout.tsx` — persona-aware sidebar: vendor sees expandable "Vendor" section (Tenants, Audit Log, Certificates, Identity/Logto), tenant admin sees Dashboard/License/SSO/Team/Audit/Settings - `OrgResolver.tsx` — merges global + org-scoped token scopes (vendor's platform:admin is global) - `config.ts` — fetch Logto config from /platform/api/config - `auth/useAuth.ts` — auth hook (isAuthenticated, logout, signIn) - `auth/useOrganization.ts` — Zustand store for current tenant - `auth/useScopes.ts` — decode JWT scopes, hasScope() - `auth/ProtectedRoute.tsx` — guard (redirects to /login) -- **Vendor pages**: `VendorTenantsPage.tsx`, `CreateTenantPage.tsx`, `TenantDetailPage.tsx` -- **Tenant pages**: `TenantDashboardPage.tsx`, `TenantLicensePage.tsx`, `OidcConfigPage.tsx`, `TeamPage.tsx`, `SettingsPage.tsx` +- **Vendor pages**: `VendorTenantsPage.tsx`, `CreateTenantPage.tsx`, `TenantDetailPage.tsx`, `VendorAuditPage.tsx`, `CertificatesPage.tsx` +- **Tenant pages**: `TenantDashboardPage.tsx`, `TenantLicensePage.tsx`, `SsoPage.tsx`, `TeamPage.tsx`, `TenantAuditPage.tsx`, `SettingsPage.tsx` ### Custom Sign-in UI (`ui/sign-in/src/`) @@ -97,7 +107,7 @@ All services on one hostname. Two env vars control everything: `PUBLIC_HOST` + ` - SPA assets at `/_app/` (Vite `assetsDir: '_app'`) to avoid conflict with Logto's `/assets/` - Logto `ENDPOINT` = `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}` (same domain, same origin) -- TLS: self-signed cert init container (`traefik-certs`) for dev, ACME for production +- TLS: `traefik-certs` init container generates self-signed cert (dev) or copies user-supplied cert via `CERT_FILE`/`KEY_FILE`/`CA_FILE` env vars. Runtime cert replacement via vendor UI (stage/activate/restore). ACME for production (future). - Root `/` -> `/platform/` redirect via Traefik file provider (`docker/traefik-dynamic.yml`) - LoginPage auto-redirects to Logto OIDC (no intermediate button) - Per-tenant server containers get Traefik labels for `/t/{slug}/*` routing at provisioning time @@ -163,7 +173,7 @@ These env vars are injected into provisioned per-tenant server containers: |---------|-------|---------| | `CAMELEER_OIDC_ISSUER_URI` | `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}/oidc` | Token issuer claim validation | | `CAMELEER_OIDC_JWK_SET_URI` | `http://logto:3001/oidc/jwks` | Docker-internal JWK fetch | -| `CAMELEER_OIDC_TLS_SKIP_VERIFY` | `true` | Skip cert verify for OIDC discovery (dev only) | +| `CAMELEER_OIDC_TLS_SKIP_VERIFY` | `true` (conditional) | Skip cert verify for OIDC discovery; only set when no `/certs/ca.pem` exists | | `CAMELEER_CORS_ALLOWED_ORIGINS` | `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}` | Allow browser requests through Traefik | | `CAMELEER_RUNTIME_ENABLED` | `true` | Enable Docker orchestration | | `CAMELEER_SERVER_URL` | `http://cameleer3-server-{slug}:8081` | Per-tenant server URL (DNS alias on tenant network) | @@ -181,6 +191,7 @@ These env vars are injected into provisioned per-tenant server containers: |-------|---------------|---------| | `/var/run/docker.sock` | `/var/run/docker.sock` | Docker socket for app deployment orchestration | | `cameleer-jars-{slug}` (volume) | `/data/jars` | Shared JAR storage — server writes, deployed app containers read | +| `cameleer-saas_certs` (volume, ro) | `/certs` | Platform TLS certs + CA bundle for OIDC trust | ### Server OIDC role extraction (two paths) @@ -228,23 +239,32 @@ Idempotent script run via `logto-bootstrap` init container. **Clean slate** — 12. (Optional) Vendor seed: create `saas-vendor` global role, vendor user, grant Logto console access (`VENDOR_SEED_ENABLED=true` in dev). -The compose stack is: Traefik + PostgreSQL + ClickHouse + Logto + logto-bootstrap + cameleer-saas. The compose stack is: Traefik + PostgreSQL + ClickHouse + Logto + logto-bootstrap + cameleer-saas. No `cameleer3-server` or `cameleer3-server-ui` in compose — those are provisioned per-tenant by `DockerTenantProvisioner`. +The compose stack is: Traefik + traefik-certs (init) + PostgreSQL + ClickHouse + Logto + logto-bootstrap (init) + cameleer-saas. No `cameleer3-server` or `cameleer3-server-ui` in compose — those are provisioned per-tenant by `DockerTenantProvisioner`. ### Tenant Provisioning Flow When vendor creates a tenant via `VendorTenantService`: + +**Synchronous (in `createAndProvision`):** 1. Create `TenantEntity` (status=PROVISIONING) + Logto organization 2. Create admin user in Logto with owner org role 3. Add vendor user to new org for support access 4. Register OIDC redirect URIs for `/t/{slug}/oidc/callback` on Logto Traditional Web App 5. Generate license (tier-appropriate, 365 days) -6. Create tenant-isolated Docker network (`cameleer-tenant-{slug}`) -7. Create server container with env vars, Traefik labels (`traefik.docker.network`), health check, Docker socket bind, JAR volume (`cameleer-jars-{slug}:/data/jars`) -8. Create UI container with `CAMELEER_API_URL`, `BASE_PATH`, Traefik strip-prefix labels -9. Wait for health check (`/api/v1/health`, not `/actuator/health` which requires auth) -10. Push license token to server via M2M API -11. Push OIDC config (Traditional Web App credentials + `additionalScopes: [urn:logto:scope:organizations, urn:logto:scope:organization_roles]`) to server for SSO -11. Update tenant status -> ACTIVE +6. Return immediately — UI shows provisioning spinner, polls via `refetchInterval` + +**Asynchronous (in `provisionAsync`, `@Async`):** +7. Create tenant-isolated Docker network (`cameleer-tenant-{slug}`) +8. Create server container with env vars, Traefik labels (`traefik.docker.network`), health check, Docker socket bind, JAR volume, certs volume (ro) +9. Create UI container with `CAMELEER_API_URL`, `BASE_PATH`, Traefik strip-prefix labels +10. Wait for health check (`/api/v1/health`, not `/actuator/health` which requires auth) +11. Push license token to server via M2M API +12. Push OIDC config (Traditional Web App credentials + `additionalScopes: [urn:logto:scope:organizations, urn:logto:scope:organization_roles]`) to server for SSO +13. Update tenant status -> ACTIVE (or set `provisionError` on failure) + +**Server restart** (available to vendor + tenant admin): +- `POST /api/vendor/tenants/{id}/restart` (vendor) and `POST /api/tenant/server/restart` (tenant) +- Calls `TenantProvisioner.stop(slug)` then `start(slug)` — restarts server + UI containers only ## Database Migrations @@ -258,6 +278,8 @@ PostgreSQL (Flyway): `src/main/resources/db/migration/` - V007 — audit_log - V008 — app resource limits - V010 — cleanup of migrated tables +- V011 — add provisioning fields (server_endpoint, provision_error) +- V012 — certificates table + tenants.ca_applied_at ## Related Conventions @@ -280,7 +302,7 @@ PostgreSQL (Flyway): `src/main/resources/db/migration/` # GitNexus — Code Intelligence -This project is indexed by GitNexus as **cameleer-saas** (1976 symbols, 3805 relationships, 165 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely. +This project is indexed by GitNexus as **cameleer-saas** (2057 symbols, 4069 relationships, 172 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely. > If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first. diff --git a/HOWTO.md b/HOWTO.md index 679d4e4..6d8d7a4 100644 --- a/HOWTO.md +++ b/HOWTO.md @@ -35,19 +35,21 @@ curl http://localhost:8080/actuator/health ## Architecture -The platform runs as a Docker Compose stack with 6 services: +The platform runs as a Docker Compose stack: | Service | Image | Port | Purpose | |---------|-------|------|---------| -| **traefik** | traefik:v3 | 80, 443 | Reverse proxy, TLS, routing | +| **traefik-certs** | alpine:latest | — | Init container: generates self-signed cert or copies user-supplied cert | +| **traefik** | traefik:v3 | 80, 443, 3002 | Reverse proxy, TLS termination, routing | | **postgres** | postgres:16-alpine | 5432* | Platform database + Logto database | | **logto** | ghcr.io/logto-io/logto | 3001*, 3002* | Identity provider (OIDC) | -| **cameleer-saas** | cameleer-saas:latest | 8080* | SaaS API server | -| **cameleer3-server** | cameleer3-server:latest | 8081 | Observability backend | +| **cameleer-saas** | cameleer-saas:latest | 8080* | SaaS API server + vendor UI | | **clickhouse** | clickhouse-server:latest | 8123* | Trace/metrics/log storage | *Ports exposed to host only with `docker-compose.dev.yml` overlay. +Per-tenant `cameleer3-server` and `cameleer3-server-ui` containers are provisioned dynamically by `DockerTenantProvisioner` — they are NOT part of the compose stack. + ## Installation ### 1. Environment Configuration @@ -83,7 +85,25 @@ This creates `keys/ed25519.key` (private) and `keys/ed25519.pub` (public). The k If no key files are configured, the platform generates ephemeral keys on startup (suitable for development only -- keys change on every restart). -### 3. Start the Stack +### 3. TLS Certificate (Optional) + +By default, the `traefik-certs` init container generates a self-signed certificate for `PUBLIC_HOST`. To supply your own certificate at bootstrap time, set these env vars in `.env`: + +```bash +CERT_FILE=/path/to/cert.pem # PEM-encoded certificate +KEY_FILE=/path/to/key.pem # PEM-encoded private key +CA_FILE=/path/to/ca.pem # Optional: CA bundle (for private CA trust) +``` + +The init container validates that the key matches the certificate before accepting. If validation fails, the container exits with an error. + +**Runtime certificate replacement** is available via the vendor UI at `/vendor/certificates`: +- Upload a new cert+key+CA bundle (staged, not yet active) +- Validate and activate (atomic swap, Traefik hot-reloads) +- Roll back to the previous certificate if needed +- Track which tenants need a restart to pick up CA bundle changes + +### 4. Start the Stack **Development** (ports exposed for direct access): ```bash @@ -95,7 +115,7 @@ docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d docker compose up -d ``` -### 4. Verify Services +### 5. Verify Services ```bash # Health check @@ -287,6 +307,42 @@ Query params: `since`, `until` (ISO timestamps), `limit` (default 500), `stream` |------|-------------| | `/dashboard` | cameleer3-server observability dashboard (forward-auth protected) | +### Vendor: Certificates (platform:admin) +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/vendor/certificates` | Overview (active, staged, archived, stale count) | +| POST | `/api/vendor/certificates/stage` | Upload cert+key+CA (multipart) | +| POST | `/api/vendor/certificates/activate` | Promote staged -> active | +| POST | `/api/vendor/certificates/restore` | Swap archived <-> active | +| DELETE | `/api/vendor/certificates/staged` | Discard staged cert | +| GET | `/api/vendor/certificates/stale-tenants` | Count tenants needing CA restart | + +### Vendor: Tenants (platform:admin) +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/vendor/tenants` | List all tenants | +| POST | `/api/vendor/tenants` | Create tenant (async provisioning) | +| GET | `/api/vendor/tenants/{id}` | Tenant detail + server state | +| POST | `/api/vendor/tenants/{id}/restart` | Restart server containers | +| POST | `/api/vendor/tenants/{id}/suspend` | Suspend tenant | +| POST | `/api/vendor/tenants/{id}/activate` | Activate tenant | +| DELETE | `/api/vendor/tenants/{id}` | Delete tenant | +| POST | `/api/vendor/tenants/{id}/license` | Renew license | + +### Tenant Portal (org-scoped) +| Method | Path | Description | +|--------|------|-------------| +| GET | `/api/tenant/dashboard` | Tenant dashboard data | +| GET | `/api/tenant/license` | License details | +| POST | `/api/tenant/server/restart` | Restart server | +| GET | `/api/tenant/team` | List team members | +| POST | `/api/tenant/team/invite` | Invite team member | +| DELETE | `/api/tenant/team/{userId}` | Remove team member | +| GET | `/api/tenant/settings` | Tenant settings | +| GET | `/api/tenant/sso` | List SSO connectors | +| POST | `/api/tenant/sso` | Create SSO connector | +| GET | `/api/tenant/audit` | Tenant audit log | + ### Health | Method | Path | Description | |--------|------|-------------|