Files
cameleer-saas/docs/superpowers/specs/2026-04-09-platform-redesign.md
hsiegeln 63c194dab7
Some checks failed
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
chore: rename cameleer3 to cameleer
Rename Java packages from net.siegeln.cameleer3 to net.siegeln.cameleer,
update all references in workflows, Docker configs, docs, and bootstrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:44 +02:00

500 lines
22 KiB
Markdown

# Cameleer SaaS Platform Redesign — Design Spec
**Date:** 2026-04-09
**Status:** Approved (brainstorming session)
**Scope:** Redesign the SaaS platform from a read-only tenant viewer into a functional vendor management plane with tenant provisioning, license management, and customer self-service.
## Context
The SaaS platform currently has 3 pages (Dashboard, License, Admin Tenants) — all read-only. It cannot create tenants, provision servers, manage licenses, or let customers configure their own settings. The backend has foundations (TenantService, LicenseService, LogtoManagementClient, ServerApiClient, audit logging) but none are exposed through management workflows.
This spec redesigns the platform around two personas — **vendor** (us) and **customer** (tenant admin) — with a clear separation of concerns.
### Architectural Decisions (from brainstorming)
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Server isolation | Shared data stores, isolated server per tenant | Server is already standalone; PostgreSQL/ClickHouse shared with tenant_id partitioning |
| Auth model | Hybrid — SaaS uses Logto, server uses customer OIDC | Clean separation: SaaS is vendor plane, server is product plane |
| Tenant admin access | Both SaaS + server, with SSO bridge | Admin configures in SaaS, jumps to server for operations |
| Server data in SaaS | License compliance + health summary | Quick pulse without duplicating the server dashboard |
| Provisioning mechanism | Docker API via docker-java | Already a dependency, same pattern as server's RuntimeOrchestrator |
| Docker/K8s support | Pluggable interface, Docker first | Mirror server's RuntimeOrchestrator + auto-detection pattern |
---
## 1. Personas & User Stories
### Vendor (platform:admin scope)
| ID | Story | Acceptance Criteria |
|----|-------|-------------------|
| V1 | As a vendor, I want to create a tenant so I can onboard a new customer | Form collects name, slug, tier. Creates DB record + Logto org. Status = PROVISIONING. |
| V2 | As a vendor, I want to provision a server for a tenant so they have a running Cameleer instance | After tenant creation, SaaS creates a cameleer-server container via Docker API with correct env vars, network, and Traefik labels. Health check passes → status = ACTIVE. |
| V3 | As a vendor, I want to generate and assign a license to a tenant | License created with tier-appropriate features/limits/expiry. Token pushed to tenant's server via M2M API. |
| V4 | As a vendor, I want to suspend a tenant who hasn't paid | Suspend stops the server container and marks tenant SUSPENDED. Reactivation restarts it. |
| V5 | As a vendor, I want to view fleet health at a glance | Tenant list shows each tenant's server status (running/stopped/error), agent count vs limit, license expiry. |
| V6 | As a vendor, I want to delete/offboard a tenant | Stops and removes server container, revokes license, marks tenant DELETED. |
### Customer (tenant admin, org-scoped JWT)
| ID | Story | Acceptance Criteria |
|----|-------|-------------------|
| C1 | As a tenant admin, I want to see my dashboard with server health and license usage | Dashboard shows: server status (up/down), connected agents vs limit, environments vs limit, feature entitlements. |
| C2 | As a tenant admin, I want to configure external OIDC for my team | Form to set issuer URI, client ID, client secret, audience, claim mappings. SaaS pushes config to the tenant's server via M2M API. |
| C3 | As a tenant admin, I want to manage team members | View/invite/remove users in Logto org. Assign roles (owner/operator/viewer) that flow through to server access. |
| C4 | As a tenant admin, I want to access the server dashboard seamlessly | "Open Server Dashboard" navigates to the tenant's server URL. Initial auth via Logto (same OIDC provider until customer configures their own). |
| C5 | As a tenant admin, I want to view my license details | Tier, features, limits, validity, days remaining — enriched with actual usage data from server. |
| C6 | As a tenant admin, I want to see my organization settings | Tenant name, slug, tier, created date. Read-only (tier changes go through vendor). |
---
## 2. Information Architecture
### Route Structure
```
/platform/
├── /vendor/ (platform:admin only)
│ ├── /vendor/tenants Tenant list with fleet health overview
│ ├── /vendor/tenants/new Create tenant flow (create → provision → license)
│ └── /vendor/tenants/:id Tenant detail — server status, license, actions
├── /tenant/ (org-scoped, any authenticated user)
│ ├── /tenant/ Dashboard — server health + license usage
│ ├── /tenant/license License details + usage vs limits
│ ├── /tenant/oidc External OIDC configuration
│ ├── /tenant/team Team members + role management
│ └── /tenant/settings Organization settings
├── /login Logto OIDC redirect
└── /callback Logto callback handler
```
### Navigation
**Sidebar adapts to persona:**
- **Vendor** (`platform:admin`): "Tenants" section at top. If a tenant is selected (e.g., viewing detail), the tenant portal sections appear below for support/debugging.
- **Customer** (no `platform:admin`): Dashboard, License, OIDC, Team, Settings.
- **Footer**: "Open Server Dashboard" (contextual to current tenant).
**Landing page:**
- `platform:admin``/vendor/tenants`
- Otherwise → `/tenant/`
### What Happens to Existing Pages
| Current | Becomes | Changes |
|---------|---------|---------|
| `DashboardPage` | `/tenant/` | Add health data from server, license usage indicators |
| `LicensePage` | `/tenant/license` | Add usage enrichment (agents used/limit, envs used/limit) |
| `AdminTenantsPage` | `/vendor/tenants` | Full CRUD, health indicators, provision/suspend/delete actions |
---
## 3. Provisioning Architecture
### Pluggable Interface
Following the server's `RuntimeOrchestrator` pattern with auto-detection:
```java
public interface TenantProvisioner {
boolean isAvailable();
ProvisionResult provision(TenantProvisionRequest request);
void start(String tenantId);
void stop(String tenantId);
void remove(String tenantId);
ServerStatus getStatus(String tenantId);
String getServerEndpoint(String tenantId);
}
```
**Auto-detection** (same pattern as server's `RuntimeOrchestratorAutoConfig`):
```java
@Configuration
public class TenantProvisionerAutoConfig {
@Bean
TenantProvisioner tenantProvisioner() {
if (Files.exists(Path.of("/var/run/docker.sock"))) {
return new DockerTenantProvisioner(dockerClientConfig());
}
// Future: K8s detection (service account token)
return new DisabledTenantProvisioner();
}
}
```
### Docker Implementation
`DockerTenantProvisioner` uses docker-java to manage per-tenant server containers:
**Container specification per tenant:**
| Config | Value | Source |
|--------|-------|--------|
| Image | `gitea.siegeln.net/cameleer/cameleer-server:${VERSION}` | Global config |
| Name | `cameleer-server-${tenant.slug}` | Derived from tenant |
| Network | `cameleer` + `cameleer-traefik` | Fixed networks from compose |
| DNS alias | `cameleer-server-${tenant.slug}` | For SaaS→server M2M calls |
| Health check | `wget -q -O- http://localhost:8081/actuator/health` | Server's actuator |
| Restart policy | `unless-stopped` | Standard for services |
**Environment variables injected per tenant:**
| Env var | Value | Purpose |
|---------|-------|---------|
| `SPRING_DATASOURCE_URL` | `jdbc:postgresql://postgres:5432/cameleer` | Shared PostgreSQL |
| `CAMELEER_TENANT_ID` | `${tenant.slug}` | Tenant isolation key |
| `CAMELEER_OIDC_ISSUER_URI` | `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}/oidc` | Logto as initial OIDC |
| `CAMELEER_OIDC_JWK_SET_URI` | `http://logto:3001/oidc/jwks` | Docker-internal JWK |
| `CAMELEER_CORS_ALLOWED_ORIGINS` | `${PUBLIC_PROTOCOL}://${PUBLIC_HOST}` | Browser CORS |
| `CAMELEER_LICENSE_TOKEN` | `${license.token}` | License for this tenant |
| `CAMELEER_RUNTIME_ENABLED` | `true` | Enable Docker orchestration |
| `CAMELEER_SERVER_URL` | `http://cameleer-server-${slug}:8081` | Self-reference for agents |
| `CAMELEER_ROUTING_DOMAIN` | `${PUBLIC_HOST}` | Traefik routing domain |
| `CAMELEER_ROUTING_MODE` | `path` | Path-based routing |
**Traefik labels for per-tenant routing:**
```
traefik.enable=true
traefik.http.routers.server-${slug}.rule=PathPrefix(`/t/${slug}`)
traefik.http.routers.server-${slug}.tls=true
traefik.http.services.server-${slug}.loadbalancer.server.port=8081
```
**Server UI container per tenant:**
Each tenant also gets a `cameleer-server-ui` container:
| Config | Value |
|--------|-------|
| Name | `cameleer-server-ui-${tenant.slug}` |
| Image | `gitea.siegeln.net/cameleer/cameleer-server-ui:${VERSION}` |
| Env | `BASE_PATH=/t/${slug}` |
| Traefik | `PathPrefix(/t/${slug})` with `priority=2` (higher than API) |
The server UI serves static assets and proxies API calls to the backend. The `BASE_PATH` env var configures React Router's basename and nginx proxy target.
### Provision Flow
```
Vendor clicks "Create Tenant"
→ POST /api/vendor/tenants
1. Validate slug uniqueness
2. Create TenantEntity (status=PROVISIONING)
3. Create Logto organization
4. Generate license (tier-appropriate, 365 days)
5. Create server container (DockerTenantProvisioner.provision())
6. Create server UI container
7. Wait for health check (poll /actuator/health, timeout 60s)
8. Push license to server via M2M API (ServerApiClient)
9. Update status → ACTIVE
10. Audit log: TENANT_CREATE + TENANT_PROVISION + LICENSE_GENERATE
```
If provisioning fails at any step, the tenant remains in PROVISIONING status with an error message. The vendor can retry or delete.
### Suspend / Activate Flow
```
Suspend:
1. Stop server + UI containers (DockerTenantProvisioner.stop())
2. Set tenant status → SUSPENDED
3. Audit log: TENANT_SUSPEND
Activate:
1. Start server + UI containers (DockerTenantProvisioner.start())
2. Wait for health check
3. Set tenant status → ACTIVE
4. Audit log: TENANT_ACTIVATE
```
### Delete Flow
```
Delete:
1. Stop and remove server + UI containers (DockerTenantProvisioner.remove())
2. Revoke active license
3. Delete Logto organization (LogtoManagementClient.deleteOrganization())
4. Set tenant status → DELETED (soft delete, keep record for audit)
5. Audit log: TENANT_DELETE
```
---
## 4. Server Communication
### SaaS → Server (M2M API)
The existing `ServerApiClient` pattern (Logto M2M token, `X-Cameleer-Protocol-Version: 1` header) is extended for per-tenant endpoints:
```java
public class ServerApiClient {
// Existing: uses configured server-endpoint
// New: accepts dynamic endpoint per tenant
public ServerHealth getHealth(String serverEndpoint) { ... }
public void pushLicenseToken(String serverEndpoint, String token) { ... }
public void pushOidcConfig(String serverEndpoint, OidcConfigRequest config) { ... }
public ServerUsage getUsage(String serverEndpoint) { ... }
}
```
The `serverEndpoint` is resolved per tenant: `http://cameleer-server-${slug}:8081` (Docker-internal DNS).
### Health & Usage Data
**ServerHealth** (from server's `/actuator/health` + `/api/admin/status`):
- Server status: UP/DOWN
- Connected agents: count
- Active applications: count
- Error rate (last hour)
**ServerUsage** (from server API — new endpoint or existing data):
- Agent count vs license limit
- Environment count vs license limit
- Which features are actively used (topology, lineage, etc.)
The SaaS caches health data per tenant (refresh every 30s for the fleet view, on-demand for detail pages).
### SSO Bridge
**Initial state** (before customer OIDC): The tenant's server trusts Logto. The tenant admin has a Logto account. "Open Server Dashboard" navigates to `/t/{slug}/` — the server's OIDC flow detects the existing Logto session and authenticates the user.
**After customer OIDC**: The SaaS pushes the customer's OIDC config to the server via `ServerApiClient.pushOidcConfig()`. The server switches to trusting the customer's provider. The tenant admin authenticates via their company's OIDC when accessing the server.
---
## 5. Backend API Design
### Vendor Endpoints (platform:admin required)
| Method | Path | Purpose |
|--------|------|---------|
| `GET` | `/api/vendor/tenants` | List all tenants with health summary |
| `POST` | `/api/vendor/tenants` | Create tenant (triggers full provisioning flow) |
| `GET` | `/api/vendor/tenants/{id}` | Tenant detail with server status |
| `PATCH` | `/api/vendor/tenants/{id}` | Update tenant metadata (name, tier) |
| `POST` | `/api/vendor/tenants/{id}/suspend` | Suspend tenant |
| `POST` | `/api/vendor/tenants/{id}/activate` | Reactivate tenant |
| `DELETE` | `/api/vendor/tenants/{id}` | Offboard tenant |
| `POST` | `/api/vendor/tenants/{id}/license` | Generate/renew license |
| `GET` | `/api/vendor/tenants/{id}/health` | Server health check (on-demand) |
### Tenant Endpoints (org-scoped, tenant from JWT)
| Method | Path | Purpose |
|--------|------|---------|
| `GET` | `/api/tenant/dashboard` | Aggregated health + license usage |
| `GET` | `/api/tenant/license` | License details with usage data |
| `GET` | `/api/tenant/oidc` | Current OIDC configuration |
| `PUT` | `/api/tenant/oidc` | Update OIDC config (push to server) |
| `GET` | `/api/tenant/team` | Team members (from Logto org) |
| `POST` | `/api/tenant/team/invite` | Invite member |
| `PATCH` | `/api/tenant/team/{userId}/role` | Change member role |
| `DELETE` | `/api/tenant/team/{userId}` | Remove member |
| `GET` | `/api/tenant/settings` | Org settings |
### Existing Endpoints to Modify
| Current | Change |
|---------|--------|
| `GET /api/tenants` | Move to `/api/vendor/tenants`, add health data |
| `POST /api/tenants` | Move to `/api/vendor/tenants`, add provisioning |
| `GET /api/tenants/{id}` | Keep for backward compat, also available at `/api/vendor/tenants/{id}` |
| `GET /api/tenants/{id}/license` | Keep, also available at `/api/tenant/license` |
| `POST /api/tenants/{id}/license` | Move to `/api/vendor/tenants/{id}/license` |
| `GET /api/me` | Keep (used by OrgResolver) |
| `GET /api/config` | Keep (used by frontend bootstrap) |
---
## 6. Frontend Design
### Vendor Console
**Tenant List** (`/vendor/tenants`):
- DataTable with columns: Name, Slug, Tier (Badge), Status (Badge), Server (health indicator), Agents (used/limit), License (expiry or "None"), Created
- Row click → tenant detail
- "+ Create Tenant" button in header
- Status badges: ACTIVE (green), PROVISIONING (blue), SUSPENDED (amber), DELETED (gray)
- Server health: green dot (UP), red dot (DOWN), gray dot (no server)
**Create Tenant** (`/vendor/tenants/new`):
- Form with: Name, Slug (auto-generated from name, editable), Tier (dropdown: LOW/MID/HIGH/BUSINESS)
- On submit: shows provisioning progress (creating record → creating org → generating license → starting server → health check → done)
- Progress displayed as a step indicator or timeline
- On success: redirect to tenant detail
**Tenant Detail** (`/vendor/tenants/:id`):
- Header: Tenant name + tier badge + status badge
- KPI strip: Server Status, Agents (used/limit), Environments (used/limit), License (days remaining)
- Sections:
- **Server**: Status, endpoint URL, start/stop/restart actions
- **License**: Current license details, "Renew" button
- **Info**: Slug, created date, Logto org ID
- Actions: Suspend/Activate toggle, Delete (with confirmation)
### Tenant Portal
**Dashboard** (`/tenant/`):
- KPI strip: Server Status, Agents (used/limit), Environments (used/limit), License (days remaining)
- Quick links: "Open Server Dashboard", "View License", "Configure OIDC"
- If server is DOWN: prominent alert banner
**License** (`/tenant/license`):
- Reuses existing LicensePage layout
- Adds usage indicators: "2 of 3 agents", "1 of 1 environments"
- Progress bars for limits approaching capacity
- License token section (show/hide + copy)
**OIDC Configuration** (`/tenant/oidc`):
- Form: Issuer URI, Client ID, Client Secret (masked), Audience, Roles Claim
- Current status: "Using Logto (default)" or "External OIDC configured"
- Save pushes config to server via SaaS API
- "Test Connection" button (calls server's OIDC discovery endpoint)
- "Reset to Logto" button (reverts to default)
**Team Management** (`/tenant/team`):
- DataTable: Name, Email, Role (dropdown: Owner/Operator/Viewer), Actions (Remove)
- "+ Invite Member" button → form with email + role
- Role changes update Logto org membership
- Cannot remove the last owner
**Settings** (`/tenant/settings`):
- Read-only info: Name, Slug, Tier, Status, Created
- Server endpoint URL
- "Contact support to change tier" message (tier changes go through vendor)
### Shared Components
- **ServerStatusBadge**: Green dot + "Running", Red dot + "Stopped", Gray dot + "Provisioning"
- **UsageIndicator**: "2 / 3 agents" with progress bar, color-coded (green < 80%, amber < 100%, red = 100%)
- **ProvisioningProgress**: Step indicator for tenant creation flow
### Layout Changes
- Remove TopBar server controls (status filters, time range, auto-refresh) — these are not relevant to the SaaS platform. Use a simplified TopBar with breadcrumb, theme toggle, and user menu only.
- Sidebar: persona-aware navigation (vendor vs customer sections)
- Sidebar footer: "Open Server Dashboard" link with tenant-specific URL (`/t/{slug}/`)
---
## 7. Files to Create/Modify
### New Backend Files
| File | Purpose |
|------|---------|
| `provisioning/TenantProvisioner.java` | Pluggable provisioning interface |
| `provisioning/TenantProvisionRequest.java` | Provision request record |
| `provisioning/ProvisionResult.java` | Provision result record |
| `provisioning/ServerStatus.java` | Server health status record |
| `provisioning/DockerTenantProvisioner.java` | Docker implementation |
| `provisioning/DisabledTenantProvisioner.java` | No-op fallback |
| `provisioning/TenantProvisionerAutoConfig.java` | Auto-detection config |
| `vendor/VendorTenantController.java` | Vendor API endpoints |
| `vendor/VendorTenantService.java` | Vendor business logic (orchestrates provisioning + license + Logto) |
| `tenant/TenantPortalController.java` | Customer API endpoints |
| `tenant/TenantPortalService.java` | Customer business logic (reads from server, manages team) |
### Modified Backend Files
| File | Changes |
|------|---------|
| `identity/ServerApiClient.java` | Add per-tenant endpoint support, health/usage/OIDC methods |
| `identity/LogtoManagementClient.java` | Add user invite, role management, list org members |
| `tenant/TenantEntity.java` | Add `serverEndpoint` field, `provisionError` field |
| `tenant/TenantService.java` | Keep existing methods, used by VendorTenantService |
| `license/LicenseService.java` | Keep existing, add revoke method |
| `config/SecurityConfig.java` | Add vendor/tenant endpoint security rules |
| `config/TenantIsolationInterceptor.java` | Handle `/api/tenant/*` (resolve from JWT, no path variable) |
### New Frontend Files
| File | Purpose |
|------|---------|
| `pages/vendor/VendorTenantsPage.tsx` | Tenant list with fleet health |
| `pages/vendor/CreateTenantPage.tsx` | Create tenant wizard |
| `pages/vendor/TenantDetailPage.tsx` | Tenant detail with actions |
| `pages/tenant/TenantDashboardPage.tsx` | Customer dashboard (evolves from DashboardPage) |
| `pages/tenant/TenantLicensePage.tsx` | License with usage (evolves from LicensePage) |
| `pages/tenant/OidcConfigPage.tsx` | External OIDC configuration |
| `pages/tenant/TeamPage.tsx` | Team management |
| `pages/tenant/SettingsPage.tsx` | Organization settings |
| `components/ServerStatusBadge.tsx` | Shared server status indicator |
| `components/UsageIndicator.tsx` | License usage progress bar |
| `api/vendor-hooks.ts` | React Query hooks for vendor API |
| `api/tenant-hooks.ts` | React Query hooks for tenant API |
### Modified Frontend Files
| File | Changes |
|------|---------|
| `router.tsx` | Restructure routes: `/vendor/*`, `/tenant/*` |
| `components/Layout.tsx` | Persona-aware sidebar, simplified TopBar, tenant-specific server link |
| `auth/OrgResolver.tsx` | Handle vendor landing (redirect to `/vendor/tenants`) |
| `types/api.ts` | Add vendor/tenant API types |
| `api/client.ts` | No changes needed (generic fetch wrapper) |
### Files to Remove
| File | Reason |
|------|--------|
| `pages/DashboardPage.tsx` | Replaced by `tenant/TenantDashboardPage.tsx` |
| `pages/LicensePage.tsx` | Replaced by `tenant/TenantLicensePage.tsx` |
| `pages/AdminTenantsPage.tsx` | Replaced by `vendor/VendorTenantsPage.tsx` |
### Docker Changes
| File | Changes |
|------|---------|
| `docker-compose.yml` | Mount Docker socket into cameleer-saas container |
| `docker-compose.dev.yml` | Add Docker socket mount, group_add for Docker access |
### Database Migration
New migration `V011`:
- Add `server_endpoint` column to `tenants` (nullable VARCHAR, stores Docker-internal URL)
- Add `provision_error` column to `tenants` (nullable TEXT, stores last error message)
- Add `DELETED` to status enum (for soft-delete offboarding)
---
## 8. Existing Compose Stack Changes
The default `cameleer-server` and `cameleer-server-ui` containers in docker-compose.yml become the "bootstrap" server for the `default` tenant. When provisioning is enabled, new tenants get their own dynamically-created containers.
The existing compose stack continues to work as-is for development. The provisioner creates additional containers alongside the compose-managed ones.
For the `default` tenant (created by bootstrap), the SaaS recognizes the existing compose-managed server and doesn't try to provision a new one. This is detected by checking if a container named `cameleer-server-default` (or the compose-managed `cameleer-server`) already exists.
---
## 9. Out of Scope
- **Kubernetes provisioning** — interface defined, implementation deferred
- **Billing/Stripe** — fields exist in DB, no integration in this spec
- **Mobile responsiveness** — deferred
- **Self-service signup** — tenants created by vendor only
- **Custom domains** — deferred
- **Email notifications** — deferred
- **Usage-based metering** — deferred (license limits are checked but not metered)
---
## 10. Related Issues
| Issue | Relevance |
|-------|-----------|
| #1 | Epic: SaaS Management Platform |
| #3 | Tenant Provisioning & Lifecycle |
| #25 | K8s Operational Layer (deferred) |
| #29 | Billing & Metering (deferred) |
| #37 | Admin: Tenant creation UI — superseded by this spec |
| #38 | Cross-app session management — addressed by SSO bridge |