Files
cameleer-saas/docs/superpowers/specs/2026-04-09-platform-redesign.md
hsiegeln 7b92de4017 docs: add platform redesign spec with user stories
Redesign SaaS platform from read-only viewer into vendor management
plane with tenant provisioning, license management, and customer
self-service. Two personas (vendor/customer), pluggable provisioning
interface (Docker first, K8s later), per-tenant server instances.

User stories tracked as Gitea issues #40-#51. Closes #37.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 22:29:01 +02:00

22 KiB

Cameleer SaaS Platform Redesign — Design Spec

Date: 2026-04-09 Status: Approved (brainstorming session) Scope: Redesign the SaaS platform from a read-only tenant viewer into a functional vendor management plane with tenant provisioning, license management, and customer self-service.

Context

The SaaS platform currently has 3 pages (Dashboard, License, Admin Tenants) — all read-only. It cannot create tenants, provision servers, manage licenses, or let customers configure their own settings. The backend has foundations (TenantService, LicenseService, LogtoManagementClient, ServerApiClient, audit logging) but none are exposed through management workflows.

This spec redesigns the platform around two personas — vendor (us) and customer (tenant admin) — with a clear separation of concerns.

Architectural Decisions (from brainstorming)

Decision Choice Rationale
Server isolation Shared data stores, isolated server per tenant Server is already standalone; PostgreSQL/ClickHouse shared with tenant_id partitioning
Auth model Hybrid — SaaS uses Logto, server uses customer OIDC Clean separation: SaaS is vendor plane, server is product plane
Tenant admin access Both SaaS + server, with SSO bridge Admin configures in SaaS, jumps to server for operations
Server data in SaaS License compliance + health summary Quick pulse without duplicating the server dashboard
Provisioning mechanism Docker API via docker-java Already a dependency, same pattern as server's RuntimeOrchestrator
Docker/K8s support Pluggable interface, Docker first Mirror server's RuntimeOrchestrator + auto-detection pattern

1. Personas & User Stories

Vendor (platform:admin scope)

ID Story Acceptance Criteria
V1 As a vendor, I want to create a tenant so I can onboard a new customer Form collects name, slug, tier. Creates DB record + Logto org. Status = PROVISIONING.
V2 As a vendor, I want to provision a server for a tenant so they have a running Cameleer instance After tenant creation, SaaS creates a cameleer3-server container via Docker API with correct env vars, network, and Traefik labels. Health check passes → status = ACTIVE.
V3 As a vendor, I want to generate and assign a license to a tenant License created with tier-appropriate features/limits/expiry. Token pushed to tenant's server via M2M API.
V4 As a vendor, I want to suspend a tenant who hasn't paid Suspend stops the server container and marks tenant SUSPENDED. Reactivation restarts it.
V5 As a vendor, I want to view fleet health at a glance Tenant list shows each tenant's server status (running/stopped/error), agent count vs limit, license expiry.
V6 As a vendor, I want to delete/offboard a tenant Stops and removes server container, revokes license, marks tenant DELETED.

Customer (tenant admin, org-scoped JWT)

ID Story Acceptance Criteria
C1 As a tenant admin, I want to see my dashboard with server health and license usage Dashboard shows: server status (up/down), connected agents vs limit, environments vs limit, feature entitlements.
C2 As a tenant admin, I want to configure external OIDC for my team Form to set issuer URI, client ID, client secret, audience, claim mappings. SaaS pushes config to the tenant's server via M2M API.
C3 As a tenant admin, I want to manage team members View/invite/remove users in Logto org. Assign roles (owner/operator/viewer) that flow through to server access.
C4 As a tenant admin, I want to access the server dashboard seamlessly "Open Server Dashboard" navigates to the tenant's server URL. Initial auth via Logto (same OIDC provider until customer configures their own).
C5 As a tenant admin, I want to view my license details Tier, features, limits, validity, days remaining — enriched with actual usage data from server.
C6 As a tenant admin, I want to see my organization settings Tenant name, slug, tier, created date. Read-only (tier changes go through vendor).

2. Information Architecture

Route Structure

/platform/
├── /vendor/                        (platform:admin only)
│   ├── /vendor/tenants             Tenant list with fleet health overview
│   ├── /vendor/tenants/new         Create tenant flow (create → provision → license)
│   └── /vendor/tenants/:id         Tenant detail — server status, license, actions
│
├── /tenant/                        (org-scoped, any authenticated user)
│   ├── /tenant/                    Dashboard — server health + license usage
│   ├── /tenant/license             License details + usage vs limits
│   ├── /tenant/oidc                External OIDC configuration
│   ├── /tenant/team                Team members + role management
│   └── /tenant/settings            Organization settings
│
├── /login                          Logto OIDC redirect
└── /callback                       Logto callback handler

Navigation

Sidebar adapts to persona:

  • Vendor (platform:admin): "Tenants" section at top. If a tenant is selected (e.g., viewing detail), the tenant portal sections appear below for support/debugging.
  • Customer (no platform:admin): Dashboard, License, OIDC, Team, Settings.
  • Footer: "Open Server Dashboard" (contextual to current tenant).

Landing page:

  • platform:admin/vendor/tenants
  • Otherwise → /tenant/

What Happens to Existing Pages

Current Becomes Changes
DashboardPage /tenant/ Add health data from server, license usage indicators
LicensePage /tenant/license Add usage enrichment (agents used/limit, envs used/limit)
AdminTenantsPage /vendor/tenants Full CRUD, health indicators, provision/suspend/delete actions

3. Provisioning Architecture

Pluggable Interface

Following the server's RuntimeOrchestrator pattern with auto-detection:

public interface TenantProvisioner {
    boolean isAvailable();
    ProvisionResult provision(TenantProvisionRequest request);
    void start(String tenantId);
    void stop(String tenantId);
    void remove(String tenantId);
    ServerStatus getStatus(String tenantId);
    String getServerEndpoint(String tenantId);
}

Auto-detection (same pattern as server's RuntimeOrchestratorAutoConfig):

@Configuration
public class TenantProvisionerAutoConfig {
    @Bean
    TenantProvisioner tenantProvisioner() {
        if (Files.exists(Path.of("/var/run/docker.sock"))) {
            return new DockerTenantProvisioner(dockerClientConfig());
        }
        // Future: K8s detection (service account token)
        return new DisabledTenantProvisioner();
    }
}

Docker Implementation

DockerTenantProvisioner uses docker-java to manage per-tenant server containers:

Container specification per tenant:

Config Value Source
Image gitea.siegeln.net/cameleer/cameleer3-server:${VERSION} Global config
Name cameleer-server-${tenant.slug} Derived from tenant
Network cameleer + cameleer-traefik Fixed networks from compose
DNS alias cameleer-server-${tenant.slug} For SaaS→server M2M calls
Health check wget -q -O- http://localhost:8081/actuator/health Server's actuator
Restart policy unless-stopped Standard for services

Environment variables injected per tenant:

Env var Value Purpose
SPRING_DATASOURCE_URL jdbc:postgresql://postgres:5432/cameleer3 Shared PostgreSQL
CAMELEER_TENANT_ID ${tenant.slug} Tenant isolation key
CAMELEER_OIDC_ISSUER_URI ${PUBLIC_PROTOCOL}://${PUBLIC_HOST}/oidc Logto as initial OIDC
CAMELEER_OIDC_JWK_SET_URI http://logto:3001/oidc/jwks Docker-internal JWK
CAMELEER_CORS_ALLOWED_ORIGINS ${PUBLIC_PROTOCOL}://${PUBLIC_HOST} Browser CORS
CAMELEER_LICENSE_TOKEN ${license.token} License for this tenant
CAMELEER_RUNTIME_ENABLED true Enable Docker orchestration
CAMELEER_SERVER_URL http://cameleer-server-${slug}:8081 Self-reference for agents
CAMELEER_ROUTING_DOMAIN ${PUBLIC_HOST} Traefik routing domain
CAMELEER_ROUTING_MODE path Path-based routing

Traefik labels for per-tenant routing:

traefik.enable=true
traefik.http.routers.server-${slug}.rule=PathPrefix(`/t/${slug}`)
traefik.http.routers.server-${slug}.tls=true
traefik.http.services.server-${slug}.loadbalancer.server.port=8081

Server UI container per tenant:

Each tenant also gets a cameleer3-server-ui container:

Config Value
Name cameleer-server-ui-${tenant.slug}
Image gitea.siegeln.net/cameleer/cameleer3-server-ui:${VERSION}
Env BASE_PATH=/t/${slug}
Traefik PathPrefix(/t/${slug}) with priority=2 (higher than API)

The server UI serves static assets and proxies API calls to the backend. The BASE_PATH env var configures React Router's basename and nginx proxy target.

Provision Flow

Vendor clicks "Create Tenant"
  → POST /api/vendor/tenants
    1. Validate slug uniqueness
    2. Create TenantEntity (status=PROVISIONING)
    3. Create Logto organization
    4. Generate license (tier-appropriate, 365 days)
    5. Create server container (DockerTenantProvisioner.provision())
    6. Create server UI container
    7. Wait for health check (poll /actuator/health, timeout 60s)
    8. Push license to server via M2M API (ServerApiClient)
    9. Update status → ACTIVE
    10. Audit log: TENANT_CREATE + TENANT_PROVISION + LICENSE_GENERATE

If provisioning fails at any step, the tenant remains in PROVISIONING status with an error message. The vendor can retry or delete.

Suspend / Activate Flow

Suspend:
  1. Stop server + UI containers (DockerTenantProvisioner.stop())
  2. Set tenant status → SUSPENDED
  3. Audit log: TENANT_SUSPEND

Activate:
  1. Start server + UI containers (DockerTenantProvisioner.start())
  2. Wait for health check
  3. Set tenant status → ACTIVE
  4. Audit log: TENANT_ACTIVATE

Delete Flow

Delete:
  1. Stop and remove server + UI containers (DockerTenantProvisioner.remove())
  2. Revoke active license
  3. Delete Logto organization (LogtoManagementClient.deleteOrganization())
  4. Set tenant status → DELETED (soft delete, keep record for audit)
  5. Audit log: TENANT_DELETE

4. Server Communication

SaaS → Server (M2M API)

The existing ServerApiClient pattern (Logto M2M token, X-Cameleer-Protocol-Version: 1 header) is extended for per-tenant endpoints:

public class ServerApiClient {
    // Existing: uses configured server-endpoint
    // New: accepts dynamic endpoint per tenant

    public ServerHealth getHealth(String serverEndpoint) { ... }
    public void pushLicenseToken(String serverEndpoint, String token) { ... }
    public void pushOidcConfig(String serverEndpoint, OidcConfigRequest config) { ... }
    public ServerUsage getUsage(String serverEndpoint) { ... }
}

The serverEndpoint is resolved per tenant: http://cameleer-server-${slug}:8081 (Docker-internal DNS).

Health & Usage Data

ServerHealth (from server's /actuator/health + /api/admin/status):

  • Server status: UP/DOWN
  • Connected agents: count
  • Active applications: count
  • Error rate (last hour)

ServerUsage (from server API — new endpoint or existing data):

  • Agent count vs license limit
  • Environment count vs license limit
  • Which features are actively used (topology, lineage, etc.)

The SaaS caches health data per tenant (refresh every 30s for the fleet view, on-demand for detail pages).

SSO Bridge

Initial state (before customer OIDC): The tenant's server trusts Logto. The tenant admin has a Logto account. "Open Server Dashboard" navigates to /t/{slug}/ — the server's OIDC flow detects the existing Logto session and authenticates the user.

After customer OIDC: The SaaS pushes the customer's OIDC config to the server via ServerApiClient.pushOidcConfig(). The server switches to trusting the customer's provider. The tenant admin authenticates via their company's OIDC when accessing the server.


5. Backend API Design

Vendor Endpoints (platform:admin required)

Method Path Purpose
GET /api/vendor/tenants List all tenants with health summary
POST /api/vendor/tenants Create tenant (triggers full provisioning flow)
GET /api/vendor/tenants/{id} Tenant detail with server status
PATCH /api/vendor/tenants/{id} Update tenant metadata (name, tier)
POST /api/vendor/tenants/{id}/suspend Suspend tenant
POST /api/vendor/tenants/{id}/activate Reactivate tenant
DELETE /api/vendor/tenants/{id} Offboard tenant
POST /api/vendor/tenants/{id}/license Generate/renew license
GET /api/vendor/tenants/{id}/health Server health check (on-demand)

Tenant Endpoints (org-scoped, tenant from JWT)

Method Path Purpose
GET /api/tenant/dashboard Aggregated health + license usage
GET /api/tenant/license License details with usage data
GET /api/tenant/oidc Current OIDC configuration
PUT /api/tenant/oidc Update OIDC config (push to server)
GET /api/tenant/team Team members (from Logto org)
POST /api/tenant/team/invite Invite member
PATCH /api/tenant/team/{userId}/role Change member role
DELETE /api/tenant/team/{userId} Remove member
GET /api/tenant/settings Org settings

Existing Endpoints to Modify

Current Change
GET /api/tenants Move to /api/vendor/tenants, add health data
POST /api/tenants Move to /api/vendor/tenants, add provisioning
GET /api/tenants/{id} Keep for backward compat, also available at /api/vendor/tenants/{id}
GET /api/tenants/{id}/license Keep, also available at /api/tenant/license
POST /api/tenants/{id}/license Move to /api/vendor/tenants/{id}/license
GET /api/me Keep (used by OrgResolver)
GET /api/config Keep (used by frontend bootstrap)

6. Frontend Design

Vendor Console

Tenant List (/vendor/tenants):

  • DataTable with columns: Name, Slug, Tier (Badge), Status (Badge), Server (health indicator), Agents (used/limit), License (expiry or "None"), Created
  • Row click → tenant detail
  • "+ Create Tenant" button in header
  • Status badges: ACTIVE (green), PROVISIONING (blue), SUSPENDED (amber), DELETED (gray)
  • Server health: green dot (UP), red dot (DOWN), gray dot (no server)

Create Tenant (/vendor/tenants/new):

  • Form with: Name, Slug (auto-generated from name, editable), Tier (dropdown: LOW/MID/HIGH/BUSINESS)
  • On submit: shows provisioning progress (creating record → creating org → generating license → starting server → health check → done)
  • Progress displayed as a step indicator or timeline
  • On success: redirect to tenant detail

Tenant Detail (/vendor/tenants/:id):

  • Header: Tenant name + tier badge + status badge
  • KPI strip: Server Status, Agents (used/limit), Environments (used/limit), License (days remaining)
  • Sections:
    • Server: Status, endpoint URL, start/stop/restart actions
    • License: Current license details, "Renew" button
    • Info: Slug, created date, Logto org ID
  • Actions: Suspend/Activate toggle, Delete (with confirmation)

Tenant Portal

Dashboard (/tenant/):

  • KPI strip: Server Status, Agents (used/limit), Environments (used/limit), License (days remaining)
  • Quick links: "Open Server Dashboard", "View License", "Configure OIDC"
  • If server is DOWN: prominent alert banner

License (/tenant/license):

  • Reuses existing LicensePage layout
  • Adds usage indicators: "2 of 3 agents", "1 of 1 environments"
  • Progress bars for limits approaching capacity
  • License token section (show/hide + copy)

OIDC Configuration (/tenant/oidc):

  • Form: Issuer URI, Client ID, Client Secret (masked), Audience, Roles Claim
  • Current status: "Using Logto (default)" or "External OIDC configured"
  • Save pushes config to server via SaaS API
  • "Test Connection" button (calls server's OIDC discovery endpoint)
  • "Reset to Logto" button (reverts to default)

Team Management (/tenant/team):

  • DataTable: Name, Email, Role (dropdown: Owner/Operator/Viewer), Actions (Remove)
  • "+ Invite Member" button → form with email + role
  • Role changes update Logto org membership
  • Cannot remove the last owner

Settings (/tenant/settings):

  • Read-only info: Name, Slug, Tier, Status, Created
  • Server endpoint URL
  • "Contact support to change tier" message (tier changes go through vendor)

Shared Components

  • ServerStatusBadge: Green dot + "Running", Red dot + "Stopped", Gray dot + "Provisioning"
  • UsageIndicator: "2 / 3 agents" with progress bar, color-coded (green < 80%, amber < 100%, red = 100%)
  • ProvisioningProgress: Step indicator for tenant creation flow

Layout Changes

  • Remove TopBar server controls (status filters, time range, auto-refresh) — these are not relevant to the SaaS platform. Use a simplified TopBar with breadcrumb, theme toggle, and user menu only.
  • Sidebar: persona-aware navigation (vendor vs customer sections)
  • Sidebar footer: "Open Server Dashboard" link with tenant-specific URL (/t/{slug}/)

7. Files to Create/Modify

New Backend Files

File Purpose
provisioning/TenantProvisioner.java Pluggable provisioning interface
provisioning/TenantProvisionRequest.java Provision request record
provisioning/ProvisionResult.java Provision result record
provisioning/ServerStatus.java Server health status record
provisioning/DockerTenantProvisioner.java Docker implementation
provisioning/DisabledTenantProvisioner.java No-op fallback
provisioning/TenantProvisionerAutoConfig.java Auto-detection config
vendor/VendorTenantController.java Vendor API endpoints
vendor/VendorTenantService.java Vendor business logic (orchestrates provisioning + license + Logto)
tenant/TenantPortalController.java Customer API endpoints
tenant/TenantPortalService.java Customer business logic (reads from server, manages team)

Modified Backend Files

File Changes
identity/ServerApiClient.java Add per-tenant endpoint support, health/usage/OIDC methods
identity/LogtoManagementClient.java Add user invite, role management, list org members
tenant/TenantEntity.java Add serverEndpoint field, provisionError field
tenant/TenantService.java Keep existing methods, used by VendorTenantService
license/LicenseService.java Keep existing, add revoke method
config/SecurityConfig.java Add vendor/tenant endpoint security rules
config/TenantIsolationInterceptor.java Handle /api/tenant/* (resolve from JWT, no path variable)

New Frontend Files

File Purpose
pages/vendor/VendorTenantsPage.tsx Tenant list with fleet health
pages/vendor/CreateTenantPage.tsx Create tenant wizard
pages/vendor/TenantDetailPage.tsx Tenant detail with actions
pages/tenant/TenantDashboardPage.tsx Customer dashboard (evolves from DashboardPage)
pages/tenant/TenantLicensePage.tsx License with usage (evolves from LicensePage)
pages/tenant/OidcConfigPage.tsx External OIDC configuration
pages/tenant/TeamPage.tsx Team management
pages/tenant/SettingsPage.tsx Organization settings
components/ServerStatusBadge.tsx Shared server status indicator
components/UsageIndicator.tsx License usage progress bar
api/vendor-hooks.ts React Query hooks for vendor API
api/tenant-hooks.ts React Query hooks for tenant API

Modified Frontend Files

File Changes
router.tsx Restructure routes: /vendor/*, /tenant/*
components/Layout.tsx Persona-aware sidebar, simplified TopBar, tenant-specific server link
auth/OrgResolver.tsx Handle vendor landing (redirect to /vendor/tenants)
types/api.ts Add vendor/tenant API types
api/client.ts No changes needed (generic fetch wrapper)

Files to Remove

File Reason
pages/DashboardPage.tsx Replaced by tenant/TenantDashboardPage.tsx
pages/LicensePage.tsx Replaced by tenant/TenantLicensePage.tsx
pages/AdminTenantsPage.tsx Replaced by vendor/VendorTenantsPage.tsx

Docker Changes

File Changes
docker-compose.yml Mount Docker socket into cameleer-saas container
docker-compose.dev.yml Add Docker socket mount, group_add for Docker access

Database Migration

New migration V011:

  • Add server_endpoint column to tenants (nullable VARCHAR, stores Docker-internal URL)
  • Add provision_error column to tenants (nullable TEXT, stores last error message)
  • Add DELETED to status enum (for soft-delete offboarding)

8. Existing Compose Stack Changes

The default cameleer3-server and cameleer3-server-ui containers in docker-compose.yml become the "bootstrap" server for the default tenant. When provisioning is enabled, new tenants get their own dynamically-created containers.

The existing compose stack continues to work as-is for development. The provisioner creates additional containers alongside the compose-managed ones.

For the default tenant (created by bootstrap), the SaaS recognizes the existing compose-managed server and doesn't try to provision a new one. This is detected by checking if a container named cameleer-server-default (or the compose-managed cameleer3-server) already exists.


9. Out of Scope

  • Kubernetes provisioning — interface defined, implementation deferred
  • Billing/Stripe — fields exist in DB, no integration in this spec
  • Mobile responsiveness — deferred
  • Self-service signup — tenants created by vendor only
  • Custom domains — deferred
  • Email notifications — deferred
  • Usage-based metering — deferred (license limits are checked but not metered)

Issue Relevance
#1 Epic: SaaS Management Platform
#3 Tenant Provisioning & Lifecycle
#25 K8s Operational Layer (deferred)
#29 Billing & Metering (deferred)
#37 Admin: Tenant creation UI — superseded by this spec
#38 Cross-app session management — addressed by SSO bridge