Files
cameleer-saas/docs/superpowers/specs/2026-04-04-dual-deployment-architecture.md
hsiegeln 63c194dab7
Some checks failed
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
chore: rename cameleer3 to cameleer
Rename Java packages from net.siegeln.cameleer3 to net.siegeln.cameleer,
update all references in workflows, Docker configs, docs, and bootstrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:44 +02:00

17 KiB

Dual Deployment Architecture: Docker + Kubernetes

Date: 2026-04-04 Status: Approved Supersedes: Portions of 2026-03-29-saas-platform-prd.md (deployment model, phase ordering, auth strategy)

Context

Cameleer SaaS must serve two deployment targets:

  • Docker Compose — production-viable for small customers and air-gapped installs (single-tenant per stack)
  • Kubernetes — managed SaaS and enterprise self-hosted (multi-tenant)

The original PRD assumed K8s-only production. This design restructures the architecture and roadmap to treat Docker Compose as a first-class production target, uses the Docker+K8s dual requirement as a filter for build-vs-buy decisions, and reorders the phase roadmap to ship a deployable product faster.

Key constraints:

  • The application is always multi-tenant — Docker deployments have exactly 1 tenant
  • Don't build custom abstractions over K8s-only primitives when no Docker equivalent exists
  • Prefer right-sized OSS tools over Swiss Army knives or custom builds
  • K8s-only features (NetworkPolicies, HPA, Flux CD) are operational enhancements, never functional requirements

Build-vs-Buy Decisions

BUY (Use 3rd Party OSS)

Subsystem Tool License Why This Tool
Identity & Auth Logto MPL-2.0 Lightest IdP (2 containers, ~0.5-1 GB). Orgs, RBAC, M2M tokens, OIDC/SSO federation all in OSS. Replaces ~3-4 months of custom auth build (OIDC, SSO, teams, invites, MFA, password reset, custom roles).
Reverse Proxy Traefik MIT Native Docker provider (labels) and K8s provider (IngressRoute CRDs). Same mental model in both environments. Already on the k3s cluster. ForwardAuth middleware for tenant-aware routing. Auto-HTTPS via Let's Encrypt. ~256 MB RAM.
Database PostgreSQL PostgreSQL License Already chosen. Platform data + Logto data (separate schemas).
Trace/Metrics Storage ClickHouse Apache-2.0 Replaced OpenSearch in the cameleer-server stack. Columnar OLAP, excellent for time-series observability data.
Schema Migrations Flyway Apache-2.0 Already in place.
Billing (subscriptions) Stripe N/A (API) Start with Stripe Checkout for fixed-tier subscriptions. No custom billing infrastructure day 1.
Billing (usage metering) Lago (deferred) AGPL-3.0 Purpose-built for event-based metering. 8 containers — deploy only when usage-based pricing launches. Design event model with Lago's API shape in mind from day 1. Integrate via API only (keeps AGPL safe).
GitOps (K8s only) Flux CD Apache-2.0 K8s-only, and that's acceptable. Docker deployments get release tarballs + upgrade scripts.
Image Builds (K8s) Kaniko Apache-2.0 Daemonless container image builds inside K8s. For Docker mode, docker build via docker-java is simpler.
Monitoring Prometheus + Grafana + Loki Apache-2.0 Works in both Docker and K8s. Optional for Docker (customer's choice), standard for K8s SaaS.
TLS Certificates Traefik ACME (Docker) / cert-manager (K8s) MIT / Apache-2.0 Standard tools, no custom code.
Container Registry (K8s) Gitea Registry (SaaS) / registry:2 (self-hosted) Docker mode doesn't need a registry (local image cache).

BUILD (Custom / Core IP)

Subsystem Why Build
License signing & validation Ed25519 signed JWT with tier, features, limits, expiry. Dual mode: online API check + offline signed file. No off-the-shelf tool does this. Core IP.
Agent bootstrap tokens Tightly coupled to the cameleer agent protocol (PROTOCOL.md). Custom Ed25519 tokens for agent registration.
Tenant lifecycle CRUD, configuration, status management. Core business logic. User management (invites, teams, roles) is delegated to Logto's organization model.
Runtime orchestration The core of the "managed Camel runtime" product. RuntimeOrchestrator interface with Docker and K8s implementations. No off-the-shelf tool does "managed Camel runtime with agent injection."
Image build pipeline Templated Dockerfile: JRE + cameleer-agent.jar + customer JAR + -javaagent flag. Simple but custom.
Feature gating Tier-based feature gating logic. Which features are available at which tier. Business logic.
Billing integration Stripe API calls, subscription lifecycle, webhook handling. Thin integration layer.
Observability proxy Routing authenticated requests to tenant-specific cameleer-server instances.
MOAT features Debugger, Lineage, Correlation — the defensible product. Built in cameleer agent + server.

SKIP / DEFER

Subsystem Why Skip
Secrets management (Vault) Docker: env vars + mounted files. K8s: K8s Secrets. Vault is enterprise-tier complexity. Defer until demanded.
Custom role management UI Logto provides this.
OIDC provider implementation Logto provides this.
WireGuard VPN / VPC peering Far future, dedicated-tier only.
Cluster API for dedicated tiers Don't design for this until enterprise customers exist.
Management agent for updates Watchtower is optional for connected customers. Air-gapped gets release tarballs. Don't build custom.

Architecture

Platform Stack (Docker Compose — 6 base containers)

+-------------------------------------------------------+
|  Traefik (reverse proxy, TLS, ForwardAuth)            |
|  - Docker: labels-based routing                       |
|  - K8s: IngressRoute CRDs                             |
+--------+---------------------+------------------------+
         |                     |
+--------v--------+  +---------v-----------+
| cameleer-saas   |  | cameleer-server    |
| (Spring Boot)   |  | (observability)     |
| Control plane   |  | Per-tenant instance |
+---+-------+-----+  +----------+----------+
    |       |                    |
+---v--+ +--v----+     +---------v---------+
| PG   | | Logto |     | ClickHouse        |
|      | | (IdP) |     | (traces/metrics)  |
+------+ +-------+     +-------------------+

Customer Camel apps are additional containers dynamically managed by the control plane via Docker API (Docker mode) or K8s API (K8s mode).

Auth Flow

User login:
  Browser -> Traefik -> Logto (OIDC flow) -> JWT issued by Logto
  
API request:
  Browser -> Traefik -> ForwardAuth (cameleer-saas /auth/verify)
    -> Validates Logto JWT, injects X-Tenant-Id header
    -> Traefik forwards to upstream service

Machine auth (agent bootstrap):
  cameleer-agent -> cameleer-saas /api/agent/register
    -> Validates bootstrap token (Ed25519)
    -> Issues agent session token
    -> Agent connects to cameleer-server

Logto handles all user-facing identity. The cameleer-saas app handles machine-to-machine auth (agent tokens, license tokens) using Ed25519.

Runtime Orchestration

RuntimeOrchestrator (interface)
  + deployApp(tenantId, appId, envId, imageRef, config) -> Deployment
  + stopApp(tenantId, appId, envId) -> void
  + restartApp(tenantId, appId, envId) -> void
  + getAppLogs(tenantId, appId, envId, since) -> Stream<LogLine>
  + getAppStatus(tenantId, appId, envId) -> AppStatus
  + listApps(tenantId) -> List<AppSummary>

DockerRuntimeOrchestrator (docker-java library)
  - Talks to Docker daemon via /var/run/docker.sock
  - Creates containers with labels for Traefik routing
  - Manages container lifecycle
  - Builds images locally via docker build

KubernetesRuntimeOrchestrator (fabric8 kubernetes-client)
  - Creates Deployments, Services, ConfigMaps in tenant namespace
  - Builds images via Kaniko Jobs, pushes to registry
  - Manages rollout lifecycle

Image Build Pipeline

Customer uploads JAR
  -> Validation (file type, size, SHA-256, security scan)
  -> Templated Dockerfile generation:
       FROM eclipse-temurin:21-jre-alpine
       COPY cameleer-agent.jar /opt/agent/
       COPY customer-app.jar /opt/app/
       ENTRYPOINT ["java", "-javaagent:/opt/agent/cameleer-agent.jar", "-jar", "/opt/app/customer-app.jar"]
  -> Build:
       Docker mode: docker build via docker-java (local image cache)
       K8s mode: Kaniko Job -> push to registry
  -> Deploy to requested environment

Multi-Tenancy Model

  • Always multi-tenant. Docker Compose has 1 pre-configured tenant.
  • Schema-per-tenant in PostgreSQL for platform data isolation.
  • Logto organizations map 1:1 to tenants. Logto handles user-tenant membership.
  • ClickHouse data partitioned by tenant_id.
  • cameleer-server instances are per-tenant (separate containers/pods).
  • K8s bonus: Namespace-per-tenant for network isolation, resource quotas.

Environment Model

Each tenant can have multiple logical environments (tier-dependent):

Tier Environments
Low prod only
Mid dev, prod
High+ dev, staging, prod + custom

Each environment is a separate deployment of the same app image with different configuration:

  • Docker: separate container, different env vars
  • K8s: separate Deployment, different ConfigMap

Promotion = deploy same image tag to a different environment with that environment's config.

Configuration Strategy

The application is configured entirely via environment variables and Spring Boot profiles:

# Detected at startup
cameleer.deployment.mode: docker | kubernetes  # auto-detected
cameleer.deployment.docker.socket: /var/run/docker.sock
cameleer.deployment.k8s.namespace-template: tenant-{tenantId}

# Identity provider
cameleer.identity.issuer-uri: http://logto:3001/oidc
cameleer.identity.client-id: ${LOGTO_CLIENT_ID}
cameleer.identity.client-secret: ${LOGTO_CLIENT_SECRET}

# Ed25519 keys (externalized, not per-boot)
cameleer.jwt.private-key-path: /etc/cameleer/keys/ed25519.key
cameleer.jwt.public-key-path: /etc/cameleer/keys/ed25519.pub

# Database
spring.datasource.url: ${DATABASE_URL}

# ClickHouse
cameleer.clickhouse.url: ${CLICKHOUSE_URL}

Docker Compose Production Template

services:
  traefik:
    image: traefik:v3
    ports: ["80:80", "443:443"]
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik.yml:/etc/traefik/traefik.yml
      - acme:/etc/traefik/acme
    labels:
      # Dashboard (optional, secured)
      
  cameleer-saas:
    image: gitea.siegeln.net/cameleer/cameleer-saas:${VERSION}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # For runtime orchestration
      - ./keys:/etc/cameleer/keys:ro
    environment:
      - DATABASE_URL=jdbc:postgresql://postgres:5432/cameleer_saas
      - LOGTO_CLIENT_ID=${LOGTO_CLIENT_ID}
      - LOGTO_CLIENT_SECRET=${LOGTO_CLIENT_SECRET}
    labels:
      - traefik.enable=true
      - traefik.http.routers.api.rule=PathPrefix(`/api`)
      
  logto:
    image: svhd/logto:latest
    environment:
      - DB_URL=postgresql://postgres:5432/logto
    labels:
      - traefik.enable=true
      - traefik.http.routers.auth.rule=PathPrefix(`/auth`)

  cameleer-server:
    image: gitea.siegeln.net/cameleer/cameleer-server:${VERSION}
    environment:
      - CLICKHOUSE_URL=jdbc:clickhouse://clickhouse:8123/cameleer
    labels:
      - traefik.enable=true
      - traefik.http.routers.observe.rule=PathPrefix(`/observe`)

  postgres:
    image: postgres:16-alpine
    volumes: [pgdata:/var/lib/postgresql/data]
    
  clickhouse:
    image: clickhouse/clickhouse-server:latest
    volumes: [chdata:/var/lib/clickhouse]

volumes:
  pgdata:
  chdata:
  acme:

Docker vs K8s Feature Matrix

Feature Docker Compose Kubernetes
Deploy Camel apps Yes (Docker API) Yes (K8s API)
Multiple environments Yes (separate containers) Yes (separate Deployments)
Agent injection Yes Yes
Observability (traces, topology) Yes Yes
Identity / SSO / Teams Yes (Logto) Yes (Logto)
Licensing Yes Yes
Auto-scaling No Yes (HPA)
Network isolation (multi-tenant) Docker networks NetworkPolicies
GitOps deployment No (manual updates) Yes (Flux CD)
Rolling updates Manual restart Native
Platform monitoring Optional (customer adds Grafana) Standard (Prometheus/Grafana/Loki)
Certificate management Traefik ACME cert-manager

Revised Phase Roadmap

Phase 2: Tenants + Identity + Licensing

Goal: A customer can sign up, get a tenant, and access the platform via Traefik.

  • Integrate Logto as identity provider
    • Replace custom user-facing auth (login, registration, password management)
    • Keep Ed25519 JWT for machine tokens (agent bootstrap, license signing)
    • Configure Logto organizations to map to tenants
  • Tenant entity + CRUD API
  • License token generation (Ed25519 signed JWT: tier, features, limits, expiry)
  • Traefik integration with ForwardAuth middleware
  • Docker Compose production stack (6 containers)
  • Externalize Ed25519 keys (mounted files, not per-boot)

Files to modify/create:

  • src/main/java/net/siegeln/cameleer/saas/tenant/ — new package
  • src/main/java/net/siegeln/cameleer/saas/license/ — new package
  • src/main/java/net/siegeln/cameleer/saas/config/SecurityConfig.java — Logto OIDC integration
  • src/main/resources/db/migration/V005__create_tenants.sql
  • src/main/resources/db/migration/V006__create_licenses.sql
  • docker-compose.yml — expand to full production stack
  • traefik.yml — static config
  • src/main/resources/application.yml — Logto + Traefik config

Phase 3: Runtime Orchestration + Environments

Goal: Customer can upload a Camel JAR, deploy it to dev/prod, see it running with agent attached.

  • RuntimeOrchestrator interface
  • DockerRuntimeOrchestrator implementation (docker-java)
  • Customer JAR upload endpoint
  • Image build pipeline (Dockerfile template + docker build)
  • Logical environment model (dev/test/prod per tenant)
  • Environment-specific config overlays
  • App lifecycle API (deploy, start, stop, restart, logs, health)

Key dependencies: docker-java, Kaniko (for future K8s)

Phase 4: Observability Pipeline

Goal: Customer can see traces, metrics, and route topology for deployed apps.

  • Connect cameleer-server to customer app containers
  • ClickHouse tenant-scoped data partitioning
  • Observability API proxy (tenant-aware routing to cameleer-server)
  • Basic topology graph endpoint
  • Agent ↔ server connectivity verification

Phase 5: K8s Operational Layer

Goal: Same product works on K8s with operational enhancements.

  • KubernetesRuntimeOrchestrator implementation (fabric8)
  • Kaniko-based image builds
  • Flux CD integration for platform GitOps
  • Namespace-per-tenant provisioning
  • NetworkPolicies, ResourceQuotas
  • Helm chart for K8s deployment
  • Registry integration (Gitea registry / registry:2)

Phase 6: Billing

Goal: Customers can subscribe and pay.

  • Stripe Checkout integration
  • Subscription lifecycle (create, upgrade, downgrade, cancel)
  • Tier enforcement (feature gating based on active subscription)
  • Usage tracking in platform DB (prep for Lago integration later)
  • Webhook handling for payment events

Phase 7: Security Hardening + Monitoring

Goal: Production-hardened platform.

  • Prometheus/Grafana/Loki stack (optional Docker compose overlay, standard K8s)
  • SOC 2 compliance review
  • Rate limiting
  • Container image signing (cosign)
  • Supply chain security (SBOM, Trivy scanning)
  • Audit log shipping to separate sink

Frontend (React Shell) — Parallel Track (Phase 2+)

  • Can start as soon as Phase 2 API contracts are defined
  • Uses @cameleer/design-system
  • Screens: login, dashboard, app deployment, environment management, observability views, team management, billing

Verification Plan

Phase 2 Verification

  1. docker compose up starts all 6 containers
  2. Navigate to Logto admin, create a user
  3. User logs in via OIDC flow through Traefik
  4. API calls with JWT include X-Tenant-Id header
  5. License token can be generated and verified
  6. All existing tests still pass

Phase 3 Verification

  1. Upload a sample Camel JAR via API
  2. Platform builds container image
  3. Deploy to "dev" environment
  4. Container starts with cameleer agent attached
  5. App is reachable via Traefik routing
  6. Logs are accessible via API
  7. Deploy same image to "prod" with different config

Phase 4 Verification

  1. Running Camel app sends traces to cameleer-server
  2. Traces visible in ClickHouse with correct tenant_id
  3. Topology graph shows route structure
  4. Different tenant cannot see another tenant's data

Phase 5 Verification

  1. Helm install deploys full platform to k3s
  2. Tenant provisioning creates namespace + resources
  3. App deployment creates K8s Deployment + Service
  4. Kaniko builds image and pushes to registry
  5. NetworkPolicy blocks cross-tenant traffic
  6. Same API contracts work as Docker mode

End-to-End Smoke Test (Any Phase)

# Docker Compose
docker compose up -d
# Create tenant + user via API/Logto
# Upload sample Camel JAR
# Deploy to environment
# Verify agent connects to cameleer-server
# Verify traces in ClickHouse
# Verify observability API returns data