2026-04-04 14:45:33 +02:00
# Dual Deployment Architecture: Docker + Kubernetes
**Date:** 2026-04-04
**Status:** Approved
**Supersedes:** Portions of `2026-03-29-saas-platform-prd.md` (deployment model, phase ordering, auth strategy)
## Context
Cameleer SaaS must serve two deployment targets:
- **Docker Compose** — production-viable for small customers and air-gapped installs (single-tenant per stack)
- **Kubernetes** — managed SaaS and enterprise self-hosted (multi-tenant)
The original PRD assumed K8s-only production. This design restructures the architecture and roadmap to treat Docker Compose as a first-class production target, uses the Docker+K8s dual requirement as a filter for build-vs-buy decisions, and reorders the phase roadmap to ship a deployable product faster.
Key constraints:
- The application is **always multi-tenant ** — Docker deployments have exactly 1 tenant
- Don't build custom abstractions over K8s-only primitives when no Docker equivalent exists
- Prefer right-sized OSS tools over Swiss Army knives or custom builds
- K8s-only features (NetworkPolicies, HPA, Flux CD) are operational enhancements, never functional requirements
## Build-vs-Buy Decisions
### BUY (Use 3rd Party OSS)
| Subsystem | Tool | License | Why This Tool |
|---|---|---|---|
| **Identity & Auth ** | **Logto ** | MPL-2.0 | Lightest IdP (2 containers, ~0.5-1 GB). Orgs, RBAC, M2M tokens, OIDC/SSO federation all in OSS. Replaces ~3-4 months of custom auth build (OIDC, SSO, teams, invites, MFA, password reset, custom roles). |
| **Reverse Proxy ** | **Traefik ** | MIT | Native Docker provider (labels) and K8s provider (IngressRoute CRDs). Same mental model in both environments. Already on the k3s cluster. ForwardAuth middleware for tenant-aware routing. Auto-HTTPS via Let's Encrypt. ~256 MB RAM. |
| **Database ** | **PostgreSQL ** | PostgreSQL License | Already chosen. Platform data + Logto data (separate schemas). |
2026-04-15 15:28:44 +02:00
| **Trace/Metrics Storage ** | **ClickHouse ** | Apache-2.0 | Replaced OpenSearch in the cameleer-server stack. Columnar OLAP, excellent for time-series observability data. |
2026-04-04 14:45:33 +02:00
| **Schema Migrations ** | **Flyway ** | Apache-2.0 | Already in place. |
| **Billing (subscriptions) ** | **Stripe ** | N/A (API) | Start with Stripe Checkout for fixed-tier subscriptions. No custom billing infrastructure day 1. |
| **Billing (usage metering) ** | **Lago ** (deferred) | AGPL-3.0 | Purpose-built for event-based metering. 8 containers — deploy only when usage-based pricing launches. Design event model with Lago's API shape in mind from day 1. Integrate via API only (keeps AGPL safe). |
| **GitOps (K8s only) ** | **Flux CD ** | Apache-2.0 | K8s-only, and that's acceptable. Docker deployments get release tarballs + upgrade scripts. |
| **Image Builds (K8s) ** | **Kaniko ** | Apache-2.0 | Daemonless container image builds inside K8s. For Docker mode, `docker build` via docker-java is simpler. |
| **Monitoring ** | **Prometheus + Grafana + Loki ** | Apache-2.0 | Works in both Docker and K8s. Optional for Docker (customer's choice), standard for K8s SaaS. |
| **TLS Certificates ** | **Traefik ACME ** (Docker) / **cert-manager ** (K8s) | MIT / Apache-2.0 | Standard tools, no custom code. |
| **Container Registry (K8s) ** | **Gitea Registry ** (SaaS) / **registry:2 ** (self-hosted) | — | Docker mode doesn't need a registry (local image cache). |
### BUILD (Custom / Core IP)
| Subsystem | Why Build |
|---|---|
| **License signing & validation ** | Ed25519 signed JWT with tier, features, limits, expiry. Dual mode: online API check + offline signed file. No off-the-shelf tool does this. Core IP. |
2026-04-15 15:28:44 +02:00
| **Agent bootstrap tokens ** | Tightly coupled to the cameleer agent protocol (PROTOCOL.md). Custom Ed25519 tokens for agent registration. |
2026-04-04 14:45:33 +02:00
| **Tenant lifecycle ** | CRUD, configuration, status management. Core business logic. User management (invites, teams, roles) is delegated to Logto's organization model. |
| **Runtime orchestration ** | The core of the "managed Camel runtime" product. `RuntimeOrchestrator` interface with Docker and K8s implementations. No off-the-shelf tool does "managed Camel runtime with agent injection." |
2026-04-15 15:28:44 +02:00
| **Image build pipeline ** | Templated Dockerfile: JRE + cameleer-agent.jar + customer JAR + `-javaagent` flag. Simple but custom. |
2026-04-04 14:45:33 +02:00
| **Feature gating ** | Tier-based feature gating logic. Which features are available at which tier. Business logic. |
| **Billing integration ** | Stripe API calls, subscription lifecycle, webhook handling. Thin integration layer. |
2026-04-15 15:28:44 +02:00
| **Observability proxy ** | Routing authenticated requests to tenant-specific cameleer-server instances. |
| **MOAT features ** | Debugger, Lineage, Correlation — the defensible product. Built in cameleer agent + server. |
2026-04-04 14:45:33 +02:00
### SKIP / DEFER
| Subsystem | Why Skip |
|---|---|
| **Secrets management (Vault) ** | Docker: env vars + mounted files. K8s: K8s Secrets. Vault is enterprise-tier complexity. Defer until demanded. |
| **Custom role management UI ** | Logto provides this. |
| **OIDC provider implementation ** | Logto provides this. |
| **WireGuard VPN / VPC peering ** | Far future, dedicated-tier only. |
| **Cluster API for dedicated tiers ** | Don't design for this until enterprise customers exist. |
| **Management agent for updates ** | Watchtower is optional for connected customers. Air-gapped gets release tarballs. Don't build custom. |
## Architecture
### Platform Stack (Docker Compose — 6 base containers)
```
+-------------------------------------------------------+
| Traefik (reverse proxy, TLS, ForwardAuth) |
| - Docker: labels-based routing |
| - K8s: IngressRoute CRDs |
+--------+---------------------+------------------------+
| |
+--------v--------+ +---------v-----------+
2026-04-15 15:28:44 +02:00
| cameleer-saas | | cameleer-server |
2026-04-04 14:45:33 +02:00
| (Spring Boot) | | (observability) |
| Control plane | | Per-tenant instance |
+---+-------+-----+ +----------+----------+
| | |
+---v--+ +--v----+ +---------v---------+
| PG | | Logto | | ClickHouse |
| | | (IdP) | | (traces/metrics) |
+------+ +-------+ +-------------------+
```
Customer Camel apps are **additional containers ** dynamically managed by the control plane via Docker API (Docker mode) or K8s API (K8s mode).
### Auth Flow
```
User login:
Browser -> Traefik -> Logto (OIDC flow) -> JWT issued by Logto
API request:
Browser -> Traefik -> ForwardAuth (cameleer-saas /auth/verify)
-> Validates Logto JWT, injects X-Tenant-Id header
-> Traefik forwards to upstream service
Machine auth (agent bootstrap):
2026-04-15 15:28:44 +02:00
cameleer-agent -> cameleer-saas /api/agent/register
2026-04-04 14:45:33 +02:00
-> Validates bootstrap token (Ed25519)
-> Issues agent session token
2026-04-15 15:28:44 +02:00
-> Agent connects to cameleer-server
2026-04-04 14:45:33 +02:00
```
Logto handles all user-facing identity. The cameleer-saas app handles machine-to-machine auth (agent tokens, license tokens) using Ed25519.
### Runtime Orchestration
```java
RuntimeOrchestrator (interface)
+ deployApp(tenantId, appId, envId, imageRef, config) -> Deployment
+ stopApp(tenantId, appId, envId) -> void
+ restartApp(tenantId, appId, envId) -> void
+ getAppLogs(tenantId, appId, envId, since) -> Stream<LogLine>
+ getAppStatus(tenantId, appId, envId) -> AppStatus
+ listApps(tenantId) -> List<AppSummary>
DockerRuntimeOrchestrator (docker-java library)
- Talks to Docker daemon via /var/run/docker.sock
- Creates containers with labels for Traefik routing
- Manages container lifecycle
- Builds images locally via docker build
KubernetesRuntimeOrchestrator (fabric8 kubernetes-client)
- Creates Deployments, Services, ConfigMaps in tenant namespace
- Builds images via Kaniko Jobs, pushes to registry
- Manages rollout lifecycle
```
### Image Build Pipeline
```
Customer uploads JAR
-> Validation (file type, size, SHA-256, security scan)
-> Templated Dockerfile generation:
FROM eclipse-temurin:21-jre-alpine
2026-04-15 15:28:44 +02:00
COPY cameleer-agent.jar /opt/agent/
2026-04-04 14:45:33 +02:00
COPY customer-app.jar /opt/app/
2026-04-15 15:28:44 +02:00
ENTRYPOINT ["java", "-javaagent:/opt/agent/cameleer-agent.jar", "-jar", "/opt/app/customer-app.jar"]
2026-04-04 14:45:33 +02:00
-> Build:
Docker mode: docker build via docker-java (local image cache)
K8s mode: Kaniko Job -> push to registry
-> Deploy to requested environment
```
### Multi-Tenancy Model
- **Always multi-tenant.** Docker Compose has 1 pre-configured tenant.
- **Schema-per-tenant** in PostgreSQL for platform data isolation.
- **Logto organizations** map 1:1 to tenants. Logto handles user-tenant membership.
- **ClickHouse** data partitioned by tenant_id.
2026-04-15 15:28:44 +02:00
- **cameleer-server** instances are per-tenant (separate containers/pods).
2026-04-04 14:45:33 +02:00
- **K8s bonus:** Namespace-per-tenant for network isolation, resource quotas.
### Environment Model
Each tenant can have multiple logical environments (tier-dependent):
| Tier | Environments |
|---|---|
| Low | prod only |
| Mid | dev, prod |
| High+ | dev, staging, prod + custom |
Each environment is a separate deployment of the same app image with different configuration:
- Docker: separate container, different env vars
- K8s: separate Deployment, different ConfigMap
Promotion = deploy same image tag to a different environment with that environment's config.
### Configuration Strategy
The application is configured entirely via environment variables and Spring Boot profiles:
```yaml
# Detected at startup
cameleer.deployment.mode: docker | kubernetes # auto-detected
cameleer.deployment.docker.socket: /var/run/docker.sock
cameleer.deployment.k8s.namespace-template: tenant-{tenantId}
# Identity provider
cameleer.identity.issuer-uri: http://logto:3001/oidc
cameleer.identity.client-id: ${LOGTO_CLIENT_ID}
cameleer.identity.client-secret: ${LOGTO_CLIENT_SECRET}
# Ed25519 keys (externalized, not per-boot)
cameleer.jwt.private-key-path: /etc/cameleer/keys/ed25519.key
cameleer.jwt.public-key-path: /etc/cameleer/keys/ed25519.pub
# Database
spring.datasource.url: ${DATABASE_URL}
# ClickHouse
cameleer.clickhouse.url: ${CLICKHOUSE_URL}
```
### Docker Compose Production Template
```yaml
services:
traefik:
image: traefik:v3
ports: ["80:80", "443:443"]
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik.yml:/etc/traefik/traefik.yml
- acme:/etc/traefik/acme
labels:
# Dashboard (optional, secured)
cameleer-saas:
image: gitea.siegeln.net/cameleer/cameleer-saas:${VERSION}
volumes:
- /var/run/docker.sock:/var/run/docker.sock # For runtime orchestration
- ./keys:/etc/cameleer/keys:ro
environment:
- DATABASE_URL=jdbc:postgresql://postgres:5432/cameleer_saas
- LOGTO_CLIENT_ID=${LOGTO_CLIENT_ID}
- LOGTO_CLIENT_SECRET=${LOGTO_CLIENT_SECRET}
labels:
- traefik.enable=true
- traefik.http.routers.api.rule=PathPrefix(`/api` )
logto:
image: svhd/logto:latest
environment:
- DB_URL=postgresql://postgres:5432/logto
labels:
- traefik.enable=true
- traefik.http.routers.auth.rule=PathPrefix(`/auth` )
2026-04-15 15:28:44 +02:00
cameleer-server:
image: gitea.siegeln.net/cameleer/cameleer-server:${VERSION}
2026-04-04 14:45:33 +02:00
environment:
- CLICKHOUSE_URL=jdbc:clickhouse://clickhouse:8123/cameleer
labels:
- traefik.enable=true
- traefik.http.routers.observe.rule=PathPrefix(`/observe` )
postgres:
image: postgres:16-alpine
volumes: [pgdata:/var/lib/postgresql/data]
clickhouse:
image: clickhouse/clickhouse-server:latest
volumes: [chdata:/var/lib/clickhouse]
volumes:
pgdata:
chdata:
acme:
```
### Docker vs K8s Feature Matrix
| Feature | Docker Compose | Kubernetes |
|---|---|---|
| Deploy Camel apps | Yes (Docker API) | Yes (K8s API) |
| Multiple environments | Yes (separate containers) | Yes (separate Deployments) |
| Agent injection | Yes | Yes |
| Observability (traces, topology) | Yes | Yes |
| Identity / SSO / Teams | Yes (Logto) | Yes (Logto) |
| Licensing | Yes | Yes |
| Auto-scaling | No | Yes (HPA) |
| Network isolation (multi-tenant) | Docker networks | NetworkPolicies |
| GitOps deployment | No (manual updates) | Yes (Flux CD) |
| Rolling updates | Manual restart | Native |
| Platform monitoring | Optional (customer adds Grafana) | Standard (Prometheus/Grafana/Loki) |
| Certificate management | Traefik ACME | cert-manager |
## Revised Phase Roadmap
### Phase 2: Tenants + Identity + Licensing
**Goal:** A customer can sign up, get a tenant, and access the platform via Traefik.
- Integrate Logto as identity provider
- Replace custom user-facing auth (login, registration, password management)
- Keep Ed25519 JWT for machine tokens (agent bootstrap, license signing)
- Configure Logto organizations to map to tenants
- Tenant entity + CRUD API
- License token generation (Ed25519 signed JWT: tier, features, limits, expiry)
- Traefik integration with ForwardAuth middleware
- Docker Compose production stack (6 containers)
- Externalize Ed25519 keys (mounted files, not per-boot)
**Files to modify/create:**
- `src/main/java/net/siegeln/cameleer/saas/tenant/` — new package
- `src/main/java/net/siegeln/cameleer/saas/license/` — new package
- `src/main/java/net/siegeln/cameleer/saas/config/SecurityConfig.java` — Logto OIDC integration
- `src/main/resources/db/migration/V005__create_tenants.sql`
- `src/main/resources/db/migration/V006__create_licenses.sql`
- `docker-compose.yml` — expand to full production stack
- `traefik.yml` — static config
- `src/main/resources/application.yml` — Logto + Traefik config
### Phase 3: Runtime Orchestration + Environments
**Goal:** Customer can upload a Camel JAR, deploy it to dev/prod, see it running with agent attached.
- `RuntimeOrchestrator` interface
- `DockerRuntimeOrchestrator` implementation (docker-java)
- Customer JAR upload endpoint
- Image build pipeline (Dockerfile template + docker build)
- Logical environment model (dev/test/prod per tenant)
- Environment-specific config overlays
- App lifecycle API (deploy, start, stop, restart, logs, health)
**Key dependencies:** docker-java, Kaniko (for future K8s)
### Phase 4: Observability Pipeline
**Goal:** Customer can see traces, metrics, and route topology for deployed apps.
2026-04-15 15:28:44 +02:00
- Connect cameleer-server to customer app containers
2026-04-04 14:45:33 +02:00
- ClickHouse tenant-scoped data partitioning
2026-04-15 15:28:44 +02:00
- Observability API proxy (tenant-aware routing to cameleer-server)
2026-04-04 14:45:33 +02:00
- Basic topology graph endpoint
- Agent ↔ server connectivity verification
### Phase 5: K8s Operational Layer
**Goal:** Same product works on K8s with operational enhancements.
- `KubernetesRuntimeOrchestrator` implementation (fabric8)
- Kaniko-based image builds
- Flux CD integration for platform GitOps
- Namespace-per-tenant provisioning
- NetworkPolicies, ResourceQuotas
- Helm chart for K8s deployment
- Registry integration (Gitea registry / registry:2)
### Phase 6: Billing
**Goal:** Customers can subscribe and pay.
- Stripe Checkout integration
- Subscription lifecycle (create, upgrade, downgrade, cancel)
- Tier enforcement (feature gating based on active subscription)
- Usage tracking in platform DB (prep for Lago integration later)
- Webhook handling for payment events
### Phase 7: Security Hardening + Monitoring
**Goal:** Production-hardened platform.
- Prometheus/Grafana/Loki stack (optional Docker compose overlay, standard K8s)
- SOC 2 compliance review
- Rate limiting
- Container image signing (cosign)
- Supply chain security (SBOM, Trivy scanning)
- Audit log shipping to separate sink
### Frontend (React Shell) — Parallel Track (Phase 2+)
- Can start as soon as Phase 2 API contracts are defined
- Uses `@cameleer/design-system`
- Screens: login, dashboard, app deployment, environment management, observability views, team management, billing
## Verification Plan
### Phase 2 Verification
1. `docker compose up` starts all 6 containers
2. Navigate to Logto admin, create a user
3. User logs in via OIDC flow through Traefik
4. API calls with JWT include `X-Tenant-Id` header
5. License token can be generated and verified
6. All existing tests still pass
### Phase 3 Verification
1. Upload a sample Camel JAR via API
2. Platform builds container image
3. Deploy to "dev" environment
2026-04-15 15:28:44 +02:00
4. Container starts with cameleer agent attached
2026-04-04 14:45:33 +02:00
5. App is reachable via Traefik routing
6. Logs are accessible via API
7. Deploy same image to "prod" with different config
### Phase 4 Verification
2026-04-15 15:28:44 +02:00
1. Running Camel app sends traces to cameleer-server
2026-04-04 14:45:33 +02:00
2. Traces visible in ClickHouse with correct tenant_id
3. Topology graph shows route structure
4. Different tenant cannot see another tenant's data
### Phase 5 Verification
1. Helm install deploys full platform to k3s
2. Tenant provisioning creates namespace + resources
3. App deployment creates K8s Deployment + Service
4. Kaniko builds image and pushes to registry
5. NetworkPolicy blocks cross-tenant traffic
6. Same API contracts work as Docker mode
### End-to-End Smoke Test (Any Phase)
```bash
# Docker Compose
docker compose up -d
# Create tenant + user via API/Logto
# Upload sample Camel JAR
# Deploy to environment
2026-04-15 15:28:44 +02:00
# Verify agent connects to cameleer-server
2026-04-04 14:45:33 +02:00
# Verify traces in ClickHouse
# Verify observability API returns data
```