Add SaaS platform PRD
Comprehensive product requirements document covering: - Four-tier structure (Low/Mid shared, High/Business dedicated) - Modular monolith architecture (Spring Boot + React) - Camel application runtime with agent auto-injection - Flux CD GitOps provisioning, build-once-deploy-often pipeline - Dual-mode license (SaaS API + air-gapped file) - SOC 2 day-1 compliance, zero-trust tenant isolation - Stripe billing (usage-based + committed resources) - Platform self-monitoring (Prometheus/Grafana/Loki) - Exchange Replay MOAT feature Gitea epics: cameleer/cameleer-saas #1-#13 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
684
docs/superpowers/specs/2026-03-29-saas-platform-prd.md
Normal file
684
docs/superpowers/specs/2026-03-29-saas-platform-prd.md
Normal file
@@ -0,0 +1,684 @@
|
|||||||
|
# Cameleer SaaS Platform — Product Requirements Document
|
||||||
|
|
||||||
|
**Status:** Draft — Awaiting Review
|
||||||
|
**Date:** 2026-03-29
|
||||||
|
**Author:** Hendrik Siegeln + Claude (brainstorming session)
|
||||||
|
**Gitea Project:** cameleer/cameleer-saas
|
||||||
|
**Gitea Epics:** #1–#13
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Product Definition
|
||||||
|
|
||||||
|
**Cameleer SaaS** is a Camel application runtime platform with built-in observability. Customers deploy Apache Camel applications and get zero-configuration tracing, topology mapping, payload lineage, distributed correlation, live debugging, and exchange replay — powered by the cameleer3 agent (auto-injected) and cameleer3-server (managed per tenant).
|
||||||
|
|
||||||
|
### Three Pillars
|
||||||
|
|
||||||
|
1. **Runtime** — Deploy and run Camel applications with automatic agent injection
|
||||||
|
2. **Observability** — Per-tenant cameleer3-server (traces, topology, lineage, correlation, debugger, replay)
|
||||||
|
3. **Management** — Auth, billing, teams, provisioning, secrets, environments
|
||||||
|
|
||||||
|
### Two Deployment Modes
|
||||||
|
|
||||||
|
- **SaaS (managed)** — Fully managed by the Cameleer platform
|
||||||
|
- **Self-hosted / Air-gapped** — Customer-operated, license-enforced feature parity with SaaS tiers
|
||||||
|
|
||||||
|
### Relationship to Existing Components
|
||||||
|
|
||||||
|
| Component | Role | Changes Required |
|
||||||
|
|-----------|------|------------------|
|
||||||
|
| cameleer3 (agent) | Zero-code Camel instrumentation, auto-injected into customer JARs | MOAT features (lineage, correlation, debugger, replay) |
|
||||||
|
| cameleer3-server | Per-tenant observability backend | Managed mode (trust SaaS JWT), license module, MOAT features |
|
||||||
|
| cameleer-saas (this repo) | SaaS management platform — control plane | New: everything in this document |
|
||||||
|
| design-system | Shared React component library | Used by both SaaS shell and server UI |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Tier Structure
|
||||||
|
|
||||||
|
### Tier Matrix
|
||||||
|
|
||||||
|
| Dimension | Low | Mid | High | Business |
|
||||||
|
|-----------|-----|-----|------|----------|
|
||||||
|
| **Infrastructure** | Shared cluster, shared PG/OS | Shared cluster, shared PG/OS | Dedicated cluster(s) | Dedicated cluster(s) |
|
||||||
|
| **Pricing** | Base fee + usage (data vol, CPU, RAM) | Base fee + usage (data vol, CPU, RAM) | Committed resources | Committed resources |
|
||||||
|
| **Environments** | 1 (prod) | 2 (dev, prod) | Unlimited | Unlimited |
|
||||||
|
| **Agents** | Limited | Higher limit | Unlimited | Unlimited |
|
||||||
|
| **Data Retention** | 7 days | 30 days | 90 days | Custom |
|
||||||
|
| **Topology Graph** | Yes | Yes | Yes | Yes |
|
||||||
|
| **Payload Lineage** | Limited (route-scope only, max 10 captures/min) | Full | Full | Full |
|
||||||
|
| **Cross-Service Correlation** | No | Yes | Yes | Yes |
|
||||||
|
| **Live Route Debugger** | No | No | Yes | Yes |
|
||||||
|
| **Exchange Replay** | No | No | Yes | Yes |
|
||||||
|
| **SSO / OIDC** | No | No | Yes | Yes |
|
||||||
|
| **Custom Roles** | No | No | Yes | Yes |
|
||||||
|
| **Team Management** | Basic | Basic | Full | Full |
|
||||||
|
| **Secrets** | Platform-native | Platform-native + 1 vault | Unlimited vaults | Unlimited vaults |
|
||||||
|
| **Support** | Docs | Email | Priority | Dedicated CSM |
|
||||||
|
| **SLA** | Best effort | 99.5% | 99.9% | 99.95%+ custom |
|
||||||
|
| **VPN (future)** | No | No | Yes | Yes |
|
||||||
|
|
||||||
|
### Pricing Models
|
||||||
|
|
||||||
|
**Usage-based (Low/Mid):**
|
||||||
|
- Optional small monthly base fee
|
||||||
|
- Metered dimensions: data volume (GB ingested), CPU (core·hours), RAM (GB·hours)
|
||||||
|
- Stripe metered subscriptions with periodic usage reporting
|
||||||
|
|
||||||
|
**Committed resources (High/Business):**
|
||||||
|
- Fixed pricing based on reserved cluster capacity (CPU cores, RAM, storage, node count)
|
||||||
|
- Annual or multi-year contracts
|
||||||
|
- Overage alerts (upsell, not automatic billing)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. System Architecture
|
||||||
|
|
||||||
|
### Approach: Modular Monolith Control Plane
|
||||||
|
|
||||||
|
Single Spring Boot application with well-bounded internal modules. K8s ingress handles tenant routing. Flux CD handles infrastructure reconciliation.
|
||||||
|
|
||||||
|
```
|
||||||
|
[Browser] → [Ingress (Traefik/Envoy)] → [SaaS Platform (modular Spring Boot)]
|
||||||
|
↓ (tenant routes) ↓ (provisioning)
|
||||||
|
[Tenant cameleer3-server] [Flux CD → K8s]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Component Map
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────┐
|
||||||
|
│ Ingress (Traefik/Envoy) │
|
||||||
|
│ TLS termination, tenant routing │
|
||||||
|
└──────┬──────────────┬──────────────┬────┘
|
||||||
|
│ │ │
|
||||||
|
┌──────────▼──────────┐ │ ┌─────────▼─────────┐
|
||||||
|
│ SaaS Management │ │ │ Grafana/Prometheus│
|
||||||
|
│ Platform │ │ │ (self-monitoring) │
|
||||||
|
│ (Spring Boot) │ │ └───────────────────┘
|
||||||
|
│ │ │
|
||||||
|
│ Modules: │ │
|
||||||
|
│ ├─ Auth │ │
|
||||||
|
│ ├─ Billing │ │
|
||||||
|
│ ├─ Provisioning │ │
|
||||||
|
│ ├─ Runtime │ │
|
||||||
|
│ ├─ License │ │
|
||||||
|
│ ├─ Secrets │ │
|
||||||
|
│ └─ Audit │ │
|
||||||
|
└──┬───┬──────┬───────┘ │
|
||||||
|
│ │ │ │
|
||||||
|
┌────────┘ │ └──────┐ │
|
||||||
|
▼ ▼ ▼ │
|
||||||
|
┌──────────────┐ ┌────────┐ ┌──────────▼───────────────┐
|
||||||
|
│ Platform DB │ │ Stripe │ │ Shared K8s Cluster │
|
||||||
|
│ (PostgreSQL) │ │ API │ │ │
|
||||||
|
│ - tenants │ └────────┘ │ ┌─────────────────────┐ │
|
||||||
|
│ - users │ │ │ tenant-a namespace │ │
|
||||||
|
│ - teams │ ┌─────┐ │ │ ├─ cameleer3-server │ │
|
||||||
|
│ - audit log │ │Flux │ │ │ ├─ camel-app-1 │ │
|
||||||
|
│ - licenses │ │ CD │ │ │ ├─ camel-app-2 │ │
|
||||||
|
└──────────────┘ └──┬──┘ │ │ └─ NetworkPolicies │ │
|
||||||
|
│ │ └─────────────────────┘ │
|
||||||
|
┌───────▼──┐ │ ┌─────────────────────┐ │
|
||||||
|
│ GitOps │ │ │ tenant-b namespace │ │
|
||||||
|
│ Repo │ │ │ └─ ... │ │
|
||||||
|
│(HelmRel) │ │ └─────────────────────┘ │
|
||||||
|
└──────────┘ │ │
|
||||||
|
│ Shared: │
|
||||||
|
│ ├─ PostgreSQL (tenant │
|
||||||
|
│ │ schemas) │
|
||||||
|
│ ├─ OpenSearch (tenant │
|
||||||
|
│ │ indices) │
|
||||||
|
│ └─ Container Registry │
|
||||||
|
└──────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dedicated Tier (High/Business)
|
||||||
|
|
||||||
|
Same management platform routes to dedicated cluster(s) per customer. Dedicated PostgreSQL, OpenSearch, and container registry within the customer's cluster. Provisioned semi-manually at launch (Flux bootstrap), full Cluster API automation deferred.
|
||||||
|
|
||||||
|
### Tech Stack
|
||||||
|
|
||||||
|
| Component | Technology |
|
||||||
|
|-----------|------------|
|
||||||
|
| Management Platform backend | Spring Boot 3, Java 21 |
|
||||||
|
| Management Platform frontend | React, @cameleer/design-system |
|
||||||
|
| Platform database | PostgreSQL |
|
||||||
|
| Tenant observability | cameleer3-server (Spring Boot), PostgreSQL, OpenSearch |
|
||||||
|
| GitOps | Flux CD |
|
||||||
|
| K8s distribution | Talos (production), k3s (dev) |
|
||||||
|
| Ingress | Traefik or Envoy |
|
||||||
|
| Billing | Stripe (Subscriptions + Usage Records API) |
|
||||||
|
| Auth | Spring Security OAuth2, Ed25519 JWT |
|
||||||
|
| Secrets sync | K8s External Secrets Operator |
|
||||||
|
| Container registry | Platform-managed (Harbor or Gitea Container Registry) |
|
||||||
|
| Monitoring | Prometheus, Grafana, Loki, Alertmanager |
|
||||||
|
| Image signing | cosign/sigstore |
|
||||||
|
| Image scanning | Trivy |
|
||||||
|
|
||||||
|
### Key Architectural Decisions
|
||||||
|
|
||||||
|
1. **Modular monolith** — Single Spring Boot app with clean module boundaries. Extractable later if needed.
|
||||||
|
2. **K8s ingress handles routing** — Tenant routing via path or subdomain. No custom API gateway.
|
||||||
|
3. **Flux CD for reconciliation** — HelmRelease CRs per tenant. Drift detection, self-healing. K8s-distribution-agnostic.
|
||||||
|
4. **Platform DB separate from tenant data** — Management platform has its own PostgreSQL. Tenant observability data in separate shared (or dedicated) instances.
|
||||||
|
5. **Immutable artifact pipeline** — JAR upload → container image → promote through environments. Same binary everywhere.
|
||||||
|
6. **Dual-mode auth** — SaaS mode: platform is the IdP. Air-gapped mode: server uses standalone auth with local license file.
|
||||||
|
7. **SOC 2 baked in** — Not bolted on. Audit logging, encryption, image signing, SBOM from day 1.
|
||||||
|
8. **Self-monitoring** — Prometheus + Grafana stack, completely separate from tenant observability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Data Architecture
|
||||||
|
|
||||||
|
### Platform Database (Management Platform)
|
||||||
|
|
||||||
|
Stores all SaaS control plane data — completely separate from tenant observability data.
|
||||||
|
|
||||||
|
| Table/Domain | Purpose |
|
||||||
|
|---|---|
|
||||||
|
| `tenants` | Tenant record: ID, name, tier, status, Stripe customer ID, created_at |
|
||||||
|
| `users` | Platform users: email, password hash, MFA, status |
|
||||||
|
| `tenant_members` | User-to-tenant mapping with role |
|
||||||
|
| `teams` | Team groupings within a tenant |
|
||||||
|
| `roles` / `permissions` | RBAC definitions (predefined + custom for high/business) |
|
||||||
|
| `licenses` | License records: tenant, tier, feature flags, limits, expiry, signing key |
|
||||||
|
| `audit_log` | Immutable append-only log: actor, action, resource, timestamp, IP, tenant |
|
||||||
|
| `applications` | Deployed Camel app metadata: name, tenant, version, image ref, status |
|
||||||
|
| `secrets_metadata` | Secret references (actual values in K8s Secrets or external vault) |
|
||||||
|
| `vault_configs` | External vault connection configs per tenant |
|
||||||
|
| `provisioning_events` | Tenant provisioning pipeline state and history |
|
||||||
|
| `billing_usage` | Aggregated usage snapshots before Stripe reporting |
|
||||||
|
|
||||||
|
### Tenant Data (Shared PostgreSQL)
|
||||||
|
|
||||||
|
Each tenant's cameleer3-server uses its own PostgreSQL schema on the shared instance (dedicated instance for high/business). This is the existing cameleer3-server data model — unchanged:
|
||||||
|
|
||||||
|
- Route executions, processor traces, metrics
|
||||||
|
- Route graph topology
|
||||||
|
- Agent registrations, config history
|
||||||
|
- Lineage captures, correlation traces, debug sessions
|
||||||
|
|
||||||
|
### Tenant Data (Shared OpenSearch)
|
||||||
|
|
||||||
|
- `{tenant_id}-executions-*` — time-series execution data
|
||||||
|
- `{tenant_id}-traces-*` — processor-level traces
|
||||||
|
- Full index-level isolation with index templates per tenant
|
||||||
|
|
||||||
|
### Self-Monitoring Data
|
||||||
|
|
||||||
|
Completely separate: Prometheus TSDB for metrics, Loki for logs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Identity & Access Management
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
The SaaS management platform is the single identity plane. It owns authentication and authorization. Per-tenant cameleer3-server instances trust SaaS-issued tokens.
|
||||||
|
|
||||||
|
- Spring Security OAuth2 for OIDC federation with customer IdPs
|
||||||
|
- Ed25519 JWT signing (consistent with existing cameleer3-server pattern)
|
||||||
|
- Tokens carry: tenant ID, user ID, roles, feature entitlements
|
||||||
|
- cameleer3-server validates SaaS-issued JWTs in managed mode
|
||||||
|
- Standalone mode retains its own auth for air-gapped deployments
|
||||||
|
|
||||||
|
### RBAC Model
|
||||||
|
|
||||||
|
| Role | Capabilities |
|
||||||
|
|------|-------------|
|
||||||
|
| Owner | Full tenant admin, billing, team management, delete tenant |
|
||||||
|
| Admin | Manage apps, secrets, team members, environments. No billing. |
|
||||||
|
| Developer | Deploy apps, view traces, use debugger/replay. No team management. |
|
||||||
|
| Viewer | Read-only access to dashboards, traces, topology |
|
||||||
|
|
||||||
|
High/Business tiers: custom roles with granular permissions (e.g., "can replay in dev but not prod").
|
||||||
|
|
||||||
|
### Team Management
|
||||||
|
|
||||||
|
- Invite by email
|
||||||
|
- Role assignment per user
|
||||||
|
- Basic (low/mid): single team, predefined roles
|
||||||
|
- Full (high/business): multiple teams, custom roles, team-scoped permissions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Tenant Provisioning
|
||||||
|
|
||||||
|
### Shared Tier Flow (Low/Mid)
|
||||||
|
|
||||||
|
```
|
||||||
|
Customer signs up + payment
|
||||||
|
→ Create tenant record + Stripe customer/subscription
|
||||||
|
→ Generate signed license token (Ed25519)
|
||||||
|
→ Create Flux HelmRelease CR
|
||||||
|
→ Flux reconciles: namespace, ResourceQuota, NetworkPolicies, cameleer3-server
|
||||||
|
→ Provision PostgreSQL schema + per-tenant credentials
|
||||||
|
→ Provision OpenSearch index template + per-tenant credentials
|
||||||
|
→ Readiness check: server healthy, DB migrated, auth working
|
||||||
|
→ Generate bootstrap tokens, present onboarding instructions
|
||||||
|
→ Tenant status → ACTIVE
|
||||||
|
```
|
||||||
|
|
||||||
|
**Target: < 5 minutes from payment to active environment.**
|
||||||
|
|
||||||
|
### Dedicated Tier Flow (High/Business)
|
||||||
|
|
||||||
|
Semi-manual at launch:
|
||||||
|
1. Customer signs committed resource agreement
|
||||||
|
2. Operator provisions dedicated cluster (Talos)
|
||||||
|
3. Flux bootstrap deploys full stack
|
||||||
|
4. Management platform configured to route to dedicated cluster
|
||||||
|
5. From this point, automated (same lifecycle management as shared)
|
||||||
|
|
||||||
|
Full Cluster API automation deferred to future release.
|
||||||
|
|
||||||
|
### Lifecycle Operations
|
||||||
|
|
||||||
|
| Operation | Mechanism |
|
||||||
|
|-----------|-----------|
|
||||||
|
| Suspension (non-payment) | Scale tenant workloads to 0, license set to suspended |
|
||||||
|
| Reactivation | Scale back up, license reactivated |
|
||||||
|
| Deletion | Remove namespace, drop PG schema, delete OS indices, scrub audit log references. GDPR compliant. |
|
||||||
|
| Tier upgrade (shared → dedicated) | Provision dedicated cluster, migrate data, update routing. Downtime window coordinated. |
|
||||||
|
| Tier downgrade | Reverse of upgrade. Data retention limits applied. |
|
||||||
|
|
||||||
|
### Failure Handling
|
||||||
|
|
||||||
|
- Each provisioning step is idempotent and retryable
|
||||||
|
- State machine in platform DB tracks progress per step
|
||||||
|
- Failed provisioning → alert ops + notify customer with ETA
|
||||||
|
- Partial provisioning cleanup on permanent failure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Camel Application Runtime
|
||||||
|
|
||||||
|
### JAR Upload → Immutable Image
|
||||||
|
|
||||||
|
1. **Validation** — File type check, size limit per tier, SHA-256 checksum, Trivy security scan, secret detection (reject JARs with embedded credentials)
|
||||||
|
2. **Image Build** — Templated Dockerfile: distroless JRE base + customer JAR + cameleer3-agent.jar + `-javaagent` flag + agent pre-configured for tenant server. Image tagged: `registry/{tenant}/{app}:v{N}-{sha256short}`. Signed with cosign. SBOM attached.
|
||||||
|
3. **Registry Push** — Per-tenant repository in platform container registry
|
||||||
|
4. **Deploy** — K8s Deployment in tenant namespace with resource limits, secrets mounted, config injected, NetworkPolicy applied, liveness/readiness probes
|
||||||
|
|
||||||
|
### Environment Promotion
|
||||||
|
|
||||||
|
```
|
||||||
|
dev → staging → prod
|
||||||
|
(same image tag, different config + secrets per environment)
|
||||||
|
```
|
||||||
|
|
||||||
|
- Promotion = deploy existing image tag to target environment (no rebuild)
|
||||||
|
- Rollback = redeploy previous image tag
|
||||||
|
- Every promotion audit logged (who, what, from, to)
|
||||||
|
|
||||||
|
### Environment Model
|
||||||
|
|
||||||
|
| Tier | Default Environments | Custom Environments |
|
||||||
|
|------|---------------------|-------------------|
|
||||||
|
| Low | prod | No |
|
||||||
|
| Mid | dev, prod | No |
|
||||||
|
| High | dev, staging, prod | Unlimited |
|
||||||
|
| Business | dev, staging, prod | Unlimited |
|
||||||
|
|
||||||
|
### Application Deployment Page
|
||||||
|
|
||||||
|
Central UI for managing each deployed application:
|
||||||
|
|
||||||
|
- **Deploy** — Upload JAR, view build status, deploy to environment, promote, rollback
|
||||||
|
- **Configuration** — Environment variables, JVM options, agent config overrides, application properties. Per-environment. Changes trigger rolling restart.
|
||||||
|
- **Secrets** — Create/edit platform-managed secrets. Link external vault secrets. Scoped per environment. Masked in UI, reveal with audit log.
|
||||||
|
- **Status** — Pod health, resource usage, agent connection status, recent events
|
||||||
|
- **Logs** — Live stdout/stderr stream
|
||||||
|
- **Versions** — Image history, promotion history, rollback targets
|
||||||
|
|
||||||
|
### Application Lifecycle
|
||||||
|
|
||||||
|
| Action | Mechanism |
|
||||||
|
|--------|-----------|
|
||||||
|
| Deploy | Upload JAR → build image → deploy to environment |
|
||||||
|
| Promote | Redeploy same image tag to next environment |
|
||||||
|
| Rollback | Redeploy previous image tag |
|
||||||
|
| Scale | Update replica count |
|
||||||
|
| Stop | Scale to 0 (preserves config) |
|
||||||
|
| Delete | Remove Deployment + clean registry images per retention |
|
||||||
|
| Logs | Stream via K8s log API |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Observability Integration
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
Each tenant gets a dedicated cameleer3-server instance:
|
||||||
|
- Shared tiers: deployed in tenant's namespace
|
||||||
|
- Dedicated tiers: deployed in tenant's cluster
|
||||||
|
|
||||||
|
The SaaS API gateway routes `/t/{tenant}/api/*` to the correct server instance. The server's React UI is embedded in the SaaS shell (nav, tenant switcher, billing pages provided by the shell; product UI rendered inside).
|
||||||
|
|
||||||
|
### Agent Connection
|
||||||
|
|
||||||
|
- Agent bootstrap tokens generated by the SaaS platform
|
||||||
|
- Agents connect directly to their tenant's cameleer3-server instance
|
||||||
|
- Agent auto-injected into customer Camel apps deployed on the platform
|
||||||
|
- External agents (customer-hosted Camel apps) can also connect using bootstrap tokens
|
||||||
|
|
||||||
|
### MOAT Features (gated by license)
|
||||||
|
|
||||||
|
| Feature | Description | Tier Availability |
|
||||||
|
|---------|-------------|-------------------|
|
||||||
|
| **Topology Graph** | Route dependency visualization from existing execution data | All tiers |
|
||||||
|
| **Payload Flow Lineage** | Per-processor before/after capture + format-aware diff | Limited on Low (route-scope only, max 10 captures/min), Full on Mid+ |
|
||||||
|
| **Cross-Service Correlation** | Distributed trace assembly + service dependency graph | Mid+ |
|
||||||
|
| **Live Route Debugger** | Browser-based route stepping with breakpoints | High+ |
|
||||||
|
| **Exchange Replay** | Re-execute recorded exchange with modified payload, fully audited | High+ |
|
||||||
|
|
||||||
|
### Server Configuration
|
||||||
|
|
||||||
|
- SaaS platform pushes tier-specific config: feature flags, retention limits, resource limits
|
||||||
|
- Server runs in "managed mode": trusts SaaS-issued JWTs, reports metrics back to platform
|
||||||
|
- Air-gapped mode: standalone with local license file
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Secrets Management
|
||||||
|
|
||||||
|
### Day 1 Requirements
|
||||||
|
|
||||||
|
- **Platform-native secret store** — Encrypted at rest in K8s Secrets (sealed-secrets or SOPS)
|
||||||
|
- **External vault integration** — HashiCorp Vault at launch. AWS Secrets Manager, Azure Key Vault, GCP Secret Manager deferred to future release.
|
||||||
|
- **Injection** — Secrets injected into Camel app containers as environment variables or mounted files
|
||||||
|
- **Rotation** — Update secret → rolling restart of affected apps
|
||||||
|
- **RBAC** — Only authorized team members can create/view/rotate secrets
|
||||||
|
- **Per-environment scoping** — Dev secrets ≠ prod secrets
|
||||||
|
- **K8s External Secrets Operator** — Syncs external vault secrets into K8s Secrets
|
||||||
|
|
||||||
|
### Tenant Isolation
|
||||||
|
|
||||||
|
- Secrets strictly scoped to tenant + environment
|
||||||
|
- No cross-tenant secret access possible
|
||||||
|
- Envelope encryption with per-tenant keys on shared storage
|
||||||
|
|
||||||
|
### Audit
|
||||||
|
|
||||||
|
- Every secret access logged (create, read, update, delete, inject)
|
||||||
|
- Audit trail queryable by tenant admins
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. License & Feature Gating
|
||||||
|
|
||||||
|
### License Token
|
||||||
|
|
||||||
|
Ed25519-signed JWT containing:
|
||||||
|
- Tenant ID, tier, expiry
|
||||||
|
- Feature flags (topology, lineage, correlation, debugger, replay)
|
||||||
|
- Resource limits (agents, retention, environments, vaults, debug sessions)
|
||||||
|
- SSO/OIDC and custom roles entitlements
|
||||||
|
|
||||||
|
### Dual Validation
|
||||||
|
|
||||||
|
| Mode | Mechanism |
|
||||||
|
|------|-----------|
|
||||||
|
| **SaaS** | Server polls platform API `GET /api/license/{tenant}`, caches 5 min, 24h grace on API failure |
|
||||||
|
| **Air-gapped** | Server validates local file `/etc/cameleer/license.jwt`, Ed25519 signature verification |
|
||||||
|
|
||||||
|
Both modes produce the same `LicenseContext` singleton used throughout the server.
|
||||||
|
|
||||||
|
### Enforcement
|
||||||
|
|
||||||
|
- Feature endpoints return 403 with `not_entitled` reason and upgrade URL
|
||||||
|
- Graceful degradation: features disabled, not errors
|
||||||
|
- License expiry: 7-day grace period (read-only mode), then hard cutoff
|
||||||
|
|
||||||
|
### Lifecycle
|
||||||
|
|
||||||
|
- Generated on tenant signup, regenerated on tier change
|
||||||
|
- Air-gapped: downloadable from management platform
|
||||||
|
- Non-payment: license suspended → grace period → expired
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Networking & Tenant Isolation
|
||||||
|
|
||||||
|
### Day 1: Namespace Isolation (Shared Tiers)
|
||||||
|
|
||||||
|
K8s NetworkPolicies per tenant namespace:
|
||||||
|
- **Default deny** all ingress/egress between tenant namespaces
|
||||||
|
- **Allow:** tenant namespace → shared PostgreSQL/OpenSearch (authenticated per-tenant credentials)
|
||||||
|
- **Allow:** tenant namespace → public internet (Camel app external connectivity)
|
||||||
|
- **Allow:** SaaS platform namespace → all tenant namespaces (management access)
|
||||||
|
- **Allow:** tenant Camel apps → tenant cameleer3-server (intra-namespace)
|
||||||
|
|
||||||
|
### Zero-Trust Tenant Boundary
|
||||||
|
|
||||||
|
- Per-tenant database credentials (not shared superuser with row filtering)
|
||||||
|
- Per-tenant OpenSearch roles with index-level ACLs
|
||||||
|
- Connection pooling per tenant (PgBouncer per namespace)
|
||||||
|
- A compromised tenant server physically cannot query another tenant's data
|
||||||
|
|
||||||
|
### Future: VPN / Private Connectivity
|
||||||
|
|
||||||
|
- WireGuard or IPsec tunnels to customer infrastructure
|
||||||
|
- Private DNS resolution for customer internal hostnames
|
||||||
|
- Available on High/Business tiers only
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Security & SOC 2 Compliance
|
||||||
|
|
||||||
|
### Encryption
|
||||||
|
|
||||||
|
| Layer | Mechanism |
|
||||||
|
|-------|-----------|
|
||||||
|
| In transit (external) | TLS 1.3 at ingress |
|
||||||
|
| In transit (internal) | mTLS between services |
|
||||||
|
| In transit (agent ↔ server) | TLS + Ed25519-signed config payloads |
|
||||||
|
| At rest (databases) | Volume encryption (LUKS or cloud-native) |
|
||||||
|
| At rest (secrets) | Envelope encryption with per-tenant keys |
|
||||||
|
| At rest (registry) | Encrypted storage backend |
|
||||||
|
|
||||||
|
### Audit Trail
|
||||||
|
|
||||||
|
Every state-changing action produces an immutable audit record:
|
||||||
|
- Actor, tenant, action, resource, environment, source IP, result, metadata
|
||||||
|
- Append-only table with no UPDATE/DELETE grants
|
||||||
|
- Minimum 1 year retention
|
||||||
|
- Shipped to separate write-only sink (survives platform DB compromise)
|
||||||
|
- Covers: auth events, provisioning, deployments, config changes, secret access, billing, replay executions, debug sessions, admin actions
|
||||||
|
|
||||||
|
### Container Hardening
|
||||||
|
|
||||||
|
- Distroless base images (no shell in production)
|
||||||
|
- Read-only filesystem
|
||||||
|
- K8s Pod Security Standards: restricted profile (no root, no privilege escalation, no host access)
|
||||||
|
- Resource limits enforced — compromised tenant can't fork-bomb the node
|
||||||
|
|
||||||
|
### Supply Chain Security
|
||||||
|
|
||||||
|
- Container images signed with cosign/sigstore
|
||||||
|
- SBOM generated per build
|
||||||
|
- Dependency pinning (no floating versions)
|
||||||
|
- Trivy scanning in CI — block on critical CVEs
|
||||||
|
- Customer JAR uploads scanned
|
||||||
|
|
||||||
|
### Breach Detection
|
||||||
|
|
||||||
|
- Anomaly alerting: unusual API patterns, auth failures, cross-namespace DNS queries
|
||||||
|
- Runtime security scanning (Falco or similar)
|
||||||
|
- Audit log anomaly detection
|
||||||
|
|
||||||
|
### Payload Protection
|
||||||
|
|
||||||
|
- Application-level encryption of customer exchange payloads with per-tenant keys before writing to PG/OS
|
||||||
|
- Tenant key rotation without downtime
|
||||||
|
- Payload redaction rules configurable per tenant (agent already supports this)
|
||||||
|
|
||||||
|
### Compliance
|
||||||
|
|
||||||
|
- SOC 2 Trust Service Criteria: Security (CC6), Availability (A1), Processing Integrity (PI1), Confidentiality (C1), Privacy (P1-P8)
|
||||||
|
- Evidence collection: git history (change management), audit log (access), Prometheus (availability)
|
||||||
|
- Evaluate Vanta or Drata for continuous compliance monitoring
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Platform Operations & Self-Monitoring
|
||||||
|
|
||||||
|
### Monitoring Stack
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| Prometheus | Metrics collection (platform + tenant infra + K8s) |
|
||||||
|
| Grafana | Dashboards |
|
||||||
|
| Loki | Log aggregation |
|
||||||
|
| Alertmanager | Alert routing → PagerDuty/OpsGenie/Slack |
|
||||||
|
| Uptime Kuma or Checkly | External synthetic monitoring |
|
||||||
|
|
||||||
|
Completely separate from tenant observability data.
|
||||||
|
|
||||||
|
### Key Day-1 Alerts
|
||||||
|
|
||||||
|
- Control plane down/degraded
|
||||||
|
- Tenant provisioning failure
|
||||||
|
- Database connection pool exhaustion
|
||||||
|
- OpenSearch cluster red/yellow
|
||||||
|
- Flux reconciliation failure
|
||||||
|
- TLS certificate expiry < 14 days
|
||||||
|
- Metering pipeline stale > 1 hour
|
||||||
|
- Disk usage > 80% on any PV
|
||||||
|
- Tenant cameleer3-server unhealthy > 5 minutes
|
||||||
|
- OOMKill on any tenant workload
|
||||||
|
|
||||||
|
### Dashboards
|
||||||
|
|
||||||
|
- Platform overview: tenant count, active agents, provisioning queue, error rates
|
||||||
|
- Per-tenant health: server status, app status, resource usage
|
||||||
|
- Billing: MRR, usage trends, metering pipeline health
|
||||||
|
- Infrastructure: cluster capacity, node utilization, storage growth
|
||||||
|
- Security: auth failures, audit log anomalies, certificate status
|
||||||
|
|
||||||
|
### SLA Reporting
|
||||||
|
|
||||||
|
- Automated uptime calculation per tenant
|
||||||
|
- SLA breach detection and alerting
|
||||||
|
- Monthly availability reports for high/business tier customers
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Billing & Metering
|
||||||
|
|
||||||
|
### Metering Pipeline (Low/Mid Tiers)
|
||||||
|
|
||||||
|
```
|
||||||
|
K8s Metrics → Metrics Collector → Usage Aggregator (hourly) → Stripe Usage Records API
|
||||||
|
```
|
||||||
|
|
||||||
|
| Dimension | Unit | Source |
|
||||||
|
|-----------|------|--------|
|
||||||
|
| CPU | core·hours | K8s metrics (namespace aggregate) |
|
||||||
|
| RAM | GB·hours | K8s metrics (namespace aggregate) |
|
||||||
|
| Data volume | GB ingested | cameleer3-server reports |
|
||||||
|
|
||||||
|
- Aggregated per tenant, per hour, stored in platform DB before Stripe submission
|
||||||
|
- Idempotent aggregation (safe to re-run)
|
||||||
|
- Staleness alert if no data for > 1 hour
|
||||||
|
- Monthly reconciliation: platform records vs Stripe invoices
|
||||||
|
|
||||||
|
### Committed Resources (High/Business)
|
||||||
|
|
||||||
|
- Fixed Stripe subscription per resource bundle
|
||||||
|
- Overage alerts (upsell, not automatic billing)
|
||||||
|
- Annual/multi-year contracts
|
||||||
|
|
||||||
|
### Billing UI
|
||||||
|
|
||||||
|
- Current period usage with live cost estimate
|
||||||
|
- Historical usage charts per dimension
|
||||||
|
- Invoice history
|
||||||
|
- Plan management (upgrade/downgrade)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Management Platform UI
|
||||||
|
|
||||||
|
### Navigation
|
||||||
|
|
||||||
|
| Section | Content |
|
||||||
|
|---------|---------|
|
||||||
|
| **Dashboard** | Platform overview: apps, health, usage summary |
|
||||||
|
| **Apps** | List deployed Camel applications |
|
||||||
|
| **App → Deploy** | Upload JAR, build status, deploy/promote/rollback |
|
||||||
|
| **App → Configuration** | Env vars, JVM options, agent config. Per environment. |
|
||||||
|
| **App → Secrets** | Manage secrets, link vaults. Per environment. |
|
||||||
|
| **App → Status** | Pod health, resource usage, agent connection, events |
|
||||||
|
| **App → Logs** | Live stdout/stderr stream |
|
||||||
|
| **App → Versions** | Image history, promotion log, rollback |
|
||||||
|
| **Observe** | Embedded cameleer3-server UI (topology, traces, lineage, correlation, debugger, replay) |
|
||||||
|
| **Team** | Users, roles, invites |
|
||||||
|
| **Settings** | Tenant config, SSO/OIDC, vault connections |
|
||||||
|
| **Billing** | Usage, invoices, plan management |
|
||||||
|
|
||||||
|
### Design
|
||||||
|
|
||||||
|
- SaaS shell built with `@cameleer/design-system`
|
||||||
|
- cameleer3-server React UI embedded (same design system, visual consistency)
|
||||||
|
- Responsive but desktop-primary (observability tooling is a desktop workflow)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Day-1 vs Future Scope
|
||||||
|
|
||||||
|
### Day 1 (Launch)
|
||||||
|
|
||||||
|
| Epic | Scope |
|
||||||
|
|------|-------|
|
||||||
|
| #1 Management Platform | Modular monolith, React shell |
|
||||||
|
| #2 Identity & Access | Registration, login, teams, JWT, OIDC (high/business) |
|
||||||
|
| #3 Tenant Provisioning | Automated shared tiers, semi-manual dedicated |
|
||||||
|
| #4 Billing & Metering | Stripe usage-based + committed. Full metering pipeline. |
|
||||||
|
| #5 Camel Runtime | JAR upload → immutable image → deploy. Agent auto-injection. |
|
||||||
|
| #6 Observability | Per-tenant server, embedded UI, all MOAT features gated by tier |
|
||||||
|
| #7 License Module | Dual-mode (SaaS API + local file), feature gating |
|
||||||
|
| #8 Networking | Namespace isolation, NetworkPolicies, public internet |
|
||||||
|
| #9 Secrets | Platform-native + HashiCorp Vault. Per-environment scoping. |
|
||||||
|
| #10 Environments | Build-once-deploy-often. Tier-based environment model. |
|
||||||
|
| #11 Security & SOC 2 | Full SOC 2 foundations, zero-trust tenant boundaries, audit logging |
|
||||||
|
| #12 Self-Monitoring | Prometheus/Grafana/Loki/Alertmanager, key alerts, dashboards |
|
||||||
|
| #13 Exchange Replay | MOAT feature, extends debugger infrastructure |
|
||||||
|
|
||||||
|
### Deferred (Future)
|
||||||
|
|
||||||
|
| Feature | Reason |
|
||||||
|
|---------|--------|
|
||||||
|
| Automated dedicated cluster provisioning (Cluster API) | Semi-manual sufficient for early high/business customers |
|
||||||
|
| Container image deployment | JAR upload covers day 1 |
|
||||||
|
| Git-based deployment | Nice-to-have |
|
||||||
|
| VPN / private connectivity | Public internet sufficient at launch |
|
||||||
|
| Auto-scaling (HPA) | Manual scaling sufficient |
|
||||||
|
| Data residency / region selection | Single region at launch |
|
||||||
|
| Cross-tenant correlation federation | Designed, deferred to v2 |
|
||||||
|
| Additional vault providers (AWS, Azure, GCP) | HashiCorp Vault covers day 1 |
|
||||||
|
| Compliance tooling integration (Vanta/Drata) | Manual evidence collection initially |
|
||||||
|
| Vulnerability scanning in registry | Trivy in CI covers basics |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Gitea Issue Map
|
||||||
|
|
||||||
|
| # | Epic | Labels |
|
||||||
|
|---|------|--------|
|
||||||
|
| 1 | SaaS Management Platform | epic, platform |
|
||||||
|
| 2 | Identity & Access Management | epic, auth |
|
||||||
|
| 3 | Tenant Provisioning & Lifecycle | epic, infra |
|
||||||
|
| 4 | Billing & Metering | epic, billing |
|
||||||
|
| 5 | Camel Application Runtime | epic, runtime |
|
||||||
|
| 6 | Observability Integration | epic, observability |
|
||||||
|
| 7 | License & Feature Gating | epic, licensing |
|
||||||
|
| 8 | Networking & Tenant Isolation | epic, networking |
|
||||||
|
| 9 | Secrets Management | epic, secrets |
|
||||||
|
| 10 | Environments & Promotion Pipeline | epic, runtime, day-1 |
|
||||||
|
| 11 | Security & SOC 2 Compliance | epic, security |
|
||||||
|
| 12 | Platform Operations & Self-Monitoring | epic, ops |
|
||||||
|
| 13 | MOAT: Exchange Replay | epic, observability |
|
||||||
|
|
||||||
|
MOAT features (Debugger, Lineage, Correlation) tracked in cameleer/cameleer3 #57–#72.
|
||||||
Reference in New Issue
Block a user