From 2ed6430aea766aa0e290414c4b02e7066784945d Mon Sep 17 00:00:00 2001 From: hsiegeln <37154749+hsiegeln@users.noreply.github.com> Date: Sun, 29 Mar 2026 23:45:15 +0200 Subject: [PATCH] Add SaaS platform PRD Comprehensive product requirements document covering: - Four-tier structure (Low/Mid shared, High/Business dedicated) - Modular monolith architecture (Spring Boot + React) - Camel application runtime with agent auto-injection - Flux CD GitOps provisioning, build-once-deploy-often pipeline - Dual-mode license (SaaS API + air-gapped file) - SOC 2 day-1 compliance, zero-trust tenant isolation - Stripe billing (usage-based + committed resources) - Platform self-monitoring (Prometheus/Grafana/Loki) - Exchange Replay MOAT feature Gitea epics: cameleer/cameleer-saas #1-#13 Co-Authored-By: Claude Opus 4.6 (1M context) --- .../specs/2026-03-29-saas-platform-prd.md | 684 ++++++++++++++++++ 1 file changed, 684 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-29-saas-platform-prd.md diff --git a/docs/superpowers/specs/2026-03-29-saas-platform-prd.md b/docs/superpowers/specs/2026-03-29-saas-platform-prd.md new file mode 100644 index 0000000..c660f75 --- /dev/null +++ b/docs/superpowers/specs/2026-03-29-saas-platform-prd.md @@ -0,0 +1,684 @@ +# Cameleer SaaS Platform — Product Requirements Document + +**Status:** Draft — Awaiting Review +**Date:** 2026-03-29 +**Author:** Hendrik Siegeln + Claude (brainstorming session) +**Gitea Project:** cameleer/cameleer-saas +**Gitea Epics:** #1–#13 + +--- + +## 1. Product Definition + +**Cameleer SaaS** is a Camel application runtime platform with built-in observability. Customers deploy Apache Camel applications and get zero-configuration tracing, topology mapping, payload lineage, distributed correlation, live debugging, and exchange replay — powered by the cameleer3 agent (auto-injected) and cameleer3-server (managed per tenant). + +### Three Pillars + +1. **Runtime** — Deploy and run Camel applications with automatic agent injection +2. **Observability** — Per-tenant cameleer3-server (traces, topology, lineage, correlation, debugger, replay) +3. **Management** — Auth, billing, teams, provisioning, secrets, environments + +### Two Deployment Modes + +- **SaaS (managed)** — Fully managed by the Cameleer platform +- **Self-hosted / Air-gapped** — Customer-operated, license-enforced feature parity with SaaS tiers + +### Relationship to Existing Components + +| Component | Role | Changes Required | +|-----------|------|------------------| +| cameleer3 (agent) | Zero-code Camel instrumentation, auto-injected into customer JARs | MOAT features (lineage, correlation, debugger, replay) | +| cameleer3-server | Per-tenant observability backend | Managed mode (trust SaaS JWT), license module, MOAT features | +| cameleer-saas (this repo) | SaaS management platform — control plane | New: everything in this document | +| design-system | Shared React component library | Used by both SaaS shell and server UI | + +--- + +## 2. Tier Structure + +### Tier Matrix + +| Dimension | Low | Mid | High | Business | +|-----------|-----|-----|------|----------| +| **Infrastructure** | Shared cluster, shared PG/OS | Shared cluster, shared PG/OS | Dedicated cluster(s) | Dedicated cluster(s) | +| **Pricing** | Base fee + usage (data vol, CPU, RAM) | Base fee + usage (data vol, CPU, RAM) | Committed resources | Committed resources | +| **Environments** | 1 (prod) | 2 (dev, prod) | Unlimited | Unlimited | +| **Agents** | Limited | Higher limit | Unlimited | Unlimited | +| **Data Retention** | 7 days | 30 days | 90 days | Custom | +| **Topology Graph** | Yes | Yes | Yes | Yes | +| **Payload Lineage** | Limited (route-scope only, max 10 captures/min) | Full | Full | Full | +| **Cross-Service Correlation** | No | Yes | Yes | Yes | +| **Live Route Debugger** | No | No | Yes | Yes | +| **Exchange Replay** | No | No | Yes | Yes | +| **SSO / OIDC** | No | No | Yes | Yes | +| **Custom Roles** | No | No | Yes | Yes | +| **Team Management** | Basic | Basic | Full | Full | +| **Secrets** | Platform-native | Platform-native + 1 vault | Unlimited vaults | Unlimited vaults | +| **Support** | Docs | Email | Priority | Dedicated CSM | +| **SLA** | Best effort | 99.5% | 99.9% | 99.95%+ custom | +| **VPN (future)** | No | No | Yes | Yes | + +### Pricing Models + +**Usage-based (Low/Mid):** +- Optional small monthly base fee +- Metered dimensions: data volume (GB ingested), CPU (core·hours), RAM (GB·hours) +- Stripe metered subscriptions with periodic usage reporting + +**Committed resources (High/Business):** +- Fixed pricing based on reserved cluster capacity (CPU cores, RAM, storage, node count) +- Annual or multi-year contracts +- Overage alerts (upsell, not automatic billing) + +--- + +## 3. System Architecture + +### Approach: Modular Monolith Control Plane + +Single Spring Boot application with well-bounded internal modules. K8s ingress handles tenant routing. Flux CD handles infrastructure reconciliation. + +``` +[Browser] → [Ingress (Traefik/Envoy)] → [SaaS Platform (modular Spring Boot)] + ↓ (tenant routes) ↓ (provisioning) + [Tenant cameleer3-server] [Flux CD → K8s] +``` + +### Component Map + +``` + ┌─────────────────────────────────────────┐ + │ Ingress (Traefik/Envoy) │ + │ TLS termination, tenant routing │ + └──────┬──────────────┬──────────────┬────┘ + │ │ │ + ┌──────────▼──────────┐ │ ┌─────────▼─────────┐ + │ SaaS Management │ │ │ Grafana/Prometheus│ + │ Platform │ │ │ (self-monitoring) │ + │ (Spring Boot) │ │ └───────────────────┘ + │ │ │ + │ Modules: │ │ + │ ├─ Auth │ │ + │ ├─ Billing │ │ + │ ├─ Provisioning │ │ + │ ├─ Runtime │ │ + │ ├─ License │ │ + │ ├─ Secrets │ │ + │ └─ Audit │ │ + └──┬───┬──────┬───────┘ │ + │ │ │ │ + ┌────────┘ │ └──────┐ │ + ▼ ▼ ▼ │ + ┌──────────────┐ ┌────────┐ ┌──────────▼───────────────┐ + │ Platform DB │ │ Stripe │ │ Shared K8s Cluster │ + │ (PostgreSQL) │ │ API │ │ │ + │ - tenants │ └────────┘ │ ┌─────────────────────┐ │ + │ - users │ │ │ tenant-a namespace │ │ + │ - teams │ ┌─────┐ │ │ ├─ cameleer3-server │ │ + │ - audit log │ │Flux │ │ │ ├─ camel-app-1 │ │ + │ - licenses │ │ CD │ │ │ ├─ camel-app-2 │ │ + └──────────────┘ └──┬──┘ │ │ └─ NetworkPolicies │ │ + │ │ └─────────────────────┘ │ + ┌───────▼──┐ │ ┌─────────────────────┐ │ + │ GitOps │ │ │ tenant-b namespace │ │ + │ Repo │ │ │ └─ ... │ │ + │(HelmRel) │ │ └─────────────────────┘ │ + └──────────┘ │ │ + │ Shared: │ + │ ├─ PostgreSQL (tenant │ + │ │ schemas) │ + │ ├─ OpenSearch (tenant │ + │ │ indices) │ + │ └─ Container Registry │ + └──────────────────────────┘ +``` + +### Dedicated Tier (High/Business) + +Same management platform routes to dedicated cluster(s) per customer. Dedicated PostgreSQL, OpenSearch, and container registry within the customer's cluster. Provisioned semi-manually at launch (Flux bootstrap), full Cluster API automation deferred. + +### Tech Stack + +| Component | Technology | +|-----------|------------| +| Management Platform backend | Spring Boot 3, Java 21 | +| Management Platform frontend | React, @cameleer/design-system | +| Platform database | PostgreSQL | +| Tenant observability | cameleer3-server (Spring Boot), PostgreSQL, OpenSearch | +| GitOps | Flux CD | +| K8s distribution | Talos (production), k3s (dev) | +| Ingress | Traefik or Envoy | +| Billing | Stripe (Subscriptions + Usage Records API) | +| Auth | Spring Security OAuth2, Ed25519 JWT | +| Secrets sync | K8s External Secrets Operator | +| Container registry | Platform-managed (Harbor or Gitea Container Registry) | +| Monitoring | Prometheus, Grafana, Loki, Alertmanager | +| Image signing | cosign/sigstore | +| Image scanning | Trivy | + +### Key Architectural Decisions + +1. **Modular monolith** — Single Spring Boot app with clean module boundaries. Extractable later if needed. +2. **K8s ingress handles routing** — Tenant routing via path or subdomain. No custom API gateway. +3. **Flux CD for reconciliation** — HelmRelease CRs per tenant. Drift detection, self-healing. K8s-distribution-agnostic. +4. **Platform DB separate from tenant data** — Management platform has its own PostgreSQL. Tenant observability data in separate shared (or dedicated) instances. +5. **Immutable artifact pipeline** — JAR upload → container image → promote through environments. Same binary everywhere. +6. **Dual-mode auth** — SaaS mode: platform is the IdP. Air-gapped mode: server uses standalone auth with local license file. +7. **SOC 2 baked in** — Not bolted on. Audit logging, encryption, image signing, SBOM from day 1. +8. **Self-monitoring** — Prometheus + Grafana stack, completely separate from tenant observability. + +--- + +## 4. Data Architecture + +### Platform Database (Management Platform) + +Stores all SaaS control plane data — completely separate from tenant observability data. + +| Table/Domain | Purpose | +|---|---| +| `tenants` | Tenant record: ID, name, tier, status, Stripe customer ID, created_at | +| `users` | Platform users: email, password hash, MFA, status | +| `tenant_members` | User-to-tenant mapping with role | +| `teams` | Team groupings within a tenant | +| `roles` / `permissions` | RBAC definitions (predefined + custom for high/business) | +| `licenses` | License records: tenant, tier, feature flags, limits, expiry, signing key | +| `audit_log` | Immutable append-only log: actor, action, resource, timestamp, IP, tenant | +| `applications` | Deployed Camel app metadata: name, tenant, version, image ref, status | +| `secrets_metadata` | Secret references (actual values in K8s Secrets or external vault) | +| `vault_configs` | External vault connection configs per tenant | +| `provisioning_events` | Tenant provisioning pipeline state and history | +| `billing_usage` | Aggregated usage snapshots before Stripe reporting | + +### Tenant Data (Shared PostgreSQL) + +Each tenant's cameleer3-server uses its own PostgreSQL schema on the shared instance (dedicated instance for high/business). This is the existing cameleer3-server data model — unchanged: + +- Route executions, processor traces, metrics +- Route graph topology +- Agent registrations, config history +- Lineage captures, correlation traces, debug sessions + +### Tenant Data (Shared OpenSearch) + +- `{tenant_id}-executions-*` — time-series execution data +- `{tenant_id}-traces-*` — processor-level traces +- Full index-level isolation with index templates per tenant + +### Self-Monitoring Data + +Completely separate: Prometheus TSDB for metrics, Loki for logs. + +--- + +## 5. Identity & Access Management + +### Architecture + +The SaaS management platform is the single identity plane. It owns authentication and authorization. Per-tenant cameleer3-server instances trust SaaS-issued tokens. + +- Spring Security OAuth2 for OIDC federation with customer IdPs +- Ed25519 JWT signing (consistent with existing cameleer3-server pattern) +- Tokens carry: tenant ID, user ID, roles, feature entitlements +- cameleer3-server validates SaaS-issued JWTs in managed mode +- Standalone mode retains its own auth for air-gapped deployments + +### RBAC Model + +| Role | Capabilities | +|------|-------------| +| Owner | Full tenant admin, billing, team management, delete tenant | +| Admin | Manage apps, secrets, team members, environments. No billing. | +| Developer | Deploy apps, view traces, use debugger/replay. No team management. | +| Viewer | Read-only access to dashboards, traces, topology | + +High/Business tiers: custom roles with granular permissions (e.g., "can replay in dev but not prod"). + +### Team Management + +- Invite by email +- Role assignment per user +- Basic (low/mid): single team, predefined roles +- Full (high/business): multiple teams, custom roles, team-scoped permissions + +--- + +## 6. Tenant Provisioning + +### Shared Tier Flow (Low/Mid) + +``` +Customer signs up + payment + → Create tenant record + Stripe customer/subscription + → Generate signed license token (Ed25519) + → Create Flux HelmRelease CR + → Flux reconciles: namespace, ResourceQuota, NetworkPolicies, cameleer3-server + → Provision PostgreSQL schema + per-tenant credentials + → Provision OpenSearch index template + per-tenant credentials + → Readiness check: server healthy, DB migrated, auth working + → Generate bootstrap tokens, present onboarding instructions + → Tenant status → ACTIVE +``` + +**Target: < 5 minutes from payment to active environment.** + +### Dedicated Tier Flow (High/Business) + +Semi-manual at launch: +1. Customer signs committed resource agreement +2. Operator provisions dedicated cluster (Talos) +3. Flux bootstrap deploys full stack +4. Management platform configured to route to dedicated cluster +5. From this point, automated (same lifecycle management as shared) + +Full Cluster API automation deferred to future release. + +### Lifecycle Operations + +| Operation | Mechanism | +|-----------|-----------| +| Suspension (non-payment) | Scale tenant workloads to 0, license set to suspended | +| Reactivation | Scale back up, license reactivated | +| Deletion | Remove namespace, drop PG schema, delete OS indices, scrub audit log references. GDPR compliant. | +| Tier upgrade (shared → dedicated) | Provision dedicated cluster, migrate data, update routing. Downtime window coordinated. | +| Tier downgrade | Reverse of upgrade. Data retention limits applied. | + +### Failure Handling + +- Each provisioning step is idempotent and retryable +- State machine in platform DB tracks progress per step +- Failed provisioning → alert ops + notify customer with ETA +- Partial provisioning cleanup on permanent failure + +--- + +## 7. Camel Application Runtime + +### JAR Upload → Immutable Image + +1. **Validation** — File type check, size limit per tier, SHA-256 checksum, Trivy security scan, secret detection (reject JARs with embedded credentials) +2. **Image Build** — Templated Dockerfile: distroless JRE base + customer JAR + cameleer3-agent.jar + `-javaagent` flag + agent pre-configured for tenant server. Image tagged: `registry/{tenant}/{app}:v{N}-{sha256short}`. Signed with cosign. SBOM attached. +3. **Registry Push** — Per-tenant repository in platform container registry +4. **Deploy** — K8s Deployment in tenant namespace with resource limits, secrets mounted, config injected, NetworkPolicy applied, liveness/readiness probes + +### Environment Promotion + +``` +dev → staging → prod + (same image tag, different config + secrets per environment) +``` + +- Promotion = deploy existing image tag to target environment (no rebuild) +- Rollback = redeploy previous image tag +- Every promotion audit logged (who, what, from, to) + +### Environment Model + +| Tier | Default Environments | Custom Environments | +|------|---------------------|-------------------| +| Low | prod | No | +| Mid | dev, prod | No | +| High | dev, staging, prod | Unlimited | +| Business | dev, staging, prod | Unlimited | + +### Application Deployment Page + +Central UI for managing each deployed application: + +- **Deploy** — Upload JAR, view build status, deploy to environment, promote, rollback +- **Configuration** — Environment variables, JVM options, agent config overrides, application properties. Per-environment. Changes trigger rolling restart. +- **Secrets** — Create/edit platform-managed secrets. Link external vault secrets. Scoped per environment. Masked in UI, reveal with audit log. +- **Status** — Pod health, resource usage, agent connection status, recent events +- **Logs** — Live stdout/stderr stream +- **Versions** — Image history, promotion history, rollback targets + +### Application Lifecycle + +| Action | Mechanism | +|--------|-----------| +| Deploy | Upload JAR → build image → deploy to environment | +| Promote | Redeploy same image tag to next environment | +| Rollback | Redeploy previous image tag | +| Scale | Update replica count | +| Stop | Scale to 0 (preserves config) | +| Delete | Remove Deployment + clean registry images per retention | +| Logs | Stream via K8s log API | + +--- + +## 8. Observability Integration + +### Architecture + +Each tenant gets a dedicated cameleer3-server instance: +- Shared tiers: deployed in tenant's namespace +- Dedicated tiers: deployed in tenant's cluster + +The SaaS API gateway routes `/t/{tenant}/api/*` to the correct server instance. The server's React UI is embedded in the SaaS shell (nav, tenant switcher, billing pages provided by the shell; product UI rendered inside). + +### Agent Connection + +- Agent bootstrap tokens generated by the SaaS platform +- Agents connect directly to their tenant's cameleer3-server instance +- Agent auto-injected into customer Camel apps deployed on the platform +- External agents (customer-hosted Camel apps) can also connect using bootstrap tokens + +### MOAT Features (gated by license) + +| Feature | Description | Tier Availability | +|---------|-------------|-------------------| +| **Topology Graph** | Route dependency visualization from existing execution data | All tiers | +| **Payload Flow Lineage** | Per-processor before/after capture + format-aware diff | Limited on Low (route-scope only, max 10 captures/min), Full on Mid+ | +| **Cross-Service Correlation** | Distributed trace assembly + service dependency graph | Mid+ | +| **Live Route Debugger** | Browser-based route stepping with breakpoints | High+ | +| **Exchange Replay** | Re-execute recorded exchange with modified payload, fully audited | High+ | + +### Server Configuration + +- SaaS platform pushes tier-specific config: feature flags, retention limits, resource limits +- Server runs in "managed mode": trusts SaaS-issued JWTs, reports metrics back to platform +- Air-gapped mode: standalone with local license file + +--- + +## 9. Secrets Management + +### Day 1 Requirements + +- **Platform-native secret store** — Encrypted at rest in K8s Secrets (sealed-secrets or SOPS) +- **External vault integration** — HashiCorp Vault at launch. AWS Secrets Manager, Azure Key Vault, GCP Secret Manager deferred to future release. +- **Injection** — Secrets injected into Camel app containers as environment variables or mounted files +- **Rotation** — Update secret → rolling restart of affected apps +- **RBAC** — Only authorized team members can create/view/rotate secrets +- **Per-environment scoping** — Dev secrets ≠ prod secrets +- **K8s External Secrets Operator** — Syncs external vault secrets into K8s Secrets + +### Tenant Isolation + +- Secrets strictly scoped to tenant + environment +- No cross-tenant secret access possible +- Envelope encryption with per-tenant keys on shared storage + +### Audit + +- Every secret access logged (create, read, update, delete, inject) +- Audit trail queryable by tenant admins + +--- + +## 10. License & Feature Gating + +### License Token + +Ed25519-signed JWT containing: +- Tenant ID, tier, expiry +- Feature flags (topology, lineage, correlation, debugger, replay) +- Resource limits (agents, retention, environments, vaults, debug sessions) +- SSO/OIDC and custom roles entitlements + +### Dual Validation + +| Mode | Mechanism | +|------|-----------| +| **SaaS** | Server polls platform API `GET /api/license/{tenant}`, caches 5 min, 24h grace on API failure | +| **Air-gapped** | Server validates local file `/etc/cameleer/license.jwt`, Ed25519 signature verification | + +Both modes produce the same `LicenseContext` singleton used throughout the server. + +### Enforcement + +- Feature endpoints return 403 with `not_entitled` reason and upgrade URL +- Graceful degradation: features disabled, not errors +- License expiry: 7-day grace period (read-only mode), then hard cutoff + +### Lifecycle + +- Generated on tenant signup, regenerated on tier change +- Air-gapped: downloadable from management platform +- Non-payment: license suspended → grace period → expired + +--- + +## 11. Networking & Tenant Isolation + +### Day 1: Namespace Isolation (Shared Tiers) + +K8s NetworkPolicies per tenant namespace: +- **Default deny** all ingress/egress between tenant namespaces +- **Allow:** tenant namespace → shared PostgreSQL/OpenSearch (authenticated per-tenant credentials) +- **Allow:** tenant namespace → public internet (Camel app external connectivity) +- **Allow:** SaaS platform namespace → all tenant namespaces (management access) +- **Allow:** tenant Camel apps → tenant cameleer3-server (intra-namespace) + +### Zero-Trust Tenant Boundary + +- Per-tenant database credentials (not shared superuser with row filtering) +- Per-tenant OpenSearch roles with index-level ACLs +- Connection pooling per tenant (PgBouncer per namespace) +- A compromised tenant server physically cannot query another tenant's data + +### Future: VPN / Private Connectivity + +- WireGuard or IPsec tunnels to customer infrastructure +- Private DNS resolution for customer internal hostnames +- Available on High/Business tiers only + +--- + +## 12. Security & SOC 2 Compliance + +### Encryption + +| Layer | Mechanism | +|-------|-----------| +| In transit (external) | TLS 1.3 at ingress | +| In transit (internal) | mTLS between services | +| In transit (agent ↔ server) | TLS + Ed25519-signed config payloads | +| At rest (databases) | Volume encryption (LUKS or cloud-native) | +| At rest (secrets) | Envelope encryption with per-tenant keys | +| At rest (registry) | Encrypted storage backend | + +### Audit Trail + +Every state-changing action produces an immutable audit record: +- Actor, tenant, action, resource, environment, source IP, result, metadata +- Append-only table with no UPDATE/DELETE grants +- Minimum 1 year retention +- Shipped to separate write-only sink (survives platform DB compromise) +- Covers: auth events, provisioning, deployments, config changes, secret access, billing, replay executions, debug sessions, admin actions + +### Container Hardening + +- Distroless base images (no shell in production) +- Read-only filesystem +- K8s Pod Security Standards: restricted profile (no root, no privilege escalation, no host access) +- Resource limits enforced — compromised tenant can't fork-bomb the node + +### Supply Chain Security + +- Container images signed with cosign/sigstore +- SBOM generated per build +- Dependency pinning (no floating versions) +- Trivy scanning in CI — block on critical CVEs +- Customer JAR uploads scanned + +### Breach Detection + +- Anomaly alerting: unusual API patterns, auth failures, cross-namespace DNS queries +- Runtime security scanning (Falco or similar) +- Audit log anomaly detection + +### Payload Protection + +- Application-level encryption of customer exchange payloads with per-tenant keys before writing to PG/OS +- Tenant key rotation without downtime +- Payload redaction rules configurable per tenant (agent already supports this) + +### Compliance + +- SOC 2 Trust Service Criteria: Security (CC6), Availability (A1), Processing Integrity (PI1), Confidentiality (C1), Privacy (P1-P8) +- Evidence collection: git history (change management), audit log (access), Prometheus (availability) +- Evaluate Vanta or Drata for continuous compliance monitoring + +--- + +## 13. Platform Operations & Self-Monitoring + +### Monitoring Stack + +| Tool | Purpose | +|------|---------| +| Prometheus | Metrics collection (platform + tenant infra + K8s) | +| Grafana | Dashboards | +| Loki | Log aggregation | +| Alertmanager | Alert routing → PagerDuty/OpsGenie/Slack | +| Uptime Kuma or Checkly | External synthetic monitoring | + +Completely separate from tenant observability data. + +### Key Day-1 Alerts + +- Control plane down/degraded +- Tenant provisioning failure +- Database connection pool exhaustion +- OpenSearch cluster red/yellow +- Flux reconciliation failure +- TLS certificate expiry < 14 days +- Metering pipeline stale > 1 hour +- Disk usage > 80% on any PV +- Tenant cameleer3-server unhealthy > 5 minutes +- OOMKill on any tenant workload + +### Dashboards + +- Platform overview: tenant count, active agents, provisioning queue, error rates +- Per-tenant health: server status, app status, resource usage +- Billing: MRR, usage trends, metering pipeline health +- Infrastructure: cluster capacity, node utilization, storage growth +- Security: auth failures, audit log anomalies, certificate status + +### SLA Reporting + +- Automated uptime calculation per tenant +- SLA breach detection and alerting +- Monthly availability reports for high/business tier customers + +--- + +## 14. Billing & Metering + +### Metering Pipeline (Low/Mid Tiers) + +``` +K8s Metrics → Metrics Collector → Usage Aggregator (hourly) → Stripe Usage Records API +``` + +| Dimension | Unit | Source | +|-----------|------|--------| +| CPU | core·hours | K8s metrics (namespace aggregate) | +| RAM | GB·hours | K8s metrics (namespace aggregate) | +| Data volume | GB ingested | cameleer3-server reports | + +- Aggregated per tenant, per hour, stored in platform DB before Stripe submission +- Idempotent aggregation (safe to re-run) +- Staleness alert if no data for > 1 hour +- Monthly reconciliation: platform records vs Stripe invoices + +### Committed Resources (High/Business) + +- Fixed Stripe subscription per resource bundle +- Overage alerts (upsell, not automatic billing) +- Annual/multi-year contracts + +### Billing UI + +- Current period usage with live cost estimate +- Historical usage charts per dimension +- Invoice history +- Plan management (upgrade/downgrade) + +--- + +## 15. Management Platform UI + +### Navigation + +| Section | Content | +|---------|---------| +| **Dashboard** | Platform overview: apps, health, usage summary | +| **Apps** | List deployed Camel applications | +| **App → Deploy** | Upload JAR, build status, deploy/promote/rollback | +| **App → Configuration** | Env vars, JVM options, agent config. Per environment. | +| **App → Secrets** | Manage secrets, link vaults. Per environment. | +| **App → Status** | Pod health, resource usage, agent connection, events | +| **App → Logs** | Live stdout/stderr stream | +| **App → Versions** | Image history, promotion log, rollback | +| **Observe** | Embedded cameleer3-server UI (topology, traces, lineage, correlation, debugger, replay) | +| **Team** | Users, roles, invites | +| **Settings** | Tenant config, SSO/OIDC, vault connections | +| **Billing** | Usage, invoices, plan management | + +### Design + +- SaaS shell built with `@cameleer/design-system` +- cameleer3-server React UI embedded (same design system, visual consistency) +- Responsive but desktop-primary (observability tooling is a desktop workflow) + +--- + +## 16. Day-1 vs Future Scope + +### Day 1 (Launch) + +| Epic | Scope | +|------|-------| +| #1 Management Platform | Modular monolith, React shell | +| #2 Identity & Access | Registration, login, teams, JWT, OIDC (high/business) | +| #3 Tenant Provisioning | Automated shared tiers, semi-manual dedicated | +| #4 Billing & Metering | Stripe usage-based + committed. Full metering pipeline. | +| #5 Camel Runtime | JAR upload → immutable image → deploy. Agent auto-injection. | +| #6 Observability | Per-tenant server, embedded UI, all MOAT features gated by tier | +| #7 License Module | Dual-mode (SaaS API + local file), feature gating | +| #8 Networking | Namespace isolation, NetworkPolicies, public internet | +| #9 Secrets | Platform-native + HashiCorp Vault. Per-environment scoping. | +| #10 Environments | Build-once-deploy-often. Tier-based environment model. | +| #11 Security & SOC 2 | Full SOC 2 foundations, zero-trust tenant boundaries, audit logging | +| #12 Self-Monitoring | Prometheus/Grafana/Loki/Alertmanager, key alerts, dashboards | +| #13 Exchange Replay | MOAT feature, extends debugger infrastructure | + +### Deferred (Future) + +| Feature | Reason | +|---------|--------| +| Automated dedicated cluster provisioning (Cluster API) | Semi-manual sufficient for early high/business customers | +| Container image deployment | JAR upload covers day 1 | +| Git-based deployment | Nice-to-have | +| VPN / private connectivity | Public internet sufficient at launch | +| Auto-scaling (HPA) | Manual scaling sufficient | +| Data residency / region selection | Single region at launch | +| Cross-tenant correlation federation | Designed, deferred to v2 | +| Additional vault providers (AWS, Azure, GCP) | HashiCorp Vault covers day 1 | +| Compliance tooling integration (Vanta/Drata) | Manual evidence collection initially | +| Vulnerability scanning in registry | Trivy in CI covers basics | + +--- + +## 17. Gitea Issue Map + +| # | Epic | Labels | +|---|------|--------| +| 1 | SaaS Management Platform | epic, platform | +| 2 | Identity & Access Management | epic, auth | +| 3 | Tenant Provisioning & Lifecycle | epic, infra | +| 4 | Billing & Metering | epic, billing | +| 5 | Camel Application Runtime | epic, runtime | +| 6 | Observability Integration | epic, observability | +| 7 | License & Feature Gating | epic, licensing | +| 8 | Networking & Tenant Isolation | epic, networking | +| 9 | Secrets Management | epic, secrets | +| 10 | Environments & Promotion Pipeline | epic, runtime, day-1 | +| 11 | Security & SOC 2 Compliance | epic, security | +| 12 | Platform Operations & Self-Monitoring | epic, ops | +| 13 | MOAT: Exchange Replay | epic, observability | + +MOAT features (Debugger, Lineage, Correlation) tracked in cameleer/cameleer3 #57–#72.