# License Enforcement — Design **Date:** 2026-04-25 **Status:** Approved (brainstorm); pending writing-plans **Related:** cameleer-saas#7 (Epic: License & Feature Gating), cameleer-saas#42 (vendor minting), cameleer-saas#50 (customer license view) ## Problem `cameleer-server` ships a license skeleton (`LicenseValidator`, `LicenseGate`, admin endpoint) but nothing enforces anything. Open mode (no license configured) currently grants *all* features and *no* limits — the opposite of what we want for a self-hosted distribution that needs to gate scale behind a paid license. We want: 1. A self-hosted server with **no license** to operate within a small, hard-coded "default tier" that is enough to evaluate the product but not enough to run it in production. 2. Licenses to express **arbitrary per-customer limits** (no fixed tiers) on a vendor-defined set of resources: entity counts, compute footprint, retention. 3. A **standalone minter** owned by the vendor that signs licenses with an Ed25519 private key the customer never sees. 4. Licenses to be **persisted** on the server, **installable** via env var, file, or admin POST, and **renewable** by replacement. 5. **Revocation** handled out of band (vendor suspends the SaaS tenant, or issues short-`exp` licenses) — no online revocation callback in v1. ## Non-goals - Feature flags. The current `Feature` enum (topology/lineage/correlation/debugger/replay) is dead scaffolding and gets removed; this design is about quantitative limits only. - Ingestion-rate limits (executions/minute, logs/minute). Defer to a follow-up. - Online revocation. Vendor uses shorter `exp` + reissue; SaaS suspension is independent. - Auto-deletion of resources when caps are lowered. Existing rows stay; only new creates reject. - Minter keypair generation tooling. Vendor uses standard `openssl genpkey -algorithm ed25519` out of band. --- ## 1. Architecture ### 1.1 Module layout ``` cameleer-server-core/ (existing — pure domain, no Spring) └── license/ ├── LicenseInfo (record — see §2) ├── LicenseLimits (typed wrapper over the limits map) ├── LicenseValidator (existing, payload schema updated) ├── LicenseGate (existing, gutted: no Feature; getLimits() only) ├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED) └── DefaultTierLimits (constant — §5 numbers) cameleer-server-app/ (existing — Spring, web, persistence) ├── license/ │ ├── LicenseRepository (NEW — PostgreSQL persistence) │ ├── LicenseService (NEW — load/save/replace; emits state events) │ ├── LicenseEnforcer (NEW — assertWithinCap entry point) │ ├── LicenseUsageReader (NEW — counts current usage for /usage endpoint) │ ├── LicenseCapExceededException (NEW — mapped to 403 by ControllerAdvice) │ └── LicenseMetrics (NEW — Prometheus gauges) ├── controller/ │ ├── LicenseAdminController (existing — extended; persists, audited) │ └── LicenseUsageController (NEW — GET /admin/license/usage) └── config/ └── LicenseBeanConfig (existing — extended for DB load order) cameleer-license-minter/ (NEW — top-level Maven module) ├── pom.xml (depends on cameleer-server-core) ├── LicenseMinter (signing primitive; takes private key + LicenseInfo) └── cli/LicenseMinterCli (CLI main class) ``` ### 1.2 Why a separate `cameleer-license-minter` module Not shipped in the runtime JAR. Vendor distributes it independently or builds it from source on a trusted machine. Customers never receive it. This is module hygiene + smaller runtime attack surface, not a cryptographic protection — license forgery requires the vendor's private key, and the public key in the server is enough to verify forged tokens regardless of where the minter code lives. ### 1.3 Dependency graph ``` cameleer-license-minter ──▶ cameleer-server-core (LicenseInfo schema only) cameleer-server-app ──▶ cameleer-server-core (validator, gate, FSM, defaults) cameleer-saas ──▶ cameleer-license-minter (for SaaS-mode minting) cameleer-saas ──▶ cameleer-server-core (transitive) ``` `cameleer-server-app` has **no** dependency on `cameleer-license-minter`. --- ## 2. License envelope Wire format unchanged: `base64(payload).base64(ed25519_signature)`. Payload schema: ```json { "licenseId": "550e8400-e29b-41d4-a716-446655440000", "tenantId": "acme-corp", "label": "ACME prod 2026", "iat": 1745539200, "exp": 1777075200, "gracePeriodDays": 30, "limits": { "max_environments": 5, "max_apps": 50, "max_agents": 100, "max_users": 25, "max_outbound_connections": 10, "max_alert_rules": 200, "max_total_cpu_millis": 32000, "max_total_memory_mb": 65536, "max_total_replicas": 100, "max_execution_retention_days": 90, "max_log_retention_days": 30, "max_metric_retention_days": 365, "max_jar_retention_count": 10 } } ``` ### 2.1 Field rules | Field | Required | Notes | |---|---|---| | `licenseId` | yes | UUID. Used in audit + future revocation. | | `tenantId` | optional | If present and `CAMELEER_SERVER_TENANT_ID` differs, treat as no license + log error. Air-gapped customers may omit. | | `label` | optional | Free-form human description. Surfaced in UI. | | `iat` | yes | Unix seconds. | | `exp` | yes | Unix seconds. | | `gracePeriodDays` | optional, default `0` | Days `exp` may be in the past while limits still apply. | | `limits.*` | each optional | Missing key inherits from `DefaultTierLimits`. A license can lift any subset. | ### 2.2 Removed from the current envelope - `tier` (string) — was a non-functional label. Folded into `label`. - `features` (array) — out of scope. `Feature` enum deleted. --- ## 3. License state machine ``` exp + grace passes ┌─────────┐ install valid ┌────────┐ exp ┌────────┐ ────────► ┌─────────┐ │ ABSENT │ ───────────────▶│ ACTIVE │──────▶│ GRACE │ │ EXPIRED │ └─────────┘ └────────┘ └────────┘ └─────────┘ ▲ │ │ ▲ │ │ install invalid │ replace │ │ replace valid │ replace │ (sig/tenant/parse) ▼ │ │ ▼ └────────────────────────────┴──────────────┴─┴───────────────────┘ all transitions persist + audit-log ``` ### 3.1 State semantics | State | Effective limits | Trigger | |---|---|---| | `ABSENT` | `DefaultTierLimits` | No DB row, or signature/tenant/parse failure. | | `ACTIVE` | `merge(default, license.limits)` | License loaded, `now < exp`. | | `GRACE` | Same as `ACTIVE` | `exp ≤ now < exp + gracePeriodDays`. UI banner. | | `EXPIRED` | `DefaultTierLimits` | `now ≥ exp + gracePeriodDays`. Distinct UI label vs ABSENT. | State is recomputed on every limit check (clock comparison only) — no scheduler needed for transitions. The only "background" behaviour is the Prometheus gauge refresh. ### 3.2 Default tier (the "no license" caps) | Limit | Default | |---|---| | `max_environments` | 1 | | `max_apps` | 3 | | `max_agents` | 5 | | `max_users` | 3 | | `max_outbound_connections` | 1 | | `max_alert_rules` | 2 | | `max_total_cpu_millis` | 2000 (2 cores) | | `max_total_memory_mb` | 2048 (2 GB) | | `max_total_replicas` | 5 | | `max_execution_retention_days` | 1 | | `max_log_retention_days` | 1 | | `max_metric_retention_days` | 1 | | `max_jar_retention_count` | 3 | Encoded as `public static final Map DEFAULTS` in `DefaultTierLimits`. Keys match the license payload exactly. --- ## 4. Enforcement map Every limit check goes through one method on `LicenseEnforcer`: ```java void assertWithinCap(String limitKey, long currentUsage, long requestedDelta); ``` Throws `LicenseCapExceededException(limitKey, current, cap)` when `currentUsage + requestedDelta > cap`. A `@ControllerAdvice` maps it to `403` with body `{"error":"license cap reached","limit":"max_apps","current":3,"cap":3}`. | Limit | Call site | Failure response | |---|---|---| | `max_environments` | `EnvironmentService.create` (start) | 403 | | `max_apps` | `AppService.createApp` | 403 | | `max_agents` | `AgentRegistryService.register` | 403 — agent treated as unregistered (no SSE, no commands) | | `max_users` | `UserAdminController.createUser` and `OidcAuthController.callback` (auto-signup) | 403 / OIDC login failure | | `max_outbound_connections` | `OutboundConnectionServiceImpl.create` | 403 | | `max_alert_rules` | `AlertRuleController.create` | 403 | | `max_total_cpu_millis` | `DeploymentExecutor.PRE_FLIGHT` (sum across non-stopped deploys + new) | Deploy fails fast at PRE_FLIGHT, status FAILED, audit row | | `max_total_memory_mb` | same | same | | `max_total_replicas` | same | same | | `max_execution_retention_days` | `EnvironmentService.update` (per-env field, see §4.1) + `ClickHouseSchemaInitializer.applyRetention()` at boot | 422 on update; boot pins effective TTL = `min(licenseCap, configured)` | | `max_log_retention_days` | same | same | | `max_metric_retention_days` | same | same | | `max_jar_retention_count` | `EnvironmentAdminController.PUT /jar-retention` | 422 | ### 4.1 Per-environment retention fields Three new columns on `environments` (Flyway V2): ```sql ALTER TABLE environments ADD COLUMN execution_retention_days INTEGER NOT NULL DEFAULT 1, ADD COLUMN log_retention_days INTEGER NOT NULL DEFAULT 1, ADD COLUMN metric_retention_days INTEGER NOT NULL DEFAULT 1; ``` These are the configured per-env values. The effective ClickHouse TTL is `min(licenseCap, configured)`, applied at startup by `ClickHouseSchemaInitializer`. Admin UI surfaces the configured values; `EnvironmentService.update` rejects values above the license cap with 422. ### 4.2 Boot-time invariant If a license is added that *lowers* a cap below current usage (10 apps, license now allows 5), the server logs one WARN per limit at boot. **No deletion**. New creates reject; existing resources keep working. --- ## 5. Usage endpoint `GET /api/v1/admin/license/usage` (ADMIN only): ```json { "state": "ACTIVE", "expiresAt": "2027-04-25T00:00:00Z", "daysRemaining": 365, "gracePeriodDays": 30, "tenantId": "acme-corp", "label": "ACME prod 2026", "limits": [ {"key": "max_apps", "current": 7, "cap": 50, "source": "license"}, {"key": "max_agents", "current": 12, "cap": 100, "source": "license"}, {"key": "max_total_cpu_millis", "current": 8500, "cap": 32000, "source": "license"}, {"key": "max_outbound_connections", "current": 0, "cap": 1, "source": "default"} ] } ``` `source` is `"default"` when the cap comes from `DefaultTierLimits` (i.e. the license omits this key, or there is no license), and `"license"` when the cap is explicit in the license. Drives the SaaS UI's "free tier" badge. `LicenseUsageReader` issues one cheap aggregate per limit (`SELECT COUNT(*)` per entity table; a single grouped `SELECT SUM(replicas * cpuMillis), SUM(replicas * memoryMb), SUM(replicas)` over non-stopped deployments). `GET /api/v1/admin/license` (existing) is extended to return `{state, envelope}` with the raw token omitted from the response. --- ## 6. Lifecycle, persistence, install paths ### 6.1 Storage Flyway V2 migration: ```sql CREATE TABLE license ( tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant) token TEXT NOT NULL, -- full signed token license_id UUID NOT NULL, installed_at TIMESTAMPTZ NOT NULL, installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot expires_at TIMESTAMPTZ NOT NULL ); ``` ### 6.2 Boot order `LicenseBeanConfig`: 1. If `CAMELEER_SERVER_LICENSE_TOKEN` env var is set → validate → write to DB (overwrite) → load. 2. Else if `CAMELEER_SERVER_LICENSE_FILE` is set → read file → validate → write to DB → load. 3. Else read `license` row from DB → validate → load. 4. Else `ABSENT`. Env-var / file act as **idempotent overrides** — they always win and replace the DB row, so the operator's last action survives reboots. ### 6.3 Runtime install `POST /api/v1/admin/license { "token": "..." }` (existing): - Validates against the configured public key. - On success, persists to `license` table (`installed_by = user_id`), updates the in-memory `LicenseGate`, audits. - On failure, returns 400 with the validator error message and audits the rejection. ### 6.4 Public key custody `CAMELEER_SERVER_LICENSE_PUBLICKEY` (existing) remains the only verification key. Build- / deploy-time secret bound to the vendor distribution. **Not stored in DB.** If unset *and* a license is present → reject all licenses (existing behaviour). ### 6.5 Audit trail New `AuditCategory.LICENSE`. Actions: | Action | When | Payload | |---|---|---| | `install_license` | First successful install in an empty state | `{licenseId, expiresAt, installedBy, source}` (`source` = `env`/`file`/`api`) | | `replace_license` | Successful install over an existing license | same + `previousLicenseId` | | `reject_license` | Validation failed (signature, tenant, parse, public key missing) | `{reason, source}` | | `cap_exceeded` | Any `LicenseCapExceededException` | `{limit, current, cap, requestedBy}` | --- ## 7. Minter ### 7.1 `LicenseMinter` (library) Pure function, packaged in `cameleer-license-minter`: ```java public final class LicenseMinter { public static String mint(LicenseInfo info, PrivateKey ed25519PrivateKey); } ``` Serializes `LicenseInfo` to canonical JSON (sorted keys), signs the bytes with Ed25519, returns `base64(payload).base64(signature)`. cameleer-saas calls this directly to mint per-tenant tokens. ### 7.2 `LicenseMinterCli` (CLI) ```bash java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \ --private-key=/secure/vendor.key \ --tenant=acme-corp \ --label="ACME prod 2026" \ --expires=2027-04-25 \ --grace-days=30 \ --max-apps=50 \ --max-agents=100 \ --max-total-cpu-millis=32000 \ --max-total-memory-mb=65536 \ --max-execution-retention-days=90 \ --output=acme-license.tok ``` - `--private-key` reads a PEM-encoded Ed25519 private key (output of `openssl genpkey -algorithm ed25519`). - Unspecified `--max-*` flags are omitted from the payload — the license inherits the default for that key. - Unknown flags fail fast. - `--output` writes the token; if omitted, prints to stdout. Keypair generation is **out of band** — vendor uses `openssl` and stores both halves in their secret manager. We deliberately do not ship a `--gen-keypair` subcommand to keep the boundary clean. --- ## 8. Telemetry Prometheus gauges scraped via `/api/v1/prometheus`: | Metric | Labels | Notes | |---|---|---| | `cameleer_license_state` | `state="ABSENT|ACTIVE|GRACE|EXPIRED"` | Boolean — exactly one is 1. | | `cameleer_license_days_remaining` | (none) | Negative in GRACE/EXPIRED. | | `cameleer_license_limit_utilisation`| `limit="max_apps"` etc. | `current / cap`, in `[0, 1+]`. | | `cameleer_license_cap_rejections_total` | `limit="..."` | Counter. | State-transition log lines: `INFO` on install/ACTIVE, `WARN` on GRACE, `ERROR` on EXPIRED, `WARN` on cap reject (sampled to avoid log spam). --- ## 9. Dead-code removal Performed in the **first commit** of the implementation. Per the project's "no backwards compatibility shims" preference, no deprecated path or feature flag. - Delete `Feature.java`. - Delete `LicenseGate.isEnabled(Feature)`. - Delete `LicenseInfo.features` field, `LicenseInfo.hasFeature(Feature)`. - Delete `LicenseGateTest.withLicense_onlyLicensedFeaturesEnabled` and `LicenseInfo.open()`'s `Set.of(Feature.values())` assertion. - Update `LicenseValidator` to ignore `features` if present in old tokens (silently dropped, not an error). --- ## 10. Testing | Layer | Tests | |---|---| | Core unit | `LicenseValidatorTest` — signature, expiry, tenant mismatch, missing required fields, unknown extra fields. | | Core unit | `LicenseStateMachineTest` — all four transitions including grace boundary, replace from any state, invalid install. | | Core unit | `DefaultTierLimitsTest` — every documented key has a default. | | Minter unit | `LicenseMinterTest` — round-trip with a throwaway Ed25519 keypair. Canonical JSON is stable across runs. | | Minter CLI | `LicenseMinterCliTest` — invokes `main` with `--private-key=tmp` and checks output token validates. | | App unit | `LicenseEnforcerTest` — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default. | | App integration | `LicenseLifecycleIT` — install via env, replace via POST, restart restores from DB. Driven through REST. | | App integration | `LicenseEnforcementIT` — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes `cap_exceeded` audit row check. | | Boot | `SchemaBootstrapIT` extension — `license` table exists, `environments` retention columns exist, retention pinning honoured at boot. | No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or env var. --- ## 11. Open follow-ups (deliberately deferred) - Ingestion-rate limits (`max_executions_per_minute`, `max_logs_per_minute`). - Online revocation callback (the `revocation_check_url` envelope field). - Concurrent debug session limit (`max_concurrent_debug_sessions` from the SaaS epic). - A "license usage history" report for vendors to see growth over time. - Open a tracking issue on `cameleer/cameleer-server` (Gitea) — none exists today. --- ## 12. Risk register | Risk | Mitigation | |---|---| | Default tier so tight that an honest evaluator cannot try the product. | Defaults documented; vendor can ship a longer-`exp` "trial" license at install time if needed. | | Customer lowers `gracePeriodDays` field by editing token. | Token is signed; any edit invalidates the signature. | | License removed from DB out of band, server lands in ABSENT and rejects new resources but old ones are above default tier. | Boot-time WARN per over-cap limit. UI banner in the admin license page. No auto-deletion. | | Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. | | Compute cap arithmetic relies on `cpuLimit` and `memoryLimitMb` being set on every container. | Existing `ResolvedContainerConfig` already enforces these; `DeploymentExecutor.PRE_FLIGHT` rejects deploys with unset compute fields. | | Per-env retention column added but old ClickHouse partitions retain longer. | Documented: TTL change is honoured by ClickHouse on its next merge cycle. New rows inserted always honour the new TTL. |