Files
cameleer-server/docs/superpowers/specs/2026-04-25-license-enforcement-design.md
hsiegeln e0be6a069f docs(license): apply review feedback to enforcement design
- Add INVALID state to FSM (signature/tenant/parse failure ≠ ABSENT)
  with loud UI/audit/metric severity; ABSENT stays a calm state.
- Make tenantId required in the license envelope (it's already inside
  the signed payload, so a self-hosted customer cannot strip it).
- Move ClickHouse TTL recompute from boot-only to a
  RetentionPolicyApplier @EventListener(LicenseChangedEvent), so a
  long-running server that lands in EXPIRED tightens TTL automatically.
- Add LicenseRevalidationJob (daily) that re-runs signature check
  against the DB row and updates last_validated_at; transitions to
  INVALID on failure (catches public-key rotation drift).
- Add last_validated_at column to the license table, surfaced on the
  /usage endpoint and as cameleer_license_last_validated_age_seconds.
- Enrich enforcement-failure responses and the /usage endpoint with a
  per-state human-readable message so 403s and the UI both explain
  WHY caps changed.
- Add --verify (with --public-key) to the minter CLI to round-trip a
  freshly-minted token through LicenseValidator before shipping it,
  deleting the output file on verify failure.
- Add corresponding tests, telemetry gauge, and a runtime-recompute IT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 09:42:16 +02:00

570 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# License Enforcement — Design
**Date:** 2026-04-25
**Status:** Approved (brainstorm); pending writing-plans
**Related:** cameleer-saas#7 (Epic: License & Feature Gating), cameleer-saas#42 (vendor minting), cameleer-saas#50 (customer license view)
## Problem
`cameleer-server` ships a license skeleton (`LicenseValidator`, `LicenseGate`, admin endpoint) but
nothing enforces anything. Open mode (no license configured) currently grants *all* features and
*no* limits — the opposite of what we want for a self-hosted distribution that needs to gate scale
behind a paid license.
We want:
1. A self-hosted server with **no license** to operate within a small, hard-coded "default tier"
that is enough to evaluate the product but not enough to run it in production.
2. Licenses to express **arbitrary per-customer limits** (no fixed tiers) on a vendor-defined set
of resources: entity counts, compute footprint, retention.
3. A **standalone minter** owned by the vendor that signs licenses with an Ed25519 private key the
customer never sees.
4. Licenses to be **persisted** on the server, **installable** via env var, file, or admin POST,
and **renewable** by replacement.
5. **Revocation** handled out of band (vendor suspends the SaaS tenant, or issues short-`exp`
licenses) — no online revocation callback in v1.
## Non-goals
- Feature flags. The current `Feature` enum (topology/lineage/correlation/debugger/replay) is dead
scaffolding and gets removed; this design is about quantitative limits only.
- Ingestion-rate limits (executions/minute, logs/minute). Defer to a follow-up.
- Online revocation. Vendor uses shorter `exp` + reissue; SaaS suspension is independent.
- Auto-deletion of resources when caps are lowered. Existing rows stay; only new creates reject.
- Minter keypair generation tooling. Vendor uses standard `openssl genpkey -algorithm ed25519`
out of band.
---
## 1. Architecture
### 1.1 Module layout
```
cameleer-server-core/ (existing — pure domain, no Spring)
└── license/
├── LicenseInfo (record — see §2)
├── LicenseLimits (typed wrapper over the limits map)
├── LicenseValidator (existing, payload schema updated)
├── LicenseGate (existing, gutted: no Feature; getLimits() only)
├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED / INVALID)
└── DefaultTierLimits (constant — §3.2 numbers)
cameleer-server-app/ (existing — Spring, web, persistence)
├── license/
│ ├── LicenseRepository (NEW — PostgreSQL persistence)
│ ├── LicenseService (NEW — load/save/replace; publishes LicenseChangedEvent)
│ ├── LicenseEnforcer (NEW — assertWithinCap entry point)
│ ├── LicenseUsageReader (NEW — counts current usage for /usage endpoint)
│ ├── LicenseCapExceededException (NEW — mapped to 403 by ControllerAdvice)
│ ├── LicenseRevalidationJob (NEW — @Scheduled daily; updates last_validated_at)
│ ├── RetentionPolicyApplier (NEW — @EventListener(LicenseChangedEvent); recomputes ClickHouse TTL + per-env caps)
│ └── LicenseMetrics (NEW — Prometheus gauges)
├── controller/
│ ├── LicenseAdminController (existing — extended; persists, audited)
│ └── LicenseUsageController (NEW — GET /admin/license/usage)
└── config/
└── LicenseBeanConfig (existing — extended for DB load order)
cameleer-license-minter/ (NEW — top-level Maven module)
├── pom.xml (depends on cameleer-server-core)
├── LicenseMinter (signing primitive; takes private key + LicenseInfo)
└── cli/LicenseMinterCli (CLI main class, supports --verify)
```
### 1.2 Why a separate `cameleer-license-minter` module
Not shipped in the runtime JAR. Vendor distributes it independently or builds it from source on a
trusted machine. Customers never receive it.
This is module hygiene + smaller runtime attack surface, not a cryptographic protection — license
forgery requires the vendor's private key, and the public key in the server is enough to verify
forged tokens regardless of where the minter code lives.
### 1.3 Dependency graph
```
cameleer-license-minter ──▶ cameleer-server-core (LicenseInfo schema only)
cameleer-server-app ──▶ cameleer-server-core (validator, gate, FSM, defaults)
cameleer-saas ──▶ cameleer-license-minter (for SaaS-mode minting)
cameleer-saas ──▶ cameleer-server-core (transitive)
```
`cameleer-server-app` has **no** dependency on `cameleer-license-minter`.
---
## 2. License envelope
Wire format unchanged: `base64(payload).base64(ed25519_signature)`. Payload schema:
```json
{
"licenseId": "550e8400-e29b-41d4-a716-446655440000",
"tenantId": "acme-corp",
"label": "ACME prod 2026 — site:hamburg",
"iat": 1745539200,
"exp": 1777075200,
"gracePeriodDays": 30,
"limits": {
"max_environments": 5,
"max_apps": 50,
"max_agents": 100,
"max_users": 25,
"max_outbound_connections": 10,
"max_alert_rules": 200,
"max_total_cpu_millis": 32000,
"max_total_memory_mb": 65536,
"max_total_replicas": 100,
"max_execution_retention_days": 90,
"max_log_retention_days": 30,
"max_metric_retention_days": 365,
"max_jar_retention_count": 10
}
}
```
### 2.1 Field rules
| Field | Required | Notes |
|---|---|---|
| `licenseId` | yes | UUID. Used in audit + future revocation. |
| `tenantId` | **yes** | Must match `CAMELEER_SERVER_TENANT_ID`. Mismatch = `INVALID` state (see §3). The field is inside the signed payload, so a self-hosted customer cannot strip it to make a license portable across tenants — any edit invalidates the signature. Air-gapped customers receive a license bound to a vendor-issued tenant id (not necessarily a UUID — any non-empty slug). |
| `label` | optional | Free-form human description. Surfaced in UI. |
| `iat` | yes | Unix seconds. |
| `exp` | yes | Unix seconds. |
| `gracePeriodDays` | optional, default `0` | Days `exp` may be in the past while limits still apply. |
| `limits.*` | each optional | Missing key inherits from `DefaultTierLimits`. A license can lift any subset. |
### 2.2 Removed from the current envelope
- `tier` (string) — was a non-functional label. Folded into `label`.
- `features` (array) — out of scope. `Feature` enum deleted.
---
## 3. License state machine
```
exp + grace passes
┌─────────┐ install valid ┌────────┐ exp ┌────────┐ ────────► ┌─────────┐
│ ABSENT │ ───────────────▶│ ACTIVE │──────▶│ GRACE │ │ EXPIRED │
└─────────┘ └────────┘ └────────┘ └─────────┘
▲ │ │ ▲ │
│ │ replace │ │ replace valid │ replace
│ ▼ │ │ ▼
│ ┌─────────┐ └──────────────┴─┴───────────────────┘
└──┤ INVALID │ ──── replace valid ────────────────────────────────▶ ACTIVE
└─────────┘
│ install fails (signature / tenant / parse / public-key-missing)
all transitions persist + audit-log
```
### 3.1 State semantics
| State | Effective limits | Trigger | Severity |
|--- |--- |--- |--- |
| `ABSENT` | `DefaultTierLimits` | No DB row. Clean install with no license configured. | INFO |
| `ACTIVE` | `merge(default, license.limits)` | License loaded, `now < exp`. | INFO |
| `GRACE` | Same as `ACTIVE` | `exp ≤ now < exp + gracePeriodDays`. UI warning banner. | WARN |
| `EXPIRED` | `DefaultTierLimits` | `now ≥ exp + gracePeriodDays`. UI label distinct from ABSENT. | ERROR |
| `INVALID` | `DefaultTierLimits` | Signature failure, tenant mismatch, parse error, or public key not configured but a token is present. | **ERROR — loud** |
`ABSENT` and `INVALID` produce the same enforcement (default tier) but are surfaced very
differently:
- **`ABSENT`** is a clean state — fresh install, no license yet. UI shows a calm "Install a
license to lift the default-tier caps" call to action. No audit row beyond the boot log line.
- **`INVALID`** is an active error — tampering, wrong public key, or a paste that lost
characters. UI shows a red banner with the validator's error message
(e.g. "License signature verification failed", "License tenantId 'acme-corp' does not match
server tenant 'beta-corp'"). Audit row written under
`AuditCategory.LICENSE` action `reject_license`. Prometheus
`cameleer_license_state{state="INVALID"} = 1` so an alert can fire.
State is recomputed on every limit check (clock comparison only against parsed in-memory
`LicenseInfo`) — no scheduler needed for `ACTIVE → GRACE → EXPIRED` transitions. A separate
**daily revalidation job** (§6.6) re-runs the signature check against the DB row to catch slow
failures like public-key rotation drift.
### 3.2 Default tier (the "no license" caps)
| Limit | Default |
|---|---|
| `max_environments` | 1 |
| `max_apps` | 3 |
| `max_agents` | 5 |
| `max_users` | 3 |
| `max_outbound_connections` | 1 |
| `max_alert_rules` | 2 |
| `max_total_cpu_millis` | 2000 (2 cores) |
| `max_total_memory_mb` | 2048 (2 GB) |
| `max_total_replicas` | 5 |
| `max_execution_retention_days` | 1 |
| `max_log_retention_days` | 1 |
| `max_metric_retention_days` | 1 |
| `max_jar_retention_count` | 3 |
Encoded as `public static final Map<String, Integer> DEFAULTS` in `DefaultTierLimits`. Keys
match the license payload exactly.
---
## 4. Enforcement map
Every limit check goes through one method on `LicenseEnforcer`:
```java
void assertWithinCap(String limitKey, long currentUsage, long requestedDelta);
```
Throws `LicenseCapExceededException(limitKey, current, cap)` when `currentUsage + requestedDelta > cap`.
A `@ControllerAdvice` maps it to `403` with a body that explains the "why" so operators can act
without grepping logs:
```json
{
"error": "license cap reached",
"limit": "max_apps",
"current": 3,
"cap": 3,
"state": "EXPIRED",
"message": "License expired 5 days ago: system reverted to default tier (3 apps). Current usage is 3. Install or renew the license to create more apps."
}
```
The `message` field is rendered server-side from a small template per state:
| State | Message template |
|--- |---|
| `ABSENT` | "No license installed: default tier applies (cap = N for {limit}). Install a license to raise this." |
| `ACTIVE` | "License cap reached: {limit} = {cap}. Current usage is {current}. Contact your vendor to raise the cap." |
| `GRACE` | "License expired {n} day(s) ago and is in its grace period (ends in {m} days). Cap unchanged at {cap}. Renew before grace ends." |
| `EXPIRED`| "License expired {n} days ago: system reverted to default tier (cap = N for {limit}). Current usage is {current}. Renew the license to lift the cap." |
| `INVALID`| "License rejected ({reason}): default tier applies (cap = N for {limit}). Fix the license to raise this." |
### 4.1 Per-limit call sites
| Limit | Call site | Failure response |
|---|---|---|
| `max_environments` | `EnvironmentService.create` (start) | 403 |
| `max_apps` | `AppService.createApp` | 403 |
| `max_agents` | `AgentRegistryService.register` | 403 — agent treated as unregistered (no SSE, no commands) |
| `max_users` | `UserAdminController.createUser` and `OidcAuthController.callback` (auto-signup) | 403 / OIDC login failure |
| `max_outbound_connections` | `OutboundConnectionServiceImpl.create` | 403 |
| `max_alert_rules` | `AlertRuleController.create` | 403 |
| `max_total_cpu_millis` | `DeploymentExecutor.PRE_FLIGHT` (sum across non-stopped deploys + new) | Deploy fails fast at PRE_FLIGHT, status FAILED, audit row |
| `max_total_memory_mb` | same | same |
| `max_total_replicas` | same | same |
| `max_execution_retention_days` | `EnvironmentService.update` (per-env field, see §4.2) + `RetentionPolicyApplier` (see §4.3) | 422 on update; ClickHouse TTL recomputed on every license change |
| `max_log_retention_days` | same | same |
| `max_metric_retention_days` | same | same |
| `max_jar_retention_count` | `EnvironmentAdminController.PUT /jar-retention` | 422 |
### 4.2 Per-environment retention fields
Three new columns on `environments` (Flyway V2):
```sql
ALTER TABLE environments
ADD COLUMN execution_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN log_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN metric_retention_days INTEGER NOT NULL DEFAULT 1;
```
These are the configured per-env values. The effective ClickHouse TTL is
`min(licenseCap, configured)`. Admin UI surfaces the configured values;
`EnvironmentService.update` rejects values above the license cap with 422.
### 4.3 Runtime retention recompute
`RetentionPolicyApplier` is `@EventListener(LicenseChangedEvent)`:
- Triggered on every `LicenseService.replace(...)` (boot install, env-var override, file
override, POST `/admin/license`) **and** on every state transition the revalidation job
detects (e.g. license becomes `EXPIRED`, caps drop to default).
- Recomputes the effective TTL per env (`min(licenseCap, configured)`), then issues
`ALTER TABLE … MODIFY TTL …` on the affected ClickHouse tables (executions, processors,
logs, metrics, route_diagrams, agent_events). One ALTER per table per affected env.
- Errors are logged WARN; a failed ALTER does not block the license install — the operator can
retry by reposting the license. The previous TTL keeps applying until the next successful
ALTER.
- At boot, `LicenseService.loadInitial(...)` publishes one `LicenseChangedEvent` after the
load order in §6.2 settles, so the boot path goes through the same applier as runtime
changes.
Result: a server that stays up for months and lands in `EXPIRED` will see ClickHouse TTLs
collapse to default-tier values automatically — no restart needed.
### 4.4 Boot-time invariant
If a license is added that *lowers* a cap below current usage (10 apps, license now allows 5), the
server logs one WARN per limit at boot. **No deletion**. New creates reject; existing resources
keep working.
---
## 5. Usage endpoint
`GET /api/v1/admin/license/usage` (ADMIN only):
```json
{
"state": "ACTIVE",
"expiresAt": "2027-04-25T00:00:00Z",
"daysRemaining": 365,
"gracePeriodDays": 30,
"tenantId": "acme-corp",
"label": "ACME prod 2026",
"lastValidatedAt": "2026-04-26T03:14:07Z",
"message": "License active. 365 days remaining.",
"limits": [
{"key": "max_apps", "current": 7, "cap": 50, "source": "license"},
{"key": "max_agents", "current": 12, "cap": 100, "source": "license"},
{"key": "max_total_cpu_millis", "current": 8500, "cap": 32000, "source": "license"},
{"key": "max_outbound_connections", "current": 0, "cap": 1, "source": "default"}
]
}
```
`source` is `"default"` when the cap comes from `DefaultTierLimits` (i.e. the license omits this
key, or there is no license), and `"license"` when the cap is explicit in the license. Drives the
SaaS UI's "free tier" badge.
`message` carries the same human-readable explanation that the 403 body uses, varying by state:
- `ABSENT` — "No license installed. Default tier applies."
- `ACTIVE` — "License active. {n} days remaining."
- `GRACE` — "License expired {n} days ago. Grace period ends in {m} days. Renew now to avoid degradation."
- `EXPIRED`— "License expired {n} days ago. System reverted to default tier."
- `INVALID`— "License rejected: {reason}. Default tier applies. Fix the license to recover."
`LicenseUsageReader` issues one cheap aggregate per limit (`SELECT COUNT(*)` per entity table; a
single grouped `SELECT SUM(replicas * cpuMillis), SUM(replicas * memoryMb), SUM(replicas)` over
non-stopped deployments).
`GET /api/v1/admin/license` (existing) is extended to return `{state, envelope, lastValidatedAt}`
with the raw token omitted from the response.
---
## 6. Lifecycle, persistence, install paths
### 6.1 Storage
Flyway V2 migration:
```sql
CREATE TABLE license (
tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant)
token TEXT NOT NULL, -- full signed token
license_id UUID NOT NULL,
installed_at TIMESTAMPTZ NOT NULL,
installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot
expires_at TIMESTAMPTZ NOT NULL,
last_validated_at TIMESTAMPTZ NOT NULL -- updated by boot, install, and revalidation job
);
```
`last_validated_at` is the timestamp of the most recent **successful** signature/parse round-trip
against the current public key. Useful for troubleshooting "why did my license stop working" — a
stale `last_validated_at` next to a recent `now` is a strong signal that revalidation is failing
and the operator should check the public key.
### 6.2 Boot order
`LicenseBeanConfig`:
1. If `CAMELEER_SERVER_LICENSE_TOKEN` env var is set → validate → write to DB (overwrite,
sets `last_validated_at = now`) → load.
2. Else if `CAMELEER_SERVER_LICENSE_FILE` is set → read file → validate → write to DB → load.
3. Else read `license` row from DB → validate → on success update `last_validated_at = now`
load.
4. Else `ABSENT`.
After step 13 the service publishes one `LicenseChangedEvent` so the retention applier and
metrics gauges initialise off the same code path as runtime changes.
Env-var / file act as **idempotent overrides** — they always win and replace the DB row, so the
operator's last action survives reboots.
### 6.3 Runtime install
`POST /api/v1/admin/license { "token": "..." }` (existing):
- Validates against the configured public key.
- On success, persists to `license` table (`installed_by = user_id`, `last_validated_at = now`),
updates the in-memory `LicenseGate`, publishes `LicenseChangedEvent`, audits.
- On failure, returns 400 with the validator error message and audits the rejection.
Server transitions to `INVALID` state if a previously-loaded license was replaced; otherwise
remains in its prior state (the rejected token is *not* written to DB).
### 6.4 Public key custody
`CAMELEER_SERVER_LICENSE_PUBLICKEY` (existing) remains the only verification key. Build- /
deploy-time secret bound to the vendor distribution. **Not stored in DB.** If unset *and* a
license is present → reject all licenses (existing behaviour) → `INVALID` state.
### 6.5 Audit trail
New `AuditCategory.LICENSE`. Actions:
| Action | When | Payload |
|---|---|---|
| `install_license` | First successful install in an empty state | `{licenseId, expiresAt, installedBy, source}` (`source` = `env`/`file`/`api`) |
| `replace_license` | Successful install over an existing license | same + `previousLicenseId` |
| `reject_license` | Validation failed (signature, tenant, parse, public key missing) | `{reason, source}` |
| `revalidate_license` | Daily job result, on **failure only** | `{licenseId, reason}` |
| `cap_exceeded` | Any `LicenseCapExceededException` | `{limit, current, cap, requestedBy, state}` |
### 6.6 Daily revalidation job
`LicenseRevalidationJob`:
- `@Scheduled(cron = "0 0 3 * * *")` (03:00 server local time) plus an immediate run 60s
after boot.
- Reads the DB token, re-runs `LicenseValidator.validate(token)` against the current public
key.
- On success: `UPDATE license SET last_validated_at = now WHERE tenant_id = ?`.
- On failure (e.g. operator rotated the public key without reinstalling the license, or DB
row was tampered with directly): transition state to `INVALID`, publish
`LicenseChangedEvent` (so retention recomputes too), audit `revalidate_license` with the
reason, log `ERROR`.
- Cheap (no I/O beyond one DB read + one DB write); safe to run frequently. 03:00 is chosen
to coincide with off-peak so the WARN noise lands when humans aren't deploying.
---
## 7. Minter
### 7.1 `LicenseMinter` (library)
Pure function, packaged in `cameleer-license-minter`:
```java
public final class LicenseMinter {
public static String mint(LicenseInfo info, PrivateKey ed25519PrivateKey);
}
```
Serializes `LicenseInfo` to canonical JSON (sorted keys), signs the bytes with Ed25519, returns
`base64(payload).base64(signature)`. cameleer-saas calls this directly to mint per-tenant tokens.
### 7.2 `LicenseMinterCli` (CLI)
```bash
java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \
--private-key=/secure/vendor.key \
--public-key=/secure/vendor.pub \
--tenant=acme-corp \
--label="ACME prod 2026" \
--expires=2027-04-25 \
--grace-days=30 \
--max-apps=50 \
--max-agents=100 \
--max-total-cpu-millis=32000 \
--max-total-memory-mb=65536 \
--max-execution-retention-days=90 \
--output=acme-license.tok \
--verify
```
- `--private-key` reads a PEM-encoded Ed25519 private key (output of
`openssl genpkey -algorithm ed25519`).
- `--public-key` *(used only with `--verify`)* reads the matching public key. Required when
`--verify` is set; ignored otherwise.
- Unspecified `--max-*` flags are omitted from the payload — the license inherits the default for
that key.
- Unknown flags fail fast.
- `--output` writes the token; if omitted, prints to stdout.
- `--verify` round-trips the freshly-minted token through `LicenseValidator` against
`--public-key` *after* writing the output file. This catches:
- corruption between `String → file` write,
- wrong-key pairing (vendor accidentally pointed `--public-key` at a different keypair's
public half),
- signature mismatch from a buggy build of the minter.
On verify failure the CLI exits non-zero, prints the validator error, and (if `--output` was
written) deletes the output file so the bad token does not get shipped.
Keypair generation is **out of band** — vendor uses `openssl` and stores both halves in their
secret manager. We deliberately do not ship a `--gen-keypair` subcommand to keep the boundary
clean.
---
## 8. Telemetry
Prometheus gauges scraped via `/api/v1/prometheus`:
| Metric | Labels | Notes |
|---|---|---|
| `cameleer_license_state` | `state="ABSENT|ACTIVE|GRACE|EXPIRED|INVALID"` | Boolean — exactly one is 1. |
| `cameleer_license_days_remaining` | (none) | Negative in GRACE/EXPIRED. |
| `cameleer_license_limit_utilisation`| `limit="max_apps"` etc. | `current / cap`, in `[0, 1+]`. |
| `cameleer_license_cap_rejections_total` | `limit="..."` | Counter. |
| `cameleer_license_last_validated_age_seconds` | (none) | `now - last_validated_at`. Spikes if the daily revalidation job is failing. |
State-transition log lines: `INFO` on install/ACTIVE, `WARN` on GRACE, `ERROR` on EXPIRED,
`ERROR` on INVALID, `WARN` on cap reject (sampled to avoid log spam).
Recommended alert (in cameleer-saas Grafana, not shipped with the server): page on
`cameleer_license_state{state="INVALID"} == 1` for > 5 minutes.
---
## 9. Dead-code removal
Performed in the **first commit** of the implementation. Per the project's "no backwards
compatibility shims" preference, no deprecated path or feature flag.
- Delete `Feature.java`.
- Delete `LicenseGate.isEnabled(Feature)`.
- Delete `LicenseInfo.features` field, `LicenseInfo.hasFeature(Feature)`.
- Delete `LicenseGateTest.withLicense_onlyLicensedFeaturesEnabled` and `LicenseInfo.open()`'s
`Set.of(Feature.values())` assertion.
- Update `LicenseValidator` to ignore `features` if present in old tokens (silently dropped,
not an error).
---
## 10. Testing
| Layer | Tests |
|---|---|
| Core unit | `LicenseValidatorTest` — signature, expiry, tenant mismatch, missing required fields (`tenantId`, `licenseId`, `iat`, `exp`), unknown extra fields. |
| Core unit | `LicenseStateMachineTest` — all five transitions including grace boundary, replace from any state, invalid install routes to `INVALID`, valid install from `INVALID` recovers to `ACTIVE`. |
| Core unit | `DefaultTierLimitsTest` — every documented key has a default. |
| Minter unit | `LicenseMinterTest` — round-trip with a throwaway Ed25519 keypair. Canonical JSON is stable across runs. |
| Minter CLI | `LicenseMinterCliTest` — invokes `main` with `--private-key=tmp` and checks output token validates; `--verify` happy path; `--verify` failure path deletes the output file and exits non-zero. |
| App unit | `LicenseEnforcerTest` — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default, message text varies per state. |
| App unit | `RetentionPolicyApplierTest` — license-changed event recomputes effective TTL per env; failed ALTER logs WARN and does not throw. |
| App integration | `LicenseLifecycleIT` — install via env, replace via POST, restart restores from DB, public-key removal at runtime transitions to `INVALID`, daily revalidation job updates `last_validated_at`. Driven through REST. |
| App integration | `LicenseEnforcementIT` — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes `cap_exceeded` audit row check and verifies the 403 body's `message` field matches the state. |
| App integration | `RetentionRuntimeRecomputeIT` — install license with `max_log_retention_days=30`, observe `logs` TTL ALTER fires; replace with `max_log_retention_days=7`, observe TTL drops to 7 without restart. |
| Boot | `SchemaBootstrapIT` extension — `license` table exists with `last_validated_at`, `environments` retention columns exist, retention pinning honoured at boot. |
No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or env var.
---
## 11. Open follow-ups (deliberately deferred)
- Ingestion-rate limits (`max_executions_per_minute`, `max_logs_per_minute`).
- Online revocation callback (the `revocation_check_url` envelope field).
- Concurrent debug session limit (`max_concurrent_debug_sessions` from the SaaS epic).
- A "license usage history" report for vendors to see growth over time.
- Open a tracking issue on `cameleer/cameleer-server` (Gitea) — none exists today.
---
## 12. Risk register
| Risk | Mitigation |
|---|---|
| Default tier so tight that an honest evaluator cannot try the product. | Defaults documented; vendor can ship a longer-`exp` "trial" license at install time if needed. |
| Customer lowers `gracePeriodDays` field by editing token. | Token is signed; any edit invalidates the signature. |
| License removed from DB out of band, server lands in ABSENT and rejects new resources but old ones are above default tier. | Boot-time WARN per over-cap limit. UI banner in the admin license page. No auto-deletion. |
| Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. Daily revalidation job catches a rotation that wasn't paired with a reinstall (state → `INVALID`, alertable). |
| Compute cap arithmetic relies on `cpuLimit` and `memoryLimitMb` being set on every container. | Existing `ResolvedContainerConfig` already enforces these; `DeploymentExecutor.PRE_FLIGHT` rejects deploys with unset compute fields. |
| Per-env retention column added but old ClickHouse partitions retain longer. | Documented: TTL change is honoured by ClickHouse on its next merge cycle. New rows inserted always honour the new TTL. |
| `RetentionPolicyApplier` issues blocking ALTERs from the event listener thread. | Applier runs ALTERs serialised but on a separate executor (not the publisher thread) so a slow ClickHouse does not stall the install API call. License install API returns immediately with the new state; retention recompute completes asynchronously and is observable via metrics. |