- Add INVALID state to FSM (signature/tenant/parse failure ≠ ABSENT) with loud UI/audit/metric severity; ABSENT stays a calm state. - Make tenantId required in the license envelope (it's already inside the signed payload, so a self-hosted customer cannot strip it). - Move ClickHouse TTL recompute from boot-only to a RetentionPolicyApplier @EventListener(LicenseChangedEvent), so a long-running server that lands in EXPIRED tightens TTL automatically. - Add LicenseRevalidationJob (daily) that re-runs signature check against the DB row and updates last_validated_at; transitions to INVALID on failure (catches public-key rotation drift). - Add last_validated_at column to the license table, surfaced on the /usage endpoint and as cameleer_license_last_validated_age_seconds. - Enrich enforcement-failure responses and the /usage endpoint with a per-state human-readable message so 403s and the UI both explain WHY caps changed. - Add --verify (with --public-key) to the minter CLI to round-trip a freshly-minted token through LicenseValidator before shipping it, deleting the output file on verify failure. - Add corresponding tests, telemetry gauge, and a runtime-recompute IT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
570 lines
29 KiB
Markdown
570 lines
29 KiB
Markdown
# License Enforcement — Design
|
||
|
||
**Date:** 2026-04-25
|
||
**Status:** Approved (brainstorm); pending writing-plans
|
||
**Related:** cameleer-saas#7 (Epic: License & Feature Gating), cameleer-saas#42 (vendor minting), cameleer-saas#50 (customer license view)
|
||
|
||
## Problem
|
||
|
||
`cameleer-server` ships a license skeleton (`LicenseValidator`, `LicenseGate`, admin endpoint) but
|
||
nothing enforces anything. Open mode (no license configured) currently grants *all* features and
|
||
*no* limits — the opposite of what we want for a self-hosted distribution that needs to gate scale
|
||
behind a paid license.
|
||
|
||
We want:
|
||
|
||
1. A self-hosted server with **no license** to operate within a small, hard-coded "default tier"
|
||
that is enough to evaluate the product but not enough to run it in production.
|
||
2. Licenses to express **arbitrary per-customer limits** (no fixed tiers) on a vendor-defined set
|
||
of resources: entity counts, compute footprint, retention.
|
||
3. A **standalone minter** owned by the vendor that signs licenses with an Ed25519 private key the
|
||
customer never sees.
|
||
4. Licenses to be **persisted** on the server, **installable** via env var, file, or admin POST,
|
||
and **renewable** by replacement.
|
||
5. **Revocation** handled out of band (vendor suspends the SaaS tenant, or issues short-`exp`
|
||
licenses) — no online revocation callback in v1.
|
||
|
||
## Non-goals
|
||
|
||
- Feature flags. The current `Feature` enum (topology/lineage/correlation/debugger/replay) is dead
|
||
scaffolding and gets removed; this design is about quantitative limits only.
|
||
- Ingestion-rate limits (executions/minute, logs/minute). Defer to a follow-up.
|
||
- Online revocation. Vendor uses shorter `exp` + reissue; SaaS suspension is independent.
|
||
- Auto-deletion of resources when caps are lowered. Existing rows stay; only new creates reject.
|
||
- Minter keypair generation tooling. Vendor uses standard `openssl genpkey -algorithm ed25519`
|
||
out of band.
|
||
|
||
---
|
||
|
||
## 1. Architecture
|
||
|
||
### 1.1 Module layout
|
||
|
||
```
|
||
cameleer-server-core/ (existing — pure domain, no Spring)
|
||
└── license/
|
||
├── LicenseInfo (record — see §2)
|
||
├── LicenseLimits (typed wrapper over the limits map)
|
||
├── LicenseValidator (existing, payload schema updated)
|
||
├── LicenseGate (existing, gutted: no Feature; getLimits() only)
|
||
├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED / INVALID)
|
||
└── DefaultTierLimits (constant — §3.2 numbers)
|
||
|
||
cameleer-server-app/ (existing — Spring, web, persistence)
|
||
├── license/
|
||
│ ├── LicenseRepository (NEW — PostgreSQL persistence)
|
||
│ ├── LicenseService (NEW — load/save/replace; publishes LicenseChangedEvent)
|
||
│ ├── LicenseEnforcer (NEW — assertWithinCap entry point)
|
||
│ ├── LicenseUsageReader (NEW — counts current usage for /usage endpoint)
|
||
│ ├── LicenseCapExceededException (NEW — mapped to 403 by ControllerAdvice)
|
||
│ ├── LicenseRevalidationJob (NEW — @Scheduled daily; updates last_validated_at)
|
||
│ ├── RetentionPolicyApplier (NEW — @EventListener(LicenseChangedEvent); recomputes ClickHouse TTL + per-env caps)
|
||
│ └── LicenseMetrics (NEW — Prometheus gauges)
|
||
├── controller/
|
||
│ ├── LicenseAdminController (existing — extended; persists, audited)
|
||
│ └── LicenseUsageController (NEW — GET /admin/license/usage)
|
||
└── config/
|
||
└── LicenseBeanConfig (existing — extended for DB load order)
|
||
|
||
cameleer-license-minter/ (NEW — top-level Maven module)
|
||
├── pom.xml (depends on cameleer-server-core)
|
||
├── LicenseMinter (signing primitive; takes private key + LicenseInfo)
|
||
└── cli/LicenseMinterCli (CLI main class, supports --verify)
|
||
```
|
||
|
||
### 1.2 Why a separate `cameleer-license-minter` module
|
||
|
||
Not shipped in the runtime JAR. Vendor distributes it independently or builds it from source on a
|
||
trusted machine. Customers never receive it.
|
||
|
||
This is module hygiene + smaller runtime attack surface, not a cryptographic protection — license
|
||
forgery requires the vendor's private key, and the public key in the server is enough to verify
|
||
forged tokens regardless of where the minter code lives.
|
||
|
||
### 1.3 Dependency graph
|
||
|
||
```
|
||
cameleer-license-minter ──▶ cameleer-server-core (LicenseInfo schema only)
|
||
cameleer-server-app ──▶ cameleer-server-core (validator, gate, FSM, defaults)
|
||
cameleer-saas ──▶ cameleer-license-minter (for SaaS-mode minting)
|
||
cameleer-saas ──▶ cameleer-server-core (transitive)
|
||
```
|
||
|
||
`cameleer-server-app` has **no** dependency on `cameleer-license-minter`.
|
||
|
||
---
|
||
|
||
## 2. License envelope
|
||
|
||
Wire format unchanged: `base64(payload).base64(ed25519_signature)`. Payload schema:
|
||
|
||
```json
|
||
{
|
||
"licenseId": "550e8400-e29b-41d4-a716-446655440000",
|
||
"tenantId": "acme-corp",
|
||
"label": "ACME prod 2026 — site:hamburg",
|
||
"iat": 1745539200,
|
||
"exp": 1777075200,
|
||
"gracePeriodDays": 30,
|
||
"limits": {
|
||
"max_environments": 5,
|
||
"max_apps": 50,
|
||
"max_agents": 100,
|
||
"max_users": 25,
|
||
"max_outbound_connections": 10,
|
||
"max_alert_rules": 200,
|
||
"max_total_cpu_millis": 32000,
|
||
"max_total_memory_mb": 65536,
|
||
"max_total_replicas": 100,
|
||
"max_execution_retention_days": 90,
|
||
"max_log_retention_days": 30,
|
||
"max_metric_retention_days": 365,
|
||
"max_jar_retention_count": 10
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2.1 Field rules
|
||
|
||
| Field | Required | Notes |
|
||
|---|---|---|
|
||
| `licenseId` | yes | UUID. Used in audit + future revocation. |
|
||
| `tenantId` | **yes** | Must match `CAMELEER_SERVER_TENANT_ID`. Mismatch = `INVALID` state (see §3). The field is inside the signed payload, so a self-hosted customer cannot strip it to make a license portable across tenants — any edit invalidates the signature. Air-gapped customers receive a license bound to a vendor-issued tenant id (not necessarily a UUID — any non-empty slug). |
|
||
| `label` | optional | Free-form human description. Surfaced in UI. |
|
||
| `iat` | yes | Unix seconds. |
|
||
| `exp` | yes | Unix seconds. |
|
||
| `gracePeriodDays` | optional, default `0` | Days `exp` may be in the past while limits still apply. |
|
||
| `limits.*` | each optional | Missing key inherits from `DefaultTierLimits`. A license can lift any subset. |
|
||
|
||
### 2.2 Removed from the current envelope
|
||
|
||
- `tier` (string) — was a non-functional label. Folded into `label`.
|
||
- `features` (array) — out of scope. `Feature` enum deleted.
|
||
|
||
---
|
||
|
||
## 3. License state machine
|
||
|
||
```
|
||
exp + grace passes
|
||
┌─────────┐ install valid ┌────────┐ exp ┌────────┐ ────────► ┌─────────┐
|
||
│ ABSENT │ ───────────────▶│ ACTIVE │──────▶│ GRACE │ │ EXPIRED │
|
||
└─────────┘ └────────┘ └────────┘ └─────────┘
|
||
▲ │ │ ▲ │
|
||
│ │ replace │ │ replace valid │ replace
|
||
│ ▼ │ │ ▼
|
||
│ ┌─────────┐ └──────────────┴─┴───────────────────┘
|
||
└──┤ INVALID │ ──── replace valid ────────────────────────────────▶ ACTIVE
|
||
└─────────┘
|
||
▲
|
||
│ install fails (signature / tenant / parse / public-key-missing)
|
||
all transitions persist + audit-log
|
||
```
|
||
|
||
### 3.1 State semantics
|
||
|
||
| State | Effective limits | Trigger | Severity |
|
||
|--- |--- |--- |--- |
|
||
| `ABSENT` | `DefaultTierLimits` | No DB row. Clean install with no license configured. | INFO |
|
||
| `ACTIVE` | `merge(default, license.limits)` | License loaded, `now < exp`. | INFO |
|
||
| `GRACE` | Same as `ACTIVE` | `exp ≤ now < exp + gracePeriodDays`. UI warning banner. | WARN |
|
||
| `EXPIRED` | `DefaultTierLimits` | `now ≥ exp + gracePeriodDays`. UI label distinct from ABSENT. | ERROR |
|
||
| `INVALID` | `DefaultTierLimits` | Signature failure, tenant mismatch, parse error, or public key not configured but a token is present. | **ERROR — loud** |
|
||
|
||
`ABSENT` and `INVALID` produce the same enforcement (default tier) but are surfaced very
|
||
differently:
|
||
|
||
- **`ABSENT`** is a clean state — fresh install, no license yet. UI shows a calm "Install a
|
||
license to lift the default-tier caps" call to action. No audit row beyond the boot log line.
|
||
- **`INVALID`** is an active error — tampering, wrong public key, or a paste that lost
|
||
characters. UI shows a red banner with the validator's error message
|
||
(e.g. "License signature verification failed", "License tenantId 'acme-corp' does not match
|
||
server tenant 'beta-corp'"). Audit row written under
|
||
`AuditCategory.LICENSE` action `reject_license`. Prometheus
|
||
`cameleer_license_state{state="INVALID"} = 1` so an alert can fire.
|
||
|
||
State is recomputed on every limit check (clock comparison only against parsed in-memory
|
||
`LicenseInfo`) — no scheduler needed for `ACTIVE → GRACE → EXPIRED` transitions. A separate
|
||
**daily revalidation job** (§6.6) re-runs the signature check against the DB row to catch slow
|
||
failures like public-key rotation drift.
|
||
|
||
### 3.2 Default tier (the "no license" caps)
|
||
|
||
| Limit | Default |
|
||
|---|---|
|
||
| `max_environments` | 1 |
|
||
| `max_apps` | 3 |
|
||
| `max_agents` | 5 |
|
||
| `max_users` | 3 |
|
||
| `max_outbound_connections` | 1 |
|
||
| `max_alert_rules` | 2 |
|
||
| `max_total_cpu_millis` | 2000 (2 cores) |
|
||
| `max_total_memory_mb` | 2048 (2 GB) |
|
||
| `max_total_replicas` | 5 |
|
||
| `max_execution_retention_days` | 1 |
|
||
| `max_log_retention_days` | 1 |
|
||
| `max_metric_retention_days` | 1 |
|
||
| `max_jar_retention_count` | 3 |
|
||
|
||
Encoded as `public static final Map<String, Integer> DEFAULTS` in `DefaultTierLimits`. Keys
|
||
match the license payload exactly.
|
||
|
||
---
|
||
|
||
## 4. Enforcement map
|
||
|
||
Every limit check goes through one method on `LicenseEnforcer`:
|
||
|
||
```java
|
||
void assertWithinCap(String limitKey, long currentUsage, long requestedDelta);
|
||
```
|
||
|
||
Throws `LicenseCapExceededException(limitKey, current, cap)` when `currentUsage + requestedDelta > cap`.
|
||
A `@ControllerAdvice` maps it to `403` with a body that explains the "why" so operators can act
|
||
without grepping logs:
|
||
|
||
```json
|
||
{
|
||
"error": "license cap reached",
|
||
"limit": "max_apps",
|
||
"current": 3,
|
||
"cap": 3,
|
||
"state": "EXPIRED",
|
||
"message": "License expired 5 days ago: system reverted to default tier (3 apps). Current usage is 3. Install or renew the license to create more apps."
|
||
}
|
||
```
|
||
|
||
The `message` field is rendered server-side from a small template per state:
|
||
|
||
| State | Message template |
|
||
|--- |---|
|
||
| `ABSENT` | "No license installed: default tier applies (cap = N for {limit}). Install a license to raise this." |
|
||
| `ACTIVE` | "License cap reached: {limit} = {cap}. Current usage is {current}. Contact your vendor to raise the cap." |
|
||
| `GRACE` | "License expired {n} day(s) ago and is in its grace period (ends in {m} days). Cap unchanged at {cap}. Renew before grace ends." |
|
||
| `EXPIRED`| "License expired {n} days ago: system reverted to default tier (cap = N for {limit}). Current usage is {current}. Renew the license to lift the cap." |
|
||
| `INVALID`| "License rejected ({reason}): default tier applies (cap = N for {limit}). Fix the license to raise this." |
|
||
|
||
### 4.1 Per-limit call sites
|
||
|
||
| Limit | Call site | Failure response |
|
||
|---|---|---|
|
||
| `max_environments` | `EnvironmentService.create` (start) | 403 |
|
||
| `max_apps` | `AppService.createApp` | 403 |
|
||
| `max_agents` | `AgentRegistryService.register` | 403 — agent treated as unregistered (no SSE, no commands) |
|
||
| `max_users` | `UserAdminController.createUser` and `OidcAuthController.callback` (auto-signup) | 403 / OIDC login failure |
|
||
| `max_outbound_connections` | `OutboundConnectionServiceImpl.create` | 403 |
|
||
| `max_alert_rules` | `AlertRuleController.create` | 403 |
|
||
| `max_total_cpu_millis` | `DeploymentExecutor.PRE_FLIGHT` (sum across non-stopped deploys + new) | Deploy fails fast at PRE_FLIGHT, status FAILED, audit row |
|
||
| `max_total_memory_mb` | same | same |
|
||
| `max_total_replicas` | same | same |
|
||
| `max_execution_retention_days` | `EnvironmentService.update` (per-env field, see §4.2) + `RetentionPolicyApplier` (see §4.3) | 422 on update; ClickHouse TTL recomputed on every license change |
|
||
| `max_log_retention_days` | same | same |
|
||
| `max_metric_retention_days` | same | same |
|
||
| `max_jar_retention_count` | `EnvironmentAdminController.PUT /jar-retention` | 422 |
|
||
|
||
### 4.2 Per-environment retention fields
|
||
|
||
Three new columns on `environments` (Flyway V2):
|
||
|
||
```sql
|
||
ALTER TABLE environments
|
||
ADD COLUMN execution_retention_days INTEGER NOT NULL DEFAULT 1,
|
||
ADD COLUMN log_retention_days INTEGER NOT NULL DEFAULT 1,
|
||
ADD COLUMN metric_retention_days INTEGER NOT NULL DEFAULT 1;
|
||
```
|
||
|
||
These are the configured per-env values. The effective ClickHouse TTL is
|
||
`min(licenseCap, configured)`. Admin UI surfaces the configured values;
|
||
`EnvironmentService.update` rejects values above the license cap with 422.
|
||
|
||
### 4.3 Runtime retention recompute
|
||
|
||
`RetentionPolicyApplier` is `@EventListener(LicenseChangedEvent)`:
|
||
|
||
- Triggered on every `LicenseService.replace(...)` (boot install, env-var override, file
|
||
override, POST `/admin/license`) **and** on every state transition the revalidation job
|
||
detects (e.g. license becomes `EXPIRED`, caps drop to default).
|
||
- Recomputes the effective TTL per env (`min(licenseCap, configured)`), then issues
|
||
`ALTER TABLE … MODIFY TTL …` on the affected ClickHouse tables (executions, processors,
|
||
logs, metrics, route_diagrams, agent_events). One ALTER per table per affected env.
|
||
- Errors are logged WARN; a failed ALTER does not block the license install — the operator can
|
||
retry by reposting the license. The previous TTL keeps applying until the next successful
|
||
ALTER.
|
||
- At boot, `LicenseService.loadInitial(...)` publishes one `LicenseChangedEvent` after the
|
||
load order in §6.2 settles, so the boot path goes through the same applier as runtime
|
||
changes.
|
||
|
||
Result: a server that stays up for months and lands in `EXPIRED` will see ClickHouse TTLs
|
||
collapse to default-tier values automatically — no restart needed.
|
||
|
||
### 4.4 Boot-time invariant
|
||
|
||
If a license is added that *lowers* a cap below current usage (10 apps, license now allows 5), the
|
||
server logs one WARN per limit at boot. **No deletion**. New creates reject; existing resources
|
||
keep working.
|
||
|
||
---
|
||
|
||
## 5. Usage endpoint
|
||
|
||
`GET /api/v1/admin/license/usage` (ADMIN only):
|
||
|
||
```json
|
||
{
|
||
"state": "ACTIVE",
|
||
"expiresAt": "2027-04-25T00:00:00Z",
|
||
"daysRemaining": 365,
|
||
"gracePeriodDays": 30,
|
||
"tenantId": "acme-corp",
|
||
"label": "ACME prod 2026",
|
||
"lastValidatedAt": "2026-04-26T03:14:07Z",
|
||
"message": "License active. 365 days remaining.",
|
||
"limits": [
|
||
{"key": "max_apps", "current": 7, "cap": 50, "source": "license"},
|
||
{"key": "max_agents", "current": 12, "cap": 100, "source": "license"},
|
||
{"key": "max_total_cpu_millis", "current": 8500, "cap": 32000, "source": "license"},
|
||
{"key": "max_outbound_connections", "current": 0, "cap": 1, "source": "default"}
|
||
]
|
||
}
|
||
```
|
||
|
||
`source` is `"default"` when the cap comes from `DefaultTierLimits` (i.e. the license omits this
|
||
key, or there is no license), and `"license"` when the cap is explicit in the license. Drives the
|
||
SaaS UI's "free tier" badge.
|
||
|
||
`message` carries the same human-readable explanation that the 403 body uses, varying by state:
|
||
|
||
- `ABSENT` — "No license installed. Default tier applies."
|
||
- `ACTIVE` — "License active. {n} days remaining."
|
||
- `GRACE` — "License expired {n} days ago. Grace period ends in {m} days. Renew now to avoid degradation."
|
||
- `EXPIRED`— "License expired {n} days ago. System reverted to default tier."
|
||
- `INVALID`— "License rejected: {reason}. Default tier applies. Fix the license to recover."
|
||
|
||
`LicenseUsageReader` issues one cheap aggregate per limit (`SELECT COUNT(*)` per entity table; a
|
||
single grouped `SELECT SUM(replicas * cpuMillis), SUM(replicas * memoryMb), SUM(replicas)` over
|
||
non-stopped deployments).
|
||
|
||
`GET /api/v1/admin/license` (existing) is extended to return `{state, envelope, lastValidatedAt}`
|
||
with the raw token omitted from the response.
|
||
|
||
---
|
||
|
||
## 6. Lifecycle, persistence, install paths
|
||
|
||
### 6.1 Storage
|
||
|
||
Flyway V2 migration:
|
||
|
||
```sql
|
||
CREATE TABLE license (
|
||
tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant)
|
||
token TEXT NOT NULL, -- full signed token
|
||
license_id UUID NOT NULL,
|
||
installed_at TIMESTAMPTZ NOT NULL,
|
||
installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot
|
||
expires_at TIMESTAMPTZ NOT NULL,
|
||
last_validated_at TIMESTAMPTZ NOT NULL -- updated by boot, install, and revalidation job
|
||
);
|
||
```
|
||
|
||
`last_validated_at` is the timestamp of the most recent **successful** signature/parse round-trip
|
||
against the current public key. Useful for troubleshooting "why did my license stop working" — a
|
||
stale `last_validated_at` next to a recent `now` is a strong signal that revalidation is failing
|
||
and the operator should check the public key.
|
||
|
||
### 6.2 Boot order
|
||
|
||
`LicenseBeanConfig`:
|
||
|
||
1. If `CAMELEER_SERVER_LICENSE_TOKEN` env var is set → validate → write to DB (overwrite,
|
||
sets `last_validated_at = now`) → load.
|
||
2. Else if `CAMELEER_SERVER_LICENSE_FILE` is set → read file → validate → write to DB → load.
|
||
3. Else read `license` row from DB → validate → on success update `last_validated_at = now` →
|
||
load.
|
||
4. Else `ABSENT`.
|
||
|
||
After step 1–3 the service publishes one `LicenseChangedEvent` so the retention applier and
|
||
metrics gauges initialise off the same code path as runtime changes.
|
||
|
||
Env-var / file act as **idempotent overrides** — they always win and replace the DB row, so the
|
||
operator's last action survives reboots.
|
||
|
||
### 6.3 Runtime install
|
||
|
||
`POST /api/v1/admin/license { "token": "..." }` (existing):
|
||
- Validates against the configured public key.
|
||
- On success, persists to `license` table (`installed_by = user_id`, `last_validated_at = now`),
|
||
updates the in-memory `LicenseGate`, publishes `LicenseChangedEvent`, audits.
|
||
- On failure, returns 400 with the validator error message and audits the rejection.
|
||
Server transitions to `INVALID` state if a previously-loaded license was replaced; otherwise
|
||
remains in its prior state (the rejected token is *not* written to DB).
|
||
|
||
### 6.4 Public key custody
|
||
|
||
`CAMELEER_SERVER_LICENSE_PUBLICKEY` (existing) remains the only verification key. Build- /
|
||
deploy-time secret bound to the vendor distribution. **Not stored in DB.** If unset *and* a
|
||
license is present → reject all licenses (existing behaviour) → `INVALID` state.
|
||
|
||
### 6.5 Audit trail
|
||
|
||
New `AuditCategory.LICENSE`. Actions:
|
||
|
||
| Action | When | Payload |
|
||
|---|---|---|
|
||
| `install_license` | First successful install in an empty state | `{licenseId, expiresAt, installedBy, source}` (`source` = `env`/`file`/`api`) |
|
||
| `replace_license` | Successful install over an existing license | same + `previousLicenseId` |
|
||
| `reject_license` | Validation failed (signature, tenant, parse, public key missing) | `{reason, source}` |
|
||
| `revalidate_license` | Daily job result, on **failure only** | `{licenseId, reason}` |
|
||
| `cap_exceeded` | Any `LicenseCapExceededException` | `{limit, current, cap, requestedBy, state}` |
|
||
|
||
### 6.6 Daily revalidation job
|
||
|
||
`LicenseRevalidationJob`:
|
||
- `@Scheduled(cron = "0 0 3 * * *")` (03:00 server local time) plus an immediate run 60s
|
||
after boot.
|
||
- Reads the DB token, re-runs `LicenseValidator.validate(token)` against the current public
|
||
key.
|
||
- On success: `UPDATE license SET last_validated_at = now WHERE tenant_id = ?`.
|
||
- On failure (e.g. operator rotated the public key without reinstalling the license, or DB
|
||
row was tampered with directly): transition state to `INVALID`, publish
|
||
`LicenseChangedEvent` (so retention recomputes too), audit `revalidate_license` with the
|
||
reason, log `ERROR`.
|
||
- Cheap (no I/O beyond one DB read + one DB write); safe to run frequently. 03:00 is chosen
|
||
to coincide with off-peak so the WARN noise lands when humans aren't deploying.
|
||
|
||
---
|
||
|
||
## 7. Minter
|
||
|
||
### 7.1 `LicenseMinter` (library)
|
||
|
||
Pure function, packaged in `cameleer-license-minter`:
|
||
|
||
```java
|
||
public final class LicenseMinter {
|
||
public static String mint(LicenseInfo info, PrivateKey ed25519PrivateKey);
|
||
}
|
||
```
|
||
|
||
Serializes `LicenseInfo` to canonical JSON (sorted keys), signs the bytes with Ed25519, returns
|
||
`base64(payload).base64(signature)`. cameleer-saas calls this directly to mint per-tenant tokens.
|
||
|
||
### 7.2 `LicenseMinterCli` (CLI)
|
||
|
||
```bash
|
||
java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \
|
||
--private-key=/secure/vendor.key \
|
||
--public-key=/secure/vendor.pub \
|
||
--tenant=acme-corp \
|
||
--label="ACME prod 2026" \
|
||
--expires=2027-04-25 \
|
||
--grace-days=30 \
|
||
--max-apps=50 \
|
||
--max-agents=100 \
|
||
--max-total-cpu-millis=32000 \
|
||
--max-total-memory-mb=65536 \
|
||
--max-execution-retention-days=90 \
|
||
--output=acme-license.tok \
|
||
--verify
|
||
```
|
||
|
||
- `--private-key` reads a PEM-encoded Ed25519 private key (output of
|
||
`openssl genpkey -algorithm ed25519`).
|
||
- `--public-key` *(used only with `--verify`)* reads the matching public key. Required when
|
||
`--verify` is set; ignored otherwise.
|
||
- Unspecified `--max-*` flags are omitted from the payload — the license inherits the default for
|
||
that key.
|
||
- Unknown flags fail fast.
|
||
- `--output` writes the token; if omitted, prints to stdout.
|
||
- `--verify` round-trips the freshly-minted token through `LicenseValidator` against
|
||
`--public-key` *after* writing the output file. This catches:
|
||
- corruption between `String → file` write,
|
||
- wrong-key pairing (vendor accidentally pointed `--public-key` at a different keypair's
|
||
public half),
|
||
- signature mismatch from a buggy build of the minter.
|
||
On verify failure the CLI exits non-zero, prints the validator error, and (if `--output` was
|
||
written) deletes the output file so the bad token does not get shipped.
|
||
|
||
Keypair generation is **out of band** — vendor uses `openssl` and stores both halves in their
|
||
secret manager. We deliberately do not ship a `--gen-keypair` subcommand to keep the boundary
|
||
clean.
|
||
|
||
---
|
||
|
||
## 8. Telemetry
|
||
|
||
Prometheus gauges scraped via `/api/v1/prometheus`:
|
||
|
||
| Metric | Labels | Notes |
|
||
|---|---|---|
|
||
| `cameleer_license_state` | `state="ABSENT|ACTIVE|GRACE|EXPIRED|INVALID"` | Boolean — exactly one is 1. |
|
||
| `cameleer_license_days_remaining` | (none) | Negative in GRACE/EXPIRED. |
|
||
| `cameleer_license_limit_utilisation`| `limit="max_apps"` etc. | `current / cap`, in `[0, 1+]`. |
|
||
| `cameleer_license_cap_rejections_total` | `limit="..."` | Counter. |
|
||
| `cameleer_license_last_validated_age_seconds` | (none) | `now - last_validated_at`. Spikes if the daily revalidation job is failing. |
|
||
|
||
State-transition log lines: `INFO` on install/ACTIVE, `WARN` on GRACE, `ERROR` on EXPIRED,
|
||
`ERROR` on INVALID, `WARN` on cap reject (sampled to avoid log spam).
|
||
|
||
Recommended alert (in cameleer-saas Grafana, not shipped with the server): page on
|
||
`cameleer_license_state{state="INVALID"} == 1` for > 5 minutes.
|
||
|
||
---
|
||
|
||
## 9. Dead-code removal
|
||
|
||
Performed in the **first commit** of the implementation. Per the project's "no backwards
|
||
compatibility shims" preference, no deprecated path or feature flag.
|
||
|
||
- Delete `Feature.java`.
|
||
- Delete `LicenseGate.isEnabled(Feature)`.
|
||
- Delete `LicenseInfo.features` field, `LicenseInfo.hasFeature(Feature)`.
|
||
- Delete `LicenseGateTest.withLicense_onlyLicensedFeaturesEnabled` and `LicenseInfo.open()`'s
|
||
`Set.of(Feature.values())` assertion.
|
||
- Update `LicenseValidator` to ignore `features` if present in old tokens (silently dropped,
|
||
not an error).
|
||
|
||
---
|
||
|
||
## 10. Testing
|
||
|
||
| Layer | Tests |
|
||
|---|---|
|
||
| Core unit | `LicenseValidatorTest` — signature, expiry, tenant mismatch, missing required fields (`tenantId`, `licenseId`, `iat`, `exp`), unknown extra fields. |
|
||
| Core unit | `LicenseStateMachineTest` — all five transitions including grace boundary, replace from any state, invalid install routes to `INVALID`, valid install from `INVALID` recovers to `ACTIVE`. |
|
||
| Core unit | `DefaultTierLimitsTest` — every documented key has a default. |
|
||
| Minter unit | `LicenseMinterTest` — round-trip with a throwaway Ed25519 keypair. Canonical JSON is stable across runs. |
|
||
| Minter CLI | `LicenseMinterCliTest` — invokes `main` with `--private-key=tmp` and checks output token validates; `--verify` happy path; `--verify` failure path deletes the output file and exits non-zero. |
|
||
| App unit | `LicenseEnforcerTest` — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default, message text varies per state. |
|
||
| App unit | `RetentionPolicyApplierTest` — license-changed event recomputes effective TTL per env; failed ALTER logs WARN and does not throw. |
|
||
| App integration | `LicenseLifecycleIT` — install via env, replace via POST, restart restores from DB, public-key removal at runtime transitions to `INVALID`, daily revalidation job updates `last_validated_at`. Driven through REST. |
|
||
| App integration | `LicenseEnforcementIT` — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes `cap_exceeded` audit row check and verifies the 403 body's `message` field matches the state. |
|
||
| App integration | `RetentionRuntimeRecomputeIT` — install license with `max_log_retention_days=30`, observe `logs` TTL ALTER fires; replace with `max_log_retention_days=7`, observe TTL drops to 7 without restart. |
|
||
| Boot | `SchemaBootstrapIT` extension — `license` table exists with `last_validated_at`, `environments` retention columns exist, retention pinning honoured at boot. |
|
||
|
||
No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or env var.
|
||
|
||
---
|
||
|
||
## 11. Open follow-ups (deliberately deferred)
|
||
|
||
- Ingestion-rate limits (`max_executions_per_minute`, `max_logs_per_minute`).
|
||
- Online revocation callback (the `revocation_check_url` envelope field).
|
||
- Concurrent debug session limit (`max_concurrent_debug_sessions` from the SaaS epic).
|
||
- A "license usage history" report for vendors to see growth over time.
|
||
- Open a tracking issue on `cameleer/cameleer-server` (Gitea) — none exists today.
|
||
|
||
---
|
||
|
||
## 12. Risk register
|
||
|
||
| Risk | Mitigation |
|
||
|---|---|
|
||
| Default tier so tight that an honest evaluator cannot try the product. | Defaults documented; vendor can ship a longer-`exp` "trial" license at install time if needed. |
|
||
| Customer lowers `gracePeriodDays` field by editing token. | Token is signed; any edit invalidates the signature. |
|
||
| License removed from DB out of band, server lands in ABSENT and rejects new resources but old ones are above default tier. | Boot-time WARN per over-cap limit. UI banner in the admin license page. No auto-deletion. |
|
||
| Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. Daily revalidation job catches a rotation that wasn't paired with a reinstall (state → `INVALID`, alertable). |
|
||
| Compute cap arithmetic relies on `cpuLimit` and `memoryLimitMb` being set on every container. | Existing `ResolvedContainerConfig` already enforces these; `DeploymentExecutor.PRE_FLIGHT` rejects deploys with unset compute fields. |
|
||
| Per-env retention column added but old ClickHouse partitions retain longer. | Documented: TTL change is honoured by ClickHouse on its next merge cycle. New rows inserted always honour the new TTL. |
|
||
| `RetentionPolicyApplier` issues blocking ALTERs from the event listener thread. | Applier runs ALTERs serialised but on a separate executor (not the publisher thread) so a slow ClickHouse does not stall the install API call. License install API returns immediately with the new state; retention recompute completes asynchronously and is observable via metrics. |
|