2026-04-25 21:55:18 +02:00
# License Enforcement — Design
**Date:** 2026-04-25
**Status:** Approved (brainstorm); pending writing-plans
**Related:** cameleer-saas#7 (Epic: License & Feature Gating), cameleer-saas#42 (vendor minting), cameleer-saas#50 (customer license view)
## Problem
`cameleer-server` ships a license skeleton (`LicenseValidator` , `LicenseGate` , admin endpoint) but
nothing enforces anything. Open mode (no license configured) currently grants * all * features and
*no* limits — the opposite of what we want for a self-hosted distribution that needs to gate scale
behind a paid license.
We want:
1. A self-hosted server with **no license ** to operate within a small, hard-coded "default tier"
that is enough to evaluate the product but not enough to run it in production.
2. Licenses to express **arbitrary per-customer limits ** (no fixed tiers) on a vendor-defined set
of resources: entity counts, compute footprint, retention.
3. A **standalone minter ** owned by the vendor that signs licenses with an Ed25519 private key the
customer never sees.
4. Licenses to be **persisted ** on the server, **installable ** via env var, file, or admin POST,
and **renewable ** by replacement.
5. **Revocation ** handled out of band (vendor suspends the SaaS tenant, or issues short-`exp`
licenses) — no online revocation callback in v1.
## Non-goals
- Feature flags. The current `Feature` enum (topology/lineage/correlation/debugger/replay) is dead
scaffolding and gets removed; this design is about quantitative limits only.
- Ingestion-rate limits (executions/minute, logs/minute). Defer to a follow-up.
- Online revocation. Vendor uses shorter `exp` + reissue; SaaS suspension is independent.
- Auto-deletion of resources when caps are lowered. Existing rows stay; only new creates reject.
- Minter keypair generation tooling. Vendor uses standard `openssl genpkey -algorithm ed25519`
out of band.
---
## 1. Architecture
### 1.1 Module layout
```
cameleer-server-core/ (existing — pure domain, no Spring)
└── license/
├── LicenseInfo (record — see §2)
├── LicenseLimits (typed wrapper over the limits map)
├── LicenseValidator (existing, payload schema updated)
├── LicenseGate (existing, gutted: no Feature; getLimits() only)
2026-04-26 09:42:16 +02:00
├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED / INVALID)
└── DefaultTierLimits (constant — §3.2 numbers)
2026-04-25 21:55:18 +02:00
cameleer-server-app/ (existing — Spring, web, persistence)
├── license/
│ ├── LicenseRepository (NEW — PostgreSQL persistence)
2026-04-26 09:42:16 +02:00
│ ├── LicenseService (NEW — load/save/replace; publishes LicenseChangedEvent)
2026-04-25 21:55:18 +02:00
│ ├── LicenseEnforcer (NEW — assertWithinCap entry point)
│ ├── LicenseUsageReader (NEW — counts current usage for /usage endpoint)
│ ├── LicenseCapExceededException (NEW — mapped to 403 by ControllerAdvice)
2026-04-26 09:42:16 +02:00
│ ├── LicenseRevalidationJob (NEW — @Scheduled daily; updates last_validated_at)
│ ├── RetentionPolicyApplier (NEW — @EventListener (LicenseChangedEvent); recomputes ClickHouse TTL + per-env caps)
2026-04-25 21:55:18 +02:00
│ └── LicenseMetrics (NEW — Prometheus gauges)
├── controller/
│ ├── LicenseAdminController (existing — extended; persists, audited)
│ └── LicenseUsageController (NEW — GET /admin/license/usage)
└── config/
└── LicenseBeanConfig (existing — extended for DB load order)
cameleer-license-minter/ (NEW — top-level Maven module)
├── pom.xml (depends on cameleer-server-core)
├── LicenseMinter (signing primitive; takes private key + LicenseInfo)
2026-04-26 09:42:16 +02:00
└── cli/LicenseMinterCli (CLI main class, supports --verify)
2026-04-25 21:55:18 +02:00
```
### 1.2 Why a separate `cameleer-license-minter` module
Not shipped in the runtime JAR. Vendor distributes it independently or builds it from source on a
trusted machine. Customers never receive it.
This is module hygiene + smaller runtime attack surface, not a cryptographic protection — license
forgery requires the vendor's private key, and the public key in the server is enough to verify
forged tokens regardless of where the minter code lives.
### 1.3 Dependency graph
```
cameleer-license-minter ──▶ cameleer-server-core (LicenseInfo schema only)
cameleer-server-app ──▶ cameleer-server-core (validator, gate, FSM, defaults)
cameleer-saas ──▶ cameleer-license-minter (for SaaS-mode minting)
cameleer-saas ──▶ cameleer-server-core (transitive)
```
`cameleer-server-app` has **no ** dependency on `cameleer-license-minter` .
---
## 2. License envelope
Wire format unchanged: `base64(payload).base64(ed25519_signature)` . Payload schema:
```json
{
"licenseId": "550e8400-e29b-41d4-a716-446655440000",
"tenantId": "acme-corp",
2026-04-26 09:42:16 +02:00
"label": "ACME prod 2026 — site:hamburg",
2026-04-25 21:55:18 +02:00
"iat": 1745539200,
"exp": 1777075200,
"gracePeriodDays": 30,
"limits": {
"max_environments": 5,
"max_apps": 50,
"max_agents": 100,
"max_users": 25,
"max_outbound_connections": 10,
"max_alert_rules": 200,
"max_total_cpu_millis": 32000,
"max_total_memory_mb": 65536,
"max_total_replicas": 100,
"max_execution_retention_days": 90,
"max_log_retention_days": 30,
"max_metric_retention_days": 365,
"max_jar_retention_count": 10
}
}
```
### 2.1 Field rules
| Field | Required | Notes |
|---|---|---|
| `licenseId` | yes | UUID. Used in audit + future revocation. |
2026-04-26 09:42:16 +02:00
| `tenantId` | **yes ** | Must match `CAMELEER_SERVER_TENANT_ID` . Mismatch = `INVALID` state (see §3). The field is inside the signed payload, so a self-hosted customer cannot strip it to make a license portable across tenants — any edit invalidates the signature. Air-gapped customers receive a license bound to a vendor-issued tenant id (not necessarily a UUID — any non-empty slug). |
2026-04-25 21:55:18 +02:00
| `label` | optional | Free-form human description. Surfaced in UI. |
| `iat` | yes | Unix seconds. |
| `exp` | yes | Unix seconds. |
| `gracePeriodDays` | optional, default `0` | Days `exp` may be in the past while limits still apply. |
| `limits.*` | each optional | Missing key inherits from `DefaultTierLimits` . A license can lift any subset. |
### 2.2 Removed from the current envelope
- `tier` (string) — was a non-functional label. Folded into `label` .
- `features` (array) — out of scope. `Feature` enum deleted.
---
## 3. License state machine
```
exp + grace passes
┌─────────┐ install valid ┌────────┐ exp ┌────────┐ ────────► ┌─────────┐
│ ABSENT │ ───────────────▶│ ACTIVE │──────▶│ GRACE │ │ EXPIRED │
└─────────┘ └────────┘ └────────┘ └─────────┘
▲ │ │ ▲ │
2026-04-26 09:42:16 +02:00
│ │ replace │ │ replace valid │ replace
│ ▼ │ │ ▼
│ ┌─────────┐ └──────────────┴─┴───────────────────┘
└──┤ INVALID │ ──── replace valid ────────────────────────────────▶ ACTIVE
└─────────┘
▲
│ install fails (signature / tenant / parse / public-key-missing)
2026-04-25 21:55:18 +02:00
all transitions persist + audit-log
```
### 3.1 State semantics
2026-04-26 09:42:16 +02:00
| State | Effective limits | Trigger | Severity |
|--- |--- |--- |--- |
| `ABSENT` | `DefaultTierLimits` | No DB row. Clean install with no license configured. | INFO |
| `ACTIVE` | `merge(default, license.limits)` | License loaded, `now < exp` . | INFO |
| `GRACE` | Same as `ACTIVE` | `exp ≤ now < exp + gracePeriodDays` . UI warning banner. | WARN |
| `EXPIRED` | `DefaultTierLimits` | `now ≥ exp + gracePeriodDays` . UI label distinct from ABSENT. | ERROR |
| `INVALID` | `DefaultTierLimits` | Signature failure, tenant mismatch, parse error, or public key not configured but a token is present. | **ERROR — loud ** |
`ABSENT` and `INVALID` produce the same enforcement (default tier) but are surfaced very
differently:
- **`ABSENT` ** is a clean state — fresh install, no license yet. UI shows a calm "Install a
license to lift the default-tier caps" call to action. No audit row beyond the boot log line.
- **`INVALID` ** is an active error — tampering, wrong public key, or a paste that lost
characters. UI shows a red banner with the validator's error message
(e.g. "License signature verification failed", "License tenantId 'acme-corp' does not match
server tenant 'beta-corp'"). Audit row written under
`AuditCategory.LICENSE` action `reject_license` . Prometheus
`cameleer_license_state{state="INVALID"} = 1` so an alert can fire.
State is recomputed on every limit check (clock comparison only against parsed in-memory
`LicenseInfo` ) — no scheduler needed for `ACTIVE → GRACE → EXPIRED` transitions. A separate
**daily revalidation job** (§6.6) re-runs the signature check against the DB row to catch slow
failures like public-key rotation drift.
2026-04-25 21:55:18 +02:00
### 3.2 Default tier (the "no license" caps)
| Limit | Default |
|---|---|
| `max_environments` | 1 |
| `max_apps` | 3 |
| `max_agents` | 5 |
| `max_users` | 3 |
| `max_outbound_connections` | 1 |
| `max_alert_rules` | 2 |
| `max_total_cpu_millis` | 2000 (2 cores) |
| `max_total_memory_mb` | 2048 (2 GB) |
| `max_total_replicas` | 5 |
| `max_execution_retention_days` | 1 |
| `max_log_retention_days` | 1 |
| `max_metric_retention_days` | 1 |
| `max_jar_retention_count` | 3 |
Encoded as `public static final Map<String, Integer> DEFAULTS` in `DefaultTierLimits` . Keys
match the license payload exactly.
---
## 4. Enforcement map
Every limit check goes through one method on `LicenseEnforcer` :
```java
void assertWithinCap(String limitKey, long currentUsage, long requestedDelta);
```
Throws `LicenseCapExceededException(limitKey, current, cap)` when `currentUsage + requestedDelta > cap` .
2026-04-26 09:42:16 +02:00
A `@ControllerAdvice` maps it to `403` with a body that explains the "why" so operators can act
without grepping logs:
```json
{
"error": "license cap reached",
"limit": "max_apps",
"current": 3,
"cap": 3,
"state": "EXPIRED",
"message": "License expired 5 days ago: system reverted to default tier (3 apps). Current usage is 3. Install or renew the license to create more apps."
}
```
The `message` field is rendered server-side from a small template per state:
| State | Message template |
|--- |---|
| `ABSENT` | "No license installed: default tier applies (cap = N for {limit}). Install a license to raise this." |
| `ACTIVE` | "License cap reached: {limit} = {cap}. Current usage is {current}. Contact your vendor to raise the cap." |
| `GRACE` | "License expired {n} day(s) ago and is in its grace period (ends in {m} days). Cap unchanged at {cap}. Renew before grace ends." |
| `EXPIRED` | "License expired {n} days ago: system reverted to default tier (cap = N for {limit}). Current usage is {current}. Renew the license to lift the cap." |
| `INVALID` | "License rejected ({reason}): default tier applies (cap = N for {limit}). Fix the license to raise this." |
### 4.1 Per-limit call sites
2026-04-25 21:55:18 +02:00
| Limit | Call site | Failure response |
|---|---|---|
| `max_environments` | `EnvironmentService.create` (start) | 403 |
| `max_apps` | `AppService.createApp` | 403 |
| `max_agents` | `AgentRegistryService.register` | 403 — agent treated as unregistered (no SSE, no commands) |
| `max_users` | `UserAdminController.createUser` and `OidcAuthController.callback` (auto-signup) | 403 / OIDC login failure |
| `max_outbound_connections` | `OutboundConnectionServiceImpl.create` | 403 |
| `max_alert_rules` | `AlertRuleController.create` | 403 |
| `max_total_cpu_millis` | `DeploymentExecutor.PRE_FLIGHT` (sum across non-stopped deploys + new) | Deploy fails fast at PRE_FLIGHT, status FAILED, audit row |
| `max_total_memory_mb` | same | same |
| `max_total_replicas` | same | same |
2026-04-26 09:42:16 +02:00
| `max_execution_retention_days` | `EnvironmentService.update` (per-env field, see §4.2) + `RetentionPolicyApplier` (see §4.3) | 422 on update; ClickHouse TTL recomputed on every license change |
2026-04-25 21:55:18 +02:00
| `max_log_retention_days` | same | same |
| `max_metric_retention_days` | same | same |
| `max_jar_retention_count` | `EnvironmentAdminController.PUT /jar-retention` | 422 |
2026-04-26 09:42:16 +02:00
### 4.2 Per-environment retention fields
2026-04-25 21:55:18 +02:00
Three new columns on `environments` (Flyway V2):
```sql
ALTER TABLE environments
ADD COLUMN execution_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN log_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN metric_retention_days INTEGER NOT NULL DEFAULT 1;
```
These are the configured per-env values. The effective ClickHouse TTL is
2026-04-26 09:42:16 +02:00
`min(licenseCap, configured)` . Admin UI surfaces the configured values;
`EnvironmentService.update` rejects values above the license cap with 422.
2026-04-25 21:55:18 +02:00
2026-04-26 09:42:16 +02:00
### 4.3 Runtime retention recompute
`RetentionPolicyApplier` is `@EventListener(LicenseChangedEvent)` :
- Triggered on every `LicenseService.replace(...)` (boot install, env-var override, file
override, POST `/admin/license` ) **and ** on every state transition the revalidation job
detects (e.g. license becomes `EXPIRED` , caps drop to default).
- Recomputes the effective TTL per env (`min(licenseCap, configured)` ), then issues
`ALTER TABLE … MODIFY TTL …` on the affected ClickHouse tables (executions, processors,
logs, metrics, route_diagrams, agent_events). One ALTER per table per affected env.
- Errors are logged WARN; a failed ALTER does not block the license install — the operator can
retry by reposting the license. The previous TTL keeps applying until the next successful
ALTER.
- At boot, `LicenseService.loadInitial(...)` publishes one `LicenseChangedEvent` after the
load order in §6.2 settles, so the boot path goes through the same applier as runtime
changes.
Result: a server that stays up for months and lands in `EXPIRED` will see ClickHouse TTLs
collapse to default-tier values automatically — no restart needed.
### 4.4 Boot-time invariant
2026-04-25 21:55:18 +02:00
If a license is added that * lowers * a cap below current usage (10 apps, license now allows 5), the
server logs one WARN per limit at boot. **No deletion ** . New creates reject; existing resources
keep working.
---
## 5. Usage endpoint
`GET /api/v1/admin/license/usage` (ADMIN only):
```json
{
"state": "ACTIVE",
"expiresAt": "2027-04-25T00:00:00Z",
"daysRemaining": 365,
"gracePeriodDays": 30,
"tenantId": "acme-corp",
"label": "ACME prod 2026",
2026-04-26 09:42:16 +02:00
"lastValidatedAt": "2026-04-26T03:14:07Z",
"message": "License active. 365 days remaining.",
2026-04-25 21:55:18 +02:00
"limits": [
{"key": "max_apps", "current": 7, "cap": 50, "source": "license"},
{"key": "max_agents", "current": 12, "cap": 100, "source": "license"},
{"key": "max_total_cpu_millis", "current": 8500, "cap": 32000, "source": "license"},
{"key": "max_outbound_connections", "current": 0, "cap": 1, "source": "default"}
]
}
```
`source` is `"default"` when the cap comes from `DefaultTierLimits` (i.e. the license omits this
key, or there is no license), and `"license"` when the cap is explicit in the license. Drives the
SaaS UI's "free tier" badge.
2026-04-26 09:42:16 +02:00
`message` carries the same human-readable explanation that the 403 body uses, varying by state:
- `ABSENT` — "No license installed. Default tier applies."
- `ACTIVE` — "License active. {n} days remaining."
- `GRACE` — "License expired {n} days ago. Grace period ends in {m} days. Renew now to avoid degradation."
- `EXPIRED` — "License expired {n} days ago. System reverted to default tier."
- `INVALID` — "License rejected: {reason}. Default tier applies. Fix the license to recover."
2026-04-25 21:55:18 +02:00
`LicenseUsageReader` issues one cheap aggregate per limit (`SELECT COUNT(*)` per entity table; a
single grouped `SELECT SUM(replicas * cpuMillis), SUM(replicas * memoryMb), SUM(replicas)` over
non-stopped deployments).
2026-04-26 09:42:16 +02:00
`GET /api/v1/admin/license` (existing) is extended to return `{state, envelope, lastValidatedAt}`
with the raw token omitted from the response.
2026-04-25 21:55:18 +02:00
---
## 6. Lifecycle, persistence, install paths
### 6.1 Storage
Flyway V2 migration:
```sql
CREATE TABLE license (
2026-04-26 09:42:16 +02:00
tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant)
token TEXT NOT NULL, -- full signed token
license_id UUID NOT NULL,
installed_at TIMESTAMPTZ NOT NULL,
installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot
expires_at TIMESTAMPTZ NOT NULL,
last_validated_at TIMESTAMPTZ NOT NULL -- updated by boot, install, and revalidation job
2026-04-25 21:55:18 +02:00
);
```
2026-04-26 09:42:16 +02:00
`last_validated_at` is the timestamp of the most recent **successful ** signature/parse round-trip
against the current public key. Useful for troubleshooting "why did my license stop working" — a
stale `last_validated_at` next to a recent `now` is a strong signal that revalidation is failing
and the operator should check the public key.
2026-04-25 21:55:18 +02:00
### 6.2 Boot order
`LicenseBeanConfig` :
2026-04-26 09:42:16 +02:00
1. If `CAMELEER_SERVER_LICENSE_TOKEN` env var is set → validate → write to DB (overwrite,
sets `last_validated_at = now` ) → load.
2026-04-25 21:55:18 +02:00
2. Else if `CAMELEER_SERVER_LICENSE_FILE` is set → read file → validate → write to DB → load.
2026-04-26 09:42:16 +02:00
3. Else read `license` row from DB → validate → on success update `last_validated_at = now` →
load.
2026-04-25 21:55:18 +02:00
4. Else `ABSENT` .
2026-04-26 09:42:16 +02:00
After step 1– 3 the service publishes one `LicenseChangedEvent` so the retention applier and
metrics gauges initialise off the same code path as runtime changes.
2026-04-25 21:55:18 +02:00
Env-var / file act as **idempotent overrides ** — they always win and replace the DB row, so the
operator's last action survives reboots.
### 6.3 Runtime install
`POST /api/v1/admin/license { "token": "..." }` (existing):
- Validates against the configured public key.
2026-04-26 09:42:16 +02:00
- On success, persists to `license` table (`installed_by = user_id` , `last_validated_at = now` ),
updates the in-memory `LicenseGate` , publishes `LicenseChangedEvent` , audits.
2026-04-25 21:55:18 +02:00
- On failure, returns 400 with the validator error message and audits the rejection.
2026-04-26 09:42:16 +02:00
Server transitions to `INVALID` state if a previously-loaded license was replaced; otherwise
remains in its prior state (the rejected token is * not * written to DB).
2026-04-25 21:55:18 +02:00
### 6.4 Public key custody
`CAMELEER_SERVER_LICENSE_PUBLICKEY` (existing) remains the only verification key. Build- /
deploy-time secret bound to the vendor distribution. **Not stored in DB. ** If unset * and * a
2026-04-26 09:42:16 +02:00
license is present → reject all licenses (existing behaviour) → `INVALID` state.
2026-04-25 21:55:18 +02:00
### 6.5 Audit trail
New `AuditCategory.LICENSE` . Actions:
| Action | When | Payload |
|---|---|---|
| `install_license` | First successful install in an empty state | `{licenseId, expiresAt, installedBy, source}` (`source` = `env` /`file` /`api` ) |
| `replace_license` | Successful install over an existing license | same + `previousLicenseId` |
| `reject_license` | Validation failed (signature, tenant, parse, public key missing) | `{reason, source}` |
2026-04-26 09:42:16 +02:00
| `revalidate_license` | Daily job result, on **failure only ** | `{licenseId, reason}` |
| `cap_exceeded` | Any `LicenseCapExceededException` | `{limit, current, cap, requestedBy, state}` |
### 6.6 Daily revalidation job
`LicenseRevalidationJob` :
- `@Scheduled(cron = "0 0 3 * * *")` (03:00 server local time) plus an immediate run 60s
after boot.
- Reads the DB token, re-runs `LicenseValidator.validate(token)` against the current public
key.
- On success: `UPDATE license SET last_validated_at = now WHERE tenant_id = ?` .
- On failure (e.g. operator rotated the public key without reinstalling the license, or DB
row was tampered with directly): transition state to `INVALID` , publish
`LicenseChangedEvent` (so retention recomputes too), audit `revalidate_license` with the
reason, log `ERROR` .
- Cheap (no I/O beyond one DB read + one DB write); safe to run frequently. 03:00 is chosen
to coincide with off-peak so the WARN noise lands when humans aren't deploying.
2026-04-25 21:55:18 +02:00
---
## 7. Minter
### 7.1 `LicenseMinter` (library)
Pure function, packaged in `cameleer-license-minter` :
```java
public final class LicenseMinter {
public static String mint(LicenseInfo info, PrivateKey ed25519PrivateKey);
}
```
Serializes `LicenseInfo` to canonical JSON (sorted keys), signs the bytes with Ed25519, returns
`base64(payload).base64(signature)` . cameleer-saas calls this directly to mint per-tenant tokens.
### 7.2 `LicenseMinterCli` (CLI)
```bash
java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \
--private-key=/secure/vendor.key \
2026-04-26 09:42:16 +02:00
--public-key=/secure/vendor.pub \
2026-04-25 21:55:18 +02:00
--tenant=acme-corp \
--label="ACME prod 2026" \
--expires=2027-04-25 \
--grace-days=30 \
--max-apps=50 \
--max-agents=100 \
--max-total-cpu-millis=32000 \
--max-total-memory-mb=65536 \
--max-execution-retention-days=90 \
2026-04-26 09:42:16 +02:00
--output=acme-license.tok \
--verify
2026-04-25 21:55:18 +02:00
```
- `--private-key` reads a PEM-encoded Ed25519 private key (output of
`openssl genpkey -algorithm ed25519` ).
2026-04-26 09:42:16 +02:00
- `--public-key` * (used only with `--verify`) * reads the matching public key. Required when
`--verify` is set; ignored otherwise.
2026-04-25 21:55:18 +02:00
- Unspecified `--max-*` flags are omitted from the payload — the license inherits the default for
that key.
- Unknown flags fail fast.
- `--output` writes the token; if omitted, prints to stdout.
2026-04-26 09:42:16 +02:00
- `--verify` round-trips the freshly-minted token through `LicenseValidator` against
`--public-key` * after * writing the output file. This catches:
- corruption between `String → file` write,
- wrong-key pairing (vendor accidentally pointed `--public-key` at a different keypair's
public half),
- signature mismatch from a buggy build of the minter.
On verify failure the CLI exits non-zero, prints the validator error, and (if `--output` was
written) deletes the output file so the bad token does not get shipped.
2026-04-25 21:55:18 +02:00
Keypair generation is **out of band ** — vendor uses `openssl` and stores both halves in their
secret manager. We deliberately do not ship a `--gen-keypair` subcommand to keep the boundary
clean.
---
## 8. Telemetry
Prometheus gauges scraped via `/api/v1/prometheus` :
| Metric | Labels | Notes |
|---|---|---|
2026-04-26 09:42:16 +02:00
| `cameleer_license_state` | `state="ABSENT|ACTIVE|GRACE|EXPIRED|INVALID"` | Boolean — exactly one is 1. |
2026-04-25 21:55:18 +02:00
| `cameleer_license_days_remaining` | (none) | Negative in GRACE/EXPIRED. |
| `cameleer_license_limit_utilisation` | `limit="max_apps"` etc. | `current / cap` , in `[0, 1+]` . |
| `cameleer_license_cap_rejections_total` | `limit="..."` | Counter. |
2026-04-26 09:42:16 +02:00
| `cameleer_license_last_validated_age_seconds` | (none) | `now - last_validated_at` . Spikes if the daily revalidation job is failing. |
State-transition log lines: `INFO` on install/ACTIVE, `WARN` on GRACE, `ERROR` on EXPIRED,
`ERROR` on INVALID, `WARN` on cap reject (sampled to avoid log spam).
2026-04-25 21:55:18 +02:00
2026-04-26 09:42:16 +02:00
Recommended alert (in cameleer-saas Grafana, not shipped with the server): page on
`cameleer_license_state{state="INVALID"} == 1` for > 5 minutes.
2026-04-25 21:55:18 +02:00
---
## 9. Dead-code removal
Performed in the **first commit ** of the implementation. Per the project's "no backwards
compatibility shims" preference, no deprecated path or feature flag.
- Delete `Feature.java` .
- Delete `LicenseGate.isEnabled(Feature)` .
- Delete `LicenseInfo.features` field, `LicenseInfo.hasFeature(Feature)` .
- Delete `LicenseGateTest.withLicense_onlyLicensedFeaturesEnabled` and `LicenseInfo.open()` 's
`Set.of(Feature.values())` assertion.
- Update `LicenseValidator` to ignore `features` if present in old tokens (silently dropped,
not an error).
---
## 10. Testing
| Layer | Tests |
|---|---|
2026-04-26 09:42:16 +02:00
| Core unit | `LicenseValidatorTest` — signature, expiry, tenant mismatch, missing required fields (`tenantId` , `licenseId` , `iat` , `exp` ), unknown extra fields. |
| Core unit | `LicenseStateMachineTest` — all five transitions including grace boundary, replace from any state, invalid install routes to `INVALID` , valid install from `INVALID` recovers to `ACTIVE` . |
2026-04-25 21:55:18 +02:00
| Core unit | `DefaultTierLimitsTest` — every documented key has a default. |
| Minter unit | `LicenseMinterTest` — round-trip with a throwaway Ed25519 keypair. Canonical JSON is stable across runs. |
2026-04-26 09:42:16 +02:00
| Minter CLI | `LicenseMinterCliTest` — invokes `main` with `--private-key=tmp` and checks output token validates; `--verify` happy path; `--verify` failure path deletes the output file and exits non-zero. |
| App unit | `LicenseEnforcerTest` — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default, message text varies per state. |
| App unit | `RetentionPolicyApplierTest` — license-changed event recomputes effective TTL per env; failed ALTER logs WARN and does not throw. |
| App integration | `LicenseLifecycleIT` — install via env, replace via POST, restart restores from DB, public-key removal at runtime transitions to `INVALID` , daily revalidation job updates `last_validated_at` . Driven through REST. |
| App integration | `LicenseEnforcementIT` — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes `cap_exceeded` audit row check and verifies the 403 body's `message` field matches the state. |
| App integration | `RetentionRuntimeRecomputeIT` — install license with `max_log_retention_days=30` , observe `logs` TTL ALTER fires; replace with `max_log_retention_days=7` , observe TTL drops to 7 without restart. |
| Boot | `SchemaBootstrapIT` extension — `license` table exists with `last_validated_at` , `environments` retention columns exist, retention pinning honoured at boot. |
2026-04-25 21:55:18 +02:00
No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or env var.
---
## 11. Open follow-ups (deliberately deferred)
- Ingestion-rate limits (`max_executions_per_minute` , `max_logs_per_minute` ).
- Online revocation callback (the `revocation_check_url` envelope field).
- Concurrent debug session limit (`max_concurrent_debug_sessions` from the SaaS epic).
- A "license usage history" report for vendors to see growth over time.
- Open a tracking issue on `cameleer/cameleer-server` (Gitea) — none exists today.
---
## 12. Risk register
| Risk | Mitigation |
|---|---|
| Default tier so tight that an honest evaluator cannot try the product. | Defaults documented; vendor can ship a longer-`exp` "trial" license at install time if needed. |
| Customer lowers `gracePeriodDays` field by editing token. | Token is signed; any edit invalidates the signature. |
| License removed from DB out of band, server lands in ABSENT and rejects new resources but old ones are above default tier. | Boot-time WARN per over-cap limit. UI banner in the admin license page. No auto-deletion. |
2026-04-26 09:42:16 +02:00
| Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. Daily revalidation job catches a rotation that wasn't paired with a reinstall (state → `INVALID` , alertable). |
2026-04-25 21:55:18 +02:00
| Compute cap arithmetic relies on `cpuLimit` and `memoryLimitMb` being set on every container. | Existing `ResolvedContainerConfig` already enforces these; `DeploymentExecutor.PRE_FLIGHT` rejects deploys with unset compute fields. |
| Per-env retention column added but old ClickHouse partitions retain longer. | Documented: TTL change is honoured by ClickHouse on its next merge cycle. New rows inserted always honour the new TTL. |
2026-04-26 09:42:16 +02:00
| `RetentionPolicyApplier` issues blocking ALTERs from the event listener thread. | Applier runs ALTERs serialised but on a separate executor (not the publisher thread) so a slow ClickHouse does not stall the install API call. License install API returns immediately with the new state; retention recompute completes asynchronously and is observable via metrics. |