- Add INVALID state to FSM (signature/tenant/parse failure ≠ ABSENT) with loud UI/audit/metric severity; ABSENT stays a calm state. - Make tenantId required in the license envelope (it's already inside the signed payload, so a self-hosted customer cannot strip it). - Move ClickHouse TTL recompute from boot-only to a RetentionPolicyApplier @EventListener(LicenseChangedEvent), so a long-running server that lands in EXPIRED tightens TTL automatically. - Add LicenseRevalidationJob (daily) that re-runs signature check against the DB row and updates last_validated_at; transitions to INVALID on failure (catches public-key rotation drift). - Add last_validated_at column to the license table, surfaced on the /usage endpoint and as cameleer_license_last_validated_age_seconds. - Enrich enforcement-failure responses and the /usage endpoint with a per-state human-readable message so 403s and the UI both explain WHY caps changed. - Add --verify (with --public-key) to the minter CLI to round-trip a freshly-minted token through LicenseValidator before shipping it, deleting the output file on verify failure. - Add corresponding tests, telemetry gauge, and a runtime-recompute IT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
29 KiB
License Enforcement — Design
Date: 2026-04-25 Status: Approved (brainstorm); pending writing-plans Related: cameleer-saas#7 (Epic: License & Feature Gating), cameleer-saas#42 (vendor minting), cameleer-saas#50 (customer license view)
Problem
cameleer-server ships a license skeleton (LicenseValidator, LicenseGate, admin endpoint) but
nothing enforces anything. Open mode (no license configured) currently grants all features and
no limits — the opposite of what we want for a self-hosted distribution that needs to gate scale
behind a paid license.
We want:
- A self-hosted server with no license to operate within a small, hard-coded "default tier" that is enough to evaluate the product but not enough to run it in production.
- Licenses to express arbitrary per-customer limits (no fixed tiers) on a vendor-defined set of resources: entity counts, compute footprint, retention.
- A standalone minter owned by the vendor that signs licenses with an Ed25519 private key the customer never sees.
- Licenses to be persisted on the server, installable via env var, file, or admin POST, and renewable by replacement.
- Revocation handled out of band (vendor suspends the SaaS tenant, or issues short-
explicenses) — no online revocation callback in v1.
Non-goals
- Feature flags. The current
Featureenum (topology/lineage/correlation/debugger/replay) is dead scaffolding and gets removed; this design is about quantitative limits only. - Ingestion-rate limits (executions/minute, logs/minute). Defer to a follow-up.
- Online revocation. Vendor uses shorter
exp+ reissue; SaaS suspension is independent. - Auto-deletion of resources when caps are lowered. Existing rows stay; only new creates reject.
- Minter keypair generation tooling. Vendor uses standard
openssl genpkey -algorithm ed25519out of band.
1. Architecture
1.1 Module layout
cameleer-server-core/ (existing — pure domain, no Spring)
└── license/
├── LicenseInfo (record — see §2)
├── LicenseLimits (typed wrapper over the limits map)
├── LicenseValidator (existing, payload schema updated)
├── LicenseGate (existing, gutted: no Feature; getLimits() only)
├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED / INVALID)
└── DefaultTierLimits (constant — §3.2 numbers)
cameleer-server-app/ (existing — Spring, web, persistence)
├── license/
│ ├── LicenseRepository (NEW — PostgreSQL persistence)
│ ├── LicenseService (NEW — load/save/replace; publishes LicenseChangedEvent)
│ ├── LicenseEnforcer (NEW — assertWithinCap entry point)
│ ├── LicenseUsageReader (NEW — counts current usage for /usage endpoint)
│ ├── LicenseCapExceededException (NEW — mapped to 403 by ControllerAdvice)
│ ├── LicenseRevalidationJob (NEW — @Scheduled daily; updates last_validated_at)
│ ├── RetentionPolicyApplier (NEW — @EventListener(LicenseChangedEvent); recomputes ClickHouse TTL + per-env caps)
│ └── LicenseMetrics (NEW — Prometheus gauges)
├── controller/
│ ├── LicenseAdminController (existing — extended; persists, audited)
│ └── LicenseUsageController (NEW — GET /admin/license/usage)
└── config/
└── LicenseBeanConfig (existing — extended for DB load order)
cameleer-license-minter/ (NEW — top-level Maven module)
├── pom.xml (depends on cameleer-server-core)
├── LicenseMinter (signing primitive; takes private key + LicenseInfo)
└── cli/LicenseMinterCli (CLI main class, supports --verify)
1.2 Why a separate cameleer-license-minter module
Not shipped in the runtime JAR. Vendor distributes it independently or builds it from source on a trusted machine. Customers never receive it.
This is module hygiene + smaller runtime attack surface, not a cryptographic protection — license forgery requires the vendor's private key, and the public key in the server is enough to verify forged tokens regardless of where the minter code lives.
1.3 Dependency graph
cameleer-license-minter ──▶ cameleer-server-core (LicenseInfo schema only)
cameleer-server-app ──▶ cameleer-server-core (validator, gate, FSM, defaults)
cameleer-saas ──▶ cameleer-license-minter (for SaaS-mode minting)
cameleer-saas ──▶ cameleer-server-core (transitive)
cameleer-server-app has no dependency on cameleer-license-minter.
2. License envelope
Wire format unchanged: base64(payload).base64(ed25519_signature). Payload schema:
{
"licenseId": "550e8400-e29b-41d4-a716-446655440000",
"tenantId": "acme-corp",
"label": "ACME prod 2026 — site:hamburg",
"iat": 1745539200,
"exp": 1777075200,
"gracePeriodDays": 30,
"limits": {
"max_environments": 5,
"max_apps": 50,
"max_agents": 100,
"max_users": 25,
"max_outbound_connections": 10,
"max_alert_rules": 200,
"max_total_cpu_millis": 32000,
"max_total_memory_mb": 65536,
"max_total_replicas": 100,
"max_execution_retention_days": 90,
"max_log_retention_days": 30,
"max_metric_retention_days": 365,
"max_jar_retention_count": 10
}
}
2.1 Field rules
| Field | Required | Notes |
|---|---|---|
licenseId |
yes | UUID. Used in audit + future revocation. |
tenantId |
yes | Must match CAMELEER_SERVER_TENANT_ID. Mismatch = INVALID state (see §3). The field is inside the signed payload, so a self-hosted customer cannot strip it to make a license portable across tenants — any edit invalidates the signature. Air-gapped customers receive a license bound to a vendor-issued tenant id (not necessarily a UUID — any non-empty slug). |
label |
optional | Free-form human description. Surfaced in UI. |
iat |
yes | Unix seconds. |
exp |
yes | Unix seconds. |
gracePeriodDays |
optional, default 0 |
Days exp may be in the past while limits still apply. |
limits.* |
each optional | Missing key inherits from DefaultTierLimits. A license can lift any subset. |
2.2 Removed from the current envelope
tier(string) — was a non-functional label. Folded intolabel.features(array) — out of scope.Featureenum deleted.
3. License state machine
exp + grace passes
┌─────────┐ install valid ┌────────┐ exp ┌────────┐ ────────► ┌─────────┐
│ ABSENT │ ───────────────▶│ ACTIVE │──────▶│ GRACE │ │ EXPIRED │
└─────────┘ └────────┘ └────────┘ └─────────┘
▲ │ │ ▲ │
│ │ replace │ │ replace valid │ replace
│ ▼ │ │ ▼
│ ┌─────────┐ └──────────────┴─┴───────────────────┘
└──┤ INVALID │ ──── replace valid ────────────────────────────────▶ ACTIVE
└─────────┘
▲
│ install fails (signature / tenant / parse / public-key-missing)
all transitions persist + audit-log
3.1 State semantics
| State | Effective limits | Trigger | Severity |
|---|---|---|---|
ABSENT |
DefaultTierLimits |
No DB row. Clean install with no license configured. | INFO |
ACTIVE |
merge(default, license.limits) |
License loaded, now < exp. |
INFO |
GRACE |
Same as ACTIVE |
exp ≤ now < exp + gracePeriodDays. UI warning banner. |
WARN |
EXPIRED |
DefaultTierLimits |
now ≥ exp + gracePeriodDays. UI label distinct from ABSENT. |
ERROR |
INVALID |
DefaultTierLimits |
Signature failure, tenant mismatch, parse error, or public key not configured but a token is present. | ERROR — loud |
ABSENT and INVALID produce the same enforcement (default tier) but are surfaced very
differently:
ABSENTis a clean state — fresh install, no license yet. UI shows a calm "Install a license to lift the default-tier caps" call to action. No audit row beyond the boot log line.INVALIDis an active error — tampering, wrong public key, or a paste that lost characters. UI shows a red banner with the validator's error message (e.g. "License signature verification failed", "License tenantId 'acme-corp' does not match server tenant 'beta-corp'"). Audit row written underAuditCategory.LICENSEactionreject_license. Prometheuscameleer_license_state{state="INVALID"} = 1so an alert can fire.
State is recomputed on every limit check (clock comparison only against parsed in-memory
LicenseInfo) — no scheduler needed for ACTIVE → GRACE → EXPIRED transitions. A separate
daily revalidation job (§6.6) re-runs the signature check against the DB row to catch slow
failures like public-key rotation drift.
3.2 Default tier (the "no license" caps)
| Limit | Default |
|---|---|
max_environments |
1 |
max_apps |
3 |
max_agents |
5 |
max_users |
3 |
max_outbound_connections |
1 |
max_alert_rules |
2 |
max_total_cpu_millis |
2000 (2 cores) |
max_total_memory_mb |
2048 (2 GB) |
max_total_replicas |
5 |
max_execution_retention_days |
1 |
max_log_retention_days |
1 |
max_metric_retention_days |
1 |
max_jar_retention_count |
3 |
Encoded as public static final Map<String, Integer> DEFAULTS in DefaultTierLimits. Keys
match the license payload exactly.
4. Enforcement map
Every limit check goes through one method on LicenseEnforcer:
void assertWithinCap(String limitKey, long currentUsage, long requestedDelta);
Throws LicenseCapExceededException(limitKey, current, cap) when currentUsage + requestedDelta > cap.
A @ControllerAdvice maps it to 403 with a body that explains the "why" so operators can act
without grepping logs:
{
"error": "license cap reached",
"limit": "max_apps",
"current": 3,
"cap": 3,
"state": "EXPIRED",
"message": "License expired 5 days ago: system reverted to default tier (3 apps). Current usage is 3. Install or renew the license to create more apps."
}
The message field is rendered server-side from a small template per state:
| State | Message template |
|---|---|
ABSENT |
"No license installed: default tier applies (cap = N for {limit}). Install a license to raise this." |
ACTIVE |
"License cap reached: {limit} = {cap}. Current usage is {current}. Contact your vendor to raise the cap." |
GRACE |
"License expired {n} day(s) ago and is in its grace period (ends in {m} days). Cap unchanged at {cap}. Renew before grace ends." |
EXPIRED |
"License expired {n} days ago: system reverted to default tier (cap = N for {limit}). Current usage is {current}. Renew the license to lift the cap." |
INVALID |
"License rejected ({reason}): default tier applies (cap = N for {limit}). Fix the license to raise this." |
4.1 Per-limit call sites
| Limit | Call site | Failure response |
|---|---|---|
max_environments |
EnvironmentService.create (start) |
403 |
max_apps |
AppService.createApp |
403 |
max_agents |
AgentRegistryService.register |
403 — agent treated as unregistered (no SSE, no commands) |
max_users |
UserAdminController.createUser and OidcAuthController.callback (auto-signup) |
403 / OIDC login failure |
max_outbound_connections |
OutboundConnectionServiceImpl.create |
403 |
max_alert_rules |
AlertRuleController.create |
403 |
max_total_cpu_millis |
DeploymentExecutor.PRE_FLIGHT (sum across non-stopped deploys + new) |
Deploy fails fast at PRE_FLIGHT, status FAILED, audit row |
max_total_memory_mb |
same | same |
max_total_replicas |
same | same |
max_execution_retention_days |
EnvironmentService.update (per-env field, see §4.2) + RetentionPolicyApplier (see §4.3) |
422 on update; ClickHouse TTL recomputed on every license change |
max_log_retention_days |
same | same |
max_metric_retention_days |
same | same |
max_jar_retention_count |
EnvironmentAdminController.PUT /jar-retention |
422 |
4.2 Per-environment retention fields
Three new columns on environments (Flyway V2):
ALTER TABLE environments
ADD COLUMN execution_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN log_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN metric_retention_days INTEGER NOT NULL DEFAULT 1;
These are the configured per-env values. The effective ClickHouse TTL is
min(licenseCap, configured). Admin UI surfaces the configured values;
EnvironmentService.update rejects values above the license cap with 422.
4.3 Runtime retention recompute
RetentionPolicyApplier is @EventListener(LicenseChangedEvent):
- Triggered on every
LicenseService.replace(...)(boot install, env-var override, file override, POST/admin/license) and on every state transition the revalidation job detects (e.g. license becomesEXPIRED, caps drop to default). - Recomputes the effective TTL per env (
min(licenseCap, configured)), then issuesALTER TABLE … MODIFY TTL …on the affected ClickHouse tables (executions, processors, logs, metrics, route_diagrams, agent_events). One ALTER per table per affected env. - Errors are logged WARN; a failed ALTER does not block the license install — the operator can retry by reposting the license. The previous TTL keeps applying until the next successful ALTER.
- At boot,
LicenseService.loadInitial(...)publishes oneLicenseChangedEventafter the load order in §6.2 settles, so the boot path goes through the same applier as runtime changes.
Result: a server that stays up for months and lands in EXPIRED will see ClickHouse TTLs
collapse to default-tier values automatically — no restart needed.
4.4 Boot-time invariant
If a license is added that lowers a cap below current usage (10 apps, license now allows 5), the server logs one WARN per limit at boot. No deletion. New creates reject; existing resources keep working.
5. Usage endpoint
GET /api/v1/admin/license/usage (ADMIN only):
{
"state": "ACTIVE",
"expiresAt": "2027-04-25T00:00:00Z",
"daysRemaining": 365,
"gracePeriodDays": 30,
"tenantId": "acme-corp",
"label": "ACME prod 2026",
"lastValidatedAt": "2026-04-26T03:14:07Z",
"message": "License active. 365 days remaining.",
"limits": [
{"key": "max_apps", "current": 7, "cap": 50, "source": "license"},
{"key": "max_agents", "current": 12, "cap": 100, "source": "license"},
{"key": "max_total_cpu_millis", "current": 8500, "cap": 32000, "source": "license"},
{"key": "max_outbound_connections", "current": 0, "cap": 1, "source": "default"}
]
}
source is "default" when the cap comes from DefaultTierLimits (i.e. the license omits this
key, or there is no license), and "license" when the cap is explicit in the license. Drives the
SaaS UI's "free tier" badge.
message carries the same human-readable explanation that the 403 body uses, varying by state:
ABSENT— "No license installed. Default tier applies."ACTIVE— "License active. {n} days remaining."GRACE— "License expired {n} days ago. Grace period ends in {m} days. Renew now to avoid degradation."EXPIRED— "License expired {n} days ago. System reverted to default tier."INVALID— "License rejected: {reason}. Default tier applies. Fix the license to recover."
LicenseUsageReader issues one cheap aggregate per limit (SELECT COUNT(*) per entity table; a
single grouped SELECT SUM(replicas * cpuMillis), SUM(replicas * memoryMb), SUM(replicas) over
non-stopped deployments).
GET /api/v1/admin/license (existing) is extended to return {state, envelope, lastValidatedAt}
with the raw token omitted from the response.
6. Lifecycle, persistence, install paths
6.1 Storage
Flyway V2 migration:
CREATE TABLE license (
tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant)
token TEXT NOT NULL, -- full signed token
license_id UUID NOT NULL,
installed_at TIMESTAMPTZ NOT NULL,
installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot
expires_at TIMESTAMPTZ NOT NULL,
last_validated_at TIMESTAMPTZ NOT NULL -- updated by boot, install, and revalidation job
);
last_validated_at is the timestamp of the most recent successful signature/parse round-trip
against the current public key. Useful for troubleshooting "why did my license stop working" — a
stale last_validated_at next to a recent now is a strong signal that revalidation is failing
and the operator should check the public key.
6.2 Boot order
LicenseBeanConfig:
- If
CAMELEER_SERVER_LICENSE_TOKENenv var is set → validate → write to DB (overwrite, setslast_validated_at = now) → load. - Else if
CAMELEER_SERVER_LICENSE_FILEis set → read file → validate → write to DB → load. - Else read
licenserow from DB → validate → on success updatelast_validated_at = now→ load. - Else
ABSENT.
After step 1–3 the service publishes one LicenseChangedEvent so the retention applier and
metrics gauges initialise off the same code path as runtime changes.
Env-var / file act as idempotent overrides — they always win and replace the DB row, so the operator's last action survives reboots.
6.3 Runtime install
POST /api/v1/admin/license { "token": "..." } (existing):
- Validates against the configured public key.
- On success, persists to
licensetable (installed_by = user_id,last_validated_at = now), updates the in-memoryLicenseGate, publishesLicenseChangedEvent, audits. - On failure, returns 400 with the validator error message and audits the rejection.
Server transitions to
INVALIDstate if a previously-loaded license was replaced; otherwise remains in its prior state (the rejected token is not written to DB).
6.4 Public key custody
CAMELEER_SERVER_LICENSE_PUBLICKEY (existing) remains the only verification key. Build- /
deploy-time secret bound to the vendor distribution. Not stored in DB. If unset and a
license is present → reject all licenses (existing behaviour) → INVALID state.
6.5 Audit trail
New AuditCategory.LICENSE. Actions:
| Action | When | Payload |
|---|---|---|
install_license |
First successful install in an empty state | {licenseId, expiresAt, installedBy, source} (source = env/file/api) |
replace_license |
Successful install over an existing license | same + previousLicenseId |
reject_license |
Validation failed (signature, tenant, parse, public key missing) | {reason, source} |
revalidate_license |
Daily job result, on failure only | {licenseId, reason} |
cap_exceeded |
Any LicenseCapExceededException |
{limit, current, cap, requestedBy, state} |
6.6 Daily revalidation job
LicenseRevalidationJob:
@Scheduled(cron = "0 0 3 * * *")(03:00 server local time) plus an immediate run 60s after boot.- Reads the DB token, re-runs
LicenseValidator.validate(token)against the current public key. - On success:
UPDATE license SET last_validated_at = now WHERE tenant_id = ?. - On failure (e.g. operator rotated the public key without reinstalling the license, or DB
row was tampered with directly): transition state to
INVALID, publishLicenseChangedEvent(so retention recomputes too), auditrevalidate_licensewith the reason, logERROR. - Cheap (no I/O beyond one DB read + one DB write); safe to run frequently. 03:00 is chosen to coincide with off-peak so the WARN noise lands when humans aren't deploying.
7. Minter
7.1 LicenseMinter (library)
Pure function, packaged in cameleer-license-minter:
public final class LicenseMinter {
public static String mint(LicenseInfo info, PrivateKey ed25519PrivateKey);
}
Serializes LicenseInfo to canonical JSON (sorted keys), signs the bytes with Ed25519, returns
base64(payload).base64(signature). cameleer-saas calls this directly to mint per-tenant tokens.
7.2 LicenseMinterCli (CLI)
java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \
--private-key=/secure/vendor.key \
--public-key=/secure/vendor.pub \
--tenant=acme-corp \
--label="ACME prod 2026" \
--expires=2027-04-25 \
--grace-days=30 \
--max-apps=50 \
--max-agents=100 \
--max-total-cpu-millis=32000 \
--max-total-memory-mb=65536 \
--max-execution-retention-days=90 \
--output=acme-license.tok \
--verify
--private-keyreads a PEM-encoded Ed25519 private key (output ofopenssl genpkey -algorithm ed25519).--public-key(used only with--verify) reads the matching public key. Required when--verifyis set; ignored otherwise.- Unspecified
--max-*flags are omitted from the payload — the license inherits the default for that key. - Unknown flags fail fast.
--outputwrites the token; if omitted, prints to stdout.--verifyround-trips the freshly-minted token throughLicenseValidatoragainst--public-keyafter writing the output file. This catches:- corruption between
String → filewrite, - wrong-key pairing (vendor accidentally pointed
--public-keyat a different keypair's public half), - signature mismatch from a buggy build of the minter.
On verify failure the CLI exits non-zero, prints the validator error, and (if
--outputwas written) deletes the output file so the bad token does not get shipped.
- corruption between
Keypair generation is out of band — vendor uses openssl and stores both halves in their
secret manager. We deliberately do not ship a --gen-keypair subcommand to keep the boundary
clean.
8. Telemetry
Prometheus gauges scraped via /api/v1/prometheus:
| Metric | Labels | Notes |
|---|---|---|
cameleer_license_state |
`state="ABSENT | ACTIVE |
cameleer_license_days_remaining |
(none) | Negative in GRACE/EXPIRED. |
cameleer_license_limit_utilisation |
limit="max_apps" etc. |
current / cap, in [0, 1+]. |
cameleer_license_cap_rejections_total |
limit="..." |
Counter. |
cameleer_license_last_validated_age_seconds |
(none) | now - last_validated_at. Spikes if the daily revalidation job is failing. |
State-transition log lines: INFO on install/ACTIVE, WARN on GRACE, ERROR on EXPIRED,
ERROR on INVALID, WARN on cap reject (sampled to avoid log spam).
Recommended alert (in cameleer-saas Grafana, not shipped with the server): page on
cameleer_license_state{state="INVALID"} == 1 for > 5 minutes.
9. Dead-code removal
Performed in the first commit of the implementation. Per the project's "no backwards compatibility shims" preference, no deprecated path or feature flag.
- Delete
Feature.java. - Delete
LicenseGate.isEnabled(Feature). - Delete
LicenseInfo.featuresfield,LicenseInfo.hasFeature(Feature). - Delete
LicenseGateTest.withLicense_onlyLicensedFeaturesEnabledandLicenseInfo.open()'sSet.of(Feature.values())assertion. - Update
LicenseValidatorto ignorefeaturesif present in old tokens (silently dropped, not an error).
10. Testing
| Layer | Tests |
|---|---|
| Core unit | LicenseValidatorTest — signature, expiry, tenant mismatch, missing required fields (tenantId, licenseId, iat, exp), unknown extra fields. |
| Core unit | LicenseStateMachineTest — all five transitions including grace boundary, replace from any state, invalid install routes to INVALID, valid install from INVALID recovers to ACTIVE. |
| Core unit | DefaultTierLimitsTest — every documented key has a default. |
| Minter unit | LicenseMinterTest — round-trip with a throwaway Ed25519 keypair. Canonical JSON is stable across runs. |
| Minter CLI | LicenseMinterCliTest — invokes main with --private-key=tmp and checks output token validates; --verify happy path; --verify failure path deletes the output file and exits non-zero. |
| App unit | LicenseEnforcerTest — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default, message text varies per state. |
| App unit | RetentionPolicyApplierTest — license-changed event recomputes effective TTL per env; failed ALTER logs WARN and does not throw. |
| App integration | LicenseLifecycleIT — install via env, replace via POST, restart restores from DB, public-key removal at runtime transitions to INVALID, daily revalidation job updates last_validated_at. Driven through REST. |
| App integration | LicenseEnforcementIT — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes cap_exceeded audit row check and verifies the 403 body's message field matches the state. |
| App integration | RetentionRuntimeRecomputeIT — install license with max_log_retention_days=30, observe logs TTL ALTER fires; replace with max_log_retention_days=7, observe TTL drops to 7 without restart. |
| Boot | SchemaBootstrapIT extension — license table exists with last_validated_at, environments retention columns exist, retention pinning honoured at boot. |
No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or env var.
11. Open follow-ups (deliberately deferred)
- Ingestion-rate limits (
max_executions_per_minute,max_logs_per_minute). - Online revocation callback (the
revocation_check_urlenvelope field). - Concurrent debug session limit (
max_concurrent_debug_sessionsfrom the SaaS epic). - A "license usage history" report for vendors to see growth over time.
- Open a tracking issue on
cameleer/cameleer-server(Gitea) — none exists today.
12. Risk register
| Risk | Mitigation |
|---|---|
| Default tier so tight that an honest evaluator cannot try the product. | Defaults documented; vendor can ship a longer-exp "trial" license at install time if needed. |
Customer lowers gracePeriodDays field by editing token. |
Token is signed; any edit invalidates the signature. |
| License removed from DB out of band, server lands in ABSENT and rejects new resources but old ones are above default tier. | Boot-time WARN per over-cap limit. UI banner in the admin license page. No auto-deletion. |
| Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. Daily revalidation job catches a rotation that wasn't paired with a reinstall (state → INVALID, alertable). |
Compute cap arithmetic relies on cpuLimit and memoryLimitMb being set on every container. |
Existing ResolvedContainerConfig already enforces these; DeploymentExecutor.PRE_FLIGHT rejects deploys with unset compute fields. |
| Per-env retention column added but old ClickHouse partitions retain longer. | Documented: TTL change is honoured by ClickHouse on its next merge cycle. New rows inserted always honour the new TTL. |
RetentionPolicyApplier issues blocking ALTERs from the event listener thread. |
Applier runs ALTERs serialised but on a separate executor (not the publisher thread) so a slow ClickHouse does not stall the install API call. License install API returns immediately with the new state; retention recompute completes asynchronously and is observable via metrics. |