From e0be6a069fce24a70a3f4f3af3beec5c28dffa8d Mon Sep 17 00:00:00 2001 From: hsiegeln <37154749+hsiegeln@users.noreply.github.com> Date: Sun, 26 Apr 2026 09:42:16 +0200 Subject: [PATCH] docs(license): apply review feedback to enforcement design MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add INVALID state to FSM (signature/tenant/parse failure ≠ ABSENT) with loud UI/audit/metric severity; ABSENT stays a calm state. - Make tenantId required in the license envelope (it's already inside the signed payload, so a self-hosted customer cannot strip it). - Move ClickHouse TTL recompute from boot-only to a RetentionPolicyApplier @EventListener(LicenseChangedEvent), so a long-running server that lands in EXPIRED tightens TTL automatically. - Add LicenseRevalidationJob (daily) that re-runs signature check against the DB row and updates last_validated_at; transitions to INVALID on failure (catches public-key rotation drift). - Add last_validated_at column to the license table, surfaced on the /usage endpoint and as cameleer_license_last_validated_age_seconds. - Enrich enforcement-failure responses and the /usage endpoint with a per-state human-readable message so 403s and the UI both explain WHY caps changed. - Add --verify (with --public-key) to the minter CLI to round-trip a freshly-minted token through LicenseValidator before shipping it, deleting the output file on verify failure. - Add corresponding tests, telemetry gauge, and a runtime-recompute IT. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-25-license-enforcement-design.md | 224 ++++++++++++++---- 1 file changed, 172 insertions(+), 52 deletions(-) diff --git a/docs/superpowers/specs/2026-04-25-license-enforcement-design.md b/docs/superpowers/specs/2026-04-25-license-enforcement-design.md index fafbe1a9..5c8e1d4e 100644 --- a/docs/superpowers/specs/2026-04-25-license-enforcement-design.md +++ b/docs/superpowers/specs/2026-04-25-license-enforcement-design.md @@ -47,16 +47,18 @@ cameleer-server-core/ (existing — pure domain, no Spring) ├── LicenseLimits (typed wrapper over the limits map) ├── LicenseValidator (existing, payload schema updated) ├── LicenseGate (existing, gutted: no Feature; getLimits() only) - ├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED) - └── DefaultTierLimits (constant — §5 numbers) + ├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED / INVALID) + └── DefaultTierLimits (constant — §3.2 numbers) cameleer-server-app/ (existing — Spring, web, persistence) ├── license/ │ ├── LicenseRepository (NEW — PostgreSQL persistence) -│ ├── LicenseService (NEW — load/save/replace; emits state events) +│ ├── LicenseService (NEW — load/save/replace; publishes LicenseChangedEvent) │ ├── LicenseEnforcer (NEW — assertWithinCap entry point) │ ├── LicenseUsageReader (NEW — counts current usage for /usage endpoint) │ ├── LicenseCapExceededException (NEW — mapped to 403 by ControllerAdvice) +│ ├── LicenseRevalidationJob (NEW — @Scheduled daily; updates last_validated_at) +│ ├── RetentionPolicyApplier (NEW — @EventListener(LicenseChangedEvent); recomputes ClickHouse TTL + per-env caps) │ └── LicenseMetrics (NEW — Prometheus gauges) ├── controller/ │ ├── LicenseAdminController (existing — extended; persists, audited) @@ -67,7 +69,7 @@ cameleer-server-app/ (existing — Spring, web, persistence) cameleer-license-minter/ (NEW — top-level Maven module) ├── pom.xml (depends on cameleer-server-core) ├── LicenseMinter (signing primitive; takes private key + LicenseInfo) -└── cli/LicenseMinterCli (CLI main class) +└── cli/LicenseMinterCli (CLI main class, supports --verify) ``` ### 1.2 Why a separate `cameleer-license-minter` module @@ -100,7 +102,7 @@ Wire format unchanged: `base64(payload).base64(ed25519_signature)`. Payload sche { "licenseId": "550e8400-e29b-41d4-a716-446655440000", "tenantId": "acme-corp", - "label": "ACME prod 2026", + "label": "ACME prod 2026 — site:hamburg", "iat": 1745539200, "exp": 1777075200, "gracePeriodDays": 30, @@ -127,7 +129,7 @@ Wire format unchanged: `base64(payload).base64(ed25519_signature)`. Payload sche | Field | Required | Notes | |---|---|---| | `licenseId` | yes | UUID. Used in audit + future revocation. | -| `tenantId` | optional | If present and `CAMELEER_SERVER_TENANT_ID` differs, treat as no license + log error. Air-gapped customers may omit. | +| `tenantId` | **yes** | Must match `CAMELEER_SERVER_TENANT_ID`. Mismatch = `INVALID` state (see §3). The field is inside the signed payload, so a self-hosted customer cannot strip it to make a license portable across tenants — any edit invalidates the signature. Air-gapped customers receive a license bound to a vendor-issued tenant id (not necessarily a UUID — any non-empty slug). | | `label` | optional | Free-form human description. Surfaced in UI. | | `iat` | yes | Unix seconds. | | `exp` | yes | Unix seconds. | @@ -149,23 +151,42 @@ Wire format unchanged: `base64(payload).base64(ed25519_signature)`. Payload sche │ ABSENT │ ───────────────▶│ ACTIVE │──────▶│ GRACE │ │ EXPIRED │ └─────────┘ └────────┘ └────────┘ └─────────┘ ▲ │ │ ▲ │ - │ install invalid │ replace │ │ replace valid │ replace - │ (sig/tenant/parse) ▼ │ │ ▼ - └────────────────────────────┴──────────────┴─┴───────────────────┘ + │ │ replace │ │ replace valid │ replace + │ ▼ │ │ ▼ + │ ┌─────────┐ └──────────────┴─┴───────────────────┘ + └──┤ INVALID │ ──── replace valid ────────────────────────────────▶ ACTIVE + └─────────┘ + ▲ + │ install fails (signature / tenant / parse / public-key-missing) all transitions persist + audit-log ``` ### 3.1 State semantics -| State | Effective limits | Trigger | -|---|---|---| -| `ABSENT` | `DefaultTierLimits` | No DB row, or signature/tenant/parse failure. | -| `ACTIVE` | `merge(default, license.limits)` | License loaded, `now < exp`. | -| `GRACE` | Same as `ACTIVE` | `exp ≤ now < exp + gracePeriodDays`. UI banner. | -| `EXPIRED` | `DefaultTierLimits` | `now ≥ exp + gracePeriodDays`. Distinct UI label vs ABSENT. | +| State | Effective limits | Trigger | Severity | +|--- |--- |--- |--- | +| `ABSENT` | `DefaultTierLimits` | No DB row. Clean install with no license configured. | INFO | +| `ACTIVE` | `merge(default, license.limits)` | License loaded, `now < exp`. | INFO | +| `GRACE` | Same as `ACTIVE` | `exp ≤ now < exp + gracePeriodDays`. UI warning banner. | WARN | +| `EXPIRED` | `DefaultTierLimits` | `now ≥ exp + gracePeriodDays`. UI label distinct from ABSENT. | ERROR | +| `INVALID` | `DefaultTierLimits` | Signature failure, tenant mismatch, parse error, or public key not configured but a token is present. | **ERROR — loud** | -State is recomputed on every limit check (clock comparison only) — no scheduler needed for -transitions. The only "background" behaviour is the Prometheus gauge refresh. +`ABSENT` and `INVALID` produce the same enforcement (default tier) but are surfaced very +differently: + +- **`ABSENT`** is a clean state — fresh install, no license yet. UI shows a calm "Install a + license to lift the default-tier caps" call to action. No audit row beyond the boot log line. +- **`INVALID`** is an active error — tampering, wrong public key, or a paste that lost + characters. UI shows a red banner with the validator's error message + (e.g. "License signature verification failed", "License tenantId 'acme-corp' does not match + server tenant 'beta-corp'"). Audit row written under + `AuditCategory.LICENSE` action `reject_license`. Prometheus + `cameleer_license_state{state="INVALID"} = 1` so an alert can fire. + +State is recomputed on every limit check (clock comparison only against parsed in-memory +`LicenseInfo`) — no scheduler needed for `ACTIVE → GRACE → EXPIRED` transitions. A separate +**daily revalidation job** (§6.6) re-runs the signature check against the DB row to catch slow +failures like public-key rotation drift. ### 3.2 Default tier (the "no license" caps) @@ -199,8 +220,31 @@ void assertWithinCap(String limitKey, long currentUsage, long requestedDelta); ``` Throws `LicenseCapExceededException(limitKey, current, cap)` when `currentUsage + requestedDelta > cap`. -A `@ControllerAdvice` maps it to `403` with body -`{"error":"license cap reached","limit":"max_apps","current":3,"cap":3}`. +A `@ControllerAdvice` maps it to `403` with a body that explains the "why" so operators can act +without grepping logs: + +```json +{ + "error": "license cap reached", + "limit": "max_apps", + "current": 3, + "cap": 3, + "state": "EXPIRED", + "message": "License expired 5 days ago: system reverted to default tier (3 apps). Current usage is 3. Install or renew the license to create more apps." +} +``` + +The `message` field is rendered server-side from a small template per state: + +| State | Message template | +|--- |---| +| `ABSENT` | "No license installed: default tier applies (cap = N for {limit}). Install a license to raise this." | +| `ACTIVE` | "License cap reached: {limit} = {cap}. Current usage is {current}. Contact your vendor to raise the cap." | +| `GRACE` | "License expired {n} day(s) ago and is in its grace period (ends in {m} days). Cap unchanged at {cap}. Renew before grace ends." | +| `EXPIRED`| "License expired {n} days ago: system reverted to default tier (cap = N for {limit}). Current usage is {current}. Renew the license to lift the cap." | +| `INVALID`| "License rejected ({reason}): default tier applies (cap = N for {limit}). Fix the license to raise this." | + +### 4.1 Per-limit call sites | Limit | Call site | Failure response | |---|---|---| @@ -213,12 +257,12 @@ A `@ControllerAdvice` maps it to `403` with body | `max_total_cpu_millis` | `DeploymentExecutor.PRE_FLIGHT` (sum across non-stopped deploys + new) | Deploy fails fast at PRE_FLIGHT, status FAILED, audit row | | `max_total_memory_mb` | same | same | | `max_total_replicas` | same | same | -| `max_execution_retention_days` | `EnvironmentService.update` (per-env field, see §4.1) + `ClickHouseSchemaInitializer.applyRetention()` at boot | 422 on update; boot pins effective TTL = `min(licenseCap, configured)` | +| `max_execution_retention_days` | `EnvironmentService.update` (per-env field, see §4.2) + `RetentionPolicyApplier` (see §4.3) | 422 on update; ClickHouse TTL recomputed on every license change | | `max_log_retention_days` | same | same | | `max_metric_retention_days` | same | same | | `max_jar_retention_count` | `EnvironmentAdminController.PUT /jar-retention` | 422 | -### 4.1 Per-environment retention fields +### 4.2 Per-environment retention fields Three new columns on `environments` (Flyway V2): @@ -230,11 +274,30 @@ ALTER TABLE environments ``` These are the configured per-env values. The effective ClickHouse TTL is -`min(licenseCap, configured)`, applied at startup by `ClickHouseSchemaInitializer`. Admin UI -surfaces the configured values; `EnvironmentService.update` rejects values above the license cap -with 422. +`min(licenseCap, configured)`. Admin UI surfaces the configured values; +`EnvironmentService.update` rejects values above the license cap with 422. -### 4.2 Boot-time invariant +### 4.3 Runtime retention recompute + +`RetentionPolicyApplier` is `@EventListener(LicenseChangedEvent)`: + +- Triggered on every `LicenseService.replace(...)` (boot install, env-var override, file + override, POST `/admin/license`) **and** on every state transition the revalidation job + detects (e.g. license becomes `EXPIRED`, caps drop to default). +- Recomputes the effective TTL per env (`min(licenseCap, configured)`), then issues + `ALTER TABLE … MODIFY TTL …` on the affected ClickHouse tables (executions, processors, + logs, metrics, route_diagrams, agent_events). One ALTER per table per affected env. +- Errors are logged WARN; a failed ALTER does not block the license install — the operator can + retry by reposting the license. The previous TTL keeps applying until the next successful + ALTER. +- At boot, `LicenseService.loadInitial(...)` publishes one `LicenseChangedEvent` after the + load order in §6.2 settles, so the boot path goes through the same applier as runtime + changes. + +Result: a server that stays up for months and lands in `EXPIRED` will see ClickHouse TTLs +collapse to default-tier values automatically — no restart needed. + +### 4.4 Boot-time invariant If a license is added that *lowers* a cap below current usage (10 apps, license now allows 5), the server logs one WARN per limit at boot. **No deletion**. New creates reject; existing resources @@ -254,6 +317,8 @@ keep working. "gracePeriodDays": 30, "tenantId": "acme-corp", "label": "ACME prod 2026", + "lastValidatedAt": "2026-04-26T03:14:07Z", + "message": "License active. 365 days remaining.", "limits": [ {"key": "max_apps", "current": 7, "cap": 50, "source": "license"}, {"key": "max_agents", "current": 12, "cap": 100, "source": "license"}, @@ -267,12 +332,20 @@ keep working. key, or there is no license), and `"license"` when the cap is explicit in the license. Drives the SaaS UI's "free tier" badge. +`message` carries the same human-readable explanation that the 403 body uses, varying by state: + +- `ABSENT` — "No license installed. Default tier applies." +- `ACTIVE` — "License active. {n} days remaining." +- `GRACE` — "License expired {n} days ago. Grace period ends in {m} days. Renew now to avoid degradation." +- `EXPIRED`— "License expired {n} days ago. System reverted to default tier." +- `INVALID`— "License rejected: {reason}. Default tier applies. Fix the license to recover." + `LicenseUsageReader` issues one cheap aggregate per limit (`SELECT COUNT(*)` per entity table; a single grouped `SELECT SUM(replicas * cpuMillis), SUM(replicas * memoryMb), SUM(replicas)` over non-stopped deployments). -`GET /api/v1/admin/license` (existing) is extended to return `{state, envelope}` with the raw token -omitted from the response. +`GET /api/v1/admin/license` (existing) is extended to return `{state, envelope, lastValidatedAt}` +with the raw token omitted from the response. --- @@ -284,25 +357,35 @@ Flyway V2 migration: ```sql CREATE TABLE license ( - tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant) - token TEXT NOT NULL, -- full signed token - license_id UUID NOT NULL, - installed_at TIMESTAMPTZ NOT NULL, - installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot - expires_at TIMESTAMPTZ NOT NULL + tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant) + token TEXT NOT NULL, -- full signed token + license_id UUID NOT NULL, + installed_at TIMESTAMPTZ NOT NULL, + installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot + expires_at TIMESTAMPTZ NOT NULL, + last_validated_at TIMESTAMPTZ NOT NULL -- updated by boot, install, and revalidation job ); ``` +`last_validated_at` is the timestamp of the most recent **successful** signature/parse round-trip +against the current public key. Useful for troubleshooting "why did my license stop working" — a +stale `last_validated_at` next to a recent `now` is a strong signal that revalidation is failing +and the operator should check the public key. + ### 6.2 Boot order `LicenseBeanConfig`: -1. If `CAMELEER_SERVER_LICENSE_TOKEN` env var is set → validate → write to DB (overwrite) → - load. +1. If `CAMELEER_SERVER_LICENSE_TOKEN` env var is set → validate → write to DB (overwrite, + sets `last_validated_at = now`) → load. 2. Else if `CAMELEER_SERVER_LICENSE_FILE` is set → read file → validate → write to DB → load. -3. Else read `license` row from DB → validate → load. +3. Else read `license` row from DB → validate → on success update `last_validated_at = now` → + load. 4. Else `ABSENT`. +After step 1–3 the service publishes one `LicenseChangedEvent` so the retention applier and +metrics gauges initialise off the same code path as runtime changes. + Env-var / file act as **idempotent overrides** — they always win and replace the DB row, so the operator's last action survives reboots. @@ -310,15 +393,17 @@ operator's last action survives reboots. `POST /api/v1/admin/license { "token": "..." }` (existing): - Validates against the configured public key. -- On success, persists to `license` table (`installed_by = user_id`), updates the in-memory - `LicenseGate`, audits. +- On success, persists to `license` table (`installed_by = user_id`, `last_validated_at = now`), + updates the in-memory `LicenseGate`, publishes `LicenseChangedEvent`, audits. - On failure, returns 400 with the validator error message and audits the rejection. + Server transitions to `INVALID` state if a previously-loaded license was replaced; otherwise + remains in its prior state (the rejected token is *not* written to DB). ### 6.4 Public key custody `CAMELEER_SERVER_LICENSE_PUBLICKEY` (existing) remains the only verification key. Build- / deploy-time secret bound to the vendor distribution. **Not stored in DB.** If unset *and* a -license is present → reject all licenses (existing behaviour). +license is present → reject all licenses (existing behaviour) → `INVALID` state. ### 6.5 Audit trail @@ -329,7 +414,23 @@ New `AuditCategory.LICENSE`. Actions: | `install_license` | First successful install in an empty state | `{licenseId, expiresAt, installedBy, source}` (`source` = `env`/`file`/`api`) | | `replace_license` | Successful install over an existing license | same + `previousLicenseId` | | `reject_license` | Validation failed (signature, tenant, parse, public key missing) | `{reason, source}` | -| `cap_exceeded` | Any `LicenseCapExceededException` | `{limit, current, cap, requestedBy}` | +| `revalidate_license` | Daily job result, on **failure only** | `{licenseId, reason}` | +| `cap_exceeded` | Any `LicenseCapExceededException` | `{limit, current, cap, requestedBy, state}` | + +### 6.6 Daily revalidation job + +`LicenseRevalidationJob`: +- `@Scheduled(cron = "0 0 3 * * *")` (03:00 server local time) plus an immediate run 60s + after boot. +- Reads the DB token, re-runs `LicenseValidator.validate(token)` against the current public + key. +- On success: `UPDATE license SET last_validated_at = now WHERE tenant_id = ?`. +- On failure (e.g. operator rotated the public key without reinstalling the license, or DB + row was tampered with directly): transition state to `INVALID`, publish + `LicenseChangedEvent` (so retention recomputes too), audit `revalidate_license` with the + reason, log `ERROR`. +- Cheap (no I/O beyond one DB read + one DB write); safe to run frequently. 03:00 is chosen + to coincide with off-peak so the WARN noise lands when humans aren't deploying. --- @@ -353,6 +454,7 @@ Serializes `LicenseInfo` to canonical JSON (sorted keys), signs the bytes with E ```bash java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \ --private-key=/secure/vendor.key \ + --public-key=/secure/vendor.pub \ --tenant=acme-corp \ --label="ACME prod 2026" \ --expires=2027-04-25 \ @@ -362,15 +464,26 @@ java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \ --max-total-cpu-millis=32000 \ --max-total-memory-mb=65536 \ --max-execution-retention-days=90 \ - --output=acme-license.tok + --output=acme-license.tok \ + --verify ``` - `--private-key` reads a PEM-encoded Ed25519 private key (output of `openssl genpkey -algorithm ed25519`). +- `--public-key` *(used only with `--verify`)* reads the matching public key. Required when + `--verify` is set; ignored otherwise. - Unspecified `--max-*` flags are omitted from the payload — the license inherits the default for that key. - Unknown flags fail fast. - `--output` writes the token; if omitted, prints to stdout. +- `--verify` round-trips the freshly-minted token through `LicenseValidator` against + `--public-key` *after* writing the output file. This catches: + - corruption between `String → file` write, + - wrong-key pairing (vendor accidentally pointed `--public-key` at a different keypair's + public half), + - signature mismatch from a buggy build of the minter. + On verify failure the CLI exits non-zero, prints the validator error, and (if `--output` was + written) deletes the output file so the bad token does not get shipped. Keypair generation is **out of band** — vendor uses `openssl` and stores both halves in their secret manager. We deliberately do not ship a `--gen-keypair` subcommand to keep the boundary @@ -384,13 +497,17 @@ Prometheus gauges scraped via `/api/v1/prometheus`: | Metric | Labels | Notes | |---|---|---| -| `cameleer_license_state` | `state="ABSENT|ACTIVE|GRACE|EXPIRED"` | Boolean — exactly one is 1. | +| `cameleer_license_state` | `state="ABSENT|ACTIVE|GRACE|EXPIRED|INVALID"` | Boolean — exactly one is 1. | | `cameleer_license_days_remaining` | (none) | Negative in GRACE/EXPIRED. | | `cameleer_license_limit_utilisation`| `limit="max_apps"` etc. | `current / cap`, in `[0, 1+]`. | | `cameleer_license_cap_rejections_total` | `limit="..."` | Counter. | +| `cameleer_license_last_validated_age_seconds` | (none) | `now - last_validated_at`. Spikes if the daily revalidation job is failing. | -State-transition log lines: `INFO` on install/ACTIVE, `WARN` on GRACE, `ERROR` on EXPIRED, `WARN` -on cap reject (sampled to avoid log spam). +State-transition log lines: `INFO` on install/ACTIVE, `WARN` on GRACE, `ERROR` on EXPIRED, +`ERROR` on INVALID, `WARN` on cap reject (sampled to avoid log spam). + +Recommended alert (in cameleer-saas Grafana, not shipped with the server): page on +`cameleer_license_state{state="INVALID"} == 1` for > 5 minutes. --- @@ -413,15 +530,17 @@ compatibility shims" preference, no deprecated path or feature flag. | Layer | Tests | |---|---| -| Core unit | `LicenseValidatorTest` — signature, expiry, tenant mismatch, missing required fields, unknown extra fields. | -| Core unit | `LicenseStateMachineTest` — all four transitions including grace boundary, replace from any state, invalid install. | +| Core unit | `LicenseValidatorTest` — signature, expiry, tenant mismatch, missing required fields (`tenantId`, `licenseId`, `iat`, `exp`), unknown extra fields. | +| Core unit | `LicenseStateMachineTest` — all five transitions including grace boundary, replace from any state, invalid install routes to `INVALID`, valid install from `INVALID` recovers to `ACTIVE`. | | Core unit | `DefaultTierLimitsTest` — every documented key has a default. | | Minter unit | `LicenseMinterTest` — round-trip with a throwaway Ed25519 keypair. Canonical JSON is stable across runs. | -| Minter CLI | `LicenseMinterCliTest` — invokes `main` with `--private-key=tmp` and checks output token validates. | -| App unit | `LicenseEnforcerTest` — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default. | -| App integration | `LicenseLifecycleIT` — install via env, replace via POST, restart restores from DB. Driven through REST. | -| App integration | `LicenseEnforcementIT` — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes `cap_exceeded` audit row check. | -| Boot | `SchemaBootstrapIT` extension — `license` table exists, `environments` retention columns exist, retention pinning honoured at boot. | +| Minter CLI | `LicenseMinterCliTest` — invokes `main` with `--private-key=tmp` and checks output token validates; `--verify` happy path; `--verify` failure path deletes the output file and exits non-zero. | +| App unit | `LicenseEnforcerTest` — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default, message text varies per state. | +| App unit | `RetentionPolicyApplierTest` — license-changed event recomputes effective TTL per env; failed ALTER logs WARN and does not throw. | +| App integration | `LicenseLifecycleIT` — install via env, replace via POST, restart restores from DB, public-key removal at runtime transitions to `INVALID`, daily revalidation job updates `last_validated_at`. Driven through REST. | +| App integration | `LicenseEnforcementIT` — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes `cap_exceeded` audit row check and verifies the 403 body's `message` field matches the state. | +| App integration | `RetentionRuntimeRecomputeIT` — install license with `max_log_retention_days=30`, observe `logs` TTL ALTER fires; replace with `max_log_retention_days=7`, observe TTL drops to 7 without restart. | +| Boot | `SchemaBootstrapIT` extension — `license` table exists with `last_validated_at`, `environments` retention columns exist, retention pinning honoured at boot. | No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or env var. @@ -444,6 +563,7 @@ No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or e | Default tier so tight that an honest evaluator cannot try the product. | Defaults documented; vendor can ship a longer-`exp` "trial" license at install time if needed. | | Customer lowers `gracePeriodDays` field by editing token. | Token is signed; any edit invalidates the signature. | | License removed from DB out of band, server lands in ABSENT and rejects new resources but old ones are above default tier. | Boot-time WARN per over-cap limit. UI banner in the admin license page. No auto-deletion. | -| Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. | +| Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. Daily revalidation job catches a rotation that wasn't paired with a reinstall (state → `INVALID`, alertable). | | Compute cap arithmetic relies on `cpuLimit` and `memoryLimitMb` being set on every container. | Existing `ResolvedContainerConfig` already enforces these; `DeploymentExecutor.PRE_FLIGHT` rejects deploys with unset compute fields. | | Per-env retention column added but old ClickHouse partitions retain longer. | Documented: TTL change is honoured by ClickHouse on its next merge cycle. New rows inserted always honour the new TTL. | +| `RetentionPolicyApplier` issues blocking ALTERs from the event listener thread. | Applier runs ALTERs serialised but on a separate executor (not the publisher thread) so a slow ClickHouse does not stall the install API call. License install API returns immediately with the new state; retention recompute completes asynchronously and is observable via metrics. |