diff --git a/cameleer-license-minter/README.md b/cameleer-license-minter/README.md new file mode 100644 index 00000000..dea3ef50 --- /dev/null +++ b/cameleer-license-minter/README.md @@ -0,0 +1,287 @@ +# cameleer-license-minter + +Standalone vendor-side tool for producing signed Ed25519 license tokens consumed by `cameleer-server`. The minter is intentionally **not** a runtime or compile-scope dependency of the server — the server only ships with the matching public key and validates tokens via `LicenseValidator`. The private signing key never leaves the vendor's environment. + +- Module GAV: `com.cameleer:cameleer-license-minter:1.0-SNAPSHOT` +- Maven coordinates of the runtime server (does **not** transitively pull this module): `com.cameleer:cameleer-server-app:1.0-SNAPSHOT` +- Build artifacts (after `mvn -pl cameleer-license-minter package`): + - `target/cameleer-license-minter-1.0-SNAPSHOT.jar` — plain library JAR (consumable as a Maven `test` dependency or via the `LicenseMinter` API in custom tooling) + - `target/cameleer-license-minter-1.0-SNAPSHOT-cli.jar` — fat CLI JAR with main class `com.cameleer.license.minter.cli.LicenseMinterCli` + +## Table of contents + +## Audience + +## Build + +## Public Java API + +## CLI usage + +## Token format + +## LicenseInfo schema + +## Limits dictionary + +## Generating an Ed25519 key pair + +## Worked example + +## Security guidance + +## Compatibility / runtime separation + +--- + +## Audience + +Vendors / SaaS operators issuing licenses to customers who run `cameleer-server`. End-customer operators looking for *how to install* a token should read `docs/license-enforcement.md` instead. + +## Build + +```bash +# From the repo root +mvn -pl cameleer-license-minter package +``` + +Two JARs land in `cameleer-license-minter/target/`: + +| Artifact | Purpose | +|---|---| +| `cameleer-license-minter-1.0-SNAPSHOT.jar` | Plain library (the `repackage` execution for the main artifact is disabled; see `pom.xml:50-54`). Use this when embedding the minter inside your own tooling or a unit test that needs a fresh signed token. | +| `cameleer-license-minter-1.0-SNAPSHOT-cli.jar` | Fat CLI JAR. Repackaged by Spring Boot's `spring-boot-maven-plugin` with classifier `cli`; main class is `com.cameleer.license.minter.cli.LicenseMinterCli`. | + +## Public Java API + +`com.cameleer.license.minter.LicenseMinter` is the only entry point for the library. It is a final, stateless utility class: + +```java +import com.cameleer.license.minter.LicenseMinter; +import com.cameleer.server.core.license.LicenseInfo; + +LicenseInfo info = new LicenseInfo( + java.util.UUID.randomUUID(), + "acme-prod", // tenantId — must match server's CAMELEER_SERVER_TENANT_ID + "Acme Production (Tier B)", // human label, optional + java.util.Map.of( + "max_environments", 3, + "max_apps", 25, + "max_agents", 50, + "max_users", 20, + "max_total_replicas", 30 + ), + java.time.Instant.now(), // issuedAt + java.time.Instant.parse("2027-01-01T00:00:00Z"), // expiresAt + 7 // gracePeriodDays +); + +String token = LicenseMinter.mint(info, ed25519PrivateKey); +``` + +Source: `cameleer-license-minter/src/main/java/com/cameleer/license/minter/LicenseMinter.java:20`. + +The method is thread-safe; the underlying Jackson `ObjectMapper` is configured once with `ORDER_MAP_ENTRIES_BY_KEYS` so canonical-JSON serialization is deterministic across runs and process boundaries. + +`LicenseMinter.mint` will throw `IllegalStateException` if the JCE provider rejects the private key or the payload cannot be serialized. + +## CLI usage + +The CLI entry point is `com.cameleer.license.minter.cli.LicenseMinterCli`. Run it from the fat JAR produced by the build: + +```bash +java -jar cameleer-license-minter/target/cameleer-license-minter-1.0-SNAPSHOT-cli.jar \ + --private-key=/secure/keys/cameleer-license-priv.pem \ + --tenant=acme-prod \ + --label="Acme Production (Tier B)" \ + --expires=2027-01-01 \ + --grace-days=7 \ + --max-environments=3 \ + --max-apps=25 \ + --max-agents=50 \ + --max-users=20 \ + --max-total-replicas=30 \ + --output=/secure/out/acme-prod.lic \ + --public-key=/secure/keys/cameleer-license-pub.b64 \ + --verify +``` + +### Flag reference + +Source of truth: `cameleer-license-minter/src/main/java/com/cameleer/license/minter/cli/LicenseMinterCli.java:26`. + +| Flag | Required | Meaning | +|---|---|---| +| `--private-key=` | yes | Path to a PKCS#8-encoded Ed25519 private key. Both PEM (`-----BEGIN PRIVATE KEY-----`) and raw base64 are accepted (`LicenseMinterCli.readEd25519PrivateKey`). | +| `--tenant=` | yes | The exact `tenantId` the server will compare against `CAMELEER_SERVER_TENANT_ID`. Mismatch causes the validator to throw at install / revalidation. | +| `--expires=` | yes | Expiration date interpreted as midnight UTC. The validator considers tokens expired once `now > exp + gracePeriodDays`. | +| `--label=` | no | Human-readable label, surfaced via `GET /api/v1/admin/license` and `/api/v1/admin/license/usage`. | +| `--grace-days=` | no | Number of days the license stays usable after `--expires`. Defaults to `0`. | +| `--max-=` | no, repeatable | Each `--max-foo-bar` flag becomes the limit key `max_foo_bar`. See the limits dictionary below. Unknown keys are accepted by the minter (the server side ignores keys it does not understand and falls through to defaults). | +| `--output=` | no | Write the token to a file. When omitted, the token is printed to stdout. On `--verify` failure the file is deleted. | +| `--public-key=` | no, required for `--verify` | Path to the matching base64 X.509 SPKI public key file (one line, no PEM markers). | +| `--verify` | no | After minting, parse + signature-check the token using `--public-key` and `--tenant`. Exits non-zero if verification fails. | + +Exit codes: `0` on success, `1` on minting / IO failure, `2` on argument validation failure, `3` on `--verify` failure. + +## Token format + +A token is the concatenation of two **standard** base64 segments joined by a literal `.`: + +``` +base64(canonicalJson) + "." + base64(ed25519Signature) +``` + +- The canonical JSON payload is produced by `LicenseMinter.canonicalPayload(...)` with keys sorted lexicographically and `limits` rendered as a sorted object. This makes the byte sequence deterministic given a fixed `LicenseInfo`. +- The signature is computed with `Signature.getInstance("Ed25519")` over the canonical payload bytes (not over the base64-encoded form). +- Encoding is `Base64.getEncoder()` (RFC 4648 §4 — *not* base64url). The validator decodes with the matching `Base64.getDecoder()`. + +`LicenseValidator.validate(...)` (`cameleer-server-core/src/main/java/com/cameleer/server/core/license/LicenseValidator.java:42`) splits on the first `.`, decodes both halves, verifies the signature, then deserializes the payload. + +## LicenseInfo schema + +Source: `cameleer-server-core/src/main/java/com/cameleer/server/core/license/LicenseInfo.java`. Field-by-field: + +| Field | Type | Required | Semantics | +|---|---|---|---| +| `licenseId` | `UUID` | yes | Stable identifier for this token. The server's audit trail records install/replace transitions by license id; renewals must use a fresh UUID so audit history is non-ambiguous. | +| `tenantId` | `String` | yes | Must equal the server's `CAMELEER_SERVER_TENANT_ID`. The validator throws `IllegalArgumentException` on mismatch. Blank values are rejected by the canonical record constructor. | +| `label` | `String` | no | Free-form human label. Surfaced on the admin/usage endpoints and the operator UI. Has no enforcement semantics. | +| `limits` | `Map` | yes (may be empty) | License-specific overrides. Any key that appears here is unioned over `DefaultTierLimits.DEFAULTS` to form the effective caps in `ACTIVE` / `GRACE` states. Keys not present fall through to defaults. | +| `issuedAt` | `Instant` (epoch seconds in JSON `iat`) | yes | Stamped by the minter; not currently consulted by the validator beyond informational logging. | +| `expiresAt` | `Instant` (epoch seconds in JSON `exp`) | yes | The validator throws if `now > expiresAt + gracePeriodDays * 86400` at install or revalidation. | +| `gracePeriodDays` | `int` | yes (>= 0) | Window after `expiresAt` during which the gate transitions to `GRACE` (license still grants its caps) before flipping to `EXPIRED`. Negative values are rejected at construction. | + +## Limits dictionary + +Canonical key set: `cameleer-server-core/src/main/java/com/cameleer/server/core/license/DefaultTierLimits.java`. Any key not listed here is silently ignored by the server's `LicenseGate.getEffectiveLimits()`. + +| CLI flag | Key | Default | What the server enforces | +|---|---|---|---| +| `--max-environments` | `max_environments` | 1 | `EnvironmentService.create(...)` consults `LicenseEnforcer.assertWithinCap("max_environments", currentCount, 1)`. | +| `--max-apps` | `max_apps` | 3 | `AppService.createApp(...)` checks total app count across all envs. | +| `--max-agents` | `max_agents` | 5 | `AgentRegistryService.register(...)` checks live agent count. | +| `--max-users` | `max_users` | 3 | User creation paths (`UserAdminController`, `UiAuthController` self-signup, `OidcAuthController` first-login). | +| `--max-outbound-connections` | `max_outbound_connections` | 1 | `OutboundConnectionServiceImpl.create(...)`. | +| `--max-alert-rules` | `max_alert_rules` | 2 | `AlertRuleController.create(...)`. | +| `--max-total-cpu-millis` | `max_total_cpu_millis` | 2000 | `DeploymentExecutor` PRE_FLIGHT compute cap (sum of `replicas * cpuLimit` over non-stopped deployments). | +| `--max-total-memory-mb` | `max_total_memory_mb` | 2048 | `DeploymentExecutor` PRE_FLIGHT compute cap (sum of `replicas * memoryLimitMb`). | +| `--max-total-replicas` | `max_total_replicas` | 5 | `DeploymentExecutor` PRE_FLIGHT compute cap (sum of `replicas`). | +| `--max-execution-retention-days` | `max_execution_retention_days` | 1 | ClickHouse TTL cap for `executions`, `processor_executions`. Effective TTL = `min(cap, env.executionRetentionDays)`. | +| `--max-log-retention-days` | `max_log_retention_days` | 1 | ClickHouse TTL cap for `logs`. | +| `--max-metric-retention-days` | `max_metric_retention_days` | 1 | ClickHouse TTL cap for `agent_metrics`, `agent_events`. | +| `--max-jar-retention-count` | `max_jar_retention_count` | 3 | `EnvironmentAdminController` PUT `/{envSlug}/jar-retention` rejects requests above this cap. Also bounds the daily `JarRetentionJob`. | + +## Generating an Ed25519 key pair + +The minter and validator both rely on the JCE `Ed25519` algorithm shipped with JDK 17+. No external crypto library is needed. + +```java +import java.security.KeyPair; +import java.security.KeyPairGenerator; +import java.util.Base64; + +KeyPair kp = KeyPairGenerator.getInstance("Ed25519").generateKeyPair(); + +// 32-byte public key, X.509 SubjectPublicKeyInfo wrapped — this is what the server expects. +String publicKeyB64 = Base64.getEncoder().encodeToString(kp.getPublic().getEncoded()); + +// PKCS#8 private key — the CLI's --private-key reader accepts this either as raw base64 +// or PEM-wrapped (`-----BEGIN PRIVATE KEY-----`). +String privateKeyB64 = Base64.getEncoder().encodeToString(kp.getPrivate().getEncoded()); +``` + +A one-liner using the JDK's `keytool` is **not** sufficient — `keytool` cannot produce raw Ed25519 PKCS#8 in a directly-usable shape for our reader. Generating via the API above (or `openssl genpkey -algorithm ed25519`) is the supported path. + +For OpenSSL: + +```bash +openssl genpkey -algorithm ed25519 -out cameleer-license-priv.pem +openssl pkey -in cameleer-license-priv.pem -pubout -outform DER \ + | base64 -w0 > cameleer-license-pub.b64 +``` + +The resulting `cameleer-license-pub.b64` is the value to put into `CAMELEER_SERVER_LICENSE_PUBLICKEY`. + +## Worked example + +End-to-end: generate a key pair, mint a license, install it on a running server, verify enforcement. + +```bash +# 1. Vendor side — generate the keypair +openssl genpkey -algorithm ed25519 -out /secrets/cameleer-priv.pem +openssl pkey -in /secrets/cameleer-priv.pem -pubout -outform DER \ + | base64 -w0 > /secrets/cameleer-pub.b64 + +# 2. Vendor side — distribute the public key (commit to deployment config / Vault / k8s Secret) +cat /secrets/cameleer-pub.b64 +# MCowBQYDK2VwAyEAxxxxx... + +# 3. Vendor side — mint a license for a customer tenant +mvn -pl cameleer-license-minter package -DskipTests +java -jar cameleer-license-minter/target/cameleer-license-minter-1.0-SNAPSHOT-cli.jar \ + --private-key=/secrets/cameleer-priv.pem \ + --public-key=/secrets/cameleer-pub.b64 \ + --tenant=acme-prod \ + --label="Acme Production" \ + --expires=2027-01-01 \ + --grace-days=14 \ + --max-environments=3 \ + --max-apps=25 \ + --max-agents=50 \ + --max-users=20 \ + --max-total-replicas=30 \ + --max-total-cpu-millis=15000 \ + --max-total-memory-mb=16384 \ + --max-execution-retention-days=30 \ + --max-log-retention-days=14 \ + --max-metric-retention-days=14 \ + --max-jar-retention-count=10 \ + --output=/tmp/acme.lic \ + --verify + +# 4. Customer side — server boots with public key + tenant id matching the mint +export CAMELEER_SERVER_TENANT_ID=acme-prod +export CAMELEER_SERVER_LICENSE_PUBLICKEY=$(cat /secrets/cameleer-pub.b64) + +# 5. Customer side — install via the admin API after boot +curl -X POST https://server.example.com/api/v1/admin/license \ + -H "Authorization: Bearer ${ADMIN_JWT}" \ + -H "Content-Type: application/json" \ + -d "{\"token\": \"$(cat /tmp/acme.lic)\"}" + +# 6. Customer side — verify it was accepted +curl https://server.example.com/api/v1/admin/license \ + -H "Authorization: Bearer ${ADMIN_JWT}" +# {"state":"ACTIVE","invalidReason":null,"envelope":{...},"lastValidatedAt":"..."} + +curl https://server.example.com/api/v1/admin/license/usage \ + -H "Authorization: Bearer ${ADMIN_JWT}" +# Shows current/cap/source per limit key +``` + +For boot-time installation (preferred for SaaS-managed deployments), set `CAMELEER_SERVER_LICENSE_TOKEN` instead of POSTing — see `docs/license-enforcement.md`. + +## Security guidance + +- **The Ed25519 private key is the trust root.** Anyone who holds it can mint licenses for any tenant. Treat it like a code-signing key. +- **Storage.** Production private keys belong in an HSM, KMS (e.g. AWS KMS / GCP KMS with non-exportable signing), or a sealed Vault transit backend. A sealed file on a laptop is acceptable for low-volume / pre-production minting only and should never be committed to git or shared via chat. +- **Rotation.** Rotation is destructive: every customer running with the *old* public key will reject all new tokens signed with the *new* private key. The pragmatic procedure is: + 1. Generate the new keypair. + 2. Distribute the new public key (`CAMELEER_SERVER_LICENSE_PUBLICKEY`) to every tenant's server config. + 3. Once tenants confirm they are running with the new public key, re-mint and re-issue every active license under the new key. + 4. Decommission the old private key. + Practical revocation flows through expiry — keep license terms short enough (12 months or less) that planned rotations stay aligned with renewal cadence. +- **Auditing.** The server records every install/replace/reject under `AuditCategory.LICENSE`. The minter itself does not write audit rows; if you need a vendor-side audit trail of mint operations, wrap `LicenseMinter.mint(...)` in your own ticketing pipeline. +- **Never commit private keys.** `.gitignore` does not block them by name — use a `secrets/` directory excluded by your repository's policy, or store them entirely outside the working tree. + +## Compatibility / runtime separation + +The minter is intentionally absent from `cameleer-server-app`'s production classpath. To verify after a build: + +```bash +mvn -pl cameleer-server-app dependency:tree | grep license-minter +# expected: empty output (or, in development branches, a single line scoped 'test') +``` + +`cameleer-license-minter/pom.xml` depends on `cameleer-server-core` for `LicenseInfo` and the validator round-trip used by `--verify`. The server app intentionally does not depend on the minter — vendors mint outside the customer-deployed runtime, and a compromised customer cannot leverage server code to forge tokens. diff --git a/docs/handoff/2026-04-26-license-saas-handoff.md b/docs/handoff/2026-04-26-license-saas-handoff.md new file mode 100644 index 00000000..10a719e1 --- /dev/null +++ b/docs/handoff/2026-04-26-license-saas-handoff.md @@ -0,0 +1,377 @@ +# License Enforcement — SaaS Handoff (2026-04-26) + +Handoff for the cameleer-saas team and customer-success engineers operating customer-facing cameleer-server deployments. Covers issuing, renewing, revoking, and operationally observing licenses. + +For end-customer operator docs, see `docs/license-enforcement.md`. For minting tooling, see `cameleer-license-minter/README.md`. For the original design + plan, see: + +- `docs/superpowers/specs/2026-04-25-license-enforcement-design.md` +- `docs/superpowers/plans/2026-04-25-license-enforcement.md` + +## Table of contents + +## Session context + +## What this delivers + +## Trust model architecture + +## Operational playbook + +## Key management + +## Cap matrix (plan tiers) + +## Telemetry the SaaS team can observe + +## Failure modes & runbook + +## Edge cases the SaaS team should know + +## Testing guidance + +## Pointers + +--- + +## Session context + +- **Branch:** `feature/runtime-hardening` +- **Commit range:** `ec51aef8..140ea884` — 40 commits delivering the full feature (3 doc/spec/plan commits + 14 implementation commits + 23 follow-ons covering enforcement, retention, metrics, REST surface, integration tests, and rules updates). +- **Plan tasks:** 36 of 36 complete. Tests green: core (122), minter (7), app unit (230), key ITs (`PostgresLicenseRepositoryIT`, `LicenseLifecycleIT`, `LicenseEnforcementIT`, `RetentionRuntimeRecomputeIT`, `SchemaBootstrapIT`). +- **Persisted state:** Flyway migration **V5** — adds the `license` table and three retention columns on `environments` (`execution_retention_days`, `log_retention_days`, `metric_retention_days`). + +### Key SHAs + +| SHA | Subject | +|---|---| +| `ec51aef8` | start of plan (above this is unrelated runtime-hardening work) | +| `551a7f12` | refactor(license): remove dead Feature enum and isEnabled scaffolding | +| `2ebe4989..0499a54e` | LicenseInfo / Validator / Limits / Gate redesign | +| `896b7e6e..f6657f81` | Standalone `cameleer-license-minter` module | +| `20aefd5b..b95e80a2` | PG schema, repository, service, boot wiring | +| `2bad9c3e..e198c13e` | Enforcement points, retention applier, REST surface, metrics, ITs | +| `140ea884` | docs(rules): document license enforcement classes + endpoints (head) | + +## What this delivers + +- **Cap enforcement** at 8 surfaces (env/app/agent/user/outbound/alert-rule creation, deploy-time compute caps, jar retention). +- **License lifecycle**: install (env > file > DB > API), daily revalidation cron + 60s post-startup tick, grace period, full state machine (ABSENT/ACTIVE/GRACE/EXPIRED/INVALID). +- **Retention enforcement**: ClickHouse TTL recomputed on every license change for `executions`, `processor_executions`, `logs`, `agent_metrics`, `agent_events`. Effective TTL = `min(licenseCap, env.configured)`. +- **Standalone `cameleer-license-minter` Maven module** for vendor-side license generation. **Not** in the server runtime/compile classpath. +- **Audit trail**: every install/replace/cap_exceeded/revalidate event under `AuditCategory.LICENSE`. +- **Observability**: 3 Prometheus gauges + 1 counter (see [Telemetry](#telemetry-the-saas-team-can-observe)). +- **Default tier**: small fixed caps when no license is installed; intentionally restrictive. + +## Trust model architecture + +``` + VENDOR / SaaS CUSTOMER (cameleer-server) + +-------------------------+ +------------------------------------+ + | cameleer-license- | | CAMELEER_SERVER_LICENSE_PUBLICKEY | + | minter (CLI/Java) | | CAMELEER_SERVER_TENANT_ID | + | | | | + | Ed25519 PRIVATE key | | Ed25519 PUBLIC key (matching) | + | (HSM / KMS / Vault) | | | + | | | | ^ | + | v | | | validate | + | LicenseMinter.mint | | | | + | | | token (HTTPS) | LicenseValidator | + | +-----token----+----------------->+ | | + | | env-var or POST | v | + +-------------------------+ | LicenseGate (state + limits) | + | | | + | v | + | LicenseEnforcer (cap checks) | + +------------------------------------+ +``` + +The vendor holds the **only** copy of the private key. Customers receive only the public key (over deployment-config channels) and the signed token. A compromised customer can read tokens but cannot forge new ones. + +The minter module physically lives in the cameleer-server repo for shared `LicenseInfo` types but is intentionally absent from the runtime classpath of the server. Verify with: + +```bash +mvn dependency:tree -pl cameleer-server-app | grep license-minter +# expected: empty (or test-scope only on dev branches) +``` + +## Operational playbook + +### Onboarding a new tenant + +1. Choose the tenant id (must match the customer's `CAMELEER_SERVER_TENANT_ID`; lowercase alphanumeric + dashes; immutable). +2. Decide whether to use the shared SaaS signing key or a dedicated per-tenant key. Shared is simpler and standard; per-tenant only if a customer has compliance requirements that mandate isolation. +3. Mint the initial license: + ```bash + java -jar cameleer-license-minter-1.0-SNAPSHOT-cli.jar \ + --private-key=/cameleer-license-priv.pem \ + --tenant= \ + --label=" ()" \ + --expires=2027-04-26 \ + --grace-days=14 \ + --max-environments= \ + --max-apps= \ + --max-agents= \ + --max-users= \ + --max-outbound-connections= \ + --max-alert-rules= \ + --max-total-cpu-millis= \ + --max-total-memory-mb= \ + --max-total-replicas= \ + --max-execution-retention-days= \ + --max-log-retention-days= \ + --max-metric-retention-days= \ + --max-jar-retention-count= \ + --output=/tmp/.lic \ + --public-key=/cameleer-license-pub.b64 \ + --verify + ``` +4. Deliver to the customer's server via either: + - **Container env var** (preferred for SaaS-managed deployments): `CAMELEER_SERVER_LICENSE_TOKEN=` set on the deploy descriptor. Activates at next boot. + - **Admin REST POST** (for hot install on a running server): `POST /api/v1/admin/license` with `{"token": "..."}`. Confirms successful installation in the response body. +5. Confirm acceptance: `GET /api/v1/admin/license` returns `state=ACTIVE`, the audit log shows `install_license`/`SUCCESS`, and `cameleer_license_state{state="ACTIVE"} == 1.0` in Prometheus. + +### Renewing a license + +1. Mint a new token with a later `--expires`. Use a **fresh `licenseId`** so the audit trail clearly distinguishes the renewal from the prior license. +2. Install via admin POST. The PG `license` row is updated in place (one row per tenant, upserted on `tenant_id`); the audit row records `replace_license` with `previousLicenseId`. +3. Confirm `lastValidatedAt` advances on the next 03:00 cron tick (or trigger by restart / `POST /admin/license`). + +### Adjusting caps mid-term + +Same as renewal: mint a new token with the new limits and install. The `limits` map of the new license replaces the prior one entirely (no merging — only `DefaultTierLimits` provides fallback for keys the new license omits). + +If the customer is **lowering** caps below current usage, there is no automatic enforcement against existing entities — only future creates are rejected. Communicate the implication clearly. The `/api/v1/admin/license/usage` endpoint after install will show `current > cap` rows, which is the operator's signal to clean up. + +### Revoking a license + +There is no remote revocation. Practical options: + +1. **Wait for expiry.** Short license terms (12 months max) keep this honest. +2. **Rotate the public key.** Push a new `CAMELEER_SERVER_LICENSE_PUBLICKEY` to the customer's server config and restart. All existing tokens become `INVALID` because the signature no longer verifies. This is destructive (all customers sharing this signing key need a re-issue), so reserve for true compromise scenarios. +3. **Deploy a corrupted token.** If the customer cooperates, set `CAMELEER_SERVER_LICENSE_TOKEN` to garbage; the boot loader marks it `INVALID`, default-tier caps apply. + +In all cases the customer falls to default-tier caps (1 env, 3 apps, 5 agents). They can continue running for evaluation; new creates fail with 403. + +### Migrating a license between server instances + +Tokens are bound to `tenantId`, not to a particular server instance. A token works on any server configured for the same tenant. To migrate: + +1. Provision the new server with `CAMELEER_SERVER_TENANT_ID=` and `CAMELEER_SERVER_LICENSE_PUBLICKEY=`. +2. Install the existing token on the new server (env var or POST). PG state is fresh on the new instance — usage starts at zero. +3. Decommission the old server. + +If both run simultaneously they both pass validation (same token, same key, same tenant id) and both apply the caps independently against their own local state — usage is **not** federated. + +## Key management + +### Where the signing key lives + +The SaaS team's Ed25519 private key is the trust root. Place it in: + +- **Production:** AWS KMS, GCP KMS, Azure Key Vault (with a non-exportable signing key) **or** HashiCorp Vault Transit. The minter API supports signing via a `PrivateKey` instance, so a custom integration that asks the KMS to sign canonicalized payload bytes is straightforward to build on top of `LicenseMinter.canonicalPayload(...)` (it's `static`-accessible for that purpose). +- **Pre-production / dev:** sealed file in a single privileged operator's home directory. Never on a CI server, never in the repo. + +For high-security environments, the minter CLI's `--private-key=` is the wrong fit — it requires the key bytes to be readable. Use the Java API directly: + +```java +PrivateKey kmsKey = kmsClient.getSigningKey("cameleer-license-prod"); +String token = LicenseMinter.mint(info, kmsKey); +``` + +The JCE provider for the KMS handles signing; the private bytes never leave the KMS. + +### Public key distribution + +Each tenant's server reads the public key from `CAMELEER_SERVER_LICENSE_PUBLICKEY` (base64-encoded X.509 SPKI). Distribute via: + +- **Helm values / Kubernetes Secret** for k8s-orchestrated tenants. +- **Docker compose env file** for self-hosted tenants. +- **Bare environment variable on the host** for VM tenants. + +A typo or whitespace difference will cause every license to be rejected. Build a smoke test that boots a sandbox server with the candidate public key and POSTs a known-good test token. + +### Rotation playbook + +Rotation is the trickiest part. The validator does not support multiple public keys — exactly one is configured. Procedure: + +1. **Generate the new keypair** in production storage (KMS / Vault). +2. **Coordinate downtime windows** with each customer running on the old key. There is no overlap-period mechanism; you must: + - Push the new public key to all tenants (config rollout, restart). + - Re-mint and re-deliver every active license under the new key. + - Each customer's server is `INVALID` between the public-key change and the new token install. +3. **Decommission the old private key** only after every active license has been re-issued. + +To avoid emergency rotations, sign with a **fresh** keypair every 24 months on a planned schedule. License terms shorter than the rotation interval keep customer impact bounded — at most one re-issue per customer per rotation. + +## Cap matrix (plan tiers) + +These are suggested values — adjust to your pricing model. Caps not listed fall through to defaults. + +| Limit key | Default (no license) | Starter | Team | Business | Enterprise | +|---|---|---|---|---|---| +| `max_environments` | 1 | 2 | 5 | 10 | 50 | +| `max_apps` | 3 | 10 | 50 | 200 | 1000 | +| `max_agents` | 5 | 20 | 100 | 500 | 5000 | +| `max_users` | 3 | 5 | 25 | 100 | 1000 | +| `max_outbound_connections` | 1 | 5 | 25 | 100 | 500 | +| `max_alert_rules` | 2 | 10 | 50 | 200 | 1000 | +| `max_total_cpu_millis` | 2000 | 8000 | 32000 | 128000 | 512000 | +| `max_total_memory_mb` | 2048 | 8192 | 32768 | 131072 | 524288 | +| `max_total_replicas` | 5 | 25 | 100 | 500 | 2000 | +| `max_execution_retention_days` | 1 | 7 | 30 | 90 | 365 | +| `max_log_retention_days` | 1 | 7 | 30 | 90 | 180 | +| `max_metric_retention_days` | 1 | 7 | 30 | 90 | 180 | +| `max_jar_retention_count` | 3 | 5 | 10 | 25 | 50 | + +## Telemetry the SaaS team can observe + +### Audit log + +Every license event lives in `audit_log` with `category=LICENSE`. Useful queries: + +```sql +-- Last 30 license events for tenant X +SELECT timestamp, username, action, target, result, detail +FROM audit_log +WHERE category = 'LICENSE' +ORDER BY timestamp DESC +LIMIT 30; + +-- Customers hitting caps in the last 24h +SELECT target AS limit, COUNT(*) AS rejections +FROM audit_log +WHERE category = 'LICENSE' AND action = 'cap_exceeded' + AND timestamp > now() - INTERVAL '24 hours' +GROUP BY target +ORDER BY rejections DESC; + +-- Customers running with rejected licenses +SELECT timestamp, detail->>'reason' AS reason, detail->>'source' AS source +FROM audit_log +WHERE category = 'LICENSE' AND action = 'reject_license' +ORDER BY timestamp DESC; +``` + +### Prometheus metrics + +| Metric | Type | Labels | Use | +|---|---|---|---| +| `cameleer_license_state` | gauge | `state` | Dashboard tile: which state is each tenant in. One-hot per state. | +| `cameleer_license_days_remaining` | gauge | (none) | Renewal alerting. Recommended thresholds: warn at 30 days, page at 7 days, critical at 1 day. `-1.0` means no license. | +| `cameleer_license_last_validated_age_seconds` | gauge | (none) | Detect stuck schedulers. Alert at >86400. | +| `cameleer_license_cap_rejections_total` | counter | `limit` | Account-management signal — customers consistently hitting caps are upgrade prospects. | + +### REST API + +`/api/v1/admin/license/usage` returns the per-limit current/cap/source table — wire this into your SaaS-side admin UI for at-a-glance per-tenant view. The endpoint requires an ADMIN-role JWT; SaaS-side automation can mint short-lived ADMIN tokens scoped per tenant or use a shared service account. + +## Failure modes & runbook + +### "Customer reports 403s after upgrade" + +1. Pull `/api/v1/admin/license/usage`. Identify which `limit` row has `current >= cap`. +2. If `state = ACTIVE` and a higher-tier license is owed, mint and install it. +3. If `state = EXPIRED`/`INVALID`/`ABSENT`, fix the license-state issue first — the cap rejection is downstream of that. +4. Confirm by replaying the failing operation; the 403 should clear. + +### "Customer reports state=INVALID" + +1. Pull `/api/v1/admin/license` — note `invalidReason`. +2. Most likely causes: + - Public-key mismatch — the customer's `CAMELEER_SERVER_LICENSE_PUBLICKEY` differs from the key used to mint. Diff the two values byte-for-byte. + - Tenant mismatch — `CAMELEER_SERVER_TENANT_ID` on the server differs from the `--tenant` used when minting. The customer must restart with the correct tenant id (it's immutable for the lifetime of the deployment because it appears in PG schema names and CH partition keys — coordinate carefully). + - Token tampering — base64-decode the payload portion (`.`), confirm the JSON looks well-formed. +3. Re-mint or fix config; re-install. + +### "License will expire in N days" + +1. Alert on `cameleer_license_days_remaining < 30`. +2. Mint a renewal license (new `licenseId`, later `expiresAt`). +3. Install via the customer's preferred channel (env-var on next deploy, or hot via POST). + +### "Audit table fills up with cap_exceeded rows" + +Customer is hammering a creation path. Either: +- They genuinely outgrew their tier — upgrade conversation. +- Their automation has a runaway loop creating environments/apps. Coordinate with the customer to throttle and clean up. + +The `cameleer_license_cap_rejections_total{limit=...}` counter is more efficient for monitoring this than scanning audit; use audit only for forensic detail. + +### "TTL recompute logs WARN: Failed to apply TTL" + +`RetentionPolicyApplier` could not run `ALTER TABLE ... MODIFY TTL` on ClickHouse. The license install itself succeeded; only the retention update failed. Check: +- ClickHouse user has `ALTER` privilege on the cameleer DB. +- ClickHouse version is >= 22.3 (required for `WHERE` predicate on TTL). +- ClickHouse cluster health. + +## Edge cases the SaaS team should know + +- **Default tier is restrictive on purpose.** A customer on default tier cannot stand up a real production workload (1 env, 3 apps, 5 agents, 1-day retention). Onboarding should always include license install before the customer adds any real workload. +- **Grace period defaults to 0.** If you want a buffer between `expiresAt` and capability loss, set `--grace-days=N` at mint time. We recommend 14 days for paid plans so a slipped renewal doesn't immediately drop the customer to default-tier caps. +- **Public key change invalidates all installed tokens immediately on next revalidation.** Daily revalidation runs at 03:00 server-local time, with a 60-second post-startup tick. A surprise public-key rollout will surface as `state=INVALID` for every customer running on the old key on the next tick or restart. +- **Caps reduce on revalidation, not just install.** A token whose `expiresAt` lapses will, at the next revalidation, transition `ACTIVE → GRACE → EXPIRED` automatically, dropping caps to default-tier on the EXPIRED transition. The state change is announced via `LicenseChangedEvent` and triggers TTL recompute. +- **Compute caps are evaluated at deploy time, not at runtime.** A deployment that successfully started under a high-tier license will keep running unchanged when the license downgrades. Only the *next* deploy attempt will see the new cap. +- **Agent count is in-memory.** `max_agents` is enforced against the `AgentRegistryService.liveCount()` (LIVE state agents). Restarts reset the count to zero until agents re-register; this is by design — DEAD agents shouldn't pin a license slot. +- **License id changes on every renewal.** Always use a fresh `UUID.randomUUID()` when minting a renewal. The audit `previousLicenseId` field then tells you which token superseded which. + +## Testing guidance + +Three approaches for dry-running licenses without touching a customer server: + +### 1. Pure unit test — `LicenseMinter` round-trip with `LicenseValidator` + +```java +KeyPair kp = KeyPairGenerator.getInstance("Ed25519").generateKeyPair(); +String pubB64 = Base64.getEncoder().encodeToString(kp.getPublic().getEncoded()); + +LicenseInfo info = new LicenseInfo( + UUID.randomUUID(), "test-tenant", "Test", Map.of("max_apps", 50), + Instant.now(), Instant.now().plus(365, ChronoUnit.DAYS), 0 +); + +String token = LicenseMinter.mint(info, kp.getPrivate()); + +LicenseValidator validator = new LicenseValidator(pubB64, "test-tenant"); +LicenseInfo parsed = validator.validate(token); +assertEquals(info.licenseId(), parsed.licenseId()); +``` + +This is the model already used in `LicenseMinterTest` and `LicenseValidatorTest` in the repo — copy from there. + +### 2. CLI dry-run — mint and self-verify + +```bash +java -jar cameleer-license-minter-1.0-SNAPSHOT-cli.jar \ + --private-key=test-priv.pem \ + --public-key=test-pub.b64 \ + --tenant=test-tenant \ + --expires=2027-12-31 \ + --max-apps=50 \ + --output=/tmp/test.lic \ + --verify +``` + +`--verify` runs the full `LicenseValidator.validate(...)` round-trip and exits 3 on failure. Useful for shaking out wrong-key / wrong-tenant before sending to a customer. + +### 3. Test server with a test public key + +Spin up a sandbox cameleer-server (docker-compose or k8s-test-namespace) with: + +```yaml +environment: + CAMELEER_SERVER_TENANT_ID: test-tenant + CAMELEER_SERVER_LICENSE_PUBLICKEY: +``` + +Install the test license, exercise the customer's reported scenario, observe `state` transitions and audit rows. The `LicenseLifecycleIT` and `LicenseEnforcementIT` integration tests in `cameleer-server-app/src/test/java/.../license/` are good templates for full-stack reproduction. + +## Pointers + +| Document | Audience | +|---|---| +| `cameleer-license-minter/README.md` | Vendor-side mint operations | +| `docs/license-enforcement.md` | End-customer operators (install, monitor, troubleshoot) | +| `docs/superpowers/specs/2026-04-25-license-enforcement-design.md` | Original design rationale | +| `docs/superpowers/plans/2026-04-25-license-enforcement.md` | Implementation plan (36 tasks) | +| `.claude/rules/core-classes.md` `# license/` section | License domain class map | +| `.claude/rules/app-classes.md` `# license/` section | Server license-app class map + endpoint surface | diff --git a/docs/license-enforcement.md b/docs/license-enforcement.md new file mode 100644 index 00000000..2224313d --- /dev/null +++ b/docs/license-enforcement.md @@ -0,0 +1,367 @@ +# License Enforcement + +Operator documentation for the cameleer-server license subsystem. Audience: operators running their own cameleer-server instance who need to install, monitor, or troubleshoot a license. + +For *issuing* licenses, see `cameleer-license-minter/README.md`. For SaaS-team operational playbooks, see `docs/handoff/2026-04-26-license-saas-handoff.md`. + +## Table of contents + +## Overview + +## What gets enforced + +## Install paths and priority + +## Public-key configuration + +## REST API + +## License state machine + +## Default tier caps + +## Cap-exceeded behavior + +## Retention semantics + +## Daily revalidation + +## Audit categories + +## Prometheus metrics + +## Troubleshooting + +--- + +## Overview + +cameleer-server can run in one of two postures: + +- **Default tier (no license installed).** A small fixed cap-set applies (1 environment, 3 apps, 5 agents, 1 day retention, etc.). Suitable for evaluation and self-host single-instance use. The default tier engages automatically when no license is configured. +- **Licensed (token installed).** Caps from the signed token override the default tier on a per-key basis. Any limit key the token does not specify falls through to the default value, so a partial license that only raises `max_environments` and `max_apps` keeps default retention. + +A signed Ed25519 license token carries the customer's `tenantId`, an `expiresAt` timestamp, an optional `gracePeriodDays`, and a `limits` map. The server's `LicenseValidator` (`cameleer-server-core/src/main/java/com/cameleer/server/core/license/LicenseValidator.java`) checks the signature against `CAMELEER_SERVER_LICENSE_PUBLICKEY`, verifies the tenant matches `CAMELEER_SERVER_TENANT_ID`, and rejects expired tokens (past `expiresAt + gracePeriodDays`). + +The license posture is summarized as a `LicenseState`: + +- `ABSENT` — no license configured. Default-tier caps apply. +- `ACTIVE` — valid token, current time is at or before `expiresAt`. License caps apply. +- `GRACE` — past `expiresAt` but within `gracePeriodDays`. License caps still apply; the operator should renew. +- `EXPIRED` — past `expiresAt + gracePeriodDays`. Default-tier caps apply. +- `INVALID` — signature, tenant, or schema validation failed. Default-tier caps apply. + +## What gets enforced + +License caps are enforced through a single component, `LicenseEnforcer.assertWithinCap(limitKey, currentUsage, requestedDelta)`, called from each creation path. + +| Limit key | Enforcement point | Effect when exceeded | +|---|---|---| +| `max_environments` | `EnvironmentService.create(...)` | HTTP 403 from `EnvironmentAdminController.create`. | +| `max_apps` | `AppService.createApp(...)` | HTTP 403 from `AppController.create`. | +| `max_agents` | `AgentRegistryService.register(...)` | HTTP 403 from `AgentRegistrationController.register`. Counted against the in-memory live agent registry. | +| `max_users` | User creation paths in `UserAdminController`, `UiAuthController`, `OidcAuthController` | HTTP 403 (REST) or rejection during OIDC first-login. | +| `max_outbound_connections` | `OutboundConnectionServiceImpl.create(...)` | HTTP 403. | +| `max_alert_rules` | `AlertRuleController.create(...)` | HTTP 403. | +| `max_total_cpu_millis` | `DeploymentExecutor` `PRE_FLIGHT` stage | Deployment fails before pulling images; row is marked FAILED with the cap message in `deployments.error_message`. | +| `max_total_memory_mb` | same | same | +| `max_total_replicas` | same | same | +| `max_jar_retention_count` | `EnvironmentAdminController` PUT `/{envSlug}/jar-retention` | HTTP 403 if requested value > cap. The daily `JarRetentionJob` is also bounded by this cap. | +| `max_execution_retention_days`, `max_log_retention_days`, `max_metric_retention_days` | Not a creation cap; clamps ClickHouse TTL to `min(cap, env.configured)` — see [Retention semantics](#retention-semantics). | + +Note that the three compute caps are checked together at deploy time, after `ConfigMerger.resolve(...)` produces the final `ResolvedContainerConfig` but before the image is pulled. The current usage figure is computed by `LicenseUsageReader.computeUsage()` over non-stopped deployments. + +## Install paths and priority + +Tokens can be installed by four mechanisms; resolution at boot is highest-priority-first: + +1. **`CAMELEER_SERVER_LICENSE_TOKEN` environment variable.** Highest priority. The raw token is read on `@PostConstruct` from `LicenseBeanConfig.LicenseBootLoader`. +2. **`cameleer.server.license.file` Spring property** (or `CAMELEER_SERVER_LICENSE_FILE`). Path to a file containing the token. Read at boot if no env-var token is present. +3. **PostgreSQL `license` table.** Set via the admin REST POST. Loaded at boot if the env var and file both miss. +4. **None of the above.** State is `ABSENT`, default-tier caps apply, the boot loader publishes a `LicenseChangedEvent(ABSENT, null)` so listeners (Prometheus gauges, retention applier) settle on default values. + +If a higher-priority source rejects (signature failure, tenant mismatch, expired) the loader logs the reason and **does not** fall through to a lower-priority source. This is deliberate: an operator who set `CAMELEER_SERVER_LICENSE_TOKEN` expects that token to be the active one, not a silently-stale DB row. + +Any token loaded at boot also flows through `LicenseService.install(...)` so audit, persistence, and `LicenseChangedEvent` publishing are uniform across paths. + +## Public-key configuration + +```bash +export CAMELEER_SERVER_LICENSE_PUBLICKEY="$(cat cameleer-license-pub.b64)" +``` + +The value is the base64 encoding of the Ed25519 public key in X.509 SubjectPublicKeyInfo form (see `cameleer-license-minter/README.md` for generation). + +When `CAMELEER_SERVER_LICENSE_PUBLICKEY` is **unset**: + +- `LicenseBeanConfig.licenseValidator()` (line 62) logs a WARN: `CAMELEER_SERVER_LICENSE_PUBLICKEY not set — all licenses will be rejected as INVALID`. +- The bean is constructed against a throwaway public key whose private counterpart no one holds. The override's `validate(...)` always throws `IllegalStateException("license public key not configured")`. +- Any token loaded from any source routes through `LicenseService.install(...)`, fails validation, marks the gate `INVALID`, and writes a `reject_license` audit row with the failure reason. +- The state will be `INVALID`, default-tier caps apply, and the operator must set the variable and restart (or hot-install via POST after restart). + +## REST API + +All endpoints require an ADMIN-role JWT. Source-of-truth controllers: `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LicenseAdminController.java`, `LicenseUsageController.java`. + +### `GET /api/v1/admin/license` + +```json +{ + "state": "ACTIVE", + "invalidReason": null, + "envelope": { + "licenseId": "fd3a8f2a-1c44-4eac-aa07-1a5d1ce9c4a4", + "tenantId": "acme-prod", + "label": "Acme Production", + "limits": { "max_apps": 25, "max_environments": 3 }, + "issuedAt": "2026-04-26T10:00:00Z", + "expiresAt": "2027-01-01T00:00:00Z", + "gracePeriodDays": 14 + }, + "lastValidatedAt": "2026-04-26T03:00:00Z" +} +``` + +The raw token string is **deliberately not** returned — only the parsed envelope. `lastValidatedAt` is omitted when no DB row exists yet (env-var or file source on first boot before the next revalidation tick). + +### `POST /api/v1/admin/license` + +```bash +curl -X POST https://server.example.com/api/v1/admin/license \ + -H "Authorization: Bearer ${ADMIN_JWT}" \ + -H "Content-Type: application/json" \ + -d '{"token": "eyJ...long.base64.string..."}' +``` + +Body shape: `{"token": ""}`. On success returns `{"state": "ACTIVE", "envelope": {...}}`. On failure returns HTTP 400 with `{"error": ""}`. + +The handler delegates to `LicenseService.install(token, userId, "api")`. Acting `userId` comes from the authenticated principal stripped of the `user:` prefix (see `app-classes.md` user-id convention). + +This endpoint installs *or replaces* — there is one row per tenant in the `license` table, so a successful POST upserts and supersedes any prior token. The previous license id is captured in the `replace_license` audit detail. + +### `GET /api/v1/admin/license/usage` + +```json +{ + "state": "ACTIVE", + "expiresAt": "2027-01-01T00:00:00Z", + "daysRemaining": 250, + "gracePeriodDays": 14, + "tenantId": "acme-prod", + "label": "Acme Production", + "lastValidatedAt": "2026-04-26T03:00:00Z", + "message": "License active. 250 days remaining.", + "limits": [ + {"key": "max_environments", "current": 2, "cap": 3, "source": "license"}, + {"key": "max_apps", "current": 12, "cap": 25, "source": "license"}, + {"key": "max_agents", "current": 38, "cap": 50, "source": "license"}, + {"key": "max_users", "current": 4, "cap": 3, "source": "default"} + ] +} +``` + +For each effective-limits key: +- `current` — current usage. `max_agents` is read from the in-memory `AgentRegistryService.liveCount()`; everything else comes from `LicenseUsageReader.snapshot()` (PostgreSQL counts, plus deployment compute aggregates from `deployed_config_snapshot`). Limits the server does not measure return `0`. +- `cap` — effective cap (license override or default-tier value). +- `source` — `"license"` if the cap came from the token's `limits` map, `"default"` if it fell through. + +## License state machine + +``` + +---------------+ + | ABSENT | (no token configured) + +-------+-------+ + | + | install via env / file / DB / POST + v + +-------+-------+ + +-------------- | ACTIVE | --------------+ + | +-------+-------+ | + | revalidate | now > expiresAt + | fails sig/tenant/ | + | parse v + | +-------+-------+ + | | GRACE | + | +-------+-------+ + | | + | | now > exp + gracePeriodDays + | v + | +-------+-------+ + | | EXPIRED | + | +-------+-------+ + v ++-------+-------+ +| INVALID | (signature mismatch, tenant mismatch, ++---------------+ missing public key, malformed payload) +``` + +Classification logic: `LicenseStateMachine.classify(license, invalidReason)` (`cameleer-server-core/src/main/java/com/cameleer/server/core/license/LicenseStateMachine.java`). + +- `INVALID` and `EXPIRED` revert to **default-tier caps**. The license envelope is dropped from the gate (`getCurrent()` returns null in `INVALID`; the gate retains the parsed info in `EXPIRED` but `getEffectiveLimits()` returns defaults-only). +- `GRACE` keeps **license caps**. This is the only state where the operator should be running but should also be actively working on renewal. + +## Default tier caps + +Source: `cameleer-server-core/src/main/java/com/cameleer/server/core/license/DefaultTierLimits.java`. + +| Key | Default | Semantics | +|---|---|---| +| `max_environments` | 1 | Total environments across the tenant. | +| `max_apps` | 3 | Total apps across all environments. | +| `max_agents` | 5 | Live agents in the in-memory registry (LIVE state). | +| `max_users` | 3 | Local + OIDC users in the `users` table. | +| `max_outbound_connections` | 1 | Rows in `outbound_connections`. | +| `max_alert_rules` | 2 | Rows in `alert_rules`. | +| `max_total_cpu_millis` | 2000 | Sum of `replicas * cpuLimit` over non-stopped deployments. cpuLimit is millicores; 1000 = one core. | +| `max_total_memory_mb` | 2048 | Sum of `replicas * memoryLimitMb` over non-stopped deployments. | +| `max_total_replicas` | 5 | Sum of `replicas` over non-stopped deployments. | +| `max_execution_retention_days` | 1 | Cap on TTL applied to `executions` and `processor_executions`. | +| `max_log_retention_days` | 1 | Cap on TTL applied to `logs`. | +| `max_metric_retention_days` | 1 | Cap on TTL applied to `agent_metrics` and `agent_events`. | +| `max_jar_retention_count` | 3 | Maximum JAR retention count per environment. | + +The default tier is intentionally restrictive — it is sized for evaluation, single-developer demos, and "I forgot to install my license" recovery, not production. New customers should install a license at first onboarding. + +## Cap-exceeded behavior + +When a creation path exceeds its cap, `LicenseEnforcer.assertWithinCap(...)` throws `LicenseCapExceededException(limitKey, current, cap)`. `LicenseExceptionAdvice` (`@ControllerAdvice`) maps it to: + +```http +HTTP/1.1 403 Forbidden +Content-Type: application/json + +{ + "error": "license cap reached", + "limit": "max_apps", + "current": 4, + "cap": 3, + "state": "ABSENT", + "message": "License absent. Default tier limits apply. Cap reached for max_apps (3 of 3 used)." +} +``` + +Concurrently: +- The Prometheus counter `cameleer_license_cap_rejections_total{limit=...}` increments. +- An audit row is written: `category=LICENSE`, `action=cap_exceeded`, `target=`, `result=FAILURE`, `detail` carries `{limit, current, requested, cap, state}`. If audit storage fails, the 403 still surfaces (audit is best-effort here). + +The `message` field is rendered by `LicenseMessageRenderer.forCap(...)` and varies per state — under `EXPIRED` it nudges the operator to renew; under `INVALID` it cites `invalidReason`. + +## Retention semantics + +The license caps `max_execution_retention_days`, `max_log_retention_days`, `max_metric_retention_days`, and `max_jar_retention_count` define **maximums**. Per-environment configuration (`environments.execution_retention_days`, `log_retention_days`, `metric_retention_days`, `jar_retention_count`) defines the **operator preference**. The effective TTL applied to ClickHouse tables is: + +``` +effective = min(licenseCap, env.configuredRetentionDays) +``` + +When `LicenseChangedEvent` fires (any install/replace/revalidate/boot transition), `RetentionPolicyApplier` (`@EventListener @Async`) recomputes TTL for every (table, env) pair using: + +```sql +ALTER TABLE + MODIFY TTL toDateTime() + INTERVAL DAY DELETE + WHERE environment = '' +``` + +Tables affected: `executions`, `processor_executions`, `logs`, `agent_metrics`, `agent_events`. Excluded: +- `route_diagrams` — content-addressed `ReplacingMergeTree`, no time-based TTL. +- `server_metrics` — server-wide, no `environment` column. Its 90-day cap is fixed in the schema. + +ClickHouse failures are logged (WARN) but do not fail the originating license install — TTL recompute is best-effort. + +## Daily revalidation + +`LicenseRevalidationJob` (`@Scheduled(cron = "0 0 3 * * *")`) re-runs `LicenseService.revalidate()` against the persisted token at 03:00 server-local time. It also fires once 60 seconds after `ApplicationReadyEvent` to catch the case where a license was installed via SQL between server starts. + +Each revalidation: +- Re-reads the token from `license` table. +- Runs `LicenseValidator.validate(...)` again — same checks as install (signature, tenant, expiry). +- On success: bumps `last_validated_at`, reloads the gate, publishes `LicenseChangedEvent`. +- On failure: marks the gate `INVALID`, writes an audit row `revalidate_license` / `FAILURE`, publishes `LicenseChangedEvent(INVALID, null)`. + +A token transitioning `ACTIVE → GRACE → EXPIRED` will surface as a state change at the next revalidation tick (or on the next license-touching admin action). + +## Audit categories + +All license lifecycle events use `AuditCategory.LICENSE`. Action codes: + +| Action | Result | Detail keys | +|---|---|---| +| `install_license` | SUCCESS | `licenseId, expiresAt, installedBy, source` | +| `replace_license` | SUCCESS | same plus `previousLicenseId` | +| `reject_license` | FAILURE | `reason, source` | +| `revalidate_license` | FAILURE | `licenseId, reason` | +| `cap_exceeded` | FAILURE | `limit, current, requested, cap, state` | + +The `source` value is one of `env`, `file`, `db`, `api` — corresponds to the install path. + +## Prometheus metrics + +Scraped at `/api/v1/prometheus`. Source: `LicenseMetrics` (`cameleer-server-app/src/main/java/com/cameleer/server/app/license/LicenseMetrics.java`). + +| Metric | Type | Labels | Semantics | +|---|---|---|---| +| `cameleer_license_state` | gauge | `state=` | One-hot per state — exactly one tag value carries `1.0` at any time, others are `0.0`. | +| `cameleer_license_days_remaining` | gauge | (none) | Whole days until `expiresAt`. `-1.0` when no license is loaded (ABSENT/INVALID). Suitable alert thresholds: warn at 30, page at 7. | +| `cameleer_license_last_validated_age_seconds` | gauge | (none) | Seconds since the persisted `last_validated_at`. `0` when there is no DB row. Alerts at >86400 (revalidation hasn't run for >24h) detect a stuck scheduler or a misconfigured server. | +| `cameleer_license_cap_rejections_total` | counter | `limit=` | Incremented every time `LicenseEnforcer` rejects a creation due to a cap. A non-zero rate indicates customers hitting their plan ceiling. | + +Gauges refresh on every `LicenseChangedEvent` and on a 60-second `@Scheduled(fixedDelay)` so values stay current even without state changes. + +## Troubleshooting + +### My license shows `INVALID` — why? + +Check `invalidReason` from `GET /api/v1/admin/license`. Common causes: + +| `invalidReason` substring | Cause | Fix | +|---|---|---| +| `License signature verification failed` | Public key on the server does not match the private key the token was signed with. | Confirm `CAMELEER_SERVER_LICENSE_PUBLICKEY` matches the keypair used to mint the token. | +| `License tenantId 'X' does not match server tenant 'Y'` | Token minted for a different `tenantId`. | Re-mint with `--tenant=` matching `CAMELEER_SERVER_TENANT_ID`. | +| `licenseId is required` / `tenantId is required` / `exp is required` | Malformed token (missing required field). | Re-mint via the supported minter — fields are mandatory. | +| `License expired at <...>` | Past `expiresAt + gracePeriodDays`. | Issue a renewal license. | +| `license public key not configured` | `CAMELEER_SERVER_LICENSE_PUBLICKEY` is unset. | Set the env var and either restart or POST the token again. | + +### I'm getting 403s on creates — which cap is biting? + +```bash +curl https://server.example.com/api/v1/admin/license/usage \ + -H "Authorization: Bearer ${ADMIN_JWT}" +``` + +The `limits[]` array shows current/cap per limit key. Any row with `current >= cap` is a candidate. The 403 response body itself names the limit: + +```json +{"error":"license cap reached","limit":"max_apps","current":3,"cap":3,"state":"ABSENT", ...} +``` + +If `state` is `ABSENT` or `EXPIRED`/`INVALID`, the fix is to install a license. If `state` is `ACTIVE` and you are at the license cap, you need a higher-tier license re-issued. + +### My new license didn't take effect + +1. Check the audit log: + ```bash + curl 'https://server.example.com/api/v1/admin/audit?category=LICENSE&limit=10' \ + -H "Authorization: Bearer ${ADMIN_JWT}" + ``` + You should see an `install_license` or `replace_license` row at `SUCCESS`. A `reject_license` `FAILURE` row carries the reason. +2. Confirm the public key matches the private key used to mint: + - Vendor side: `openssl pkey -in -pubout -outform DER | base64 -w0` + - Server side: `echo $CAMELEER_SERVER_LICENSE_PUBLICKEY` + - These must be byte-identical. +3. Confirm `CAMELEER_SERVER_TENANT_ID` matches the `tenantId` in the token envelope (`GET /api/v1/admin/license`). +4. If the env var token disagrees with what's in the DB (e.g. you POSTed but a stale env var remains): the env var wins on next boot. Either remove the env var or update it before restarting. + +### Cap rejections spiking but no licensed customer should be hitting the cap + +Inspect `cameleer_license_cap_rejections_total{limit=...}`. If a tenant is on default tier (state = `ABSENT`/`EXPIRED`/`INVALID`) the very low default caps will trip immediately on routine activity. Install a license to restore expected behavior. + +### Retention TTL didn't change after installing a license + +`RetentionPolicyApplier` runs on `LicenseChangedEvent` asynchronously (`@Async`). Look for the log line: + +``` +License changed (state=ACTIVE) — recomputing TTL across N environment(s) and 5 table(s) +Applied TTL: table=executions env=prod days=30 (cap=30, configured=90) +``` + +If the log shows `Failed to apply TTL` warnings, ClickHouse rejected the `ALTER TABLE ... MODIFY TTL` statement — most often because of a permissions issue or a ClickHouse version below 22.3. The license install itself still succeeded; the TTL change just didn't land.