cameleer-license-minter/README.md — vendor-side guide: build, public LicenseMinter API, CLI usage with all flags, token format (standard base64, not url-safe), LicenseInfo schema, Ed25519 key generation, worked example, security guidance, runtime-separation verification. docs/license-enforcement.md — operator guide: install paths and priority (env > file > DB > none), public-key config, REST API, state machine (ABSENT/ACTIVE/GRACE/EXPIRED/INVALID), default tier caps, 403 envelope semantics, retention TTL recompute, daily revalidation, audit + Prometheus surfaces, troubleshooting. docs/handoff/2026-04-26-license-saas-handoff.md — SaaS playbook: trust model, onboarding/renewal/revocation runbooks, key management, cap matrix per plan tier, telemetry, failure modes, testing guidance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 KiB
License Enforcement — SaaS Handoff (2026-04-26)
Handoff for the cameleer-saas team and customer-success engineers operating customer-facing cameleer-server deployments. Covers issuing, renewing, revoking, and operationally observing licenses.
For end-customer operator docs, see docs/license-enforcement.md. For minting tooling, see cameleer-license-minter/README.md. For the original design + plan, see:
docs/superpowers/specs/2026-04-25-license-enforcement-design.mddocs/superpowers/plans/2026-04-25-license-enforcement.md
Table of contents
Session context
What this delivers
Trust model architecture
Operational playbook
Key management
Cap matrix (plan tiers)
Telemetry the SaaS team can observe
Failure modes & runbook
Edge cases the SaaS team should know
Testing guidance
Pointers
Session context
- Branch:
feature/runtime-hardening - Commit range:
ec51aef8..140ea884— 40 commits delivering the full feature (3 doc/spec/plan commits + 14 implementation commits + 23 follow-ons covering enforcement, retention, metrics, REST surface, integration tests, and rules updates). - Plan tasks: 36 of 36 complete. Tests green: core (122), minter (7), app unit (230), key ITs (
PostgresLicenseRepositoryIT,LicenseLifecycleIT,LicenseEnforcementIT,RetentionRuntimeRecomputeIT,SchemaBootstrapIT). - Persisted state: Flyway migration V5 — adds the
licensetable and three retention columns onenvironments(execution_retention_days,log_retention_days,metric_retention_days).
Key SHAs
| SHA | Subject |
|---|---|
ec51aef8 |
start of plan (above this is unrelated runtime-hardening work) |
551a7f12 |
refactor(license): remove dead Feature enum and isEnabled scaffolding |
2ebe4989..0499a54e |
LicenseInfo / Validator / Limits / Gate redesign |
896b7e6e..f6657f81 |
Standalone cameleer-license-minter module |
20aefd5b..b95e80a2 |
PG schema, repository, service, boot wiring |
2bad9c3e..e198c13e |
Enforcement points, retention applier, REST surface, metrics, ITs |
140ea884 |
docs(rules): document license enforcement classes + endpoints (head) |
What this delivers
- Cap enforcement at 8 surfaces (env/app/agent/user/outbound/alert-rule creation, deploy-time compute caps, jar retention).
- License lifecycle: install (env > file > DB > API), daily revalidation cron + 60s post-startup tick, grace period, full state machine (ABSENT/ACTIVE/GRACE/EXPIRED/INVALID).
- Retention enforcement: ClickHouse TTL recomputed on every license change for
executions,processor_executions,logs,agent_metrics,agent_events. Effective TTL =min(licenseCap, env.configured). - Standalone
cameleer-license-minterMaven module for vendor-side license generation. Not in the server runtime/compile classpath. - Audit trail: every install/replace/cap_exceeded/revalidate event under
AuditCategory.LICENSE. - Observability: 3 Prometheus gauges + 1 counter (see Telemetry).
- Default tier: small fixed caps when no license is installed; intentionally restrictive.
Trust model architecture
VENDOR / SaaS CUSTOMER (cameleer-server)
+-------------------------+ +------------------------------------+
| cameleer-license- | | CAMELEER_SERVER_LICENSE_PUBLICKEY |
| minter (CLI/Java) | | CAMELEER_SERVER_TENANT_ID |
| | | |
| Ed25519 PRIVATE key | | Ed25519 PUBLIC key (matching) |
| (HSM / KMS / Vault) | | |
| | | | ^ |
| v | | | validate |
| LicenseMinter.mint | | | |
| | | token (HTTPS) | LicenseValidator |
| +-----token----+----------------->+ | |
| | env-var or POST | v |
+-------------------------+ | LicenseGate (state + limits) |
| | |
| v |
| LicenseEnforcer (cap checks) |
+------------------------------------+
The vendor holds the only copy of the private key. Customers receive only the public key (over deployment-config channels) and the signed token. A compromised customer can read tokens but cannot forge new ones.
The minter module physically lives in the cameleer-server repo for shared LicenseInfo types but is intentionally absent from the runtime classpath of the server. Verify with:
mvn dependency:tree -pl cameleer-server-app | grep license-minter
# expected: empty (or test-scope only on dev branches)
Operational playbook
Onboarding a new tenant
- Choose the tenant id (must match the customer's
CAMELEER_SERVER_TENANT_ID; lowercase alphanumeric + dashes; immutable). - Decide whether to use the shared SaaS signing key or a dedicated per-tenant key. Shared is simpler and standard; per-tenant only if a customer has compliance requirements that mandate isolation.
- Mint the initial license:
java -jar cameleer-license-minter-1.0-SNAPSHOT-cli.jar \ --private-key=<vault path>/cameleer-license-priv.pem \ --tenant=<tenant id> \ --label="<Customer Name> (<Plan>)" \ --expires=2027-04-26 \ --grace-days=14 \ --max-environments=<plan> \ --max-apps=<plan> \ --max-agents=<plan> \ --max-users=<plan> \ --max-outbound-connections=<plan> \ --max-alert-rules=<plan> \ --max-total-cpu-millis=<plan> \ --max-total-memory-mb=<plan> \ --max-total-replicas=<plan> \ --max-execution-retention-days=<plan> \ --max-log-retention-days=<plan> \ --max-metric-retention-days=<plan> \ --max-jar-retention-count=<plan> \ --output=/tmp/<tenant>.lic \ --public-key=<vault path>/cameleer-license-pub.b64 \ --verify - Deliver to the customer's server via either:
- Container env var (preferred for SaaS-managed deployments):
CAMELEER_SERVER_LICENSE_TOKEN=<token>set on the deploy descriptor. Activates at next boot. - Admin REST POST (for hot install on a running server):
POST /api/v1/admin/licensewith{"token": "..."}. Confirms successful installation in the response body.
- Container env var (preferred for SaaS-managed deployments):
- Confirm acceptance:
GET /api/v1/admin/licensereturnsstate=ACTIVE, the audit log showsinstall_license/SUCCESS, andcameleer_license_state{state="ACTIVE"} == 1.0in Prometheus.
Renewing a license
- Mint a new token with a later
--expires. Use a freshlicenseIdso the audit trail clearly distinguishes the renewal from the prior license. - Install via admin POST. The PG
licenserow is updated in place (one row per tenant, upserted ontenant_id); the audit row recordsreplace_licensewithpreviousLicenseId. - Confirm
lastValidatedAtadvances on the next 03:00 cron tick (or trigger by restart /POST /admin/license).
Adjusting caps mid-term
Same as renewal: mint a new token with the new limits and install. The limits map of the new license replaces the prior one entirely (no merging — only DefaultTierLimits provides fallback for keys the new license omits).
If the customer is lowering caps below current usage, there is no automatic enforcement against existing entities — only future creates are rejected. Communicate the implication clearly. The /api/v1/admin/license/usage endpoint after install will show current > cap rows, which is the operator's signal to clean up.
Revoking a license
There is no remote revocation. Practical options:
- Wait for expiry. Short license terms (12 months max) keep this honest.
- Rotate the public key. Push a new
CAMELEER_SERVER_LICENSE_PUBLICKEYto the customer's server config and restart. All existing tokens becomeINVALIDbecause the signature no longer verifies. This is destructive (all customers sharing this signing key need a re-issue), so reserve for true compromise scenarios. - Deploy a corrupted token. If the customer cooperates, set
CAMELEER_SERVER_LICENSE_TOKENto garbage; the boot loader marks itINVALID, default-tier caps apply.
In all cases the customer falls to default-tier caps (1 env, 3 apps, 5 agents). They can continue running for evaluation; new creates fail with 403.
Migrating a license between server instances
Tokens are bound to tenantId, not to a particular server instance. A token works on any server configured for the same tenant. To migrate:
- Provision the new server with
CAMELEER_SERVER_TENANT_ID=<same id>andCAMELEER_SERVER_LICENSE_PUBLICKEY=<same key>. - Install the existing token on the new server (env var or POST). PG state is fresh on the new instance — usage starts at zero.
- Decommission the old server.
If both run simultaneously they both pass validation (same token, same key, same tenant id) and both apply the caps independently against their own local state — usage is not federated.
Key management
Where the signing key lives
The SaaS team's Ed25519 private key is the trust root. Place it in:
- Production: AWS KMS, GCP KMS, Azure Key Vault (with a non-exportable signing key) or HashiCorp Vault Transit. The minter API supports signing via a
PrivateKeyinstance, so a custom integration that asks the KMS to sign canonicalized payload bytes is straightforward to build on top ofLicenseMinter.canonicalPayload(...)(it'sstatic-accessible for that purpose). - Pre-production / dev: sealed file in a single privileged operator's home directory. Never on a CI server, never in the repo.
For high-security environments, the minter CLI's --private-key=<path> is the wrong fit — it requires the key bytes to be readable. Use the Java API directly:
PrivateKey kmsKey = kmsClient.getSigningKey("cameleer-license-prod");
String token = LicenseMinter.mint(info, kmsKey);
The JCE provider for the KMS handles signing; the private bytes never leave the KMS.
Public key distribution
Each tenant's server reads the public key from CAMELEER_SERVER_LICENSE_PUBLICKEY (base64-encoded X.509 SPKI). Distribute via:
- Helm values / Kubernetes Secret for k8s-orchestrated tenants.
- Docker compose env file for self-hosted tenants.
- Bare environment variable on the host for VM tenants.
A typo or whitespace difference will cause every license to be rejected. Build a smoke test that boots a sandbox server with the candidate public key and POSTs a known-good test token.
Rotation playbook
Rotation is the trickiest part. The validator does not support multiple public keys — exactly one is configured. Procedure:
- Generate the new keypair in production storage (KMS / Vault).
- Coordinate downtime windows with each customer running on the old key. There is no overlap-period mechanism; you must:
- Push the new public key to all tenants (config rollout, restart).
- Re-mint and re-deliver every active license under the new key.
- Each customer's server is
INVALIDbetween the public-key change and the new token install.
- Decommission the old private key only after every active license has been re-issued.
To avoid emergency rotations, sign with a fresh keypair every 24 months on a planned schedule. License terms shorter than the rotation interval keep customer impact bounded — at most one re-issue per customer per rotation.
Cap matrix (plan tiers)
These are suggested values — adjust to your pricing model. Caps not listed fall through to defaults.
| Limit key | Default (no license) | Starter | Team | Business | Enterprise |
|---|---|---|---|---|---|
max_environments |
1 | 2 | 5 | 10 | 50 |
max_apps |
3 | 10 | 50 | 200 | 1000 |
max_agents |
5 | 20 | 100 | 500 | 5000 |
max_users |
3 | 5 | 25 | 100 | 1000 |
max_outbound_connections |
1 | 5 | 25 | 100 | 500 |
max_alert_rules |
2 | 10 | 50 | 200 | 1000 |
max_total_cpu_millis |
2000 | 8000 | 32000 | 128000 | 512000 |
max_total_memory_mb |
2048 | 8192 | 32768 | 131072 | 524288 |
max_total_replicas |
5 | 25 | 100 | 500 | 2000 |
max_execution_retention_days |
1 | 7 | 30 | 90 | 365 |
max_log_retention_days |
1 | 7 | 30 | 90 | 180 |
max_metric_retention_days |
1 | 7 | 30 | 90 | 180 |
max_jar_retention_count |
3 | 5 | 10 | 25 | 50 |
Telemetry the SaaS team can observe
Audit log
Every license event lives in audit_log with category=LICENSE. Useful queries:
-- Last 30 license events for tenant X
SELECT timestamp, username, action, target, result, detail
FROM audit_log
WHERE category = 'LICENSE'
ORDER BY timestamp DESC
LIMIT 30;
-- Customers hitting caps in the last 24h
SELECT target AS limit, COUNT(*) AS rejections
FROM audit_log
WHERE category = 'LICENSE' AND action = 'cap_exceeded'
AND timestamp > now() - INTERVAL '24 hours'
GROUP BY target
ORDER BY rejections DESC;
-- Customers running with rejected licenses
SELECT timestamp, detail->>'reason' AS reason, detail->>'source' AS source
FROM audit_log
WHERE category = 'LICENSE' AND action = 'reject_license'
ORDER BY timestamp DESC;
Prometheus metrics
| Metric | Type | Labels | Use |
|---|---|---|---|
cameleer_license_state |
gauge | state |
Dashboard tile: which state is each tenant in. One-hot per state. |
cameleer_license_days_remaining |
gauge | (none) | Renewal alerting. Recommended thresholds: warn at 30 days, page at 7 days, critical at 1 day. -1.0 means no license. |
cameleer_license_last_validated_age_seconds |
gauge | (none) | Detect stuck schedulers. Alert at >86400. |
cameleer_license_cap_rejections_total |
counter | limit |
Account-management signal — customers consistently hitting caps are upgrade prospects. |
REST API
/api/v1/admin/license/usage returns the per-limit current/cap/source table — wire this into your SaaS-side admin UI for at-a-glance per-tenant view. The endpoint requires an ADMIN-role JWT; SaaS-side automation can mint short-lived ADMIN tokens scoped per tenant or use a shared service account.
Failure modes & runbook
"Customer reports 403s after upgrade"
- Pull
/api/v1/admin/license/usage. Identify whichlimitrow hascurrent >= cap. - If
state = ACTIVEand a higher-tier license is owed, mint and install it. - If
state = EXPIRED/INVALID/ABSENT, fix the license-state issue first — the cap rejection is downstream of that. - Confirm by replaying the failing operation; the 403 should clear.
"Customer reports state=INVALID"
- Pull
/api/v1/admin/license— noteinvalidReason. - Most likely causes:
- Public-key mismatch — the customer's
CAMELEER_SERVER_LICENSE_PUBLICKEYdiffers from the key used to mint. Diff the two values byte-for-byte. - Tenant mismatch —
CAMELEER_SERVER_TENANT_IDon the server differs from the--tenantused when minting. The customer must restart with the correct tenant id (it's immutable for the lifetime of the deployment because it appears in PG schema names and CH partition keys — coordinate carefully). - Token tampering — base64-decode the payload portion (
<base64payload>.<base64sig>), confirm the JSON looks well-formed.
- Public-key mismatch — the customer's
- Re-mint or fix config; re-install.
"License will expire in N days"
- Alert on
cameleer_license_days_remaining < 30. - Mint a renewal license (new
licenseId, laterexpiresAt). - Install via the customer's preferred channel (env-var on next deploy, or hot via POST).
"Audit table fills up with cap_exceeded rows"
Customer is hammering a creation path. Either:
- They genuinely outgrew their tier — upgrade conversation.
- Their automation has a runaway loop creating environments/apps. Coordinate with the customer to throttle and clean up.
The cameleer_license_cap_rejections_total{limit=...} counter is more efficient for monitoring this than scanning audit; use audit only for forensic detail.
"TTL recompute logs WARN: Failed to apply TTL"
RetentionPolicyApplier could not run ALTER TABLE ... MODIFY TTL on ClickHouse. The license install itself succeeded; only the retention update failed. Check:
- ClickHouse user has
ALTERprivilege on the cameleer DB. - ClickHouse version is >= 22.3 (required for
WHEREpredicate on TTL). - ClickHouse cluster health.
Edge cases the SaaS team should know
- Default tier is restrictive on purpose. A customer on default tier cannot stand up a real production workload (1 env, 3 apps, 5 agents, 1-day retention). Onboarding should always include license install before the customer adds any real workload.
- Grace period defaults to 0. If you want a buffer between
expiresAtand capability loss, set--grace-days=Nat mint time. We recommend 14 days for paid plans so a slipped renewal doesn't immediately drop the customer to default-tier caps. - Public key change invalidates all installed tokens immediately on next revalidation. Daily revalidation runs at 03:00 server-local time, with a 60-second post-startup tick. A surprise public-key rollout will surface as
state=INVALIDfor every customer running on the old key on the next tick or restart. - Caps reduce on revalidation, not just install. A token whose
expiresAtlapses will, at the next revalidation, transitionACTIVE → GRACE → EXPIREDautomatically, dropping caps to default-tier on the EXPIRED transition. The state change is announced viaLicenseChangedEventand triggers TTL recompute. - Compute caps are evaluated at deploy time, not at runtime. A deployment that successfully started under a high-tier license will keep running unchanged when the license downgrades. Only the next deploy attempt will see the new cap.
- Agent count is in-memory.
max_agentsis enforced against theAgentRegistryService.liveCount()(LIVE state agents). Restarts reset the count to zero until agents re-register; this is by design — DEAD agents shouldn't pin a license slot. - License id changes on every renewal. Always use a fresh
UUID.randomUUID()when minting a renewal. The auditpreviousLicenseIdfield then tells you which token superseded which.
Testing guidance
Three approaches for dry-running licenses without touching a customer server:
1. Pure unit test — LicenseMinter round-trip with LicenseValidator
KeyPair kp = KeyPairGenerator.getInstance("Ed25519").generateKeyPair();
String pubB64 = Base64.getEncoder().encodeToString(kp.getPublic().getEncoded());
LicenseInfo info = new LicenseInfo(
UUID.randomUUID(), "test-tenant", "Test", Map.of("max_apps", 50),
Instant.now(), Instant.now().plus(365, ChronoUnit.DAYS), 0
);
String token = LicenseMinter.mint(info, kp.getPrivate());
LicenseValidator validator = new LicenseValidator(pubB64, "test-tenant");
LicenseInfo parsed = validator.validate(token);
assertEquals(info.licenseId(), parsed.licenseId());
This is the model already used in LicenseMinterTest and LicenseValidatorTest in the repo — copy from there.
2. CLI dry-run — mint and self-verify
java -jar cameleer-license-minter-1.0-SNAPSHOT-cli.jar \
--private-key=test-priv.pem \
--public-key=test-pub.b64 \
--tenant=test-tenant \
--expires=2027-12-31 \
--max-apps=50 \
--output=/tmp/test.lic \
--verify
--verify runs the full LicenseValidator.validate(...) round-trip and exits 3 on failure. Useful for shaking out wrong-key / wrong-tenant before sending to a customer.
3. Test server with a test public key
Spin up a sandbox cameleer-server (docker-compose or k8s-test-namespace) with:
environment:
CAMELEER_SERVER_TENANT_ID: test-tenant
CAMELEER_SERVER_LICENSE_PUBLICKEY: <test public key base64>
Install the test license, exercise the customer's reported scenario, observe state transitions and audit rows. The LicenseLifecycleIT and LicenseEnforcementIT integration tests in cameleer-server-app/src/test/java/.../license/ are good templates for full-stack reproduction.
Pointers
| Document | Audience |
|---|---|
cameleer-license-minter/README.md |
Vendor-side mint operations |
docs/license-enforcement.md |
End-customer operators (install, monitor, troubleshoot) |
docs/superpowers/specs/2026-04-25-license-enforcement-design.md |
Original design rationale |
docs/superpowers/plans/2026-04-25-license-enforcement.md |
Implementation plan (36 tasks) |
.claude/rules/core-classes.md # license/ section |
License domain class map |
.claude/rules/app-classes.md # license/ section |
Server license-app class map + endpoint surface |