Files
cameleer-server/docs/handoff/2026-04-26-license-saas-handoff.md
hsiegeln 5864553fed docs(license): minter README + operator guide + SaaS handoff
cameleer-license-minter/README.md — vendor-side guide: build, public
LicenseMinter API, CLI usage with all flags, token format (standard
base64, not url-safe), LicenseInfo schema, Ed25519 key generation,
worked example, security guidance, runtime-separation verification.

docs/license-enforcement.md — operator guide: install paths and
priority (env > file > DB > none), public-key config, REST API,
state machine (ABSENT/ACTIVE/GRACE/EXPIRED/INVALID), default tier
caps, 403 envelope semantics, retention TTL recompute, daily
revalidation, audit + Prometheus surfaces, troubleshooting.

docs/handoff/2026-04-26-license-saas-handoff.md — SaaS playbook:
trust model, onboarding/renewal/revocation runbooks, key management,
cap matrix per plan tier, telemetry, failure modes, testing guidance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:33:12 +02:00

21 KiB

License Enforcement — SaaS Handoff (2026-04-26)

Handoff for the cameleer-saas team and customer-success engineers operating customer-facing cameleer-server deployments. Covers issuing, renewing, revoking, and operationally observing licenses.

For end-customer operator docs, see docs/license-enforcement.md. For minting tooling, see cameleer-license-minter/README.md. For the original design + plan, see:

  • docs/superpowers/specs/2026-04-25-license-enforcement-design.md
  • docs/superpowers/plans/2026-04-25-license-enforcement.md

Table of contents

Session context

What this delivers

Trust model architecture

Operational playbook

Key management

Cap matrix (plan tiers)

Telemetry the SaaS team can observe

Failure modes & runbook

Edge cases the SaaS team should know

Testing guidance

Pointers


Session context

  • Branch: feature/runtime-hardening
  • Commit range: ec51aef8..140ea884 — 40 commits delivering the full feature (3 doc/spec/plan commits + 14 implementation commits + 23 follow-ons covering enforcement, retention, metrics, REST surface, integration tests, and rules updates).
  • Plan tasks: 36 of 36 complete. Tests green: core (122), minter (7), app unit (230), key ITs (PostgresLicenseRepositoryIT, LicenseLifecycleIT, LicenseEnforcementIT, RetentionRuntimeRecomputeIT, SchemaBootstrapIT).
  • Persisted state: Flyway migration V5 — adds the license table and three retention columns on environments (execution_retention_days, log_retention_days, metric_retention_days).

Key SHAs

SHA Subject
ec51aef8 start of plan (above this is unrelated runtime-hardening work)
551a7f12 refactor(license): remove dead Feature enum and isEnabled scaffolding
2ebe4989..0499a54e LicenseInfo / Validator / Limits / Gate redesign
896b7e6e..f6657f81 Standalone cameleer-license-minter module
20aefd5b..b95e80a2 PG schema, repository, service, boot wiring
2bad9c3e..e198c13e Enforcement points, retention applier, REST surface, metrics, ITs
140ea884 docs(rules): document license enforcement classes + endpoints (head)

What this delivers

  • Cap enforcement at 8 surfaces (env/app/agent/user/outbound/alert-rule creation, deploy-time compute caps, jar retention).
  • License lifecycle: install (env > file > DB > API), daily revalidation cron + 60s post-startup tick, grace period, full state machine (ABSENT/ACTIVE/GRACE/EXPIRED/INVALID).
  • Retention enforcement: ClickHouse TTL recomputed on every license change for executions, processor_executions, logs, agent_metrics, agent_events. Effective TTL = min(licenseCap, env.configured).
  • Standalone cameleer-license-minter Maven module for vendor-side license generation. Not in the server runtime/compile classpath.
  • Audit trail: every install/replace/cap_exceeded/revalidate event under AuditCategory.LICENSE.
  • Observability: 3 Prometheus gauges + 1 counter (see Telemetry).
  • Default tier: small fixed caps when no license is installed; intentionally restrictive.

Trust model architecture

            VENDOR / SaaS                              CUSTOMER (cameleer-server)
   +-------------------------+                  +------------------------------------+
   |  cameleer-license-      |                  |  CAMELEER_SERVER_LICENSE_PUBLICKEY  |
   |  minter (CLI/Java)      |                  |  CAMELEER_SERVER_TENANT_ID          |
   |                         |                  |                                    |
   |   Ed25519 PRIVATE key   |                  |   Ed25519 PUBLIC key (matching)    |
   |   (HSM / KMS / Vault)   |                  |                                    |
   |          |              |                  |          ^                         |
   |          v              |                  |          | validate                |
   |   LicenseMinter.mint    |                  |          |                         |
   |          |              |   token (HTTPS)  |   LicenseValidator                 |
   |          +-----token----+----------------->+          |                         |
   |                         |  env-var or POST |          v                         |
   +-------------------------+                  |   LicenseGate (state + limits)    |
                                                |          |                         |
                                                |          v                         |
                                                |   LicenseEnforcer (cap checks)    |
                                                +------------------------------------+

The vendor holds the only copy of the private key. Customers receive only the public key (over deployment-config channels) and the signed token. A compromised customer can read tokens but cannot forge new ones.

The minter module physically lives in the cameleer-server repo for shared LicenseInfo types but is intentionally absent from the runtime classpath of the server. Verify with:

mvn dependency:tree -pl cameleer-server-app | grep license-minter
# expected: empty (or test-scope only on dev branches)

Operational playbook

Onboarding a new tenant

  1. Choose the tenant id (must match the customer's CAMELEER_SERVER_TENANT_ID; lowercase alphanumeric + dashes; immutable).
  2. Decide whether to use the shared SaaS signing key or a dedicated per-tenant key. Shared is simpler and standard; per-tenant only if a customer has compliance requirements that mandate isolation.
  3. Mint the initial license:
    java -jar cameleer-license-minter-1.0-SNAPSHOT-cli.jar \
        --private-key=<vault path>/cameleer-license-priv.pem \
        --tenant=<tenant id> \
        --label="<Customer Name> (<Plan>)" \
        --expires=2027-04-26 \
        --grace-days=14 \
        --max-environments=<plan> \
        --max-apps=<plan> \
        --max-agents=<plan> \
        --max-users=<plan> \
        --max-outbound-connections=<plan> \
        --max-alert-rules=<plan> \
        --max-total-cpu-millis=<plan> \
        --max-total-memory-mb=<plan> \
        --max-total-replicas=<plan> \
        --max-execution-retention-days=<plan> \
        --max-log-retention-days=<plan> \
        --max-metric-retention-days=<plan> \
        --max-jar-retention-count=<plan> \
        --output=/tmp/<tenant>.lic \
        --public-key=<vault path>/cameleer-license-pub.b64 \
        --verify
    
  4. Deliver to the customer's server via either:
    • Container env var (preferred for SaaS-managed deployments): CAMELEER_SERVER_LICENSE_TOKEN=<token> set on the deploy descriptor. Activates at next boot.
    • Admin REST POST (for hot install on a running server): POST /api/v1/admin/license with {"token": "..."}. Confirms successful installation in the response body.
  5. Confirm acceptance: GET /api/v1/admin/license returns state=ACTIVE, the audit log shows install_license/SUCCESS, and cameleer_license_state{state="ACTIVE"} == 1.0 in Prometheus.

Renewing a license

  1. Mint a new token with a later --expires. Use a fresh licenseId so the audit trail clearly distinguishes the renewal from the prior license.
  2. Install via admin POST. The PG license row is updated in place (one row per tenant, upserted on tenant_id); the audit row records replace_license with previousLicenseId.
  3. Confirm lastValidatedAt advances on the next 03:00 cron tick (or trigger by restart / POST /admin/license).

Adjusting caps mid-term

Same as renewal: mint a new token with the new limits and install. The limits map of the new license replaces the prior one entirely (no merging — only DefaultTierLimits provides fallback for keys the new license omits).

If the customer is lowering caps below current usage, there is no automatic enforcement against existing entities — only future creates are rejected. Communicate the implication clearly. The /api/v1/admin/license/usage endpoint after install will show current > cap rows, which is the operator's signal to clean up.

Revoking a license

There is no remote revocation. Practical options:

  1. Wait for expiry. Short license terms (12 months max) keep this honest.
  2. Rotate the public key. Push a new CAMELEER_SERVER_LICENSE_PUBLICKEY to the customer's server config and restart. All existing tokens become INVALID because the signature no longer verifies. This is destructive (all customers sharing this signing key need a re-issue), so reserve for true compromise scenarios.
  3. Deploy a corrupted token. If the customer cooperates, set CAMELEER_SERVER_LICENSE_TOKEN to garbage; the boot loader marks it INVALID, default-tier caps apply.

In all cases the customer falls to default-tier caps (1 env, 3 apps, 5 agents). They can continue running for evaluation; new creates fail with 403.

Migrating a license between server instances

Tokens are bound to tenantId, not to a particular server instance. A token works on any server configured for the same tenant. To migrate:

  1. Provision the new server with CAMELEER_SERVER_TENANT_ID=<same id> and CAMELEER_SERVER_LICENSE_PUBLICKEY=<same key>.
  2. Install the existing token on the new server (env var or POST). PG state is fresh on the new instance — usage starts at zero.
  3. Decommission the old server.

If both run simultaneously they both pass validation (same token, same key, same tenant id) and both apply the caps independently against their own local state — usage is not federated.

Key management

Where the signing key lives

The SaaS team's Ed25519 private key is the trust root. Place it in:

  • Production: AWS KMS, GCP KMS, Azure Key Vault (with a non-exportable signing key) or HashiCorp Vault Transit. The minter API supports signing via a PrivateKey instance, so a custom integration that asks the KMS to sign canonicalized payload bytes is straightforward to build on top of LicenseMinter.canonicalPayload(...) (it's static-accessible for that purpose).
  • Pre-production / dev: sealed file in a single privileged operator's home directory. Never on a CI server, never in the repo.

For high-security environments, the minter CLI's --private-key=<path> is the wrong fit — it requires the key bytes to be readable. Use the Java API directly:

PrivateKey kmsKey = kmsClient.getSigningKey("cameleer-license-prod");
String token = LicenseMinter.mint(info, kmsKey);

The JCE provider for the KMS handles signing; the private bytes never leave the KMS.

Public key distribution

Each tenant's server reads the public key from CAMELEER_SERVER_LICENSE_PUBLICKEY (base64-encoded X.509 SPKI). Distribute via:

  • Helm values / Kubernetes Secret for k8s-orchestrated tenants.
  • Docker compose env file for self-hosted tenants.
  • Bare environment variable on the host for VM tenants.

A typo or whitespace difference will cause every license to be rejected. Build a smoke test that boots a sandbox server with the candidate public key and POSTs a known-good test token.

Rotation playbook

Rotation is the trickiest part. The validator does not support multiple public keys — exactly one is configured. Procedure:

  1. Generate the new keypair in production storage (KMS / Vault).
  2. Coordinate downtime windows with each customer running on the old key. There is no overlap-period mechanism; you must:
    • Push the new public key to all tenants (config rollout, restart).
    • Re-mint and re-deliver every active license under the new key.
    • Each customer's server is INVALID between the public-key change and the new token install.
  3. Decommission the old private key only after every active license has been re-issued.

To avoid emergency rotations, sign with a fresh keypair every 24 months on a planned schedule. License terms shorter than the rotation interval keep customer impact bounded — at most one re-issue per customer per rotation.

Cap matrix (plan tiers)

These are suggested values — adjust to your pricing model. Caps not listed fall through to defaults.

Limit key Default (no license) Starter Team Business Enterprise
max_environments 1 2 5 10 50
max_apps 3 10 50 200 1000
max_agents 5 20 100 500 5000
max_users 3 5 25 100 1000
max_outbound_connections 1 5 25 100 500
max_alert_rules 2 10 50 200 1000
max_total_cpu_millis 2000 8000 32000 128000 512000
max_total_memory_mb 2048 8192 32768 131072 524288
max_total_replicas 5 25 100 500 2000
max_execution_retention_days 1 7 30 90 365
max_log_retention_days 1 7 30 90 180
max_metric_retention_days 1 7 30 90 180
max_jar_retention_count 3 5 10 25 50

Telemetry the SaaS team can observe

Audit log

Every license event lives in audit_log with category=LICENSE. Useful queries:

-- Last 30 license events for tenant X
SELECT timestamp, username, action, target, result, detail
FROM audit_log
WHERE category = 'LICENSE'
ORDER BY timestamp DESC
LIMIT 30;

-- Customers hitting caps in the last 24h
SELECT target AS limit, COUNT(*) AS rejections
FROM audit_log
WHERE category = 'LICENSE' AND action = 'cap_exceeded'
  AND timestamp > now() - INTERVAL '24 hours'
GROUP BY target
ORDER BY rejections DESC;

-- Customers running with rejected licenses
SELECT timestamp, detail->>'reason' AS reason, detail->>'source' AS source
FROM audit_log
WHERE category = 'LICENSE' AND action = 'reject_license'
ORDER BY timestamp DESC;

Prometheus metrics

Metric Type Labels Use
cameleer_license_state gauge state Dashboard tile: which state is each tenant in. One-hot per state.
cameleer_license_days_remaining gauge (none) Renewal alerting. Recommended thresholds: warn at 30 days, page at 7 days, critical at 1 day. -1.0 means no license.
cameleer_license_last_validated_age_seconds gauge (none) Detect stuck schedulers. Alert at >86400.
cameleer_license_cap_rejections_total counter limit Account-management signal — customers consistently hitting caps are upgrade prospects.

REST API

/api/v1/admin/license/usage returns the per-limit current/cap/source table — wire this into your SaaS-side admin UI for at-a-glance per-tenant view. The endpoint requires an ADMIN-role JWT; SaaS-side automation can mint short-lived ADMIN tokens scoped per tenant or use a shared service account.

Failure modes & runbook

"Customer reports 403s after upgrade"

  1. Pull /api/v1/admin/license/usage. Identify which limit row has current >= cap.
  2. If state = ACTIVE and a higher-tier license is owed, mint and install it.
  3. If state = EXPIRED/INVALID/ABSENT, fix the license-state issue first — the cap rejection is downstream of that.
  4. Confirm by replaying the failing operation; the 403 should clear.

"Customer reports state=INVALID"

  1. Pull /api/v1/admin/license — note invalidReason.
  2. Most likely causes:
    • Public-key mismatch — the customer's CAMELEER_SERVER_LICENSE_PUBLICKEY differs from the key used to mint. Diff the two values byte-for-byte.
    • Tenant mismatch — CAMELEER_SERVER_TENANT_ID on the server differs from the --tenant used when minting. The customer must restart with the correct tenant id (it's immutable for the lifetime of the deployment because it appears in PG schema names and CH partition keys — coordinate carefully).
    • Token tampering — base64-decode the payload portion (<base64payload>.<base64sig>), confirm the JSON looks well-formed.
  3. Re-mint or fix config; re-install.

"License will expire in N days"

  1. Alert on cameleer_license_days_remaining < 30.
  2. Mint a renewal license (new licenseId, later expiresAt).
  3. Install via the customer's preferred channel (env-var on next deploy, or hot via POST).

"Audit table fills up with cap_exceeded rows"

Customer is hammering a creation path. Either:

  • They genuinely outgrew their tier — upgrade conversation.
  • Their automation has a runaway loop creating environments/apps. Coordinate with the customer to throttle and clean up.

The cameleer_license_cap_rejections_total{limit=...} counter is more efficient for monitoring this than scanning audit; use audit only for forensic detail.

"TTL recompute logs WARN: Failed to apply TTL"

RetentionPolicyApplier could not run ALTER TABLE ... MODIFY TTL on ClickHouse. The license install itself succeeded; only the retention update failed. Check:

  • ClickHouse user has ALTER privilege on the cameleer DB.
  • ClickHouse version is >= 22.3 (required for WHERE predicate on TTL).
  • ClickHouse cluster health.

Edge cases the SaaS team should know

  • Default tier is restrictive on purpose. A customer on default tier cannot stand up a real production workload (1 env, 3 apps, 5 agents, 1-day retention). Onboarding should always include license install before the customer adds any real workload.
  • Grace period defaults to 0. If you want a buffer between expiresAt and capability loss, set --grace-days=N at mint time. We recommend 14 days for paid plans so a slipped renewal doesn't immediately drop the customer to default-tier caps.
  • Public key change invalidates all installed tokens immediately on next revalidation. Daily revalidation runs at 03:00 server-local time, with a 60-second post-startup tick. A surprise public-key rollout will surface as state=INVALID for every customer running on the old key on the next tick or restart.
  • Caps reduce on revalidation, not just install. A token whose expiresAt lapses will, at the next revalidation, transition ACTIVE → GRACE → EXPIRED automatically, dropping caps to default-tier on the EXPIRED transition. The state change is announced via LicenseChangedEvent and triggers TTL recompute.
  • Compute caps are evaluated at deploy time, not at runtime. A deployment that successfully started under a high-tier license will keep running unchanged when the license downgrades. Only the next deploy attempt will see the new cap.
  • Agent count is in-memory. max_agents is enforced against the AgentRegistryService.liveCount() (LIVE state agents). Restarts reset the count to zero until agents re-register; this is by design — DEAD agents shouldn't pin a license slot.
  • License id changes on every renewal. Always use a fresh UUID.randomUUID() when minting a renewal. The audit previousLicenseId field then tells you which token superseded which.

Testing guidance

Three approaches for dry-running licenses without touching a customer server:

1. Pure unit test — LicenseMinter round-trip with LicenseValidator

KeyPair kp = KeyPairGenerator.getInstance("Ed25519").generateKeyPair();
String pubB64 = Base64.getEncoder().encodeToString(kp.getPublic().getEncoded());

LicenseInfo info = new LicenseInfo(
    UUID.randomUUID(), "test-tenant", "Test", Map.of("max_apps", 50),
    Instant.now(), Instant.now().plus(365, ChronoUnit.DAYS), 0
);

String token = LicenseMinter.mint(info, kp.getPrivate());

LicenseValidator validator = new LicenseValidator(pubB64, "test-tenant");
LicenseInfo parsed = validator.validate(token);
assertEquals(info.licenseId(), parsed.licenseId());

This is the model already used in LicenseMinterTest and LicenseValidatorTest in the repo — copy from there.

2. CLI dry-run — mint and self-verify

java -jar cameleer-license-minter-1.0-SNAPSHOT-cli.jar \
    --private-key=test-priv.pem \
    --public-key=test-pub.b64 \
    --tenant=test-tenant \
    --expires=2027-12-31 \
    --max-apps=50 \
    --output=/tmp/test.lic \
    --verify

--verify runs the full LicenseValidator.validate(...) round-trip and exits 3 on failure. Useful for shaking out wrong-key / wrong-tenant before sending to a customer.

3. Test server with a test public key

Spin up a sandbox cameleer-server (docker-compose or k8s-test-namespace) with:

environment:
  CAMELEER_SERVER_TENANT_ID: test-tenant
  CAMELEER_SERVER_LICENSE_PUBLICKEY: <test public key base64>

Install the test license, exercise the customer's reported scenario, observe state transitions and audit rows. The LicenseLifecycleIT and LicenseEnforcementIT integration tests in cameleer-server-app/src/test/java/.../license/ are good templates for full-stack reproduction.

Pointers

Document Audience
cameleer-license-minter/README.md Vendor-side mint operations
docs/license-enforcement.md End-customer operators (install, monitor, troubleshoot)
docs/superpowers/specs/2026-04-25-license-enforcement-design.md Original design rationale
docs/superpowers/plans/2026-04-25-license-enforcement.md Implementation plan (36 tasks)
.claude/rules/core-classes.md # license/ section License domain class map
.claude/rules/app-classes.md # license/ section Server license-app class map + endpoint surface