Files
cameleer-server/docs/license-enforcement.md
hsiegeln 5864553fed docs(license): minter README + operator guide + SaaS handoff
cameleer-license-minter/README.md — vendor-side guide: build, public
LicenseMinter API, CLI usage with all flags, token format (standard
base64, not url-safe), LicenseInfo schema, Ed25519 key generation,
worked example, security guidance, runtime-separation verification.

docs/license-enforcement.md — operator guide: install paths and
priority (env > file > DB > none), public-key config, REST API,
state machine (ABSENT/ACTIVE/GRACE/EXPIRED/INVALID), default tier
caps, 403 envelope semantics, retention TTL recompute, daily
revalidation, audit + Prometheus surfaces, troubleshooting.

docs/handoff/2026-04-26-license-saas-handoff.md — SaaS playbook:
trust model, onboarding/renewal/revocation runbooks, key management,
cap matrix per plan tier, telemetry, failure modes, testing guidance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:33:12 +02:00

21 KiB

License Enforcement

Operator documentation for the cameleer-server license subsystem. Audience: operators running their own cameleer-server instance who need to install, monitor, or troubleshoot a license.

For issuing licenses, see cameleer-license-minter/README.md. For SaaS-team operational playbooks, see docs/handoff/2026-04-26-license-saas-handoff.md.

Table of contents

Overview

What gets enforced

Install paths and priority

Public-key configuration

REST API

License state machine

Default tier caps

Cap-exceeded behavior

Retention semantics

Daily revalidation

Audit categories

Prometheus metrics

Troubleshooting


Overview

cameleer-server can run in one of two postures:

  • Default tier (no license installed). A small fixed cap-set applies (1 environment, 3 apps, 5 agents, 1 day retention, etc.). Suitable for evaluation and self-host single-instance use. The default tier engages automatically when no license is configured.
  • Licensed (token installed). Caps from the signed token override the default tier on a per-key basis. Any limit key the token does not specify falls through to the default value, so a partial license that only raises max_environments and max_apps keeps default retention.

A signed Ed25519 license token carries the customer's tenantId, an expiresAt timestamp, an optional gracePeriodDays, and a limits map. The server's LicenseValidator (cameleer-server-core/src/main/java/com/cameleer/server/core/license/LicenseValidator.java) checks the signature against CAMELEER_SERVER_LICENSE_PUBLICKEY, verifies the tenant matches CAMELEER_SERVER_TENANT_ID, and rejects expired tokens (past expiresAt + gracePeriodDays).

The license posture is summarized as a LicenseState:

  • ABSENT — no license configured. Default-tier caps apply.
  • ACTIVE — valid token, current time is at or before expiresAt. License caps apply.
  • GRACE — past expiresAt but within gracePeriodDays. License caps still apply; the operator should renew.
  • EXPIRED — past expiresAt + gracePeriodDays. Default-tier caps apply.
  • INVALID — signature, tenant, or schema validation failed. Default-tier caps apply.

What gets enforced

License caps are enforced through a single component, LicenseEnforcer.assertWithinCap(limitKey, currentUsage, requestedDelta), called from each creation path.

Limit key Enforcement point Effect when exceeded
max_environments EnvironmentService.create(...) HTTP 403 from EnvironmentAdminController.create.
max_apps AppService.createApp(...) HTTP 403 from AppController.create.
max_agents AgentRegistryService.register(...) HTTP 403 from AgentRegistrationController.register. Counted against the in-memory live agent registry.
max_users User creation paths in UserAdminController, UiAuthController, OidcAuthController HTTP 403 (REST) or rejection during OIDC first-login.
max_outbound_connections OutboundConnectionServiceImpl.create(...) HTTP 403.
max_alert_rules AlertRuleController.create(...) HTTP 403.
max_total_cpu_millis DeploymentExecutor PRE_FLIGHT stage Deployment fails before pulling images; row is marked FAILED with the cap message in deployments.error_message.
max_total_memory_mb same same
max_total_replicas same same
max_jar_retention_count EnvironmentAdminController PUT /{envSlug}/jar-retention HTTP 403 if requested value > cap. The daily JarRetentionJob is also bounded by this cap.
max_execution_retention_days, max_log_retention_days, max_metric_retention_days Not a creation cap; clamps ClickHouse TTL to min(cap, env.configured) — see Retention semantics.

Note that the three compute caps are checked together at deploy time, after ConfigMerger.resolve(...) produces the final ResolvedContainerConfig but before the image is pulled. The current usage figure is computed by LicenseUsageReader.computeUsage() over non-stopped deployments.

Install paths and priority

Tokens can be installed by four mechanisms; resolution at boot is highest-priority-first:

  1. CAMELEER_SERVER_LICENSE_TOKEN environment variable. Highest priority. The raw token is read on @PostConstruct from LicenseBeanConfig.LicenseBootLoader.
  2. cameleer.server.license.file Spring property (or CAMELEER_SERVER_LICENSE_FILE). Path to a file containing the token. Read at boot if no env-var token is present.
  3. PostgreSQL license table. Set via the admin REST POST. Loaded at boot if the env var and file both miss.
  4. None of the above. State is ABSENT, default-tier caps apply, the boot loader publishes a LicenseChangedEvent(ABSENT, null) so listeners (Prometheus gauges, retention applier) settle on default values.

If a higher-priority source rejects (signature failure, tenant mismatch, expired) the loader logs the reason and does not fall through to a lower-priority source. This is deliberate: an operator who set CAMELEER_SERVER_LICENSE_TOKEN expects that token to be the active one, not a silently-stale DB row.

Any token loaded at boot also flows through LicenseService.install(...) so audit, persistence, and LicenseChangedEvent publishing are uniform across paths.

Public-key configuration

export CAMELEER_SERVER_LICENSE_PUBLICKEY="$(cat cameleer-license-pub.b64)"

The value is the base64 encoding of the Ed25519 public key in X.509 SubjectPublicKeyInfo form (see cameleer-license-minter/README.md for generation).

When CAMELEER_SERVER_LICENSE_PUBLICKEY is unset:

  • LicenseBeanConfig.licenseValidator() (line 62) logs a WARN: CAMELEER_SERVER_LICENSE_PUBLICKEY not set — all licenses will be rejected as INVALID.
  • The bean is constructed against a throwaway public key whose private counterpart no one holds. The override's validate(...) always throws IllegalStateException("license public key not configured").
  • Any token loaded from any source routes through LicenseService.install(...), fails validation, marks the gate INVALID, and writes a reject_license audit row with the failure reason.
  • The state will be INVALID, default-tier caps apply, and the operator must set the variable and restart (or hot-install via POST after restart).

REST API

All endpoints require an ADMIN-role JWT. Source-of-truth controllers: cameleer-server-app/src/main/java/com/cameleer/server/app/controller/LicenseAdminController.java, LicenseUsageController.java.

GET /api/v1/admin/license

{
  "state": "ACTIVE",
  "invalidReason": null,
  "envelope": {
    "licenseId": "fd3a8f2a-1c44-4eac-aa07-1a5d1ce9c4a4",
    "tenantId": "acme-prod",
    "label": "Acme Production",
    "limits": { "max_apps": 25, "max_environments": 3 },
    "issuedAt": "2026-04-26T10:00:00Z",
    "expiresAt": "2027-01-01T00:00:00Z",
    "gracePeriodDays": 14
  },
  "lastValidatedAt": "2026-04-26T03:00:00Z"
}

The raw token string is deliberately not returned — only the parsed envelope. lastValidatedAt is omitted when no DB row exists yet (env-var or file source on first boot before the next revalidation tick).

POST /api/v1/admin/license

curl -X POST https://server.example.com/api/v1/admin/license \
     -H "Authorization: Bearer ${ADMIN_JWT}" \
     -H "Content-Type: application/json" \
     -d '{"token": "eyJ...long.base64.string..."}'

Body shape: {"token": "<minted token>"}. On success returns {"state": "ACTIVE", "envelope": {...}}. On failure returns HTTP 400 with {"error": "<reason>"}.

The handler delegates to LicenseService.install(token, userId, "api"). Acting userId comes from the authenticated principal stripped of the user: prefix (see app-classes.md user-id convention).

This endpoint installs or replaces — there is one row per tenant in the license table, so a successful POST upserts and supersedes any prior token. The previous license id is captured in the replace_license audit detail.

GET /api/v1/admin/license/usage

{
  "state": "ACTIVE",
  "expiresAt": "2027-01-01T00:00:00Z",
  "daysRemaining": 250,
  "gracePeriodDays": 14,
  "tenantId": "acme-prod",
  "label": "Acme Production",
  "lastValidatedAt": "2026-04-26T03:00:00Z",
  "message": "License active. 250 days remaining.",
  "limits": [
    {"key": "max_environments", "current": 2,  "cap": 3,  "source": "license"},
    {"key": "max_apps",         "current": 12, "cap": 25, "source": "license"},
    {"key": "max_agents",       "current": 38, "cap": 50, "source": "license"},
    {"key": "max_users",        "current": 4,  "cap": 3,  "source": "default"}
  ]
}

For each effective-limits key:

  • current — current usage. max_agents is read from the in-memory AgentRegistryService.liveCount(); everything else comes from LicenseUsageReader.snapshot() (PostgreSQL counts, plus deployment compute aggregates from deployed_config_snapshot). Limits the server does not measure return 0.
  • cap — effective cap (license override or default-tier value).
  • source"license" if the cap came from the token's limits map, "default" if it fell through.

License state machine

                        +---------------+
                        |    ABSENT     |  (no token configured)
                        +-------+-------+
                                |
                                | install via env / file / DB / POST
                                v
                        +-------+-------+
        +-------------- |    ACTIVE     | --------------+
        |               +-------+-------+               |
        | revalidate                                    | now > expiresAt
        | fails sig/tenant/                             |
        | parse                                         v
        |                                       +-------+-------+
        |                                       |    GRACE      |
        |                                       +-------+-------+
        |                                               |
        |                                               | now > exp + gracePeriodDays
        |                                               v
        |                                       +-------+-------+
        |                                       |    EXPIRED    |
        |                                       +-------+-------+
        v
+-------+-------+
|    INVALID    |  (signature mismatch, tenant mismatch,
+---------------+   missing public key, malformed payload)

Classification logic: LicenseStateMachine.classify(license, invalidReason) (cameleer-server-core/src/main/java/com/cameleer/server/core/license/LicenseStateMachine.java).

  • INVALID and EXPIRED revert to default-tier caps. The license envelope is dropped from the gate (getCurrent() returns null in INVALID; the gate retains the parsed info in EXPIRED but getEffectiveLimits() returns defaults-only).
  • GRACE keeps license caps. This is the only state where the operator should be running but should also be actively working on renewal.

Default tier caps

Source: cameleer-server-core/src/main/java/com/cameleer/server/core/license/DefaultTierLimits.java.

Key Default Semantics
max_environments 1 Total environments across the tenant.
max_apps 3 Total apps across all environments.
max_agents 5 Live agents in the in-memory registry (LIVE state).
max_users 3 Local + OIDC users in the users table.
max_outbound_connections 1 Rows in outbound_connections.
max_alert_rules 2 Rows in alert_rules.
max_total_cpu_millis 2000 Sum of replicas * cpuLimit over non-stopped deployments. cpuLimit is millicores; 1000 = one core.
max_total_memory_mb 2048 Sum of replicas * memoryLimitMb over non-stopped deployments.
max_total_replicas 5 Sum of replicas over non-stopped deployments.
max_execution_retention_days 1 Cap on TTL applied to executions and processor_executions.
max_log_retention_days 1 Cap on TTL applied to logs.
max_metric_retention_days 1 Cap on TTL applied to agent_metrics and agent_events.
max_jar_retention_count 3 Maximum JAR retention count per environment.

The default tier is intentionally restrictive — it is sized for evaluation, single-developer demos, and "I forgot to install my license" recovery, not production. New customers should install a license at first onboarding.

Cap-exceeded behavior

When a creation path exceeds its cap, LicenseEnforcer.assertWithinCap(...) throws LicenseCapExceededException(limitKey, current, cap). LicenseExceptionAdvice (@ControllerAdvice) maps it to:

HTTP/1.1 403 Forbidden
Content-Type: application/json

{
  "error": "license cap reached",
  "limit": "max_apps",
  "current": 4,
  "cap": 3,
  "state": "ABSENT",
  "message": "License absent. Default tier limits apply. Cap reached for max_apps (3 of 3 used)."
}

Concurrently:

  • The Prometheus counter cameleer_license_cap_rejections_total{limit=...} increments.
  • An audit row is written: category=LICENSE, action=cap_exceeded, target=<limit key>, result=FAILURE, detail carries {limit, current, requested, cap, state}. If audit storage fails, the 403 still surfaces (audit is best-effort here).

The message field is rendered by LicenseMessageRenderer.forCap(...) and varies per state — under EXPIRED it nudges the operator to renew; under INVALID it cites invalidReason.

Retention semantics

The license caps max_execution_retention_days, max_log_retention_days, max_metric_retention_days, and max_jar_retention_count define maximums. Per-environment configuration (environments.execution_retention_days, log_retention_days, metric_retention_days, jar_retention_count) defines the operator preference. The effective TTL applied to ClickHouse tables is:

effective = min(licenseCap, env.configuredRetentionDays)

When LicenseChangedEvent fires (any install/replace/revalidate/boot transition), RetentionPolicyApplier (@EventListener @Async) recomputes TTL for every (table, env) pair using:

ALTER TABLE <table>
   MODIFY TTL toDateTime(<time_col>) + INTERVAL <effective> DAY DELETE
   WHERE environment = '<env_slug>'

Tables affected: executions, processor_executions, logs, agent_metrics, agent_events. Excluded:

  • route_diagrams — content-addressed ReplacingMergeTree, no time-based TTL.
  • server_metrics — server-wide, no environment column. Its 90-day cap is fixed in the schema.

ClickHouse failures are logged (WARN) but do not fail the originating license install — TTL recompute is best-effort.

Daily revalidation

LicenseRevalidationJob (@Scheduled(cron = "0 0 3 * * *")) re-runs LicenseService.revalidate() against the persisted token at 03:00 server-local time. It also fires once 60 seconds after ApplicationReadyEvent to catch the case where a license was installed via SQL between server starts.

Each revalidation:

  • Re-reads the token from license table.
  • Runs LicenseValidator.validate(...) again — same checks as install (signature, tenant, expiry).
  • On success: bumps last_validated_at, reloads the gate, publishes LicenseChangedEvent.
  • On failure: marks the gate INVALID, writes an audit row revalidate_license / FAILURE, publishes LicenseChangedEvent(INVALID, null).

A token transitioning ACTIVE → GRACE → EXPIRED will surface as a state change at the next revalidation tick (or on the next license-touching admin action).

Audit categories

All license lifecycle events use AuditCategory.LICENSE. Action codes:

Action Result Detail keys
install_license SUCCESS licenseId, expiresAt, installedBy, source
replace_license SUCCESS same plus previousLicenseId
reject_license FAILURE reason, source
revalidate_license FAILURE licenseId, reason
cap_exceeded FAILURE limit, current, requested, cap, state

The source value is one of env, file, db, api — corresponds to the install path.

Prometheus metrics

Scraped at /api/v1/prometheus. Source: LicenseMetrics (cameleer-server-app/src/main/java/com/cameleer/server/app/license/LicenseMetrics.java).

Metric Type Labels Semantics
cameleer_license_state gauge state=<ABSENT|ACTIVE|GRACE|EXPIRED|INVALID> One-hot per state — exactly one tag value carries 1.0 at any time, others are 0.0.
cameleer_license_days_remaining gauge (none) Whole days until expiresAt. -1.0 when no license is loaded (ABSENT/INVALID). Suitable alert thresholds: warn at 30, page at 7.
cameleer_license_last_validated_age_seconds gauge (none) Seconds since the persisted last_validated_at. 0 when there is no DB row. Alerts at >86400 (revalidation hasn't run for >24h) detect a stuck scheduler or a misconfigured server.
cameleer_license_cap_rejections_total counter limit=<limit_key> Incremented every time LicenseEnforcer rejects a creation due to a cap. A non-zero rate indicates customers hitting their plan ceiling.

Gauges refresh on every LicenseChangedEvent and on a 60-second @Scheduled(fixedDelay) so values stay current even without state changes.

Troubleshooting

My license shows INVALID — why?

Check invalidReason from GET /api/v1/admin/license. Common causes:

invalidReason substring Cause Fix
License signature verification failed Public key on the server does not match the private key the token was signed with. Confirm CAMELEER_SERVER_LICENSE_PUBLICKEY matches the keypair used to mint the token.
License tenantId 'X' does not match server tenant 'Y' Token minted for a different tenantId. Re-mint with --tenant=<correct id> matching CAMELEER_SERVER_TENANT_ID.
licenseId is required / tenantId is required / exp is required Malformed token (missing required field). Re-mint via the supported minter — fields are mandatory.
License expired at <...> Past expiresAt + gracePeriodDays. Issue a renewal license.
license public key not configured CAMELEER_SERVER_LICENSE_PUBLICKEY is unset. Set the env var and either restart or POST the token again.

I'm getting 403s on creates — which cap is biting?

curl https://server.example.com/api/v1/admin/license/usage \
     -H "Authorization: Bearer ${ADMIN_JWT}"

The limits[] array shows current/cap per limit key. Any row with current >= cap is a candidate. The 403 response body itself names the limit:

{"error":"license cap reached","limit":"max_apps","current":3,"cap":3,"state":"ABSENT", ...}

If state is ABSENT or EXPIRED/INVALID, the fix is to install a license. If state is ACTIVE and you are at the license cap, you need a higher-tier license re-issued.

My new license didn't take effect

  1. Check the audit log:
    curl 'https://server.example.com/api/v1/admin/audit?category=LICENSE&limit=10' \
         -H "Authorization: Bearer ${ADMIN_JWT}"
    
    You should see an install_license or replace_license row at SUCCESS. A reject_license FAILURE row carries the reason.
  2. Confirm the public key matches the private key used to mint:
    • Vendor side: openssl pkey -in <priv> -pubout -outform DER | base64 -w0
    • Server side: echo $CAMELEER_SERVER_LICENSE_PUBLICKEY
    • These must be byte-identical.
  3. Confirm CAMELEER_SERVER_TENANT_ID matches the tenantId in the token envelope (GET /api/v1/admin/license).
  4. If the env var token disagrees with what's in the DB (e.g. you POSTed but a stale env var remains): the env var wins on next boot. Either remove the env var or update it before restarting.

Cap rejections spiking but no licensed customer should be hitting the cap

Inspect cameleer_license_cap_rejections_total{limit=...}. If a tenant is on default tier (state = ABSENT/EXPIRED/INVALID) the very low default caps will trip immediately on routine activity. Install a license to restore expected behavior.

Retention TTL didn't change after installing a license

RetentionPolicyApplier runs on LicenseChangedEvent asynchronously (@Async). Look for the log line:

License changed (state=ACTIVE) — recomputing TTL across N environment(s) and 5 table(s)
Applied TTL: table=executions env=prod days=30 (cap=30, configured=90)

If the log shows Failed to apply TTL warnings, ClickHouse rejected the ALTER TABLE ... MODIFY TTL statement — most often because of a permissions issue or a ClickHouse version below 22.3. The license install itself still succeeded; the TTL change just didn't land.