Captures the agreed design for enforcing licensing on cameleer-server: - Default tier with hard caps when no license is configured - Arbitrary per-customer limits in signed Ed25519 license tokens - Standalone cameleer-license-minter module (vendor-only) - DB-persisted license with env/file override paths - ABSENT/ACTIVE/GRACE/EXPIRED state machine; offline expiry only - Removes the dead Feature enum scaffolding Pending writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
License Enforcement — Design
Date: 2026-04-25 Status: Approved (brainstorm); pending writing-plans Related: cameleer-saas#7 (Epic: License & Feature Gating), cameleer-saas#42 (vendor minting), cameleer-saas#50 (customer license view)
Problem
cameleer-server ships a license skeleton (LicenseValidator, LicenseGate, admin endpoint) but
nothing enforces anything. Open mode (no license configured) currently grants all features and
no limits — the opposite of what we want for a self-hosted distribution that needs to gate scale
behind a paid license.
We want:
- A self-hosted server with no license to operate within a small, hard-coded "default tier" that is enough to evaluate the product but not enough to run it in production.
- Licenses to express arbitrary per-customer limits (no fixed tiers) on a vendor-defined set of resources: entity counts, compute footprint, retention.
- A standalone minter owned by the vendor that signs licenses with an Ed25519 private key the customer never sees.
- Licenses to be persisted on the server, installable via env var, file, or admin POST, and renewable by replacement.
- Revocation handled out of band (vendor suspends the SaaS tenant, or issues short-
explicenses) — no online revocation callback in v1.
Non-goals
- Feature flags. The current
Featureenum (topology/lineage/correlation/debugger/replay) is dead scaffolding and gets removed; this design is about quantitative limits only. - Ingestion-rate limits (executions/minute, logs/minute). Defer to a follow-up.
- Online revocation. Vendor uses shorter
exp+ reissue; SaaS suspension is independent. - Auto-deletion of resources when caps are lowered. Existing rows stay; only new creates reject.
- Minter keypair generation tooling. Vendor uses standard
openssl genpkey -algorithm ed25519out of band.
1. Architecture
1.1 Module layout
cameleer-server-core/ (existing — pure domain, no Spring)
└── license/
├── LicenseInfo (record — see §2)
├── LicenseLimits (typed wrapper over the limits map)
├── LicenseValidator (existing, payload schema updated)
├── LicenseGate (existing, gutted: no Feature; getLimits() only)
├── LicenseStateMachine (NEW — pure FSM: ABSENT / ACTIVE / GRACE / EXPIRED)
└── DefaultTierLimits (constant — §5 numbers)
cameleer-server-app/ (existing — Spring, web, persistence)
├── license/
│ ├── LicenseRepository (NEW — PostgreSQL persistence)
│ ├── LicenseService (NEW — load/save/replace; emits state events)
│ ├── LicenseEnforcer (NEW — assertWithinCap entry point)
│ ├── LicenseUsageReader (NEW — counts current usage for /usage endpoint)
│ ├── LicenseCapExceededException (NEW — mapped to 403 by ControllerAdvice)
│ └── LicenseMetrics (NEW — Prometheus gauges)
├── controller/
│ ├── LicenseAdminController (existing — extended; persists, audited)
│ └── LicenseUsageController (NEW — GET /admin/license/usage)
└── config/
└── LicenseBeanConfig (existing — extended for DB load order)
cameleer-license-minter/ (NEW — top-level Maven module)
├── pom.xml (depends on cameleer-server-core)
├── LicenseMinter (signing primitive; takes private key + LicenseInfo)
└── cli/LicenseMinterCli (CLI main class)
1.2 Why a separate cameleer-license-minter module
Not shipped in the runtime JAR. Vendor distributes it independently or builds it from source on a trusted machine. Customers never receive it.
This is module hygiene + smaller runtime attack surface, not a cryptographic protection — license forgery requires the vendor's private key, and the public key in the server is enough to verify forged tokens regardless of where the minter code lives.
1.3 Dependency graph
cameleer-license-minter ──▶ cameleer-server-core (LicenseInfo schema only)
cameleer-server-app ──▶ cameleer-server-core (validator, gate, FSM, defaults)
cameleer-saas ──▶ cameleer-license-minter (for SaaS-mode minting)
cameleer-saas ──▶ cameleer-server-core (transitive)
cameleer-server-app has no dependency on cameleer-license-minter.
2. License envelope
Wire format unchanged: base64(payload).base64(ed25519_signature). Payload schema:
{
"licenseId": "550e8400-e29b-41d4-a716-446655440000",
"tenantId": "acme-corp",
"label": "ACME prod 2026",
"iat": 1745539200,
"exp": 1777075200,
"gracePeriodDays": 30,
"limits": {
"max_environments": 5,
"max_apps": 50,
"max_agents": 100,
"max_users": 25,
"max_outbound_connections": 10,
"max_alert_rules": 200,
"max_total_cpu_millis": 32000,
"max_total_memory_mb": 65536,
"max_total_replicas": 100,
"max_execution_retention_days": 90,
"max_log_retention_days": 30,
"max_metric_retention_days": 365,
"max_jar_retention_count": 10
}
}
2.1 Field rules
| Field | Required | Notes |
|---|---|---|
licenseId |
yes | UUID. Used in audit + future revocation. |
tenantId |
optional | If present and CAMELEER_SERVER_TENANT_ID differs, treat as no license + log error. Air-gapped customers may omit. |
label |
optional | Free-form human description. Surfaced in UI. |
iat |
yes | Unix seconds. |
exp |
yes | Unix seconds. |
gracePeriodDays |
optional, default 0 |
Days exp may be in the past while limits still apply. |
limits.* |
each optional | Missing key inherits from DefaultTierLimits. A license can lift any subset. |
2.2 Removed from the current envelope
tier(string) — was a non-functional label. Folded intolabel.features(array) — out of scope.Featureenum deleted.
3. License state machine
exp + grace passes
┌─────────┐ install valid ┌────────┐ exp ┌────────┐ ────────► ┌─────────┐
│ ABSENT │ ───────────────▶│ ACTIVE │──────▶│ GRACE │ │ EXPIRED │
└─────────┘ └────────┘ └────────┘ └─────────┘
▲ │ │ ▲ │
│ install invalid │ replace │ │ replace valid │ replace
│ (sig/tenant/parse) ▼ │ │ ▼
└────────────────────────────┴──────────────┴─┴───────────────────┘
all transitions persist + audit-log
3.1 State semantics
| State | Effective limits | Trigger |
|---|---|---|
ABSENT |
DefaultTierLimits |
No DB row, or signature/tenant/parse failure. |
ACTIVE |
merge(default, license.limits) |
License loaded, now < exp. |
GRACE |
Same as ACTIVE |
exp ≤ now < exp + gracePeriodDays. UI banner. |
EXPIRED |
DefaultTierLimits |
now ≥ exp + gracePeriodDays. Distinct UI label vs ABSENT. |
State is recomputed on every limit check (clock comparison only) — no scheduler needed for transitions. The only "background" behaviour is the Prometheus gauge refresh.
3.2 Default tier (the "no license" caps)
| Limit | Default |
|---|---|
max_environments |
1 |
max_apps |
3 |
max_agents |
5 |
max_users |
3 |
max_outbound_connections |
1 |
max_alert_rules |
2 |
max_total_cpu_millis |
2000 (2 cores) |
max_total_memory_mb |
2048 (2 GB) |
max_total_replicas |
5 |
max_execution_retention_days |
1 |
max_log_retention_days |
1 |
max_metric_retention_days |
1 |
max_jar_retention_count |
3 |
Encoded as public static final Map<String, Integer> DEFAULTS in DefaultTierLimits. Keys
match the license payload exactly.
4. Enforcement map
Every limit check goes through one method on LicenseEnforcer:
void assertWithinCap(String limitKey, long currentUsage, long requestedDelta);
Throws LicenseCapExceededException(limitKey, current, cap) when currentUsage + requestedDelta > cap.
A @ControllerAdvice maps it to 403 with body
{"error":"license cap reached","limit":"max_apps","current":3,"cap":3}.
| Limit | Call site | Failure response |
|---|---|---|
max_environments |
EnvironmentService.create (start) |
403 |
max_apps |
AppService.createApp |
403 |
max_agents |
AgentRegistryService.register |
403 — agent treated as unregistered (no SSE, no commands) |
max_users |
UserAdminController.createUser and OidcAuthController.callback (auto-signup) |
403 / OIDC login failure |
max_outbound_connections |
OutboundConnectionServiceImpl.create |
403 |
max_alert_rules |
AlertRuleController.create |
403 |
max_total_cpu_millis |
DeploymentExecutor.PRE_FLIGHT (sum across non-stopped deploys + new) |
Deploy fails fast at PRE_FLIGHT, status FAILED, audit row |
max_total_memory_mb |
same | same |
max_total_replicas |
same | same |
max_execution_retention_days |
EnvironmentService.update (per-env field, see §4.1) + ClickHouseSchemaInitializer.applyRetention() at boot |
422 on update; boot pins effective TTL = min(licenseCap, configured) |
max_log_retention_days |
same | same |
max_metric_retention_days |
same | same |
max_jar_retention_count |
EnvironmentAdminController.PUT /jar-retention |
422 |
4.1 Per-environment retention fields
Three new columns on environments (Flyway V2):
ALTER TABLE environments
ADD COLUMN execution_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN log_retention_days INTEGER NOT NULL DEFAULT 1,
ADD COLUMN metric_retention_days INTEGER NOT NULL DEFAULT 1;
These are the configured per-env values. The effective ClickHouse TTL is
min(licenseCap, configured), applied at startup by ClickHouseSchemaInitializer. Admin UI
surfaces the configured values; EnvironmentService.update rejects values above the license cap
with 422.
4.2 Boot-time invariant
If a license is added that lowers a cap below current usage (10 apps, license now allows 5), the server logs one WARN per limit at boot. No deletion. New creates reject; existing resources keep working.
5. Usage endpoint
GET /api/v1/admin/license/usage (ADMIN only):
{
"state": "ACTIVE",
"expiresAt": "2027-04-25T00:00:00Z",
"daysRemaining": 365,
"gracePeriodDays": 30,
"tenantId": "acme-corp",
"label": "ACME prod 2026",
"limits": [
{"key": "max_apps", "current": 7, "cap": 50, "source": "license"},
{"key": "max_agents", "current": 12, "cap": 100, "source": "license"},
{"key": "max_total_cpu_millis", "current": 8500, "cap": 32000, "source": "license"},
{"key": "max_outbound_connections", "current": 0, "cap": 1, "source": "default"}
]
}
source is "default" when the cap comes from DefaultTierLimits (i.e. the license omits this
key, or there is no license), and "license" when the cap is explicit in the license. Drives the
SaaS UI's "free tier" badge.
LicenseUsageReader issues one cheap aggregate per limit (SELECT COUNT(*) per entity table; a
single grouped SELECT SUM(replicas * cpuMillis), SUM(replicas * memoryMb), SUM(replicas) over
non-stopped deployments).
GET /api/v1/admin/license (existing) is extended to return {state, envelope} with the raw token
omitted from the response.
6. Lifecycle, persistence, install paths
6.1 Storage
Flyway V2 migration:
CREATE TABLE license (
tenant_id TEXT PRIMARY KEY, -- one row per server (= one tenant)
token TEXT NOT NULL, -- full signed token
license_id UUID NOT NULL,
installed_at TIMESTAMPTZ NOT NULL,
installed_by TEXT NOT NULL, -- users.user_id (bare) or 'system' for env/file boot
expires_at TIMESTAMPTZ NOT NULL
);
6.2 Boot order
LicenseBeanConfig:
- If
CAMELEER_SERVER_LICENSE_TOKENenv var is set → validate → write to DB (overwrite) → load. - Else if
CAMELEER_SERVER_LICENSE_FILEis set → read file → validate → write to DB → load. - Else read
licenserow from DB → validate → load. - Else
ABSENT.
Env-var / file act as idempotent overrides — they always win and replace the DB row, so the operator's last action survives reboots.
6.3 Runtime install
POST /api/v1/admin/license { "token": "..." } (existing):
- Validates against the configured public key.
- On success, persists to
licensetable (installed_by = user_id), updates the in-memoryLicenseGate, audits. - On failure, returns 400 with the validator error message and audits the rejection.
6.4 Public key custody
CAMELEER_SERVER_LICENSE_PUBLICKEY (existing) remains the only verification key. Build- /
deploy-time secret bound to the vendor distribution. Not stored in DB. If unset and a
license is present → reject all licenses (existing behaviour).
6.5 Audit trail
New AuditCategory.LICENSE. Actions:
| Action | When | Payload |
|---|---|---|
install_license |
First successful install in an empty state | {licenseId, expiresAt, installedBy, source} (source = env/file/api) |
replace_license |
Successful install over an existing license | same + previousLicenseId |
reject_license |
Validation failed (signature, tenant, parse, public key missing) | {reason, source} |
cap_exceeded |
Any LicenseCapExceededException |
{limit, current, cap, requestedBy} |
7. Minter
7.1 LicenseMinter (library)
Pure function, packaged in cameleer-license-minter:
public final class LicenseMinter {
public static String mint(LicenseInfo info, PrivateKey ed25519PrivateKey);
}
Serializes LicenseInfo to canonical JSON (sorted keys), signs the bytes with Ed25519, returns
base64(payload).base64(signature). cameleer-saas calls this directly to mint per-tenant tokens.
7.2 LicenseMinterCli (CLI)
java -jar cameleer-license-minter-1.0-SNAPSHOT.jar \
--private-key=/secure/vendor.key \
--tenant=acme-corp \
--label="ACME prod 2026" \
--expires=2027-04-25 \
--grace-days=30 \
--max-apps=50 \
--max-agents=100 \
--max-total-cpu-millis=32000 \
--max-total-memory-mb=65536 \
--max-execution-retention-days=90 \
--output=acme-license.tok
--private-keyreads a PEM-encoded Ed25519 private key (output ofopenssl genpkey -algorithm ed25519).- Unspecified
--max-*flags are omitted from the payload — the license inherits the default for that key. - Unknown flags fail fast.
--outputwrites the token; if omitted, prints to stdout.
Keypair generation is out of band — vendor uses openssl and stores both halves in their
secret manager. We deliberately do not ship a --gen-keypair subcommand to keep the boundary
clean.
8. Telemetry
Prometheus gauges scraped via /api/v1/prometheus:
| Metric | Labels | Notes |
|---|---|---|
cameleer_license_state |
`state="ABSENT | ACTIVE |
cameleer_license_days_remaining |
(none) | Negative in GRACE/EXPIRED. |
cameleer_license_limit_utilisation |
limit="max_apps" etc. |
current / cap, in [0, 1+]. |
cameleer_license_cap_rejections_total |
limit="..." |
Counter. |
State-transition log lines: INFO on install/ACTIVE, WARN on GRACE, ERROR on EXPIRED, WARN
on cap reject (sampled to avoid log spam).
9. Dead-code removal
Performed in the first commit of the implementation. Per the project's "no backwards compatibility shims" preference, no deprecated path or feature flag.
- Delete
Feature.java. - Delete
LicenseGate.isEnabled(Feature). - Delete
LicenseInfo.featuresfield,LicenseInfo.hasFeature(Feature). - Delete
LicenseGateTest.withLicense_onlyLicensedFeaturesEnabledandLicenseInfo.open()'sSet.of(Feature.values())assertion. - Update
LicenseValidatorto ignorefeaturesif present in old tokens (silently dropped, not an error).
10. Testing
| Layer | Tests |
|---|---|
| Core unit | LicenseValidatorTest — signature, expiry, tenant mismatch, missing required fields, unknown extra fields. |
| Core unit | LicenseStateMachineTest — all four transitions including grace boundary, replace from any state, invalid install. |
| Core unit | DefaultTierLimitsTest — every documented key has a default. |
| Minter unit | LicenseMinterTest — round-trip with a throwaway Ed25519 keypair. Canonical JSON is stable across runs. |
| Minter CLI | LicenseMinterCliTest — invokes main with --private-key=tmp and checks output token validates. |
| App unit | LicenseEnforcerTest — for each limit: cap-reached, under-cap, default-tier with no license, missing-cap-inherits-default. |
| App integration | LicenseLifecycleIT — install via env, replace via POST, restart restores from DB. Driven through REST. |
| App integration | LicenseEnforcementIT — REST-driven, hit each cap end-to-end (per the project's "REST-API-driven ITs" preference). Includes cap_exceeded audit row check. |
| Boot | SchemaBootstrapIT extension — license table exists, environments retention columns exist, retention pinning honoured at boot. |
No raw-SQL seeding of caps in ITs. All caps installed via the REST endpoint or env var.
11. Open follow-ups (deliberately deferred)
- Ingestion-rate limits (
max_executions_per_minute,max_logs_per_minute). - Online revocation callback (the
revocation_check_urlenvelope field). - Concurrent debug session limit (
max_concurrent_debug_sessionsfrom the SaaS epic). - A "license usage history" report for vendors to see growth over time.
- Open a tracking issue on
cameleer/cameleer-server(Gitea) — none exists today.
12. Risk register
| Risk | Mitigation |
|---|---|
| Default tier so tight that an honest evaluator cannot try the product. | Defaults documented; vendor can ship a longer-exp "trial" license at install time if needed. |
Customer lowers gracePeriodDays field by editing token. |
Token is signed; any edit invalidates the signature. |
| License removed from DB out of band, server lands in ABSENT and rejects new resources but old ones are above default tier. | Boot-time WARN per over-cap limit. UI banner in the admin license page. No auto-deletion. |
| Public key rotation. | Out of scope for v1; documented as "redeploy with new key" — vendors are expected to rotate via redeployment. |
Compute cap arithmetic relies on cpuLimit and memoryLimitMb being set on every container. |
Existing ResolvedContainerConfig already enforces these; DeploymentExecutor.PRE_FLIGHT rejects deploys with unset compute fields. |
| Per-env retention column added but old ClickHouse partitions retain longer. | Documented: TTL change is honoured by ClickHouse on its next merge cycle. New rows inserted always honour the new TTL. |