Final-review must-fixes:
- HOWTO.md: drop CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME; add the three new
artifact env vars (loaderimage / artifacttokenttlseconds / artifactbaseurl).
- DeploymentExecutor @PostConstruct WARN, handoff doc, and docker-orchestration
rule no longer claim the loader uses cameleer-traefik. The loader runs on
the PRIMARY Docker network only — additional networks are attached after
startContainer returns, by which time the loader has exited. SaaS still
works because the tenant's primary network hosts the tenant server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-pull the loader image at PULL_IMAGE so the implicit pull on first
createContainerCmd doesn't bypass the 120s loader-wait timeout.
Wrap createAndStartLoader in try/catch so a create/start failure cleans
up the just-created volume; same guard around createAndStartMain on
phase-2 failures. Folds the wait-error message into the rethrown
RuntimeException so the cause chain is visible.
Add a @PostConstruct WARN when neither artifactbaseurl nor serverurl is
set so the implicit cameleer-server DNS dependency is loud at boot, and
document the loader-to-server reachability contract in
.claude/rules/docker-orchestration.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tasks 9+10+11 of the init-container-jar-fetch plan, landed atomically because
9 alone leaves the orchestrator+executor referencing removed ContainerRequest
fields.
ContainerRequest (core) drops jarPath/jarVolumeName/jarVolumeMountPath; adds
appVersionId, artifactDownloadUrl, artifactExpectedSize, loaderImage.
DockerRuntimeOrchestrator (app):
- per-replica named volume "cameleer-jars-{containerName}"
- phase 1: loader container with the volume mounted RW at /app/jars,
ARTIFACT_URL + ARTIFACT_EXPECTED_SIZE env, full hardening contract
- block on waitContainerCmd().awaitStatusCode(120s); on non-zero exit
remove the loader, remove the volume, propagate RuntimeException so
DeploymentExecutor marks the deployment FAILED. main is never created.
- phase 2: main container with the same volume mounted RO at /app/jars
- withUsernsMode("host:1000:65536") on BOTH containers — closes the last
open hardening gap from issue #152
- main entrypoint paths point at /app/jars/app.jar
- extracted baseHardenedHostConfig() so loader and main share the
cap_drop / security_opt / readonly / pids / tmpfs contract
- removeContainer() also removes the per-replica volume so blue/green
doesn't leak volumes
DeploymentExecutor (app):
- injects ArtifactDownloadTokenSigner; new @Value props loaderimage,
artifacttokenttlseconds, artifactbaseurl
- replaces the temporary getVersion(...).jarPath() bridge with a signed
URL ${artifactBaseUrl}/api/v1/artifacts/{id}?exp&sig
- drops the Files.exists pre-flight check; AppVersion.jarSizeBytes is
the size-of-record check now
- drops jarDockerVolume / jarStoragePath @Value fields and the volume
plumbing in startReplica
- DeployCtx carries appVersionId / artifactUrl / artifactExpectedSize
in place of jarPath
Tests:
- DockerRuntimeOrchestratorHardeningTest updated for the new shape;
captures HostConfig on the MAIN container and asserts cap_drop ALL
+ no-new-privileges + apparmor + readonly + pids + tmpfs + the new
withUsernsMode("host:1000:65536")
- DockerRuntimeOrchestratorLoaderTest (new): verifies volume create →
loader create with RW bind → loader started → awaited → loader
removed → main create with RO bind → main started; verifies abort
+ cleanup on loader exit != 0 (loader removed, volume removed, main
NEVER created); verifies userns_mode applied to both containers.
Config:
- application.yml replaces jardockervolume with loaderimage,
artifacttokenttlseconds, artifactbaseurl
Rules updated: .claude/rules/docker-orchestration.md (loader pattern,
userns, no more bind-mount); .claude/rules/core-classes.md
(ContainerRequest field map).
Test counts after change:
- cameleer-server-core: 116/116 unit tests pass
- cameleer-server-app: 273/273 unit tests pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New permitAll endpoint GET /api/v1/artifacts/{appVersionId}?exp&sig that
the cameleer-runtime-loader init container hits to stream the deployed
JAR. Auth is the HMAC-signed URL (sig + exp) — no JWT, no bootstrap
token — so SecurityConfig permits the path and the controller does the
verification itself.
Also hardens ArtifactDownloadTokenSigner to reject null/blank jwtSecret
at construction (Task 6 review feedback I-3).
Wires the ArtifactDownloadTokenSigner bean in SecurityBeanConfig from
${cameleer.server.security.jwtsecret}, the same property the rest of
the security stack uses.
Test coverage: 200/401/404 paths via standalone-MockMvc unit test
(avoids dragging in WebConfig's audit + usage interceptors that pull
the full bean graph) plus the existing signer suite extended with a
null/blank-secret guard test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Existing rejectsTamperedSignature uses len+1 sig — short-circuits in
MessageDigest.isEqual on length mismatch. Same-length tamper test
forces the byte-by-byte compare so the constant-time branch is
exercised.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tactical filesystem-path read of the AppVersion locator survives until the
loader init-container lands — flagged inline so future readers don't read
the staging step as steady state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 4 of the init-container JAR fetch plan: migrate AppService.uploadJar
off direct filesystem writes onto the ArtifactStore abstraction so future
backends (OCI/Zot, S3) can swap in without touching service or controller
code.
- AppService constructor now takes (AppRepository, AppVersionRepository,
ArtifactStore, tenantId[, CreateGuard]). The store owns layout and the
locator string written into app_versions.jar_path.
- uploadJar buffers the request body once for hashing + storage, then
writes a scratch temp file solely for RuntimeDetector (which still
takes a Path); scratch is unconditionally deleted in finally.
- Add coordinatesFor(AppVersion) helper so downstream callers (Task 5+)
can derive ArtifactCoordinates without knowing the tenant binding.
- Remove resolveJarPath. DeploymentExecutor now reads jarPath directly
off the AppVersion record; the clean cut to download-URL delivery
lands in Task 11.
- RuntimeBeanConfig wires a FilesystemArtifactStore bean rooted at
cameleer.server.runtime.jarstoragepath and threads tenantId into the
AppService bean.
Bumps token_revoked_before by 1ms so a JWT issued in the same millisecond
as a logout call (Date.from(Instant.now()) quantises iat to ms) does not
survive the filter's strict isBefore check.
Also extends LogoutControllerIT @AfterEach to delete the audit_log row,
keeping reused Postgres containers clean for downstream ITs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps users.token_revoked_before = now() for the calling user, audited
under AuditCategory.AUTH. Best-effort: returns 204 even when the request
is unauthenticated, so the SPA can call it on every logout regardless of
token state. Token-rejection is enforced by the existing
JwtAuthenticationFilter revocation check (fixed in 7066795c).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds @AfterEach to delete the test users so Testcontainers reuse does
not leak an authenticated user with a future token_revoked_before into
the shared schema (visible to LicenseUsageReader.snapshot, user-admin
listing tests, etc.). Adds unrevokedUserTokenIsAccepted to pin the
revoked == null no-op branch as a first-class assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JwtAuthenticationFilter compared the JWT subject (user:alice) against
users.user_id (bare alice), so token_revoked_before was never read for
any user. Strips the prefix to match the convention documented in
CLAUDE.md. Adds JwtRevocationIT as a regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the pure license contract types (LicenseInfo, LicenseValidator,
LicenseState, LicenseStateMachine, LicenseLimits, DefaultTierLimits) into a
new cameleer-license-api module under package com.cameleer.license.
Why: cameleer-license-minter previously depended on cameleer-server-core for
these types, dragging cameleer-server-core + cameleer-common onto the
classpath of every minter consumer (notably cameleer-saas). The SaaS
management plane has no business carrying server-runtime types — it only
needs the license contract to mint and verify tokens.
After:
cameleer-license-minter -> cameleer-license-api (no server internals)
cameleer-server-core -> cameleer-license-api
cameleer-saas -> cameleer-license-minter -> cameleer-license-api
Verified: mvn -pl cameleer-license-minter dependency:tree shows the minter
no longer pulls cameleer-server-core or cameleer-common. Full reactor
verify (-DskipITs) green: 371 tests pass.
LicenseGate stays in server-core (server-runtime state holder, not contract).
Closescameleer/cameleer-server#156
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new assertions: license table has tenant_id/license_id/token/
installed_at/installed_by/expires_at/last_validated_at columns with
expected types + NOT NULL constraints, PK on tenant_id; environments
has execution_retention_days/log_retention_days/metric_retention_days
all integer NOT NULL DEFAULT 1.
Note: V5 migration does not include an installed_via column; the
plan's spec was aspirational. Test asserts what the migration
actually creates (and what PostgresLicenseRepository reads/writes).
OpenAPI regen (Step 35.2) deferred to session end — requires running
backend + UI dev server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Install license with max_log_retention_days=30, env.configured=60 →
effective=30; verify ClickHouse logs table reflects toIntervalDay(30).
Replace with max=7 → effective=7; verify TTL recomputed. Polls
system.tables.create_table_query up to 5s for the @Async listener
to apply.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five @Nested cap surfaces (envs, apps, outbound, alert rules, users)
share a single synthetic license with cap=1 each. Each test pushes
just past the cap and verifies the standard 403 envelope plus a
cap_exceeded audit row. Per-limit ITs cover full per-cap behavior;
this IT catches accidental wire-rip regressions across all caps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end IT covering the full lifecycle: mint a token via
cameleer-license-minter (test-scope), POST it via /api/v1/admin/license,
verify state=ACTIVE, clear gate, revalidate from PG, verify state restored.
Plus: tampered signature -> 400 + LICENSE/FAILURE audit row, gate not
mutated to ACTIVE.
Adds cameleer-license-minter as a test-scope dep on cameleer-server-app
(verified absent from runtime/compile classpaths). Also disables the
default spring-boot:repackage execution on the minter pom so the main
artifact stays as a plain library JAR consumable as a Maven dependency
(the cli classifier still produces the executable jar).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cameleer_license_state{state=...} (one-hot per LicenseState),
cameleer_license_days_remaining (negative when ABSENT/INVALID),
cameleer_license_last_validated_age_seconds. Refreshed on
LicenseChangedEvent and every 60s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Returns state, expiresAt/daysRemaining, lastValidatedAt, message
(LicenseMessageRenderer.forState), and a limits[] array where each
entry carries key/current/cap/source ("license" vs "default"). Adds
public AgentRegistryService.liveCount() so max_agents can be reported
from the in-memory registry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GET returns {state, invalidReason, envelope, lastValidatedAt}. POST
delegates to licenseService.install(token, userId, "api") so install
goes through audit + persistence + event publish. Removes the inline
LicenseValidator construction from the controller.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Scheduled(cron = "0 0 3 * * *") triggers svc.revalidate() daily.
@EventListener(ApplicationReadyEvent.class) @Async fires once 60s
after boot to catch ABSENT->ACTIVE transitions if the license was
written to PG between server starts. Exceptions are logged but never
propagate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@EventListener fires on every license install/replace/expire. For each
environment, computes effective TTL = min(licenseCap, env.configured)
and emits one ALTER TABLE ... MODIFY TTL ... per (table, env). Tables
covered: executions, processor_executions, logs, agent_metrics,
agent_events. ClickHouse failures are logged but do not propagate
(listener is async-tolerant).
route_diagrams is intentionally excluded -- it has no TTL clause in
init.sql (ReplacingMergeTree keyed on content_hash, not time-series).
server_metrics is also excluded -- it has no environment column
(server straddles environments).
Per-environment TTL via WHERE requires ClickHouse 22.3+; the project's
current image (clickhouse/clickhouse-server:24.12) is well above that
floor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three int fields to the Environment record + repository row mapper,
matching the columns added in V5. Default value is 1 per the V5 NOT NULL
DEFAULT 1. Read DTO surfaces the fields via Jackson record serialization;
setter endpoint deferred to a follow-up that wires the corresponding
license cap checks.
The canonical constructor enforces >= 1 for each retention field — V5
guarantees this at the DB level, but the runtime guard catches in-memory
construction errors (e.g., test sites that pass 0).
Test sites updated to the 12-arg signature with retention defaults of 1.
EnvironmentAdminControllerIT gains a regression test asserting the wire
shape exposes all three fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Returns 422 UNPROCESSABLE_ENTITY when jarRetentionCount exceeds
license cap. Default tier cap = 3. The other three retention caps
(execution/log/metric retention days) are deferred to T26+ where
the corresponding fields are added to Environment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ComputeUsage record + computeUsage() helper to LicenseUsageReader
that aggregates from PG. DeploymentExecutor.executeAsync runs three
assertWithinCap checks (max_total_cpu_millis, max_total_memory_mb,
max_total_replicas) right after config resolution. The existing
executor try/catch turns a LicenseCapExceededException into a FAILED
deployment with the cap message in the failure reason.
Adds ComputeCapEnforcementIT (HTTP-driven; @MockBean RuntimeOrchestrator,
since cap rejection short-circuits before any orchestrator call) plus
defensive license lifts in BlueGreenStrategyIT, RollingStrategyIT,
DeploymentSnapshotIT, and DeploymentControllerAuditIT so sequential
deploys under testcontainer reuse don't trip the new caps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds AlertRuleRepository.count() and a LicenseEnforcer.assertWithinCap
call at the top of the POST handler. Default cap = 2; the 3rd rule
gets the standard 403 envelope. Sibling alert ITs that legitimately
need more than 2 rules get the cap lifted via the test-license helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds LicenseEnforcer.assertWithinCap call at the top of create() using
repo.listByTenant(tenantId).size() as the current count. Lifts the cap
in OutboundConnectionAdminControllerIT (duplicateNameReturns409 needs
2 creates in one test). LicenseExceptionAdvice maps the rejection to
the standard 403 envelope; cap_exceeded audit row emitted via the
LicenseEnforcer 3-arg ctor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires LicenseEnforcer into UserAdminController.createUser and
OidcAuthController auto-signup. Cap fires before any validation so
over-cap creates short-circuit cheaply. Audit emission already
present (LicenseEnforcer 3-arg ctor from T16 emits cap_exceeded
under AuditCategory.LICENSE).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a CreateGuard to AgentRegistryService that fires only on NEW
registrations: re-registers of an existing agent bypass the cap (they
don't grow the registry, and rejecting them would orphan an agent that
already counts against the cap). Live-only count for cap enforcement —
STALE/DEAD/SHUTDOWN agents are excluded so the cap reflects the working
fleet, not historical residue.
Reuses the CreateGuard pattern from T18-T19. The global
LicenseExceptionAdvice maps the resulting LicenseCapExceededException to
403 with the structured envelope — no AgentRegistrationController
changes needed.
AgentCapEnforcementIT exercises the HTTP path end-to-end: two registers
succeed at cap=2, a third returns 403 with the expected envelope, and a
re-register of an already-registered agent succeeds at-cap.
Sibling agent-registering ITs (Agent*ControllerIT, Diagram*IT,
Execution*IT, Search*IT, Protocol*IT, Backpressure*IT, JwtRefresh*IT,
Registration*IT, Security*IT, SseSigning*IT, IngestionSchemaIT) lift
max_agents in @BeforeEach and clear the synthetic license in @AfterEach
— the in-memory registry is shared across @SpringBootTest reuse
boundaries, so without the lift the default-tier max_agents=5 would be
exhausted by accumulated test residue.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds CreateGuard hook to AppService.createApp using the same pattern
as T18 (EnvironmentService). AppRepository.count() added; the bean
wires LicenseEnforcer.assertWithinCap("max_apps", current, 1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Makes the signature-bypass loud at every call site since T19-T25 will
copy this pattern 5+ more times. The helper still loads via
LicenseGate.load() directly (no signature check) — the new name
ensures any future caller has to acknowledge that.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds CreateGuard functional interface to core (preserves the no-Spring
boundary between core and app) and wires LicenseEnforcer into the
EnvironmentService bean in RuntimeBeanConfig so POST
/api/v1/admin/environments rejects with the structured 403 envelope
(error/limit/cap/state/message) once the cap is reached. Default tier
max_environments=1; the V1 baseline seeds the default env, so the very
next create through the API is rejected unless a license lifts the cap.
Also adds EnvironmentRepository.count() (with PostgresEnvironmentRepository
impl), TestSecurityHelper.installTestLicenseWithCaps(...) so existing ITs
that POST envs keep working, and a defensive cleanup in
LicenseUsageReaderIT/EnvironmentAdminControllerIT to stay
order-independent under Testcontainer reuse (deletes deployments+apps
before envs to avoid FK violations).
Test: EnvironmentCapEnforcementIT (new) drives the rejection path
end-to-end and asserts the 403 body shape produced by
LicenseExceptionAdvice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One COUNT per entity table; one SUM-grouped query over non-stopped
deployments for compute caps. SQL traverses
deployed_config_snapshot->'containerConfig' (corrected from the
plan's top-level path; the snapshot record nests containerConfig
under that key). agentCount is fed in by the controller since it's
an in-memory registry value, not a DB row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups to LicenseEnforcer review:
- Add @Autowired to the 3-arg ctor so Spring picks it unambiguously
(the 2-arg test ctor is otherwise an equally-greedy candidate).
- Wrap audit.log() in try/catch + log.warn so a degraded audit DB
cannot mask a cap rejection: callers still see HTTP 403 even when
audit storage is unhealthy.
- Extract counter name to private static final.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
assertWithinCap consults LicenseGate.getEffectiveLimits, throws
LicenseCapExceededException on overflow, increments
cameleer_license_cap_rejections_total{limit=...} for telemetry, and
emits an AuditCategory.LICENSE cap_exceeded audit row when an
AuditService is wired (3-arg ctor; the test-only 2-arg ctor passes
null and the audit call short-circuits). Unknown limit keys are
programmer errors (IllegalArgumentException), not 403s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LicenseCapExceededException + @ControllerAdvice mapping to 403 with a
body that includes state, limit, current, cap, and a per-state human
message templated by LicenseMessageRenderer (covers ABSENT/ACTIVE/
GRACE/EXPIRED/INVALID with day counts and reason). Adds the forState()
overload now (used by the /usage endpoint in Task 30) so both surfaces
share identical phrasing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LicenseBootLoader @PostConstruct calls LicenseService.loadInitial,
which delegates to install() so env-var/file/DB paths share a single
audit + event-publish code path. A missing public key now produces
an always-failing validator (constructed with a throwaway keypair so
the parent ctor accepts it) so loaded tokens route to INVALID
instead of being silently ignored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single mediation point for token install/replace/revalidate. Audits
under AuditCategory.LICENSE, persists to PG, mutates the LicenseGate,
and publishes LicenseChangedEvent so downstream listeners
(RetentionPolicyApplier, LicenseMetrics) react uniformly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>