Commit Graph

243 Commits

Author SHA1 Message Date
hsiegeln
9031533077 feat(auth): add POST /auth/logout that revokes all user tokens
Bumps users.token_revoked_before = now() for the calling user, audited
under AuditCategory.AUTH. Best-effort: returns 204 even when the request
is unauthenticated, so the SPA can call it on every logout regardless of
token state. Token-rejection is enforced by the existing
JwtAuthenticationFilter revocation check (fixed in 7066795c).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:21:47 +02:00
hsiegeln
b4c6e45d35 test(auth): JwtRevocationIT cleanup + unrevoked-token coverage
Adds @AfterEach to delete the test users so Testcontainers reuse does
not leak an authenticated user with a future token_revoked_before into
the shared schema (visible to LicenseUsageReader.snapshot, user-admin
listing tests, etc.). Adds unrevokedUserTokenIsAccepted to pin the
revoked == null no-op branch as a first-class assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:18:10 +02:00
hsiegeln
7066795c3c fix(auth): strip user: prefix before token-revocation lookup
JwtAuthenticationFilter compared the JWT subject (user:alice) against
users.user_id (bare alice), so token_revoked_before was never read for
any user. Strips the prefix to match the convention documented in
CLAUDE.md. Adds JwtRevocationIT as a regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 09:11:55 +02:00
hsiegeln
858975f03f refactor(license): extract cameleer-license-api module from server-core
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 2m57s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Splits the pure license contract types (LicenseInfo, LicenseValidator,
LicenseState, LicenseStateMachine, LicenseLimits, DefaultTierLimits) into a
new cameleer-license-api module under package com.cameleer.license.

Why: cameleer-license-minter previously depended on cameleer-server-core for
these types, dragging cameleer-server-core + cameleer-common onto the
classpath of every minter consumer (notably cameleer-saas). The SaaS
management plane has no business carrying server-runtime types — it only
needs the license contract to mint and verify tokens.

After:
  cameleer-license-minter -> cameleer-license-api  (no server internals)
  cameleer-server-core    -> cameleer-license-api
  cameleer-saas           -> cameleer-license-minter -> cameleer-license-api

Verified: mvn -pl cameleer-license-minter dependency:tree shows the minter
no longer pulls cameleer-server-core or cameleer-common. Full reactor
verify (-DskipITs) green: 371 tests pass.

LicenseGate stays in server-core (server-runtime state holder, not contract).

Closes cameleer/cameleer-server#156

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:06:52 +02:00
hsiegeln
45b5f473c9 refactor(auth): post-review tidy — drop @NotNull, refresh e2e comment, use oidc.primary
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 19:48:20 +02:00
hsiegeln
af53eca7f6 test(auth): tighten AuthCapabilitiesControllerIT — drop redundant stub, add coverage gaps 2026-04-26 19:17:05 +02:00
hsiegeln
4f6e7ea4dc feat(auth): AuthCapabilitiesController — GET /api/v1/auth/capabilities
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:10:17 +02:00
hsiegeln
2f7c6aa005 fix(auth): @NotNull on AuthCapabilitiesResponse.Oidc.providerName 2026-04-26 18:59:20 +02:00
hsiegeln
f945d10d48 feat(auth): AuthCapabilitiesResponse DTO 2026-04-26 18:57:09 +02:00
hsiegeln
ddb18c4f17 feat(auth): OidcProviderNameDeriver — issuer URI → display label
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 18:53:31 +02:00
hsiegeln
581dc1ad13 test(license): SchemaBootstrapIT — assert V5 license + retention columns
Two new assertions: license table has tenant_id/license_id/token/
installed_at/installed_by/expires_at/last_validated_at columns with
expected types + NOT NULL constraints, PK on tenant_id; environments
has execution_retention_days/log_retention_days/metric_retention_days
all integer NOT NULL DEFAULT 1.

Note: V5 migration does not include an installed_via column; the
plan's spec was aspirational. Test asserts what the migration
actually creates (and what PostgresLicenseRepository reads/writes).

OpenAPI regen (Step 35.2) deferred to session end — requires running
backend + UI dev server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:13:50 +02:00
hsiegeln
e198c13e8a test(license): RetentionRuntimeRecomputeIT — TTL recompute on license change
Install license with max_log_retention_days=30, env.configured=60 →
effective=30; verify ClickHouse logs table reflects toIntervalDay(30).
Replace with max=7 → effective=7; verify TTL recomputed. Polls
system.tables.create_table_query up to 5s for the @Async listener
to apply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:08:28 +02:00
hsiegeln
1e78439ddd test(license): LicenseEnforcementIT — cross-cap smoke regression net
Five @Nested cap surfaces (envs, apps, outbound, alert rules, users)
share a single synthetic license with cap=1 each. Each test pushes
just past the cap and verifies the standard 403 envelope plus a
cap_exceeded audit row. Per-limit ITs cover full per-cap behavior;
this IT catches accidental wire-rip regressions across all caps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:00:50 +02:00
hsiegeln
1a307da6b2 test(license): LicenseLifecycleIT — install/persist/revalidate/reject
End-to-end IT covering the full lifecycle: mint a token via
cameleer-license-minter (test-scope), POST it via /api/v1/admin/license,
verify state=ACTIVE, clear gate, revalidate from PG, verify state restored.
Plus: tampered signature -> 400 + LICENSE/FAILURE audit row, gate not
mutated to ACTIVE.

Adds cameleer-license-minter as a test-scope dep on cameleer-server-app
(verified absent from runtime/compile classpaths). Also disables the
default spring-boot:repackage execution on the minter pom so the main
artifact stays as a plain library JAR consumable as a Maven dependency
(the cli classifier still produces the executable jar).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:56:01 +02:00
hsiegeln
885f2be16b feat(license): Prometheus gauges for state + days remaining
cameleer_license_state{state=...} (one-hot per LicenseState),
cameleer_license_days_remaining (negative when ABSENT/INVALID),
cameleer_license_last_validated_age_seconds. Refreshed on
LicenseChangedEvent and every 60s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:43:54 +02:00
hsiegeln
945ecd78cf feat(license): LicenseUsageController GET /api/v1/admin/license/usage
Returns state, expiresAt/daysRemaining, lastValidatedAt, message
(LicenseMessageRenderer.forState), and a limits[] array where each
entry carries key/current/cap/source ("license" vs "default"). Adds
public AgentRegistryService.liveCount() so max_agents can be reported
from the in-memory registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:42:39 +02:00
hsiegeln
3f69e546e4 refactor(license): LicenseAdminController delegates to LicenseService
GET returns {state, invalidReason, envelope, lastValidatedAt}. POST
delegates to licenseService.install(token, userId, "api") so install
goes through audit + persistence + event publish. Removes the inline
LicenseValidator construction from the controller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:34:07 +02:00
hsiegeln
340d954fed feat(license): LicenseRevalidationJob — daily cron + 60s post-startup
@Scheduled(cron = "0 0 3 * * *") triggers svc.revalidate() daily.
@EventListener(ApplicationReadyEvent.class) @Async fires once 60s
after boot to catch ABSENT->ACTIVE transitions if the license was
written to PG between server starts. Exceptions are logged but never
propagate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:32:33 +02:00
hsiegeln
484a55f4f4 feat(license): RetentionPolicyApplier listens on LicenseChangedEvent
@EventListener fires on every license install/replace/expire. For each
environment, computes effective TTL = min(licenseCap, env.configured)
and emits one ALTER TABLE ... MODIFY TTL ... per (table, env). Tables
covered: executions, processor_executions, logs, agent_metrics,
agent_events. ClickHouse failures are logged but do not propagate
(listener is async-tolerant).

route_diagrams is intentionally excluded -- it has no TTL clause in
init.sql (ReplacingMergeTree keyed on content_hash, not time-series).
server_metrics is also excluded -- it has no environment column
(server straddles environments).

Per-environment TTL via WHERE requires ClickHouse 22.3+; the project's
current image (clickhouse/clickhouse-server:24.12) is well above that
floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:28:42 +02:00
hsiegeln
cc5d88d708 feat(license): surface execution/log/metric retention days on Environment
Adds three int fields to the Environment record + repository row mapper,
matching the columns added in V5. Default value is 1 per the V5 NOT NULL
DEFAULT 1. Read DTO surfaces the fields via Jackson record serialization;
setter endpoint deferred to a follow-up that wires the corresponding
license cap checks.

The canonical constructor enforces >= 1 for each retention field — V5
guarantees this at the DB level, but the runtime guard catches in-memory
construction errors (e.g., test sites that pass 0).

Test sites updated to the 12-arg signature with retention defaults of 1.
EnvironmentAdminControllerIT gains a regression test asserting the wire
shape exposes all three fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:22:40 +02:00
hsiegeln
046f08fe87 feat(license): enforce max_jar_retention_count at PUT jar-retention
Returns 422 UNPROCESSABLE_ENTITY when jarRetentionCount exceeds
license cap. Default tier cap = 3. The other three retention caps
(execution/log/metric retention days) are deferred to T26+ where
the corresponding fields are added to Environment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:16:04 +02:00
hsiegeln
56bddcc747 feat(license): enforce compute caps at DeploymentExecutor PRE_FLIGHT
Adds ComputeUsage record + computeUsage() helper to LicenseUsageReader
that aggregates from PG. DeploymentExecutor.executeAsync runs three
assertWithinCap checks (max_total_cpu_millis, max_total_memory_mb,
max_total_replicas) right after config resolution. The existing
executor try/catch turns a LicenseCapExceededException into a FAILED
deployment with the cap message in the failure reason.

Adds ComputeCapEnforcementIT (HTTP-driven; @MockBean RuntimeOrchestrator,
since cap rejection short-circuits before any orchestrator call) plus
defensive license lifts in BlueGreenStrategyIT, RollingStrategyIT,
DeploymentSnapshotIT, and DeploymentControllerAuditIT so sequential
deploys under testcontainer reuse don't trip the new caps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:09:39 +02:00
hsiegeln
71f3b70b86 feat(license): enforce max_alert_rules at AlertRuleController.create
Adds AlertRuleRepository.count() and a LicenseEnforcer.assertWithinCap
call at the top of the POST handler. Default cap = 2; the 3rd rule
gets the standard 403 envelope. Sibling alert ITs that legitimately
need more than 2 rules get the cap lifted via the test-license helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:50:59 +02:00
hsiegeln
5a579415a1 feat(license): enforce max_outbound_connections at OutboundConnectionServiceImpl.create
Adds LicenseEnforcer.assertWithinCap call at the top of create() using
repo.listByTenant(tenantId).size() as the current count. Lifts the cap
in OutboundConnectionAdminControllerIT (duplicateNameReturns409 needs
2 creates in one test). LicenseExceptionAdvice maps the rejection to
the standard 403 envelope; cap_exceeded audit row emitted via the
LicenseEnforcer 3-arg ctor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:40:12 +02:00
hsiegeln
1ff30905f7 feat(license): enforce max_users at user creation paths
Wires LicenseEnforcer into UserAdminController.createUser and
OidcAuthController auto-signup. Cap fires before any validation so
over-cap creates short-circuit cheaply. Audit emission already
present (LicenseEnforcer 3-arg ctor from T16 emits cap_exceeded
under AuditCategory.LICENSE).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:29:54 +02:00
hsiegeln
afdaee628b feat(license): enforce max_agents at AgentRegistryService.register
Adds a CreateGuard to AgentRegistryService that fires only on NEW
registrations: re-registers of an existing agent bypass the cap (they
don't grow the registry, and rejecting them would orphan an agent that
already counts against the cap). Live-only count for cap enforcement —
STALE/DEAD/SHUTDOWN agents are excluded so the cap reflects the working
fleet, not historical residue.

Reuses the CreateGuard pattern from T18-T19. The global
LicenseExceptionAdvice maps the resulting LicenseCapExceededException to
403 with the structured envelope — no AgentRegistrationController
changes needed.

AgentCapEnforcementIT exercises the HTTP path end-to-end: two registers
succeed at cap=2, a third returns 403 with the expected envelope, and a
re-register of an already-registered agent succeeds at-cap.

Sibling agent-registering ITs (Agent*ControllerIT, Diagram*IT,
Execution*IT, Search*IT, Protocol*IT, Backpressure*IT, JwtRefresh*IT,
Registration*IT, Security*IT, SseSigning*IT, IngestionSchemaIT) lift
max_agents in @BeforeEach and clear the synthetic license in @AfterEach
— the in-memory registry is shared across @SpringBootTest reuse
boundaries, so without the lift the default-tier max_agents=5 would be
exhausted by accumulated test residue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:19:08 +02:00
hsiegeln
80dafe685b feat(license): enforce max_apps at AppService.createApp
Adds CreateGuard hook to AppService.createApp using the same pattern
as T18 (EnvironmentService). AppRepository.count() added; the bean
wires LicenseEnforcer.assertWithinCap("max_apps", current, 1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 13:36:34 +02:00
hsiegeln
198811b752 refactor(license-test): rename installTestLicenseWithCaps -> installSyntheticUnsignedLicense
Makes the signature-bypass loud at every call site since T19-T25 will
copy this pattern 5+ more times. The helper still loads via
LicenseGate.load() directly (no signature check) — the new name
ensures any future caller has to acknowledge that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 13:24:58 +02:00
hsiegeln
8a64a9e04c feat(license): enforce max_environments at EnvironmentService.create
Adds CreateGuard functional interface to core (preserves the no-Spring
boundary between core and app) and wires LicenseEnforcer into the
EnvironmentService bean in RuntimeBeanConfig so POST
/api/v1/admin/environments rejects with the structured 403 envelope
(error/limit/cap/state/message) once the cap is reached. Default tier
max_environments=1; the V1 baseline seeds the default env, so the very
next create through the API is rejected unless a license lifts the cap.

Also adds EnvironmentRepository.count() (with PostgresEnvironmentRepository
impl), TestSecurityHelper.installTestLicenseWithCaps(...) so existing ITs
that POST envs keep working, and a defensive cleanup in
LicenseUsageReaderIT/EnvironmentAdminControllerIT to stay
order-independent under Testcontainer reuse (deletes deployments+apps
before envs to avoid FK violations).

Test: EnvironmentCapEnforcementIT (new) drives the rejection path
end-to-end and asserts the 403 body shape produced by
LicenseExceptionAdvice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 13:16:41 +02:00
hsiegeln
f291d7c24d feat(license): LicenseUsageReader aggregates current usage
One COUNT per entity table; one SUM-grouped query over non-stopped
deployments for compute caps. SQL traverses
deployed_config_snapshot->'containerConfig' (corrected from the
plan's top-level path; the snapshot record nests containerConfig
under that key). agentCount is fed in by the controller since it's
an in-memory registry value, not a DB row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:47:59 +02:00
hsiegeln
9b9b56043c fix(license): explicit @Autowired ctor + tolerate audit failures
Two follow-ups to LicenseEnforcer review:
- Add @Autowired to the 3-arg ctor so Spring picks it unambiguously
  (the 2-arg test ctor is otherwise an equally-greedy candidate).
- Wrap audit.log() in try/catch + log.warn so a degraded audit DB
  cannot mask a cap rejection: callers still see HTTP 403 even when
  audit storage is unhealthy.
- Extract counter name to private static final.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:43:27 +02:00
hsiegeln
4985348827 feat(license): LicenseEnforcer single entry point
assertWithinCap consults LicenseGate.getEffectiveLimits, throws
LicenseCapExceededException on overflow, increments
cameleer_license_cap_rejections_total{limit=...} for telemetry, and
emits an AuditCategory.LICENSE cap_exceeded audit row when an
AuditService is wired (3-arg ctor; the test-only 2-arg ctor passes
null and the audit call short-circuits). Unknown limit keys are
programmer errors (IllegalArgumentException), not 403s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:36:58 +02:00
hsiegeln
2bad9c3e48 feat(license): cap-exceeded exception + state-aware message renderer
LicenseCapExceededException + @ControllerAdvice mapping to 403 with a
body that includes state, limit, current, cap, and a per-state human
message templated by LicenseMessageRenderer (covers ABSENT/ACTIVE/
GRACE/EXPIRED/INVALID with day counts and reason). Adds the forState()
overload now (used by the /usage endpoint in Task 30) so both surfaces
share identical phrasing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:26:39 +02:00
hsiegeln
b95e80a24a feat(license): wire LicenseService into boot order (env > file > DB)
LicenseBootLoader @PostConstruct calls LicenseService.loadInitial,
which delegates to install() so env-var/file/DB paths share a single
audit + event-publish code path. A missing public key now produces
an always-failing validator (constructed with a throwaway keypair so
the parent ctor accepts it) so loaded tokens route to INVALID
instead of being silently ignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 11:16:49 +02:00
hsiegeln
6fbcf10ee4 feat(license): LicenseService + LicenseChangedEvent
Single mediation point for token install/replace/revalidate. Audits
under AuditCategory.LICENSE, persists to PG, mutates the LicenseGate,
and publishes LicenseChangedEvent so downstream listeners
(RetentionPolicyApplier, LicenseMetrics) react uniformly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 11:11:48 +02:00
hsiegeln
2e51deb511 feat(license): PostgresLicenseRepository + LicenseRecord
JdbcTemplate-backed repo; upsert is ON CONFLICT (tenant_id), touch
updates only last_validated_at, delete is provided for future
operator-clear flow (not exposed as REST in v1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 11:05:35 +02:00
hsiegeln
20aefd5bf6 feat(license): Flyway V5 — license table + environments retention columns
Per-tenant license row stores the signed token, licenseId for audit,
installed/expires/last_validated timestamps. environments gains three
INTEGER NOT NULL DEFAULT 1 retention columns (execution, log, metric)
so existing rows land inside the default-tier cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 11:02:44 +02:00
hsiegeln
0499a54ebc feat(license): rewrite LicenseGate around state + effective limits
LicenseGate now exposes getState() (delegates to LicenseStateMachine),
getEffectiveLimits() (merged over DefaultTierLimits in ACTIVE/GRACE,
defaults-only in ABSENT/EXPIRED/INVALID), markInvalid(reason), and
clear(). Internal snapshot is an immutable record-like class swapped
atomically so concurrent reads see a consistent license+reason pair.

Removes the transient openSentinel() and getTier() introduced by
earlier tasks (no production consumers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 10:48:56 +02:00
hsiegeln
cf84d80de7 feat(license): require licenseId + tenantId in validator
Spec §2.1 — both fields are required and the validator rejects a
token whose tenantId does not match the server's configured tenant
(CAMELEER_SERVER_TENANT_ID). Self-hosted customers cannot strip
tenantId because the field is in the signed payload.

LicenseBeanConfig and LicenseAdminController updated to pass the
expected tenant to the validator constructor. The transient
placeholder/TODO from Task 2 is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 10:40:04 +02:00
hsiegeln
2ebe4989bb feat(license): expand LicenseInfo with licenseId, tenantId, grace period
Required fields per spec §2.1. tenantId is non-blank; gracePeriodDays
defines the post-exp window during which limits keep applying.
isExpired() now honours the grace; isAfterRawExpiry() distinguishes
ACTIVE from GRACE for the state machine in Task 4.

Validator and gate use placeholder values temporarily; Task 3 wires
the validator to read the new fields, Task 5 rewrites the gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 10:33:16 +02:00
hsiegeln
551a7f12b5 refactor(license): remove dead Feature enum and isEnabled scaffolding
Spec §9 — feature flags are out of scope for license enforcement.
Drops Feature.java, LicenseGate.isEnabled, LicenseInfo.hasFeature,
and the corresponding test cases. LicenseValidator now silently
ignores any features array on the wire (no error).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 10:21:51 +02:00
hsiegeln
f6b76b2d5e docs(runtime): document hardening contract and runtime override (#152)
Surfaces the multi-tenant container hardening contract introduced in the
prior commit so operators and integrators know what is enforced and why.

- application.yml: declare `cameleer.server.runtime.dockerruntime`
  alongside the other runtime properties (empty = auto-detect runsc).
- HOWTO.md: add the override row to the Runtime config table.
- SERVER-CAPABILITIES.md: new "Multi-Tenant Runtime Sandboxing" section
  describing the cap_drop, no-new-privileges, AppArmor, read-only rootfs,
  pids_limit, /tmp tmpfs, and runsc auto-detect contract — plus the
  on-disk state caveat that motivates issue #153.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 21:06:10 +02:00
hsiegeln
8e9ad47077 feat(runtime): harden tenant containers + auto-detect gVisor (#152)
Tenant JARs are arbitrary user code: Camel ships components (camel-exec,
camel-bean, MVEL/Groovy templating) that turn a header into shell, and
Java 17 has no SecurityManager — the JVM is not a security boundary.
This applies an unconditional hardening contract to every tenant
container so a single runc CVE no longer equals host takeover.

DockerRuntimeOrchestrator.startContainer now sets:
- cap_drop ALL (Capability.values() — docker-java has no ALL constant)
- security_opt: no-new-privileges, apparmor=docker-default
  (default seccomp profile applies implicitly)
- read_only rootfs, pids_limit=512
- /tmp tmpfs rw,nosuid,size=256m — no noexec, since Netty/Snappy/LZ4/Zstd
  dlopen native libs from /tmp via mmap(PROT_EXEC) which noexec blocks

The orchestrator also probes `docker info` at construction and uses runsc
(gVisor) automatically when the daemon has it registered. Override via
cameleer.server.runtime.dockerruntime (e.g. "kata"); empty = auto.

Outbound TCP, DNS, and TLS are unaffected — caps/seccomp don't gate
those — so vanilla Camel-Kafka producers/consumers and REST integrations
keep working unchanged. Stateful tenants (Kafka Streams with on-disk
state stores, apps writing to /var/log/...) need explicit writeable
volumes; that's tracked in #153 as the natural follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 20:58:26 +02:00
hsiegeln
f27a0044f1 refactor(search): align ResponseStatusException imports + add wildcard HTTP test 2026-04-24 10:30:42 +02:00
hsiegeln
5c9323cfed feat(search): accept attr= multi-value query param on /executions GET
Add a repeatable attr query parameter to the GET /executions endpoint that
parses key-only (exists check) and key:value (exact or wildcard-via-*)
filters. Invalid keys are mapped to HTTP 400 via ResponseStatusException.
The POST /executions/search path already honoured attributeFilters from
the request body via the Jackson canonical ctor; an IT now proves it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:23:52 +02:00
hsiegeln
2dcbd5a772 feat(search): push AttributeFilter list into ClickHouse WHERE clause
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 10:13:30 +02:00
hsiegeln
b5ee9e1d1f feat(ui): server metrics admin dashboard
Adds /admin/server-metrics page mirroring the Database/ClickHouse visibility
rules: sidebar entry gated on capabilities.infrastructureEndpoints, backend
controller now has @ConditionalOnProperty(infrastructureendpoints) and
class-level @PreAuthorize('hasRole(ADMIN)'). Dashboard panels are driven
from docs/server-self-metrics.md via the generic
/api/v1/admin/server-metrics/{catalog,instances,query} API — Server Health,
JVM, HTTP & DB pools, and conditionally Alerting + Deployments when their
metrics appear in the catalog. ThemedChart / Line / Area from the design
system; hooks in ui/src/api/queries/admin/serverMetrics.ts. Not yet
browser-verified against a running dev server — backend IT covers the API
end-to-end (8 tests), UI typecheck + production bundle both clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:00:14 +02:00
hsiegeln
d58c8cde2e feat(server): REST API over server_metrics for SaaS dashboards
Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control
planes can build the server-health dashboard without direct ClickHouse
access. One generic /query endpoint covers every panel in the
server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag,
filter-by-tag, counter-delta mode with per-server_instance_id rotation
handling, and a derived 'mean' statistic for timers. Regex-validated
identifiers, parameterised literals, 31-day range cap, 500-series response
cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated:
all 17 suggested panels now expressed as single-endpoint queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:41:02 +02:00
hsiegeln
48ce75bf38 feat(server): persist server self-metrics into ClickHouse
Snapshot the full Micrometer registry (cameleer business metrics, alerting
metrics, and Spring Boot Actuator defaults) every 60s into a new
server_metrics table so server health survives restarts without an external
Prometheus. Includes a dashboard-builder reference for the SaaS team.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:20:45 +02:00
hsiegeln
f8e382c217 test(diagrams): add removed-route + point-in-time coverage
Store-level: assert findLatestContentHashForAppRoute picks the newest
hash across publishing instances (proves the lookup survives agent
removal), isolates by (app, env), and returns empty for blank inputs.

Controller-level: assert the env-scoped /routes/{routeId}/diagram
endpoint resolves without a registry prerequisite, 404s for unknown
routes, and that an execution's stored diagramContentHash stays pinned
to the point-in-time version after a newer diagram is stored — the
"latest" endpoint flips to v2, the by-hash render remains byte-stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:11:06 +02:00