Harden multi-tenant runtime: sandbox untrusted user JVMs (Docker + K8s) #152

New Issue

claude · 2026-04-25T09:56:57+02:00

claude commented

2026-04-25 09:56:57 +02:00

Threat model

Cameleer is marketed as an Apache Camel observability platform, but a Camel app is just a JVM — tenants can run arbitrary Java: Runtime.exec, Unsafe, JNI, reflection, dynamic class loading. Camel itself ships components that turn a single header into shell:

camel-exec — direct Runtime.exec (CVE-2025-27636 / CVE-2025-29891, Mar 2025)
camel-bean — header-driven reflective dispatch
camel-groovy, camel-joor, camel-mvel, camel-simple, camel-velocity, camel-mustache — Turing-complete or templating engines (CVE-2020-11994 SSTI→RCE)

Java 17 has no SecurityManager. The JVM is not a security boundary. All meaningful isolation must live at the OS / container / network layers.

When we run as cameleer-saas, every tenant container we launch is hostile by default. We need to keep tenants from attacking:

Each other
The platform (server, host, cluster)
The internet (mining, scanning, exfil, abuse of our IP rep)

Current gaps (from `DockerRuntimeOrchestrator`, `DeploymentExecutor`)

Layer	Current state	Risk
Container runtime	`runc` only	One runc CVE = host takeover. CVE-2024-21626 (Leaky Vessels) and CVE-2025-31133 / 52565 / 52881 (Nov 2025) prove this is annual.
User namespace	None — tenant JVM runs as root	UID 0 in container = UID 0 on host on any escape
Read-only rootfs	Not set	Persistence + miner unpacking trivial
Capabilities	Full default set	`CAP_NET_RAW`, `CAP_SYS_PTRACE`, etc. all granted
Seccomp	Not applied (not even `RuntimeDefault` forced)	Whole syscall surface available
AppArmor / SELinux	None	No MAC layer
`--pids-limit`	None	Fork bomb crashes the host
Egress	Unrestricted	Mining, scanning, exfil, IMDS access
Cross-tenant network	Shared `cameleer-traefik` bridge	Any tenant can reach any other tenant's TCP ports
Per-tenant K8s	Not implemented	Greenfield — design isolation in from day one

What's already OK:

✅ Memory + CPU limits configurable
✅ JAR bind-mounted read-only
✅ No Docker socket / no --privileged
✅ Per-tenant primary network exists (cameleer-tenant-{slug})
✅ Per-env scoping (cameleer-env-{tenantId}-{envSlug})
✅ No DB / JWT secret / K8s API token leaks into tenant containers
✅ Tenant agent token is narrowly scoped (per app, env)

P0 — ship before any external SaaS tenant runs untrusted code

1. Sandboxed container runtime

Install gVisor (runsc) on Docker hosts. Pass --runtime=runsc for tenant containers via withRuntime("runsc") on HostConfig.
On K8s: declare a RuntimeClass named gvisor, force it on tenant namespaces via Kyverno policy.
High-sensitivity tier (regulated workloads): Kata + Firecracker (~150-300ms cold start, full guest kernel — Fly.io runs Java this way).
Single biggest leverage point. Converts a runc-CVE-of-the-quarter from "total host takeover" to "tenant pod owned, nothing else."

2. Harden every tenant container

Add to DockerRuntimeOrchestrator.HostConfig build (gated by cameleer.server.runtime.hardened=true, default true for SaaS):

hostConfig
  .withReadonlyRootfs(true)
  .withCapDrop(Capability.ALL)
  .withSecurityOpts(List.of(
      "no-new-privileges:true",
      "seccomp=default",
      "apparmor=docker-default"
  ))
  .withPidsLimit(512L)
  .withUsernsMode("host:1000:65536")
  .withTmpFs(Map.of("/tmp", "rw,noexec,nosuid,size=64m"));

Enforce cgroup v2 on hosts (systemd.unified_cgroup_hierarchy=1).

3. Operator-controlled JRE base image

Tenants upload only the JAR. Our CI builds the final image: pinned JRE 21, fixed entrypoint, JVM flags hard-coded:

-XX:+UseContainerSupport -XX:MaxRAMPercentage=75
-XX:ActiveProcessorCount=2 -XX:+ExitOnOutOfMemoryError
-XX:-UsePerfData -Dnetworkaddress.cache.ttl=30

Strip -javaagent / -agentlib from any user-supplied manifest before launch. Our agent javaagent must be the only one.

4. Default-deny egress per tenant

Today every tenant container can reach the internet and our control plane. Replace the shared cameleer-traefik bridge model:

Docker (interim): per-tenant user-defined bridge networks + iptables OUTPUT rules dropping tenant-bridge → RFC1918 to anywhere except our egress proxy. Reverse the Traefik model: Traefik joins each tenant's network rather than exposing all tenants on a shared bridge.
K8s: Cilium CiliumClusterwideNetworkPolicy — default-deny ingress + egress, allow only kube-dns + per-tenant L7 egress proxy. Block 169.254.169.254 (cloud IMDS), control-plane CIDR, K8s API service CIDR.

5. Kyverno admission policies (when on K8s)

Reject privileged, hostPath, hostNetwork, hostPID, hostIPC, docker.sock mounts, dangerous capabilities.
Require runtimeClassName: gvisor on tenant namespaces.
Require automountServiceAccountToken: false on every tenant pod.
Require resource limits.
Pod Security Standards restricted enforced via namespace label pod-security.kubernetes.io/enforce: restricted.

6. Per-tenant K8s namespace

ResourceQuota + LimitRange per tenant — one tenant cannot exhaust cluster.
Dedicated ServiceAccount with no RBAC and automountServiceAccountToken: false.

7. runc / containerd patch monitoring

Subscribe to runc-security mailing list. Auto-deploy patches.

P1 — first quarter after launch

8. Falco + custom Camel rules

Stable Falco rule set plus our additions:

Java process spawning sh|bash|curl|wget|nc|nmap|python|perl → catches camel-exec abuse instantly
Read of /proc/cpuinfo from tenant container → mining recon
Outbound TCP rate >50/min → port scanning
Sustained pod CPU >90% for >5 min on non-CPU-tier tenants → mining economic signal
DNS for known mining-pool / C2 domains

9. CoreDNS sinkhole

NXDOMAIN for mining-pool wildcards (*.minexmr.com, *.nanopool.org, ethermine, f2pool, supportxmr) and a published C2 feed.

10. L7 egress proxy per tenant

Squid / Envoy with domain allowlist the tenant declares at deploy time. Default: deny all egress except curated platform allowlist (Maven Central, etc.). Gives us billable byte-rate per tenant.

11. Camel component allowlist at deploy admission

Reject JARs whose META-INF/services/org/apache/camel/component/ includes exec, groovy, joor, mvel unless tenant has explicitly opted in. ASM-scan bytecode for Runtime.exec, ProcessBuilder, Unsafe, JNI — fail-closed if found and tenant tier is "untrusted."

12. Image scan + signing

Trivy in CI. Reject HIGH/CRITICAL CVEs at registry push gate. Cosign-sign every image. Kyverno verifyImages rejects unsigned at admission.

13. SBOM scan tenant JARs

Typosquatting detection (Dec 2025 org.fasterxml.jackson → com.fasterxml.jackson swap dropped a Cobalt Strike beacon). Dependency-Track or Trivy SBOM mode. Levenshtein-close GAV coords from non-allowlisted groups → alert.

14. Camel CVE auto-patch policy

Refuse to deploy a JAR whose Camel version < our CVE-clean floor. Move floor monthly.

15. Per-tenant egress quota + flow logs

Cilium bandwidth manager. Hubble flow logs → ClickHouse for per-tenant byte-rate alerting (we already have ClickHouse — natural fit).

P2 — at scale or for regulated workloads

Kata + Firecracker universally — stop trusting host kernel
Tetragon in enforcing mode — kernel-level kill on policy violation
Per-tenant node pools (taints/tolerations) — Spectre / L1TF side-channel hardening
Per-tenant SNAT egress IP via Cilium egress gateway — IP-reputation isolation
Quarterly red-team — try to escape our own sandbox

Smallest first PR (deployable today, no K8s required)

Extend DockerRuntimeOrchestrator HostConfig with: read_only, cap_drop ALL, no-new-privileges, seccomp=default, apparmor=docker-default, pids_limit, userns_mode, tmpfs /tmp. Behind feature flag cameleer.server.runtime.hardened so we can ramp per-tenant.
Install gVisor on Docker hosts. Add cameleer.server.runtime.dockerRuntime config; pass withRuntime("runsc"). One-line opt-in for migration.
Reverse the cameleer-traefik bridge model: Traefik joins per-tenant networks rather than tenants joining a shared bridge. Kills cross-tenant TCP today.
Add containerConfig.allowedEgress: List<String> to app config; default []. Wire to host iptables rules (interim) until Cilium / K8s lands.

These four are reversible, behind flags, no K8s required, and close the runc-host-takeover blast radius now.

Sub-issue tracking

This is an epic. File children for: gVisor rollout, container hardening flags, base-image control, Camel component allowlist, Kyverno policy bundle, Cilium NetworkPolicy bundle, Falco rule set, CoreDNS sinkhole, L7 egress proxy, Trivy/Cosign pipeline, JAR SBOM scanner, Tetragon enforcement, red-team exercise.

Key references

## Threat model Cameleer is marketed as an Apache Camel observability platform, but a Camel app is just a JVM — tenants can run **arbitrary Java**: `Runtime.exec`, `Unsafe`, JNI, reflection, dynamic class loading. Camel itself ships components that turn a single header into shell: - `camel-exec` — direct `Runtime.exec` (CVE-2025-27636 / CVE-2025-29891, Mar 2025) - `camel-bean` — header-driven reflective dispatch - `camel-groovy`, `camel-joor`, `camel-mvel`, `camel-simple`, `camel-velocity`, `camel-mustache` — Turing-complete or templating engines (CVE-2020-11994 SSTI→RCE) Java 17 has no `SecurityManager`. **The JVM is not a security boundary.** All meaningful isolation must live at the OS / container / network layers. When we run as `cameleer-saas`, every tenant container we launch is hostile by default. We need to keep tenants from attacking: 1. Each other 2. The platform (server, host, cluster) 3. The internet (mining, scanning, exfil, abuse of our IP rep) ## Current gaps (from `DockerRuntimeOrchestrator`, `DeploymentExecutor`) | Layer | Current state | Risk | |---|---|---| | Container runtime | `runc` only | One runc CVE = host takeover. CVE-2024-21626 (Leaky Vessels) and CVE-2025-31133 / 52565 / 52881 (Nov 2025) prove this is annual. | | User namespace | None — tenant JVM runs as root | UID 0 in container = UID 0 on host on any escape | | Read-only rootfs | Not set | Persistence + miner unpacking trivial | | Capabilities | Full default set | `CAP_NET_RAW`, `CAP_SYS_PTRACE`, etc. all granted | | Seccomp | Not applied (not even `RuntimeDefault` forced) | Whole syscall surface available | | AppArmor / SELinux | None | No MAC layer | | `--pids-limit` | None | Fork bomb crashes the host | | Egress | Unrestricted | Mining, scanning, exfil, IMDS access | | Cross-tenant network | Shared `cameleer-traefik` bridge | Any tenant can reach any other tenant's TCP ports | | Per-tenant K8s | Not implemented | Greenfield — design isolation in from day one | What's already OK: - ✅ Memory + CPU limits configurable - ✅ JAR bind-mounted read-only - ✅ No Docker socket / no `--privileged` - ✅ Per-tenant primary network exists (`cameleer-tenant-{slug}`) - ✅ Per-env scoping (`cameleer-env-{tenantId}-{envSlug}`) - ✅ No DB / JWT secret / K8s API token leaks into tenant containers - ✅ Tenant agent token is narrowly scoped (per app, env) ## P0 — ship before any external SaaS tenant runs untrusted code ### 1. Sandboxed container runtime - Install **gVisor** (`runsc`) on Docker hosts. Pass `--runtime=runsc` for tenant containers via `withRuntime("runsc")` on `HostConfig`. - On K8s: declare a `RuntimeClass` named `gvisor`, force it on tenant namespaces via Kyverno policy. - High-sensitivity tier (regulated workloads): **Kata + Firecracker** (~150-300ms cold start, full guest kernel — Fly.io runs Java this way). - Single biggest leverage point. Converts a runc-CVE-of-the-quarter from "total host takeover" to "tenant pod owned, nothing else." ### 2. Harden every tenant container Add to `DockerRuntimeOrchestrator.HostConfig` build (gated by `cameleer.server.runtime.hardened=true`, default true for SaaS): ```java hostConfig .withReadonlyRootfs(true) .withCapDrop(Capability.ALL) .withSecurityOpts(List.of( "no-new-privileges:true", "seccomp=default", "apparmor=docker-default" )) .withPidsLimit(512L) .withUsernsMode("host:1000:65536") .withTmpFs(Map.of("/tmp", "rw,noexec,nosuid,size=64m")); ``` Enforce cgroup v2 on hosts (`systemd.unified_cgroup_hierarchy=1`). ### 3. Operator-controlled JRE base image Tenants upload **only the JAR**. Our CI builds the final image: pinned JRE 21, fixed entrypoint, JVM flags hard-coded: ``` -XX:+UseContainerSupport -XX:MaxRAMPercentage=75 -XX:ActiveProcessorCount=2 -XX:+ExitOnOutOfMemoryError -XX:-UsePerfData -Dnetworkaddress.cache.ttl=30 ``` Strip `-javaagent` / `-agentlib` from any user-supplied manifest before launch. Our agent javaagent must be the only one. ### 4. Default-deny egress per tenant Today every tenant container can reach the internet *and* our control plane. Replace the shared `cameleer-traefik` bridge model: - **Docker (interim)**: per-tenant user-defined bridge networks + iptables OUTPUT rules dropping `tenant-bridge → RFC1918` to anywhere except our egress proxy. Reverse the Traefik model: Traefik joins each tenant's network rather than exposing all tenants on a shared bridge. - **K8s**: Cilium `CiliumClusterwideNetworkPolicy` — default-deny ingress + egress, allow only kube-dns + per-tenant L7 egress proxy. Block `169.254.169.254` (cloud IMDS), control-plane CIDR, K8s API service CIDR. ### 5. Kyverno admission policies (when on K8s) - Reject `privileged`, `hostPath`, `hostNetwork`, `hostPID`, `hostIPC`, docker.sock mounts, dangerous capabilities. - Require `runtimeClassName: gvisor` on tenant namespaces. - Require `automountServiceAccountToken: false` on every tenant pod. - Require resource limits. - Pod Security Standards `restricted` enforced via namespace label `pod-security.kubernetes.io/enforce: restricted`. ### 6. Per-tenant K8s namespace - `ResourceQuota` + `LimitRange` per tenant — one tenant cannot exhaust cluster. - Dedicated `ServiceAccount` with no RBAC and `automountServiceAccountToken: false`. ### 7. runc / containerd patch monitoring Subscribe to runc-security mailing list. Auto-deploy patches. ## P1 — first quarter after launch ### 8. Falco + custom Camel rules Stable [Falco rule set](https://github.com/falcosecurity/rules) plus our additions: - Java process spawning `sh|bash|curl|wget|nc|nmap|python|perl` → catches `camel-exec` abuse instantly - Read of `/proc/cpuinfo` from tenant container → mining recon - Outbound TCP rate >50/min → port scanning - Sustained pod CPU >90% for >5 min on non-CPU-tier tenants → mining economic signal - DNS for known mining-pool / C2 domains ### 9. CoreDNS sinkhole NXDOMAIN for mining-pool wildcards (`*.minexmr.com`, `*.nanopool.org`, ethermine, f2pool, supportxmr) and a published C2 feed. ### 10. L7 egress proxy per tenant Squid / Envoy with domain allowlist the tenant declares at deploy time. Default: deny all egress except curated platform allowlist (Maven Central, etc.). Gives us billable byte-rate per tenant. ### 11. Camel component allowlist at deploy admission Reject JARs whose `META-INF/services/org/apache/camel/component/` includes `exec`, `groovy`, `joor`, `mvel` unless tenant has explicitly opted in. ASM-scan bytecode for `Runtime.exec`, `ProcessBuilder`, `Unsafe`, JNI — fail-closed if found and tenant tier is "untrusted." ### 12. Image scan + signing Trivy in CI. Reject HIGH/CRITICAL CVEs at registry push gate. Cosign-sign every image. Kyverno `verifyImages` rejects unsigned at admission. ### 13. SBOM scan tenant JARs Typosquatting detection (Dec 2025 `org.fasterxml.jackson` → `com.fasterxml.jackson` swap dropped a Cobalt Strike beacon). Dependency-Track or Trivy SBOM mode. Levenshtein-close GAV coords from non-allowlisted groups → alert. ### 14. Camel CVE auto-patch policy Refuse to deploy a JAR whose Camel version < our CVE-clean floor. Move floor monthly. ### 15. Per-tenant egress quota + flow logs Cilium bandwidth manager. Hubble flow logs → ClickHouse for per-tenant byte-rate alerting (we already have ClickHouse — natural fit). ## P2 — at scale or for regulated workloads 16. **Kata + Firecracker** universally — stop trusting host kernel 17. **Tetragon** in enforcing mode — kernel-level kill on policy violation 18. **Per-tenant node pools** (taints/tolerations) — Spectre / L1TF side-channel hardening 19. **Per-tenant SNAT egress IP** via Cilium egress gateway — IP-reputation isolation 20. **Quarterly red-team** — try to escape our own sandbox ## Smallest first PR (deployable today, no K8s required) 1. Extend `DockerRuntimeOrchestrator` `HostConfig` with: `read_only`, `cap_drop ALL`, `no-new-privileges`, `seccomp=default`, `apparmor=docker-default`, `pids_limit`, `userns_mode`, `tmpfs /tmp`. Behind feature flag `cameleer.server.runtime.hardened` so we can ramp per-tenant. 2. Install gVisor on Docker hosts. Add `cameleer.server.runtime.dockerRuntime` config; pass `withRuntime("runsc")`. One-line opt-in for migration. 3. Reverse the `cameleer-traefik` bridge model: Traefik joins per-tenant networks rather than tenants joining a shared bridge. Kills cross-tenant TCP today. 4. Add `containerConfig.allowedEgress: List<String>` to app config; default `[]`. Wire to host iptables rules (interim) until Cilium / K8s lands. These four are reversible, behind flags, no K8s required, and close the runc-host-takeover blast radius now. ## Sub-issue tracking This is an epic. File children for: gVisor rollout, container hardening flags, base-image control, Camel component allowlist, Kyverno policy bundle, Cilium NetworkPolicy bundle, Falco rule set, CoreDNS sinkhole, L7 egress proxy, Trivy/Cosign pipeline, JAR SBOM scanner, Tetragon enforcement, red-team exercise. ## Key references - [CVE-2024-21626 — Leaky Vessels](https://www.wiz.io/blog/leaky-vessels-container-escape-vulnerabilities) - [CVE-2025-31133/52565/52881 — runc Nov 2025](https://www.sysdig.com/blog/runc-container-escape-vulnerabilities) - [Camel CVEs 2025 — Akamai](https://www.akamai.com/blog/security-research/march-apache-camel-vulnerability-detections-and-mitigations) - [Maven typosquat Dec 2025 — Aikido](https://www.aikido.dev/blog/maven-central-jackson-typosquatting-malware) - [Fly.io sandboxing](https://fly.io/blog/sandboxing-and-workload-isolation/) - [gVisor production at Ant Group](https://gvisor.dev/blog/2021/12/02/running-gvisor-in-production-at-scale-in-ant/) - [Cilium tenant isolation](https://docs.cilium.io/en/stable/security/policy/kubernetes/) - [K8s Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) - [Falco rules](https://github.com/falcosecurity/rules)

claude added the security pmf epic labels 2026-04-25 09:56:57 +02:00

claude referenced this issue

2026-04-25 20:51:56 +02:00

Per-app writeable volumes for stateful tenants (read-only rootfs follow-up) #153

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cameleer/cameleer-server#152

Harden multi-tenant runtime: sandbox untrusted user JVMs (Docker + K8s) #152