One-line FROM swap from eclipse-temurin:21-jre-alpine to cgr.dev/chainguard/jre:openjdk-21 plus deletion of the dead ENTRYPOINT. Wins: glibc (fixes hidden Netty/Snappy/JNI compatibility risk on musl), daily rebuilds, signed images + SBOM, near-zero baseline CVEs by design. No cameleer-server orchestrator change required; runtime contract unchanged. Distroless and jlink/scratch covered as optional/not-recommended follow-ups with rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
157 lines
10 KiB
Markdown
157 lines
10 KiB
Markdown
# Handoff — Runtime Base Image Hardening (cameleer-saas)
|
|
|
|
Audience: cameleer-saas team.
|
|
Owner repo to change: `cameleer-saas` (`docker/runtime-base/`).
|
|
Owner repo of this handoff: `cameleer-server` (multi-tenant orchestration consumer of the image).
|
|
|
|
## TL;DR
|
|
|
|
Replace `eclipse-temurin:21-jre-alpine` with **`cgr.dev/chainguard/jre:openjdk-21`** (Chainguard, Wolfi-based, glibc) and remove the dead `ENTRYPOINT` from `docker/runtime-base/Dockerfile`. One-line `FROM` change plus deletion. Pin by digest in production. Net effect:
|
|
|
|
- **Smaller CVE surface** — Chainguard rebuilds daily; baseline CVE count is near zero by design, with signed images and SBOMs.
|
|
- **glibc instead of musl** — fixes a hidden compatibility risk (Netty tcnative, Snappy, LZ4, Zstd, RocksDB, oshi, JNA-using libs are glibc-only and fail at load on Alpine/musl). Tenant apps haven't tripped this yet only because no one's tried.
|
|
- **Same operational shape** — non-root, has `sh` (Wolfi/busybox), works with the existing `DeploymentExecutor` `sh -c` entrypoint construction. No orchestrator change required.
|
|
|
|
## Why now
|
|
|
|
`cameleer-server`'s `DockerRuntimeOrchestrator` enforces a hardening contract for tenant containers (`cap_drop ALL`, `no-new-privileges`, `apparmor=docker-default`, `readonly` rootfs, per-container `/tmp` tmpfs `nosuid` 256 MB, `pids_limit=512`, `userns_mode=host:1000:65536`). The base image is the one piece *outside* that contract — and is the largest source of CVEs in a tenant container's attack surface today. Switching the base is the highest-leverage remaining hardening move.
|
|
|
|
## Current state
|
|
|
|
`cameleer-saas/docker/runtime-base/Dockerfile`:
|
|
|
|
```dockerfile
|
|
FROM eclipse-temurin:21-jre-alpine
|
|
WORKDIR /app
|
|
|
|
COPY agent.jar /app/agent.jar
|
|
COPY cameleer-log-appender.jar /app/cameleer-log-appender.jar
|
|
|
|
ENTRYPOINT exec java \
|
|
-Dcameleer.export.type=${CAMELEER_EXPORT_TYPE:-HTTP} \
|
|
-Dcameleer.export.endpoint=${CAMELEER_SERVER_URL} \
|
|
-Dcameleer.agent.name=${HOSTNAME} \
|
|
-Dcameleer.agent.application=${CAMELEER_APPLICATION_ID:-default} \
|
|
-Dcameleer.agent.environment=${CAMELEER_ENVIRONMENT_ID:-default} \
|
|
-Dcameleer.routeControl.enabled=${CAMELEER_ROUTE_CONTROL_ENABLED:-false} \
|
|
-Dcameleer.replay.enabled=${CAMELEER_REPLAY_ENABLED:-false} \
|
|
-Dcameleer.health.enabled=true \
|
|
-Dcameleer.health.port=9464 \
|
|
-javaagent:/app/agent.jar \
|
|
-jar /app/app.jar
|
|
```
|
|
|
|
Two issues, addressed together:
|
|
|
|
1. **Base = `eclipse-temurin:21-jre-alpine`** — Alpine + musl. Already small, already non-root, but musl breaks any tenant pulling glibc-only JNI. Daily CVE refresh is on Eclipse's release cadence (slower).
|
|
2. **`ENTRYPOINT` is dead code.** `cameleer-server`'s `DeploymentExecutor` constructs its own per-runtime-type entrypoint at deploy time and passes it to `createContainerCmd().withCmd("sh", "-c", entrypoint)`, overriding whatever the base sets. The path `/app/app.jar` referenced here is also stale — actual deploys mount `/app/jars/app.jar` via the per-replica named volume populated by `cameleer-runtime-loader`. Keeping the dead `ENTRYPOINT` invites future maintainers to "fix" the wrong layer.
|
|
|
|
## Target state
|
|
|
|
```dockerfile
|
|
# Wolfi-based JRE, glibc, daily-rebuilt with near-zero baseline CVEs,
|
|
# signed images + SBOM published, non-root by default. Pin by digest in
|
|
# production overlays — see "Pinning" below.
|
|
FROM cgr.dev/chainguard/jre:openjdk-21
|
|
|
|
WORKDIR /app
|
|
|
|
# Agent + log appender are baked in; tenant JAR is delivered at deploy
|
|
# time by cameleer-runtime-loader into the RO-mounted /app/jars volume.
|
|
COPY agent.jar /app/agent.jar
|
|
COPY cameleer-log-appender.jar /app/cameleer-log-appender.jar
|
|
|
|
# No ENTRYPOINT here. cameleer-server's DeploymentExecutor builds the
|
|
# per-runtime-type entrypoint (spring-boot/quarkus: -jar; plain-java:
|
|
# -cp + main; native: exec) and overrides via withCmd("sh","-c",...).
|
|
# Setting one here only creates drift between this image and the actual
|
|
# runtime command.
|
|
```
|
|
|
|
That's it. No multi-stage needed, no extra packages.
|
|
|
|
## Pinning (production)
|
|
|
|
Tag references (`:openjdk-21`) move when Chainguard rebuilds. That's the point — you get CVE refresh — but for reproducible deploys, pin by digest in the production CI run:
|
|
|
|
```dockerfile
|
|
FROM cgr.dev/chainguard/jre:openjdk-21@sha256:<digest>
|
|
```
|
|
|
|
Resolve the current digest at build time:
|
|
|
|
```bash
|
|
crane digest cgr.dev/chainguard/jre:openjdk-21
|
|
# or:
|
|
docker buildx imagetools inspect cgr.dev/chainguard/jre:openjdk-21 \
|
|
--format '{{json .Manifest.Digest}}'
|
|
```
|
|
|
|
Bump the pin on a regular cadence (monthly, or when a Chainguard advisory lands). The CI workflow that builds `cameleer-runtime-base` is the natural home for the bump — keep it as a deliberate commit so reviewers see the upgrade.
|
|
|
|
## Verification
|
|
|
|
Before merging in `cameleer-saas`:
|
|
|
|
1. **Build smoke:**
|
|
```bash
|
|
docker build -t cameleer-runtime-base:test docker/runtime-base/
|
|
docker run --rm cameleer-runtime-base:test java -version
|
|
docker run --rm cameleer-runtime-base:test sh -c 'id'
|
|
```
|
|
Expect Java 21 banner, non-root id (`uid=65532` or similar — Chainguard's default `nonroot` user).
|
|
|
|
2. **End-to-end deploy through `cameleer-server`:**
|
|
- Build the new `cameleer-runtime-base` image.
|
|
- Push to the dev registry.
|
|
- Trigger a deployment of any tenant Spring Boot app via the cameleer-server UI / API.
|
|
- Watch the deploy progress through `PRE_FLIGHT → PULL_IMAGE → CREATE_NETWORK → START_REPLICAS → HEALTH_CHECK → SWAP_TRAFFIC → COMPLETE`.
|
|
- Confirm: container starts, `/api/v1/health` returns UP on the tenant, agent registers and heartbeats appear in cameleer-server logs.
|
|
|
|
3. **Negative test (compatibility win, optional but recommended):**
|
|
- Build a tiny Camel app that uses `camel-netty` (which bundles `netty-tcnative-boringssl-static`).
|
|
- Deploy it on the new base. With Alpine/musl this fails at native lib load; on Chainguard it should start clean.
|
|
- This is the test that demonstrates the *real* user-visible win, not just the CVE numbers.
|
|
|
|
4. **Rollback plan:** revert the `Dockerfile` change, rebuild + push, retag deployments. The runtime contract on the cameleer-server side is unchanged — no migration, no data shape change, no orchestrator behaviour change. Failure at the base layer is reversible at the same speed as a normal deploy.
|
|
|
|
## What you're NOT changing
|
|
|
|
- **`cameleer-server` orchestrator code** — no changes. The runtime base is opaque to it; only env vars, entrypoint construction, and the loader-volume mount matter, and none of those depend on the base.
|
|
- **`cameleer-runtime-loader` image** — separate, already minimal (`busybox:1.37-musl`, ~2.6 MB, runs only at deploy time, exits 0 on success). Loader runs `wget` once and is gone before the main container starts. Don't bundle it with the base.
|
|
- **Hardening contract** — orchestrator-side, unchanged. `cap_drop ALL`, readonly rootfs, `/tmp` tmpfs, etc. continue to apply on top of whatever base image is used.
|
|
|
|
## Optional follow-ups (NOT required for this handoff)
|
|
|
|
These are deeper investments worth tracking but don't block the Chainguard switch:
|
|
|
|
1. **Distroless** — `gcr.io/distroless/java21-debian12:nonroot` is even smaller (~200 MB) and has the smallest attack surface of any pre-built option (no shell, no package manager). Adopting it requires `cameleer-server`'s `DeploymentExecutor` to refactor its entrypoint construction from `withCmd("sh","-c", "<string>")` to a JSON-array form (`withCmd("java","-javaagent:/app/agent.jar","-jar","/app/jars/app.jar", ...)`). That's a small but non-trivial change because the orchestrator currently splices `customArgs` (freeform string) into the shell command — doable safely with a tokeniser, but worth discussing as its own ticket. Trade-off: lose `docker exec -it sh` for live debugging.
|
|
|
|
2. **jlink-based custom JRE** — explicitly *not recommended* for this base. `jlink` works when you control the app's JDK module set; tenant apps can use any standard module (AWT, JFR, sun.misc.Unsafe, etc.). A custom JRE base would silently break tenant code on JDK upgrades. Keep `jlink` for single-purpose images, not multi-tenant runtime bases.
|
|
|
|
3. **From scratch** — same reasoning as #2 plus you take on the burden of glibc + libfontconfig + libfreetype + every CA bundle update. Maintenance cost dwarfs the CVE win.
|
|
|
|
## Cross-checks for the SaaS team
|
|
|
|
- `cameleer-saas/.gitea/workflows/ci.yml` "Build and push runtime base image" step: no change needed. Same `docker buildx build --push docker/runtime-base/` invocation works against the new `FROM`.
|
|
- `cameleer-saas/docker/runtime-base/agent.jar` and `cameleer-log-appender.jar` are still pulled from the gitea Maven registry by the CI step. Unchanged.
|
|
- The `runtime-base:latest` tag consumers (cameleer-server's `CAMELEER_SERVER_RUNTIME_BASEIMAGE` env on tenant servers) keep pointing at the same logical image. Tenant servers pick up the new base on their next deploy because `pullImage()` runs at PRE_FLIGHT.
|
|
|
|
## Sign-off checklist for the implementing engineer
|
|
|
|
- [ ] `FROM` swapped to `cgr.dev/chainguard/jre:openjdk-21`.
|
|
- [ ] Dead `ENTRYPOINT` block deleted.
|
|
- [ ] Production overlay pins by digest.
|
|
- [ ] Local `docker build` smoke green.
|
|
- [ ] One end-to-end tenant deploy through `cameleer-server` green (deploy reaches RUNNING, agent registers, healthcheck UP).
|
|
- [ ] Optional: Netty-tcnative tenant smoke shows the glibc compatibility win.
|
|
- [ ] CI registry cleanup loop already covers `cameleer-runtime-base` — confirm tag retention isn't disrupted (no change expected, but check).
|
|
|
|
## Pointers
|
|
|
|
- `cameleer-server/.claude/rules/docker-orchestration.md` — the hardening contract on the cameleer-server side that this base sits underneath.
|
|
- `cameleer-server/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DockerRuntimeOrchestrator.java` — `baseHardenedHostConfig()` is the spec; the base image runs *inside* this contract.
|
|
- `cameleer-server/cameleer-server-app/src/main/java/com/cameleer/server/app/runtime/DeploymentExecutor.java` — entrypoint construction logic that overrides whatever `ENTRYPOINT` the base sets.
|
|
- Chainguard catalog: <https://images.chainguard.dev/directory/image/jre/versions>
|
|
- Chainguard image security model: <https://www.chainguard.dev/unchained/the-chainguard-images-security-model>
|