cameleer-server

Author	SHA1	Message	Date
hsiegeln	2e2d069530	feat(runtime): capture loader logs in failure exceptions; add LoaderHardeningIT regression guard All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 3m42s Details CI / docker (push) Successful in 2m36s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 54s Details SonarQube / sonarqube (push) Successful in 7m24s Details Two diagnostics-and-confidence follow-ups to the loader-init-container pattern. 1) DockerRuntimeOrchestrator now captures the loader's last 50 lines of stdout/stderr (capped at 4096 chars, 5s timeout) before the finally-remove and appends them to the thrown RuntimeException as `. loader output: <text>`. Best-effort: log-capture failures are swallowed and never mask the original exit. Closes the visibility gap that turned a simple "wget: Permission denied" into the opaque "Loader exited 1". 2) New LoaderHardeningIT spins up a Testcontainers nginx serving a 1KB fixture, builds the loader image fresh from cameleer-runtime-loader/, and runs it under the exact baseHardenedHostConfig() shape (cap_drop ALL, readonly rootfs, /tmp tmpfs, no-new-privileges, apparmor=docker-default, pids=512) bound to a fresh named volume RW at /app/jars. Asserts exit 0. This would have caught the volume-permission regression in CI. GenericContainer + OneShotStartupCheckStrategy is used instead of raw docker-java waitContainerCmd because docker-java's unshaded api version in this project's pom and testcontainers' shaded copy disagree on WaitContainerCmd.getCondition() — going through GenericContainer keeps the call inside testcontainers' shaded executor. Rules doc updated to point at the captured-output behaviour and the IT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 23:51:25 +02:00
hsiegeln	f772e868e6	docs: correct loader-network reachability claim; refresh HOWTO env vars All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 4m32s Details CI / docker (push) Successful in 2m55s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 55s Details Final-review must-fixes: - HOWTO.md: drop CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME; add the three new artifact env vars (loaderimage / artifacttokenttlseconds / artifactbaseurl). - DeploymentExecutor @PostConstruct WARN, handoff doc, and docker-orchestration rule no longer claim the loader uses cameleer-traefik. The loader runs on the PRIMARY Docker network only — additional networks are attached after startContainer returns, by which time the loader has exited. SaaS still works because the tenant's primary network hosts the tenant server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:13:56 +02:00
hsiegeln	cc076b1923	fix(runtime): pre-pull loader image, plug volume-leak windows, document network dep Pre-pull the loader image at PULL_IMAGE so the implicit pull on first createContainerCmd doesn't bypass the 120s loader-wait timeout. Wrap createAndStartLoader in try/catch so a create/start failure cleans up the just-created volume; same guard around createAndStartMain on phase-2 failures. Folds the wait-error message into the rethrown RuntimeException so the cause chain is visible. Add a @PostConstruct WARN when neither artifactbaseurl nor serverurl is set so the implicit cameleer-server DNS dependency is loud at boot, and document the loader-to-server reachability contract in .claude/rules/docker-orchestration.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:26:35 +02:00
hsiegeln	1ddae94930	feat(runtime): init-container loader pattern + withUsernsMode (#152 hardening close) Tasks 9+10+11 of the init-container-jar-fetch plan, landed atomically because 9 alone leaves the orchestrator+executor referencing removed ContainerRequest fields. ContainerRequest (core) drops jarPath/jarVolumeName/jarVolumeMountPath; adds appVersionId, artifactDownloadUrl, artifactExpectedSize, loaderImage. DockerRuntimeOrchestrator (app): - per-replica named volume "cameleer-jars-{containerName}" - phase 1: loader container with the volume mounted RW at /app/jars, ARTIFACT_URL + ARTIFACT_EXPECTED_SIZE env, full hardening contract - block on waitContainerCmd().awaitStatusCode(120s); on non-zero exit remove the loader, remove the volume, propagate RuntimeException so DeploymentExecutor marks the deployment FAILED. main is never created. - phase 2: main container with the same volume mounted RO at /app/jars - withUsernsMode("host:1000:65536") on BOTH containers — closes the last open hardening gap from issue #152 - main entrypoint paths point at /app/jars/app.jar - extracted baseHardenedHostConfig() so loader and main share the cap_drop / security_opt / readonly / pids / tmpfs contract - removeContainer() also removes the per-replica volume so blue/green doesn't leak volumes DeploymentExecutor (app): - injects ArtifactDownloadTokenSigner; new @Value props loaderimage, artifacttokenttlseconds, artifactbaseurl - replaces the temporary getVersion(...).jarPath() bridge with a signed URL ${artifactBaseUrl}/api/v1/artifacts/{id}?exp&sig - drops the Files.exists pre-flight check; AppVersion.jarSizeBytes is the size-of-record check now - drops jarDockerVolume / jarStoragePath @Value fields and the volume plumbing in startReplica - DeployCtx carries appVersionId / artifactUrl / artifactExpectedSize in place of jarPath Tests: - DockerRuntimeOrchestratorHardeningTest updated for the new shape; captures HostConfig on the MAIN container and asserts cap_drop ALL + no-new-privileges + apparmor + readonly + pids + tmpfs + the new withUsernsMode("host:1000:65536") - DockerRuntimeOrchestratorLoaderTest (new): verifies volume create → loader create with RW bind → loader started → awaited → loader removed → main create with RO bind → main started; verifies abort + cleanup on loader exit != 0 (loader removed, volume removed, main NEVER created); verifies userns_mode applied to both containers. Config: - application.yml replaces jardockervolume with loaderimage, artifacttokenttlseconds, artifactbaseurl Rules updated: .claude/rules/docker-orchestration.md (loader pattern, userns, no more bind-mount); .claude/rules/core-classes.md (ContainerRequest field map). Test counts after change: - cameleer-server-core: 116/116 unit tests pass - cameleer-server-app: 273/273 unit tests pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:06:56 +02:00
hsiegeln	8e9ad47077	feat(runtime): harden tenant containers + auto-detect gVisor (#152 ) Tenant JARs are arbitrary user code: Camel ships components (camel-exec, camel-bean, MVEL/Groovy templating) that turn a header into shell, and Java 17 has no SecurityManager — the JVM is not a security boundary. This applies an unconditional hardening contract to every tenant container so a single runc CVE no longer equals host takeover. DockerRuntimeOrchestrator.startContainer now sets: - cap_drop ALL (Capability.values() — docker-java has no ALL constant) - security_opt: no-new-privileges, apparmor=docker-default (default seccomp profile applies implicitly) - read_only rootfs, pids_limit=512 - /tmp tmpfs rw,nosuid,size=256m — no noexec, since Netty/Snappy/LZ4/Zstd dlopen native libs from /tmp via mmap(PROT_EXEC) which noexec blocks The orchestrator also probes `docker info` at construction and uses runsc (gVisor) automatically when the daemon has it registered. Override via cameleer.server.runtime.dockerruntime (e.g. "kata"); empty = auto. Outbound TCP, DNS, and TLS are unaffected — caps/seccomp don't gate those — so vanilla Camel-Kafka producers/consumers and REST integrations keep working unchanged. Stateful tenants (Kafka Streams with on-disk state stores, apps writing to /var/log/...) need explicit writeable volumes; that's tracked in #153 as the natural follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 20:58:26 +02:00
hsiegeln	21db92ff00	fix(traefik): make TLS cert resolver configurable, omit when unset All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m15s Details CI / docker (push) Successful in 1m3s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 42s Details Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on every router. That assumes a resolver literally named `default` exists in the Traefik static config — true for ACME-backed installs, false for dev/local installs that use a file-based TLS store. Traefik logs "Router uses a nonexistent certificate resolver" for the bogus resolver on every managed app, and any future attempt to define a differently- named real resolver would silently skip these routers. Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver` into `ResolvedContainerConfig.certResolver`. When blank the `tls.certresolver` label is omitted entirely; `tls=true` is still emitted so Traefik serves the default TLS-store cert. When set, the label is emitted with the configured resolver name. Not per-app/per-env configurable: there is one Traefik per server instance and one resolver config; app-level override would only let users break their own routers. TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank). Full unit suite 211/0/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:47 +02:00
hsiegeln	165c9f10e3	feat(deploy): externalRouting toggle to keep apps off Traefik All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m26s Details CI / docker (push) Successful in 1m5s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details Adds a boolean `externalRouting` flag (default `true`) on ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only the identity labels (`managed-by`, `cameleer.`) and skips every `traefik.` label, so the container is not published by Traefik. Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}` can still reach it via Docker DNS on whatever port the app listens on. TDD: new TraefikLabelBuilderTest covers enabled (default labels present), disabled (zero traefik.* labels), and disabled (identity labels retained) cases. Full module unit suite: 208/0/0. Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form state, Resources tab toggle, POST payload, and snapshot-to-form mapping. Rule files updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:03:48 +02:00
hsiegeln	c6aef5ab35	fix(deploy): Checkpoints — preserve STOPPED history, fix filter + placement All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m4s Details CI / docker (push) Successful in 1m15s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details - Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment. STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty. Now only FAILED rows are pruned; STOPPED deployments are retained as restorable checkpoints (they still carry deployed_config_snapshot from their RUNNING window). - UI filter: any deployment with a snapshot is a checkpoint (was RUNNING\|DEGRADED only, which excluded the main case — the previous blue/green deployment now in STOPPED). - UI placement: Checkpoints disclosure now renders inside IdentitySection, matching the design spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:26:46 +02:00
hsiegeln	007597715a	docs(rules): deployment strategies + generation suffix All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m8s Details CI / docker (push) Successful in 1m30s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 46s Details Refresh the three rules files to match the new executor behavior: - docker-orchestration.md: rewrite DeploymentExecutor Details with container naming scheme ({...}-{replica}-{generation}), strategy dispatch (blue-green vs rolling), and the new DEGRADED semantics (post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder bullets for the generation suffix + new cameleer.generation label. - app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets mirror the same. - core-classes.md: add DeploymentStrategy enum; note DEGRADED is now post-deploy-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:02:51 +02:00
hsiegeln	810f493639	chore: track .claude/rules/ and add self-maintenance instruction All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m23s Details CI / docker (push) Successful in 5m22s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 44s Details Un-ignore .claude/rules/ so path-scoped rule files are shared via git. Add instruction in CLAUDE.md to update rule files when modifying classes, controllers, endpoints, or metrics — keeps rules current as part of normal workflow rather than requiring separate maintenance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 09:26:53 +02:00

10 Commits