Files
cameleer-saas/docs/superpowers/specs/2026-04-10-certificate-management-design.md
hsiegeln 45bcc954ac
All checks were successful
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 45s
feat: certificate management with stage/activate/restore lifecycle
Provider-based architecture (Docker now, K8s later):
- CertificateManager interface + DockerCertificateManager (file-based)
- Atomic swap via .wip files for safe cert replacement
- Stage -> Activate -> Archive lifecycle with one-deep rollback
- Bootstrap supports user-supplied certs via CERT_FILE/KEY_FILE/CA_FILE
- CA bundle aggregates platform + tenant CAs, distributed to containers
- Vendor UI: Certificates page with upload, activate, restore, discard
- Stale tenant tracking (ca_applied_at) with restart banner
- Conditional TLS skip removal when CA bundle exists

Includes design spec, migration V012, service + controller tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:29:02 +02:00

8.6 KiB

Certificate Management Design

Problem

The platform currently generates a self-signed TLS certificate at bootstrap time via an Alpine init container. There is no way to supply a real certificate at bootstrap, replace it at runtime, or manage CA trust bundles for tenant enterprise SSO providers. Internal services bypass TLS verification with hardcoded flags (CAMELEER_OIDC_TLS_SKIP_VERIFY=true, NODE_TLS_REJECT_UNAUTHORIZED=0).

Goals

  1. Supply a cert+key at bootstrap time (env vars pointing to files)
  2. Replace the platform TLS certificate at runtime via vendor UI
  3. Manage a CA trust bundle (ca.pem) aggregating platform CA + tenant enterprise CAs
  4. Stage certificates before activation (shadow certs)
  5. Roll back to the previous certificate if activation causes issues
  6. Flag tenants that need restart after CA bundle changes
  7. Provider-based architecture: Docker now, K8s later

Non-Goals

  • ACME/Let's Encrypt integration (separate future work)
  • Per-tenant TLS certificates (all tenants share the platform cert via Traefik)
  • Client certificate authentication (mTLS)

Architecture

Provider Interface

package net.siegeln.cameleer.saas.certificate;

public interface CertificateManager {
    boolean isAvailable();

    CertificateInfo getActive();
    CertificateInfo getStaged();
    CertificateInfo getArchived();

    CertValidationResult stage(byte[] certPem, byte[] keyPem, byte[] caBundlePem);
    void activate();
    void restore();
    void discardStaged();

    void generateSelfSigned(String hostname);
    byte[] getCaBundle();
}

Lives in net.siegeln.cameleer.saas.certificate. Implementation in net.siegeln.cameleer.saas.provisioning alongside DockerTenantProvisioner.

DockerCertificateManager writes to the Docker certs volume. Future K8sCertificateManager would manage K8s TLS Secrets + cert-manager CRDs.

Records

public record CertificateInfo(
    String subject, String issuer, Instant notBefore, Instant notAfter,
    boolean hasCaBundle, boolean selfSigned, String fingerprint
) {}

public record CertValidationResult(
    boolean valid, List<String> errors, CertificateInfo info
) {}

File Layout (Docker Volume)

/certs/
  cert.pem              <- ACTIVE platform cert (Traefik reads)
  key.pem               <- ACTIVE private key
  ca.pem                <- aggregated CA bundle (platform CA + tenant CAs)
  meta.json             <- bootstrap metadata for DB seeding
  staged/
    cert.pem            <- STAGED cert
    key.pem             <- STAGED key
    ca.pem              <- STAGED CA bundle
  prev/
    cert.pem            <- ARCHIVED (one previous)
    key.pem
    ca.pem

Atomic swap pattern: write to *.wip, validate, rename to final path.

Database

-- V011__certificates.sql
CREATE TABLE certificates (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    status       VARCHAR(10) NOT NULL CHECK (status IN ('ACTIVE', 'STAGED', 'ARCHIVED')),
    subject      VARCHAR(500),
    issuer       VARCHAR(500),
    not_before   TIMESTAMPTZ,
    not_after    TIMESTAMPTZ,
    fingerprint  VARCHAR(128),
    has_ca       BOOLEAN NOT NULL DEFAULT FALSE,
    self_signed  BOOLEAN NOT NULL DEFAULT FALSE,
    uploaded_by  UUID,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    activated_at TIMESTAMPTZ,
    archived_at  TIMESTAMPTZ
);

At most 3 rows: one per status. On activate: delete ARCHIVED -> ACTIVE becomes ARCHIVED -> STAGED becomes ACTIVE.

Tenant staleness tracked via ca_applied_at column on tenants table:

-- in same migration
ALTER TABLE tenants ADD COLUMN ca_applied_at TIMESTAMPTZ;

Tenants with ca_applied_at < (active cert's activated_at) are stale.

State Transitions

Upload -> STAGED -> activate -> ACTIVE -> (next activate) -> ARCHIVED
                                  ^                              |
                                  +------ restore ---------------+
  • Activate staged: delete ARCHIVED row+files, ACTIVE -> ARCHIVED (move files to prev/), STAGED -> ACTIVE (move files to root)
  • Restore archived: swap ACTIVE <-> ARCHIVED (swap files and DB statuses)
  • Discard staged: delete STAGED row + staged/ files

Bootstrap Flow

The traefik-certs init container gains env var support:

1. cert.pem + key.pem exist in volume?
   -> Yes: skip (idempotent)
   -> No: continue

2. CERT_FILE + KEY_FILE env vars set?
   -> Yes: copy to volume, validate (PEM parseable, key matches cert)
           If CA_FILE set, copy as ca.pem
   -> No: generate self-signed (current behavior)

3. Write /certs/meta.json with subject, fingerprint, self_signed flag

SaaS app reads meta.json on startup to seed the certificates DB table if no ACTIVE row exists.

REST API

All under platform:admin scope:

Method Path Description
GET /api/vendor/certificates List active, staged, archived
POST /api/vendor/certificates/stage Upload cert+key+ca (multipart)
POST /api/vendor/certificates/activate Promote staged -> active
POST /api/vendor/certificates/restore Swap archived <-> active
DELETE /api/vendor/certificates/staged Discard staged
GET /api/vendor/certificates/stale-tenants Tenants needing restart for CA

Service Layer

CertificateService orchestrates:

  • Validation (PEM parsing, key-cert match, chain building, expiry check)
  • Delegates file operations to CertificateManager (provider)
  • Manages DB metadata
  • Computes tenant CA staleness

CA Bundle Management

ca.pem is a concatenation of:

  • Platform cert's CA (if from a private CA, supplied at bootstrap or upload)
  • Tenant-supplied CAs (for enterprise SSO with private IdPs)

On any CA change (platform cert upload with CA, tenant CA add/remove):

  1. Rebuild: concatenate all CAs into ca.wip
  2. Validate: parse all PEM entries, verify structure
  3. Atomic swap: mv ca.wip ca.pem
  4. Update activated_at on ACTIVE cert row
  5. Flag tenants as stale

Tenant CA Distribution

At provisioning time (DockerTenantProvisioner):

  • Mount certs volume read-only at /certs in tenant containers
  • Java servers: JVM truststore import at entrypoint or JAVA_OPTS with custom truststore
  • Node containers: NODE_EXTRA_CA_CERTS=/certs/ca.pem
  • Set ca_applied_at = now() on tenant record
  • Remove TLS skip flags when ca.pem exists

On tenant restart (manual, after CA change):

  • Container picks up current ca.pem from volume mount
  • Update ca_applied_at on tenant

Vendor UI

New "Certificates" page in vendor sidebar:

  • Active cert card: subject, issuer, expiry, fingerprint, self-signed badge, activated date
  • Staged cert card (conditional): same metadata + Activate / Discard buttons, validation errors if any
  • Archived cert card (conditional): same metadata + Restore button (disabled if expired)
  • Upload area: file inputs for cert.pem (required), key.pem (required), ca.pem (optional)
  • Stale tenants banner: "CA bundle updated - N tenants need restart" with restart action

React Hooks

useVendorCertificates()      // GET /vendor/certificates
useStageCertificate()        // POST multipart
useActivateCertificate()     // POST activate
useRestoreCertificate()      // POST restore
useDiscardStaged()           // DELETE staged
useStaleTenants()            // GET stale-tenants

File Inventory

New Files

File Description
src/.../certificate/CertificateManager.java Provider interface
src/.../certificate/CertificateInfo.java Cert metadata record
src/.../certificate/CertValidationResult.java Validation result record
src/.../certificate/CertificateEntity.java JPA entity
src/.../certificate/CertificateRepository.java Spring Data repo
src/.../certificate/CertificateService.java Business logic
src/.../certificate/CertificateController.java REST endpoints
src/.../provisioning/DockerCertificateManager.java Docker volume implementation
src/main/resources/db/migration/V011__certificates.sql Migration
ui/src/api/certificate-hooks.ts React Query hooks
ui/src/pages/vendor/CertificatesPage.tsx Vendor UI page

Modified Files

File Change
docker-compose.yml Add CERT_FILE/KEY_FILE/CA_FILE env vars to init container
traefik.yml No change (already reads from /certs/)
src/.../provisioning/DockerTenantProvisioner.java Mount certs volume, set CA env vars, remove TLS skip flags
ui/src/components/Layout.tsx Add Certificates sidebar item
ui/src/router.tsx Add certificates route
ui/src/api/vendor-hooks.ts Or new file for cert hooks