feat: certificate management with stage/activate/restore lifecycle
All checks were successful
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 45s

Provider-based architecture (Docker now, K8s later):
- CertificateManager interface + DockerCertificateManager (file-based)
- Atomic swap via .wip files for safe cert replacement
- Stage -> Activate -> Archive lifecycle with one-deep rollback
- Bootstrap supports user-supplied certs via CERT_FILE/KEY_FILE/CA_FILE
- CA bundle aggregates platform + tenant CAs, distributed to containers
- Vendor UI: Certificates page with upload, activate, restore, discard
- Stale tenant tracking (ca_applied_at) with restart banner
- Conditional TLS skip removal when CA bundle exists

Includes design spec, migration V012, service + controller tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-04-10 18:29:02 +02:00
parent 51a1aef10e
commit 45bcc954ac
23 changed files with 2035 additions and 7 deletions

View File

@@ -0,0 +1,242 @@
# Certificate Management Design
## Problem
The platform currently generates a self-signed TLS certificate at bootstrap time via an Alpine init container. There is no way to supply a real certificate at bootstrap, replace it at runtime, or manage CA trust bundles for tenant enterprise SSO providers. Internal services bypass TLS verification with hardcoded flags (`CAMELEER_OIDC_TLS_SKIP_VERIFY=true`, `NODE_TLS_REJECT_UNAUTHORIZED=0`).
## Goals
1. Supply a cert+key at bootstrap time (env vars pointing to files)
2. Replace the platform TLS certificate at runtime via vendor UI
3. Manage a CA trust bundle (`ca.pem`) aggregating platform CA + tenant enterprise CAs
4. Stage certificates before activation (shadow certs)
5. Roll back to the previous certificate if activation causes issues
6. Flag tenants that need restart after CA bundle changes
7. Provider-based architecture: Docker now, K8s later
## Non-Goals
- ACME/Let's Encrypt integration (separate future work)
- Per-tenant TLS certificates (all tenants share the platform cert via Traefik)
- Client certificate authentication (mTLS)
## Architecture
### Provider Interface
```java
package net.siegeln.cameleer.saas.certificate;
public interface CertificateManager {
boolean isAvailable();
CertificateInfo getActive();
CertificateInfo getStaged();
CertificateInfo getArchived();
CertValidationResult stage(byte[] certPem, byte[] keyPem, byte[] caBundlePem);
void activate();
void restore();
void discardStaged();
void generateSelfSigned(String hostname);
byte[] getCaBundle();
}
```
Lives in `net.siegeln.cameleer.saas.certificate`. Implementation in `net.siegeln.cameleer.saas.provisioning` alongside `DockerTenantProvisioner`.
`DockerCertificateManager` writes to the Docker `certs` volume. Future `K8sCertificateManager` would manage K8s TLS Secrets + cert-manager CRDs.
### Records
```java
public record CertificateInfo(
String subject, String issuer, Instant notBefore, Instant notAfter,
boolean hasCaBundle, boolean selfSigned, String fingerprint
) {}
public record CertValidationResult(
boolean valid, List<String> errors, CertificateInfo info
) {}
```
### File Layout (Docker Volume)
```
/certs/
cert.pem <- ACTIVE platform cert (Traefik reads)
key.pem <- ACTIVE private key
ca.pem <- aggregated CA bundle (platform CA + tenant CAs)
meta.json <- bootstrap metadata for DB seeding
staged/
cert.pem <- STAGED cert
key.pem <- STAGED key
ca.pem <- STAGED CA bundle
prev/
cert.pem <- ARCHIVED (one previous)
key.pem
ca.pem
```
Atomic swap pattern: write to `*.wip`, validate, rename to final path.
### Database
```sql
-- V011__certificates.sql
CREATE TABLE certificates (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
status VARCHAR(10) NOT NULL CHECK (status IN ('ACTIVE', 'STAGED', 'ARCHIVED')),
subject VARCHAR(500),
issuer VARCHAR(500),
not_before TIMESTAMPTZ,
not_after TIMESTAMPTZ,
fingerprint VARCHAR(128),
has_ca BOOLEAN NOT NULL DEFAULT FALSE,
self_signed BOOLEAN NOT NULL DEFAULT FALSE,
uploaded_by UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
activated_at TIMESTAMPTZ,
archived_at TIMESTAMPTZ
);
```
At most 3 rows: one per status. On activate: delete ARCHIVED -> ACTIVE becomes ARCHIVED -> STAGED becomes ACTIVE.
Tenant staleness tracked via `ca_applied_at` column on `tenants` table:
```sql
-- in same migration
ALTER TABLE tenants ADD COLUMN ca_applied_at TIMESTAMPTZ;
```
Tenants with `ca_applied_at < (active cert's activated_at)` are stale.
### State Transitions
```
Upload -> STAGED -> activate -> ACTIVE -> (next activate) -> ARCHIVED
^ |
+------ restore ---------------+
```
- **Activate staged**: delete ARCHIVED row+files, ACTIVE -> ARCHIVED (move files to prev/), STAGED -> ACTIVE (move files to root)
- **Restore archived**: swap ACTIVE <-> ARCHIVED (swap files and DB statuses)
- **Discard staged**: delete STAGED row + staged/ files
### Bootstrap Flow
The `traefik-certs` init container gains env var support:
```
1. cert.pem + key.pem exist in volume?
-> Yes: skip (idempotent)
-> No: continue
2. CERT_FILE + KEY_FILE env vars set?
-> Yes: copy to volume, validate (PEM parseable, key matches cert)
If CA_FILE set, copy as ca.pem
-> No: generate self-signed (current behavior)
3. Write /certs/meta.json with subject, fingerprint, self_signed flag
```
SaaS app reads `meta.json` on startup to seed the certificates DB table if no ACTIVE row exists.
### REST API
All under `platform:admin` scope:
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/vendor/certificates` | List active, staged, archived |
| POST | `/api/vendor/certificates/stage` | Upload cert+key+ca (multipart) |
| POST | `/api/vendor/certificates/activate` | Promote staged -> active |
| POST | `/api/vendor/certificates/restore` | Swap archived <-> active |
| DELETE | `/api/vendor/certificates/staged` | Discard staged |
| GET | `/api/vendor/certificates/stale-tenants` | Tenants needing restart for CA |
### Service Layer
`CertificateService` orchestrates:
- Validation (PEM parsing, key-cert match, chain building, expiry check)
- Delegates file operations to `CertificateManager` (provider)
- Manages DB metadata
- Computes tenant CA staleness
### CA Bundle Management
`ca.pem` is a concatenation of:
- Platform cert's CA (if from a private CA, supplied at bootstrap or upload)
- Tenant-supplied CAs (for enterprise SSO with private IdPs)
On any CA change (platform cert upload with CA, tenant CA add/remove):
1. Rebuild: concatenate all CAs into `ca.wip`
2. Validate: parse all PEM entries, verify structure
3. Atomic swap: `mv ca.wip ca.pem`
4. Update `activated_at` on ACTIVE cert row
5. Flag tenants as stale
### Tenant CA Distribution
At provisioning time (`DockerTenantProvisioner`):
- Mount `certs` volume read-only at `/certs` in tenant containers
- Java servers: JVM truststore import at entrypoint or `JAVA_OPTS` with custom truststore
- Node containers: `NODE_EXTRA_CA_CERTS=/certs/ca.pem`
- Set `ca_applied_at = now()` on tenant record
- Remove TLS skip flags when `ca.pem` exists
On tenant restart (manual, after CA change):
- Container picks up current `ca.pem` from volume mount
- Update `ca_applied_at` on tenant
### Vendor UI
New "Certificates" page in vendor sidebar:
- **Active cert card**: subject, issuer, expiry, fingerprint, self-signed badge, activated date
- **Staged cert card** (conditional): same metadata + Activate / Discard buttons, validation errors if any
- **Archived cert card** (conditional): same metadata + Restore button (disabled if expired)
- **Upload area**: file inputs for cert.pem (required), key.pem (required), ca.pem (optional)
- **Stale tenants banner**: "CA bundle updated - N tenants need restart" with restart action
### React Hooks
```typescript
useVendorCertificates() // GET /vendor/certificates
useStageCertificate() // POST multipart
useActivateCertificate() // POST activate
useRestoreCertificate() // POST restore
useDiscardStaged() // DELETE staged
useStaleTenants() // GET stale-tenants
```
## File Inventory
### New Files
| File | Description |
|------|-------------|
| `src/.../certificate/CertificateManager.java` | Provider interface |
| `src/.../certificate/CertificateInfo.java` | Cert metadata record |
| `src/.../certificate/CertValidationResult.java` | Validation result record |
| `src/.../certificate/CertificateEntity.java` | JPA entity |
| `src/.../certificate/CertificateRepository.java` | Spring Data repo |
| `src/.../certificate/CertificateService.java` | Business logic |
| `src/.../certificate/CertificateController.java` | REST endpoints |
| `src/.../provisioning/DockerCertificateManager.java` | Docker volume implementation |
| `src/main/resources/db/migration/V011__certificates.sql` | Migration |
| `ui/src/api/certificate-hooks.ts` | React Query hooks |
| `ui/src/pages/vendor/CertificatesPage.tsx` | Vendor UI page |
### Modified Files
| File | Change |
|------|--------|
| `docker-compose.yml` | Add CERT_FILE/KEY_FILE/CA_FILE env vars to init container |
| `traefik.yml` | No change (already reads from /certs/) |
| `src/.../provisioning/DockerTenantProvisioner.java` | Mount certs volume, set CA env vars, remove TLS skip flags |
| `ui/src/components/Layout.tsx` | Add Certificates sidebar item |
| `ui/src/router.tsx` | Add certificates route |
| `ui/src/api/vendor-hooks.ts` | Or new file for cert hooks |