feat: certificate management with stage/activate/restore lifecycle
Provider-based architecture (Docker now, K8s later): - CertificateManager interface + DockerCertificateManager (file-based) - Atomic swap via .wip files for safe cert replacement - Stage -> Activate -> Archive lifecycle with one-deep rollback - Bootstrap supports user-supplied certs via CERT_FILE/KEY_FILE/CA_FILE - CA bundle aggregates platform + tenant CAs, distributed to containers - Vendor UI: Certificates page with upload, activate, restore, discard - Stale tenant tracking (ca_applied_at) with restart banner - Conditional TLS skip removal when CA bundle exists Includes design spec, migration V012, service + controller tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,242 @@
|
||||
# Certificate Management Design
|
||||
|
||||
## Problem
|
||||
|
||||
The platform currently generates a self-signed TLS certificate at bootstrap time via an Alpine init container. There is no way to supply a real certificate at bootstrap, replace it at runtime, or manage CA trust bundles for tenant enterprise SSO providers. Internal services bypass TLS verification with hardcoded flags (`CAMELEER_OIDC_TLS_SKIP_VERIFY=true`, `NODE_TLS_REJECT_UNAUTHORIZED=0`).
|
||||
|
||||
## Goals
|
||||
|
||||
1. Supply a cert+key at bootstrap time (env vars pointing to files)
|
||||
2. Replace the platform TLS certificate at runtime via vendor UI
|
||||
3. Manage a CA trust bundle (`ca.pem`) aggregating platform CA + tenant enterprise CAs
|
||||
4. Stage certificates before activation (shadow certs)
|
||||
5. Roll back to the previous certificate if activation causes issues
|
||||
6. Flag tenants that need restart after CA bundle changes
|
||||
7. Provider-based architecture: Docker now, K8s later
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- ACME/Let's Encrypt integration (separate future work)
|
||||
- Per-tenant TLS certificates (all tenants share the platform cert via Traefik)
|
||||
- Client certificate authentication (mTLS)
|
||||
|
||||
## Architecture
|
||||
|
||||
### Provider Interface
|
||||
|
||||
```java
|
||||
package net.siegeln.cameleer.saas.certificate;
|
||||
|
||||
public interface CertificateManager {
|
||||
boolean isAvailable();
|
||||
|
||||
CertificateInfo getActive();
|
||||
CertificateInfo getStaged();
|
||||
CertificateInfo getArchived();
|
||||
|
||||
CertValidationResult stage(byte[] certPem, byte[] keyPem, byte[] caBundlePem);
|
||||
void activate();
|
||||
void restore();
|
||||
void discardStaged();
|
||||
|
||||
void generateSelfSigned(String hostname);
|
||||
byte[] getCaBundle();
|
||||
}
|
||||
```
|
||||
|
||||
Lives in `net.siegeln.cameleer.saas.certificate`. Implementation in `net.siegeln.cameleer.saas.provisioning` alongside `DockerTenantProvisioner`.
|
||||
|
||||
`DockerCertificateManager` writes to the Docker `certs` volume. Future `K8sCertificateManager` would manage K8s TLS Secrets + cert-manager CRDs.
|
||||
|
||||
### Records
|
||||
|
||||
```java
|
||||
public record CertificateInfo(
|
||||
String subject, String issuer, Instant notBefore, Instant notAfter,
|
||||
boolean hasCaBundle, boolean selfSigned, String fingerprint
|
||||
) {}
|
||||
|
||||
public record CertValidationResult(
|
||||
boolean valid, List<String> errors, CertificateInfo info
|
||||
) {}
|
||||
```
|
||||
|
||||
### File Layout (Docker Volume)
|
||||
|
||||
```
|
||||
/certs/
|
||||
cert.pem <- ACTIVE platform cert (Traefik reads)
|
||||
key.pem <- ACTIVE private key
|
||||
ca.pem <- aggregated CA bundle (platform CA + tenant CAs)
|
||||
meta.json <- bootstrap metadata for DB seeding
|
||||
staged/
|
||||
cert.pem <- STAGED cert
|
||||
key.pem <- STAGED key
|
||||
ca.pem <- STAGED CA bundle
|
||||
prev/
|
||||
cert.pem <- ARCHIVED (one previous)
|
||||
key.pem
|
||||
ca.pem
|
||||
```
|
||||
|
||||
Atomic swap pattern: write to `*.wip`, validate, rename to final path.
|
||||
|
||||
### Database
|
||||
|
||||
```sql
|
||||
-- V011__certificates.sql
|
||||
CREATE TABLE certificates (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
status VARCHAR(10) NOT NULL CHECK (status IN ('ACTIVE', 'STAGED', 'ARCHIVED')),
|
||||
subject VARCHAR(500),
|
||||
issuer VARCHAR(500),
|
||||
not_before TIMESTAMPTZ,
|
||||
not_after TIMESTAMPTZ,
|
||||
fingerprint VARCHAR(128),
|
||||
has_ca BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
self_signed BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
uploaded_by UUID,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
activated_at TIMESTAMPTZ,
|
||||
archived_at TIMESTAMPTZ
|
||||
);
|
||||
```
|
||||
|
||||
At most 3 rows: one per status. On activate: delete ARCHIVED -> ACTIVE becomes ARCHIVED -> STAGED becomes ACTIVE.
|
||||
|
||||
Tenant staleness tracked via `ca_applied_at` column on `tenants` table:
|
||||
|
||||
```sql
|
||||
-- in same migration
|
||||
ALTER TABLE tenants ADD COLUMN ca_applied_at TIMESTAMPTZ;
|
||||
```
|
||||
|
||||
Tenants with `ca_applied_at < (active cert's activated_at)` are stale.
|
||||
|
||||
### State Transitions
|
||||
|
||||
```
|
||||
Upload -> STAGED -> activate -> ACTIVE -> (next activate) -> ARCHIVED
|
||||
^ |
|
||||
+------ restore ---------------+
|
||||
```
|
||||
|
||||
- **Activate staged**: delete ARCHIVED row+files, ACTIVE -> ARCHIVED (move files to prev/), STAGED -> ACTIVE (move files to root)
|
||||
- **Restore archived**: swap ACTIVE <-> ARCHIVED (swap files and DB statuses)
|
||||
- **Discard staged**: delete STAGED row + staged/ files
|
||||
|
||||
### Bootstrap Flow
|
||||
|
||||
The `traefik-certs` init container gains env var support:
|
||||
|
||||
```
|
||||
1. cert.pem + key.pem exist in volume?
|
||||
-> Yes: skip (idempotent)
|
||||
-> No: continue
|
||||
|
||||
2. CERT_FILE + KEY_FILE env vars set?
|
||||
-> Yes: copy to volume, validate (PEM parseable, key matches cert)
|
||||
If CA_FILE set, copy as ca.pem
|
||||
-> No: generate self-signed (current behavior)
|
||||
|
||||
3. Write /certs/meta.json with subject, fingerprint, self_signed flag
|
||||
```
|
||||
|
||||
SaaS app reads `meta.json` on startup to seed the certificates DB table if no ACTIVE row exists.
|
||||
|
||||
### REST API
|
||||
|
||||
All under `platform:admin` scope:
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/vendor/certificates` | List active, staged, archived |
|
||||
| POST | `/api/vendor/certificates/stage` | Upload cert+key+ca (multipart) |
|
||||
| POST | `/api/vendor/certificates/activate` | Promote staged -> active |
|
||||
| POST | `/api/vendor/certificates/restore` | Swap archived <-> active |
|
||||
| DELETE | `/api/vendor/certificates/staged` | Discard staged |
|
||||
| GET | `/api/vendor/certificates/stale-tenants` | Tenants needing restart for CA |
|
||||
|
||||
### Service Layer
|
||||
|
||||
`CertificateService` orchestrates:
|
||||
- Validation (PEM parsing, key-cert match, chain building, expiry check)
|
||||
- Delegates file operations to `CertificateManager` (provider)
|
||||
- Manages DB metadata
|
||||
- Computes tenant CA staleness
|
||||
|
||||
### CA Bundle Management
|
||||
|
||||
`ca.pem` is a concatenation of:
|
||||
- Platform cert's CA (if from a private CA, supplied at bootstrap or upload)
|
||||
- Tenant-supplied CAs (for enterprise SSO with private IdPs)
|
||||
|
||||
On any CA change (platform cert upload with CA, tenant CA add/remove):
|
||||
1. Rebuild: concatenate all CAs into `ca.wip`
|
||||
2. Validate: parse all PEM entries, verify structure
|
||||
3. Atomic swap: `mv ca.wip ca.pem`
|
||||
4. Update `activated_at` on ACTIVE cert row
|
||||
5. Flag tenants as stale
|
||||
|
||||
### Tenant CA Distribution
|
||||
|
||||
At provisioning time (`DockerTenantProvisioner`):
|
||||
- Mount `certs` volume read-only at `/certs` in tenant containers
|
||||
- Java servers: JVM truststore import at entrypoint or `JAVA_OPTS` with custom truststore
|
||||
- Node containers: `NODE_EXTRA_CA_CERTS=/certs/ca.pem`
|
||||
- Set `ca_applied_at = now()` on tenant record
|
||||
- Remove TLS skip flags when `ca.pem` exists
|
||||
|
||||
On tenant restart (manual, after CA change):
|
||||
- Container picks up current `ca.pem` from volume mount
|
||||
- Update `ca_applied_at` on tenant
|
||||
|
||||
### Vendor UI
|
||||
|
||||
New "Certificates" page in vendor sidebar:
|
||||
|
||||
- **Active cert card**: subject, issuer, expiry, fingerprint, self-signed badge, activated date
|
||||
- **Staged cert card** (conditional): same metadata + Activate / Discard buttons, validation errors if any
|
||||
- **Archived cert card** (conditional): same metadata + Restore button (disabled if expired)
|
||||
- **Upload area**: file inputs for cert.pem (required), key.pem (required), ca.pem (optional)
|
||||
- **Stale tenants banner**: "CA bundle updated - N tenants need restart" with restart action
|
||||
|
||||
### React Hooks
|
||||
|
||||
```typescript
|
||||
useVendorCertificates() // GET /vendor/certificates
|
||||
useStageCertificate() // POST multipart
|
||||
useActivateCertificate() // POST activate
|
||||
useRestoreCertificate() // POST restore
|
||||
useDiscardStaged() // DELETE staged
|
||||
useStaleTenants() // GET stale-tenants
|
||||
```
|
||||
|
||||
## File Inventory
|
||||
|
||||
### New Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `src/.../certificate/CertificateManager.java` | Provider interface |
|
||||
| `src/.../certificate/CertificateInfo.java` | Cert metadata record |
|
||||
| `src/.../certificate/CertValidationResult.java` | Validation result record |
|
||||
| `src/.../certificate/CertificateEntity.java` | JPA entity |
|
||||
| `src/.../certificate/CertificateRepository.java` | Spring Data repo |
|
||||
| `src/.../certificate/CertificateService.java` | Business logic |
|
||||
| `src/.../certificate/CertificateController.java` | REST endpoints |
|
||||
| `src/.../provisioning/DockerCertificateManager.java` | Docker volume implementation |
|
||||
| `src/main/resources/db/migration/V011__certificates.sql` | Migration |
|
||||
| `ui/src/api/certificate-hooks.ts` | React Query hooks |
|
||||
| `ui/src/pages/vendor/CertificatesPage.tsx` | Vendor UI page |
|
||||
|
||||
### Modified Files
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `docker-compose.yml` | Add CERT_FILE/KEY_FILE/CA_FILE env vars to init container |
|
||||
| `traefik.yml` | No change (already reads from /certs/) |
|
||||
| `src/.../provisioning/DockerTenantProvisioner.java` | Mount certs volume, set CA env vars, remove TLS skip flags |
|
||||
| `ui/src/components/Layout.tsx` | Add Certificates sidebar item |
|
||||
| `ui/src/router.tsx` | Add certificates route |
|
||||
| `ui/src/api/vendor-hooks.ts` | Or new file for cert hooks |
|
||||
Reference in New Issue
Block a user