Secret delivery option 3: Application-level encryption at rest + runtime decryption #131

Open
opened 2026-04-15 00:32:04 +02:00 by claude · 0 comments
Owner

Parent epic: #129

Overview

Encrypt secret values (AES-256-GCM) in the application layer before storing to PostgreSQL. Decrypt at deploy time in DeploymentExecutor.buildEnvVars() before passing to Docker containers. This matches exactly what Heroku, Render, and Fly.io do.


Current State

Secrets flow as plaintext today:

  1. Admin sets customEnvVars via API → stored as plaintext JSONB in apps.container_config
  2. Snapshotted as plaintext in deployments.resolved_config after ConfigMerger
  3. DeploymentExecutor.buildEnvVars() reads and passes to Docker as env vars

Existing key material: JWT secret (HMAC-SHA256), Ed25519 key (derived from JWT secret via HMAC). A new, independent encryption key is needed.


Encryption Approach Comparison

Approach Protects DB dump Protects vs DBA Protects vs SQLi Key in SQL logs Needs PG extension Complexity
Application-level AES-256-GCM Yes Yes Yes No No Medium
pgcrypto Yes No No Yes Yes Low
TDE (Percona pg_tde) Yes No No N/A Yes High (ops)
Full-disk encryption (LUKS) Yes (physical) No No N/A No Low (ops)
Hybrid: app-level + FDE Yes Yes Yes No No Medium

Recommendation: Application-level encryption + infrastructure FDE as defense-in-depth. pgcrypto is unsuitable (key appears in SQL logs). TDE only protects disk-level theft.


Java Library Comparison

Library Recommendation Notes
Google Tink Recommended Misuse-resistant API, built-in key rotation, AEAD first-class, envelope encryption primitives. No cloud KMS required (local keysets).
JCA/JCE (built-in) Good No dependency; Cipher.getInstance("AES/GCM/NoPadding"). Manual IV management, no key rotation.
Bouncy Castle Good More algorithms, larger dependency. Overkill for AES-256-GCM.
Spring Security Crypto Acceptable Encryptors.stronger() wraps JCA. Limited API.

Key Management

Where to Store the Master Key (KEK)

Option Security Ops Complexity Self-Hosted? SaaS?
Environment variable Medium Low Yes Yes
File on tmpfs / K8s Secret Medium-High Low Yes Yes
HashiCorp Vault Transit High High Possible Good
Cloud KMS High Medium No Cloud only
Derived from JWT secret Not recommended None

Recommendation: CAMELEER_SERVER_SECURITY_ENCRYPTIONKEY as env var, backed by K8s Secret (tmpfs). This is exactly what Heroku/Render/Fly.io do. The "circular problem" is a well-understood trade-off — the encryption key protects against a different threat vector (DB dumps, backups, SQLi) than env var exposure.

Why NOT Derive from JWT Secret

The Ed25519 key is already derived from JWT secret. While HMAC-SHA256 with distinct context strings is cryptographically sound (NIST SP 800-108), compromising the JWT secret would expose all derived keys simultaneously. Key independence is worth one extra env var.

Envelope Encryption Pattern (DEK/KEK)

KEK (env var / K8s secret)
  ├── wraps DEK-1 (per-app, stored encrypted in apps table)
  │     └── encrypts app-1 customEnvVars
  └── wraps DEK-2 (per-app)
        └── encrypts app-2 customEnvVars

Benefits: KEK rotation doesn't require re-encrypting all data (only re-wrap DEKs). Per-app DEKs provide tenant isolation. In SaaS mode, per-tenant KEK is automatic (separate server env per tenant).

Key Rotation Strategy

  1. Generate new KEK (v2), keep old KEK (v1)
  2. On read: try KEK-v2 first, fall back to KEK-v1
  3. Background job re-wraps all DEKs with KEK-v2
  4. Remove KEK-v1 from configuration

Store kek_version tag alongside each encrypted DEK.


Threat Model

What This Defends Against

Threat Protected? Detail
Database backup stolen Yes #1 practical threat — attacker gets ciphertext only
SQL injection reading config Yes Query returns encrypted JSONB
Unauthorized DBA access Yes DBA sees ciphertext in pg_dump
PG log exposure Yes Key never appears in SQL (unlike pgcrypto)
Multi-tenant DB access Yes Encrypted per-tenant, even if schema isolation is bypassed

What This Does NOT Defend Against

Threat Detail
Server memory dump Decrypted values exist in JVM heap during deployment
Full server compromise (root) Attacker reads env var, decrypts everything
Docker host compromise docker inspect shows env vars
Malicious admin with API access Can trigger deployment, observe container env vars

This is expected and acceptable. Encryption at rest is defense-in-depth for the most common vectors (DB compromise, backup theft). Full server compromise requires different mitigations (network segmentation, IDS).


Compliance Value

Framework Requirement Satisfied?
SOC2 (CC6.1) Logical access security, encryption at rest Yes
GDPR (Art. 32) "Encryption of personal data" as appropriate measure Yes
PCI-DSS (Req. 3.4) Render sensitive data unreadable in storage Yes
NIST SP 800-57 AES-256, key separation, rotation Yes

What SaaS Platforms Do (State of the Art)

Platform Encryption at Rest Key Management Delivery
Heroku AES-256 Platform-managed, BYOK for Private Spaces Env vars
Render AES-128+ Platform-managed Env vars
Fly.io Encrypted vault (API can only encrypt, not decrypt) Split architecture Env vars
Railway Not documented Not documented Env vars
Kubernetes etcd encryption (AES-CBC/GCM/KMS) Configurable Files or env vars

Every major platform encrypts at rest and delivers as env vars. None use sidecar decryptors for standard config vars.


Implementation Plan

Storage Format

Encrypt only the values of customEnvVars, not the keys. JSONB remains queryable by key name:

{
  "DB_PASSWORD": "ENC:v1:base64ciphertext...",
  "API_KEY": "ENC:v1:base64ciphertext...",
  "APP_NAME": "my-app"
}

The ENC:v1: prefix marks encrypted values. Plaintext values (non-secret config) remain as-is.

Steps

  1. Add CAMELEER_SERVER_SECURITY_ENCRYPTIONKEY env var (256-bit, Base64, generated at provisioning)
  2. Add Google Tink dependency to server-core POM
  3. Create SecretEncryptor service (encrypt/decrypt Map<String, String>)
  4. Modify PostgresAppRepository.updateContainerConfig() to encrypt customEnvVars values
  5. Modify PostgresAppRepository.mapRow() to decrypt on read
  6. Same for PostgresEnvironmentRepository (environment default config)
  7. Startup migration: detect plaintext values (no ENC: prefix), encrypt, write back. Idempotent.
  8. Decrypt in DeploymentExecutor.buildEnvVars() before passing to Docker
  9. Strip secrets from deployments.resolved_config (or encrypt there too)
  10. API responses: return decrypted values to ADMIN/OPERATOR, masked to VIEWER

Performance Impact

Operation Overhead Frequency Impact
Encrypt on save ~0.1ms App config update (rare) Negligible
Decrypt on read ~0.1ms Deployment, config page Negligible
Migration One-time, seconds Once Negligible

AES-256-GCM throughput is >1 GB/s. Config vars are typically <1 KB.

Searchability Impact

  • Query by key name: Still works (keys are plaintext)
  • Query by value content: Lost (values are ciphertext)
  • pg_dump readable secrets: Fixed

Backup/Restore

  • Restore to same server: works (same KEK)
  • Restore to different server: must migrate KEK alongside backup
  • KEK lost: all encrypted secrets irrecoverable — document in DR runbook

Recommendation

Verdict: ½ (4.5/5)

Criterion Rating Notes
Security improvement 4/5 Eliminates #1 practical risk (DB/backup exposure)
Implementation complexity 4/5 2-3 days, clean migration, minimal code changes
Operational overhead 4/5 One additional env var. KEK backup is main new concern
Compliance value 5/5 Directly satisfies SOC2, GDPR, PCI-DSS
Industry alignment 5/5 Matches exactly what Heroku, Render, Fly.io do

This should be implemented regardless of which delivery mechanism is chosen. It's a complementary layer that protects the data at rest — the delivery mechanism (env vars, file mount, callback) is a separate concern.

What NOT to Do

  • Do not use pgcrypto — key appears in SQL logs
  • Do not derive encryption key from JWT secret — maintain key independence
  • Do not implement sidecar decryptors — overengineered
  • Do not encrypt non-secret config (memoryLimitMb, etc.) — only customEnvVars

Sources

Parent epic: #129 ## Overview Encrypt secret values (AES-256-GCM) in the application layer before storing to PostgreSQL. Decrypt at deploy time in `DeploymentExecutor.buildEnvVars()` before passing to Docker containers. This matches exactly what Heroku, Render, and Fly.io do. --- ## Current State Secrets flow as plaintext today: 1. Admin sets `customEnvVars` via API → stored as **plaintext JSONB** in `apps.container_config` 2. Snapshotted as plaintext in `deployments.resolved_config` after `ConfigMerger` 3. `DeploymentExecutor.buildEnvVars()` reads and passes to Docker as env vars **Existing key material:** JWT secret (HMAC-SHA256), Ed25519 key (derived from JWT secret via HMAC). A new, independent encryption key is needed. --- ## Encryption Approach Comparison | Approach | Protects DB dump | Protects vs DBA | Protects vs SQLi | Key in SQL logs | Needs PG extension | Complexity | |----------|:---:|:---:|:---:|:---:|:---:|:---:| | **Application-level AES-256-GCM** | Yes | Yes | Yes | No | No | Medium | | **pgcrypto** | Yes | No | No | **Yes** | Yes | Low | | **TDE (Percona pg_tde)** | Yes | No | No | N/A | Yes | High (ops) | | **Full-disk encryption (LUKS)** | Yes (physical) | No | No | N/A | No | Low (ops) | | **Hybrid: app-level + FDE** | Yes | Yes | Yes | No | No | Medium | **Recommendation: Application-level encryption + infrastructure FDE as defense-in-depth.** pgcrypto is unsuitable (key appears in SQL logs). TDE only protects disk-level theft. --- ## Java Library Comparison | Library | Recommendation | Notes | |---------|:---:|-------| | **Google Tink** | **Recommended** | Misuse-resistant API, built-in key rotation, AEAD first-class, envelope encryption primitives. No cloud KMS required (local keysets). | | **JCA/JCE (built-in)** | Good | No dependency; `Cipher.getInstance("AES/GCM/NoPadding")`. Manual IV management, no key rotation. | | **Bouncy Castle** | Good | More algorithms, larger dependency. Overkill for AES-256-GCM. | | **Spring Security Crypto** | Acceptable | `Encryptors.stronger()` wraps JCA. Limited API. | --- ## Key Management ### Where to Store the Master Key (KEK) | Option | Security | Ops Complexity | Self-Hosted? | SaaS? | |--------|----------|:---:|:---:|:---:| | **Environment variable** | Medium | Low | Yes | Yes | | **File on tmpfs / K8s Secret** | Medium-High | Low | Yes | Yes | | **HashiCorp Vault Transit** | High | High | Possible | Good | | **Cloud KMS** | High | Medium | No | Cloud only | | **Derived from JWT secret** | **Not recommended** | None | — | — | **Recommendation:** `CAMELEER_SERVER_SECURITY_ENCRYPTIONKEY` as env var, backed by K8s Secret (tmpfs). This is exactly what Heroku/Render/Fly.io do. The "circular problem" is a well-understood trade-off — the encryption key protects against a **different threat vector** (DB dumps, backups, SQLi) than env var exposure. ### Why NOT Derive from JWT Secret The Ed25519 key is already derived from JWT secret. While HMAC-SHA256 with distinct context strings is cryptographically sound (NIST SP 800-108), compromising the JWT secret would expose **all** derived keys simultaneously. Key independence is worth one extra env var. ### Envelope Encryption Pattern (DEK/KEK) ``` KEK (env var / K8s secret) ├── wraps DEK-1 (per-app, stored encrypted in apps table) │ └── encrypts app-1 customEnvVars └── wraps DEK-2 (per-app) └── encrypts app-2 customEnvVars ``` Benefits: KEK rotation doesn't require re-encrypting all data (only re-wrap DEKs). Per-app DEKs provide tenant isolation. In SaaS mode, per-tenant KEK is automatic (separate server env per tenant). ### Key Rotation Strategy 1. Generate new KEK (v2), keep old KEK (v1) 2. On read: try KEK-v2 first, fall back to KEK-v1 3. Background job re-wraps all DEKs with KEK-v2 4. Remove KEK-v1 from configuration Store `kek_version` tag alongside each encrypted DEK. --- ## Threat Model ### What This Defends Against | Threat | Protected? | Detail | |--------|:---:|--------| | Database backup stolen | **Yes** | #1 practical threat — attacker gets ciphertext only | | SQL injection reading config | **Yes** | Query returns encrypted JSONB | | Unauthorized DBA access | **Yes** | DBA sees ciphertext in pg_dump | | PG log exposure | **Yes** | Key never appears in SQL (unlike pgcrypto) | | Multi-tenant DB access | **Yes** | Encrypted per-tenant, even if schema isolation is bypassed | ### What This Does NOT Defend Against | Threat | Detail | |--------|--------| | Server memory dump | Decrypted values exist in JVM heap during deployment | | Full server compromise (root) | Attacker reads env var, decrypts everything | | Docker host compromise | `docker inspect` shows env vars | | Malicious admin with API access | Can trigger deployment, observe container env vars | **This is expected and acceptable.** Encryption at rest is defense-in-depth for the most common vectors (DB compromise, backup theft). Full server compromise requires different mitigations (network segmentation, IDS). --- ## Compliance Value | Framework | Requirement | Satisfied? | |-----------|------------|:---:| | **SOC2 (CC6.1)** | Logical access security, encryption at rest | **Yes** | | **GDPR (Art. 32)** | "Encryption of personal data" as appropriate measure | **Yes** | | **PCI-DSS (Req. 3.4)** | Render sensitive data unreadable in storage | **Yes** | | **NIST SP 800-57** | AES-256, key separation, rotation | **Yes** | --- ## What SaaS Platforms Do (State of the Art) | Platform | Encryption at Rest | Key Management | Delivery | |----------|:---:|----------------|----------| | **Heroku** | AES-256 | Platform-managed, BYOK for Private Spaces | Env vars | | **Render** | AES-128+ | Platform-managed | Env vars | | **Fly.io** | Encrypted vault (API can only encrypt, not decrypt) | Split architecture | Env vars | | **Railway** | Not documented | Not documented | Env vars | | **Kubernetes** | etcd encryption (AES-CBC/GCM/KMS) | Configurable | Files or env vars | **Every major platform encrypts at rest and delivers as env vars.** None use sidecar decryptors for standard config vars. --- ## Implementation Plan ### Storage Format Encrypt only the **values** of `customEnvVars`, not the keys. JSONB remains queryable by key name: ```json { "DB_PASSWORD": "ENC:v1:base64ciphertext...", "API_KEY": "ENC:v1:base64ciphertext...", "APP_NAME": "my-app" } ``` The `ENC:v1:` prefix marks encrypted values. Plaintext values (non-secret config) remain as-is. ### Steps 1. Add `CAMELEER_SERVER_SECURITY_ENCRYPTIONKEY` env var (256-bit, Base64, generated at provisioning) 2. Add Google Tink dependency to `server-core` POM 3. Create `SecretEncryptor` service (encrypt/decrypt `Map<String, String>`) 4. Modify `PostgresAppRepository.updateContainerConfig()` to encrypt `customEnvVars` values 5. Modify `PostgresAppRepository.mapRow()` to decrypt on read 6. Same for `PostgresEnvironmentRepository` (environment default config) 7. Startup migration: detect plaintext values (no `ENC:` prefix), encrypt, write back. Idempotent. 8. Decrypt in `DeploymentExecutor.buildEnvVars()` before passing to Docker 9. Strip secrets from `deployments.resolved_config` (or encrypt there too) 10. API responses: return decrypted values to ADMIN/OPERATOR, masked to VIEWER ### Performance Impact | Operation | Overhead | Frequency | Impact | |-----------|----------|-----------|--------| | Encrypt on save | ~0.1ms | App config update (rare) | **Negligible** | | Decrypt on read | ~0.1ms | Deployment, config page | **Negligible** | | Migration | One-time, seconds | Once | **Negligible** | AES-256-GCM throughput is >1 GB/s. Config vars are typically <1 KB. ### Searchability Impact - Query by key name: **Still works** (keys are plaintext) - Query by value content: **Lost** (values are ciphertext) - `pg_dump` readable secrets: **Fixed** ### Backup/Restore - Restore to same server: works (same KEK) - Restore to different server: must migrate KEK alongside backup - KEK lost: **all encrypted secrets irrecoverable** — document in DR runbook --- ## Recommendation ### Verdict: ⭐⭐⭐⭐½ (4.5/5) | Criterion | Rating | Notes | |-----------|:---:|-------| | Security improvement | 4/5 | Eliminates #1 practical risk (DB/backup exposure) | | Implementation complexity | 4/5 | 2-3 days, clean migration, minimal code changes | | Operational overhead | 4/5 | One additional env var. KEK backup is main new concern | | Compliance value | 5/5 | Directly satisfies SOC2, GDPR, PCI-DSS | | Industry alignment | 5/5 | Matches exactly what Heroku, Render, Fly.io do | **This should be implemented regardless of which delivery mechanism is chosen.** It's a complementary layer that protects the data at rest — the delivery mechanism (env vars, file mount, callback) is a separate concern. ### What NOT to Do - Do not use pgcrypto — key appears in SQL logs - Do not derive encryption key from JWT secret — maintain key independence - Do not implement sidecar decryptors — overengineered - Do not encrypt non-secret config (memoryLimitMb, etc.) — only `customEnvVars` ### Sources - [OWASP Cryptographic Storage Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cryptographic_Storage_Cheat_Sheet.html) - [NIST SP 800-57 Part 1 Rev. 5](https://csrc.nist.gov/pubs/sp/800/57/pt1/r5/final) - [Google Tink Key Management](https://developers.google.com/tink/key-management-overview) - [Google Cloud Envelope Encryption](https://docs.cloud.google.com/kms/docs/envelope-encryption) - [Heroku Security & Compliance](https://devcenter.heroku.com/articles/security-and-compliance-resources-and-features) - [Fly.io Secrets](https://fly.io/docs/apps/secrets/) - [Render Secret Handling](https://render.com/articles/how-render-handles-secrets-and-environment-variables) - [Percona pg_tde 2.1.1](https://docs.percona.com/new/2026/01/22/percona-transparent-data-encryption-pg_tde-211-has-been-released) - [HashiCorp Vault Transit Engine](https://developer.hashicorp.com/vault/docs/secrets/transit)
claude added the featuresecurity labels 2026-04-15 00:32:04 +02:00
Sign in to join this conversation.