Secret delivery option 6: Server-side bootstrap callback (one-time token fetch) #135

Open
opened 2026-04-15 00:37:17 +02:00 by claude · 0 comments
Owner

Parent epic: #129

Overview

Pass only a one-time bootstrap token to the container. On startup, the agent calls GET /api/v1/bootstrap/{token} to fetch its secrets over TLS. Token is single-use, short-TTL, scoped to a specific deployment. This mirrors HashiCorp Vault's cubbyhole response wrapping pattern without requiring Vault infrastructure.


How It Works

DeploymentExecutor                  Server Memory/PG             Container
      |                                    |                         |
      |--- mint token (CSPRNG 256-bit) --->|                         |
      |    store(token, secrets, ttl=5m)   |                         |
      |                                    |                         |
      |--- docker create(                  |                         |
      |      CAMELEER_BOOTSTRAP_TOKEN=     |                         |
      |      <token>)                      |                         |
      |                                    |                         |
      |                                    |<-- GET /bootstrap/{token}
      |                                    |--- 200 + secrets ------>|
      |                                    |    (token consumed)     |
      |                                    |                         |
      |                                    |<-- GET /bootstrap/{token}
      |                                    |    (second attempt)     |
      |                                    |--- 410 Gone ---------->|

Token Properties

Property Value Rationale
Entropy 256-bit CSPRNG (Base64url, 43 chars) Brute-force infeasible (~10^59 years at 1B guesses/sec)
Single-use Configurable maxUses (default: 3) Covers restart policy retries
TTL 5 minutes (configurable) Industry standard (Vault default). Covers JVM cold start.
Scope Deployment ID + app + environment Token cannot be used by wrong container

Token Storage

Approach Pros Cons
In-memory ConcurrentHashMap Fast, no deps, auto-GC on restart Lost on server restart
PostgreSQL table Survives restarts, queryable audit DB round-trip, needs cleanup
Hybrid (recommended) In-memory primary + PG fallback Slightly more complex

Startup sequence loads unexpired tokens from PG into memory. @Scheduled cleanup every 60s removes expired tokens.


Security Analysis

One-Time Token Properties

Aspect Detail
Replay attack Impossible (atomically consumed via ConcurrentHashMap.remove())
Tamper detection If someone else fetched first, legitimate consumer gets 410 Gone (tamper evidence — same as Vault cubbyhole)
Exposure window 5 minutes max (vs entire container lifetime for env vars)
Brute-force 256-bit = infeasible. Rate limit endpoint as defense-in-depth.

Is the Bootstrap Token as Sensitive as the Secrets?

Yes, during its lifetime. But risk profile is much better:

  • Extremely short-lived (5 min vs deployment lifetime)
  • Single/limited-use (vs secrets used repeatedly)
  • Scoped to one deployment
  • Visible in docker inspect for only 5 minutes (vs env var secrets for container lifetime)
  • Worthless after consumption

Comparison with Vault Cubbyhole

Our pattern is architecturally identical:

Aspect Vault Cubbyhole Our Pattern
Token type Single-use wrapping token Single-use bootstrap token
TTL Configurable (default 5m) Configurable (default 5m)
Tamper detection Yes (unwrap fails) Yes (410 Gone)
Audit trail Vault audit log Server audit log + metrics
External dependency Vault cluster None

Network Security

Agent-to-server communication uses Docker bridge networks. The bootstrap token's short TTL reduces the sniffing window. For SaaS, tenant-scoped networks (cameleer-tenant-{slug}) provide isolation. mTLS can be added later as hardening.

Container Restart Handling

Strategy How Tradeoff
maxUses = restartPolicy + 1 (recommended) Token allows 3 fetches within TTL Simple, slightly weaker than strict single-use
Server re-mints on restart event DockerEventMonitor detects start, mints new token via docker exec Complex but robust
Agent caches in memory Survives JVM-level restarts Not container-level restarts

Server Restart Between Mint and Fetch

With hybrid storage: PG has the token, startup loads it into memory. Container retries with exponential backoff (1s, 2s, 4s, 8s, max 30s). Seamless recovery.


Industry Precedent

This is a well-established production pattern:

System Pattern Mechanism
HashiCorp Vault Cubbyhole response wrapping Single-use wrapping token, 5m TTL, tamper-evident
AWS IMDSv2 Session token via PUT PUT returns token (6h TTL), hop-limit=1 blocks SSRF
Fly.io Encrypted vault + agent Temporary auth token, agent decrypts secrets
Cloudflare PAL Entrypoint decryption PAL client talks to pald daemon, decrypts at startup
Kubernetes Bound ServiceAccount tokens Time-limited, audience-bound, projected volume
SPIFFE/SPIRE Workload attestation Identity from process attestation, no pre-shared secret

Every system without infrastructure-level trust must solve the same bootstrap problem. Our pattern is in the same family.


Standards Alignment

Source Guidance Our Pattern
OWASP Secrets Management "Service should retrieve its own secrets at startup" — ranked as best option Matches exactly
NIST SP 800-204 "Identity credentials provided securely at startup, not embedded in images" Compliant
NIST SP 800-207 (Zero Trust) "No resource inherently trusted; every access verified" Token IS the verification
CNCF Security Whitepaper "Never bake secrets into images; use secret managers" Server acts as purpose-built secret manager

OWASP explicitly ranks delivery approaches:

  1. Best: Service fetches own secrets at runtime ← this is our pattern
  2. Good: Sidecar injects secrets
  3. Acceptable: Orchestrator injects env vars
  4. Bad: Secrets baked into images

Platform Compatibility

Platform Network Path Reachability
Docker standalone Bridge network, Docker DNS cameleer3-server:8081 Works naturally
Docker Swarm Overlay network Works across nodes
Kubernetes ClusterIP service Works (but K8s Secrets may be preferred)
Docker-in-Docker Tenant network (cameleer-tenant-{slug}) Already supported

vs K8s Secrets: Not Redundant

Feature K8s Secrets Bootstrap Callback
Secrets visible in etcd Yes (even encrypted, admin-readable) No (server memory only)
Audit trail K8s audit log (if enabled) Built-in server audit log
Dynamic per-deployment secrets No (static) Yes
Tenant isolation Namespace-level Token + deployment scoped

Implementation Plan

Server-Side Components

Component Description Effort
BootstrapTokenStore ConcurrentHashMap<String, TokenEntry> + PG table; mint(), consume(), cleanup() New class
BootstrapSecretController GET /api/v1/bootstrap/{token} — unauthenticated (token IS the auth) New endpoint
BootstrapTokenCleanupJob @Scheduled every 60s, remove expired tokens New job
DeploymentExecutor change Mint token, pass CAMELEER_BOOTSTRAP_TOKEN instead of secrets Modify
SecurityConfig change Permit /api/v1/bootstrap/** without JWT auth Modify
Flyway V11 migration bootstrap_tokens table: token_hash, deployment_id, secrets_encrypted, expires_at, consumed_at, uses_remaining New migration
ServerMetrics cameleer.bootstrap.{minted,consumed,expired,rejected} counters Extend

Estimated: 2-3 days server-side.

Agent-Side

// In agent init (before SSE connect):
String token = System.getenv("CAMELEER_BOOTSTRAP_TOKEN");
if (token != null) {
    Map<String, String> secrets = fetchSecrets(serverUrl + "/api/v1/bootstrap/" + token);
    secrets.forEach(System::setProperty);
}

Estimated: 0.5-1 day agent-side.

Non-Java Containers

if [ -n "$CAMELEER_BOOTSTRAP_TOKEN" ]; then
  SECRETS=$(curl -sf "$CAMELEER_SERVER_URL/api/v1/bootstrap/$CAMELEER_BOOTSTRAP_TOKEN")
  eval $(echo "$SECRETS" | jq -r '.secrets | to_entries[] | "export \(.key)=\(.value)"')
  unset CAMELEER_BOOTSTRAP_TOKEN
fi

Migration Path

  1. Implement endpoint; agent supports both (env var fallback if no bootstrap token)
  2. DeploymentExecutor uses tokens for new deployments; existing unaffected
  3. Remove env var secret injection from buildEnvVars()
  4. Deprecation period for old agents

Advantages Over Alternatives

Criterion Env Vars (current) Swarm Secrets Vault Bootstrap Callback
Secrets in docker inspect Yes No No No (only token, 5m TTL)
External dependency None Swarm mode Vault cluster None
All platforms Yes Swarm only All (with Vault) Yes
Audit trail None None Full Full
Tamper detection None None Yes Yes (410 on re-fetch)
Rotation Redeploy Swarm rotation Dynamic Auto on redeploy
Operational complexity Minimal Moderate High Low-Moderate

Disadvantages & Mitigations

Disadvantage Severity Mitigation
Startup dependency on server Medium Retry with backoff; server already SPOF for agents
1-5s startup latency Low Fetch during JVM init, overlaps with other startup
Secrets in server memory Medium Already true (server reads from config); no new risk
Container restart re-issue Medium maxUses matching restart policy; logged per-use
Token visible in docker inspect Low Worthless after consumption; 5-min TTL

Recommendation

Criterion Rating Notes
Security improvement 5/5 Eliminates secrets from docker inspect, adds tamper detection
Implementation complexity 4/5 Clean, well-bounded; ~3 days server + 1 day agent
Operational overhead 4/5 Server already SPOF for agents; no new failure mode
Industry alignment 5/5 Matches Vault cubbyhole, Fly.io, OWASP "best" ranking
Platform compatibility 5/5 Works on Docker, Swarm, K8s, DinD

This is the recommended primary delivery mechanism for Cameleer3 because:

  1. Natural fit — agents already call back on startup (registration, SSE). One more HTTP call.
  2. Zero new dependencies — no Vault, no Swarm mode, no CSI driver.
  3. Eliminates the docker inspect problem — the #1 security gap today.
  4. Audit trail for free — every fetch logged with deployment ID, container ID, timestamp.
  5. Multi-platform — simple HTTP GET works everywhere.
  6. Incremental migration — dual-mode agent, gradual rollout.

Implementation Priority

Priority Item
P0 BootstrapTokenStore + endpoint
P0 DeploymentExecutor integration
P1 Agent-side fetch in cameleer3-common
P1 Metrics + audit logging
P2 PG persistence for restart resilience
P2 Rate limiting on bootstrap endpoint
P3 mTLS between agent and server

Combines Well With

  • Option 3 (#131): Encrypt secrets at rest in PG (AES-256-GCM) — complementary layer
  • Option 2 (#TBD): Tmpfs file mount — callback delivers secrets, tmpfs stores them without env var exposure

What NOT to Do

  • Don't build a full secrets manager (that's Vault's job if ever needed)
  • Don't use bootstrap token for ongoing auth (agent already gets JWT at registration)
  • Don't encrypt the bootstrap token itself (worthless after consumption; adds complexity)

Sources

Parent epic: #129 ## Overview Pass only a one-time bootstrap token to the container. On startup, the agent calls `GET /api/v1/bootstrap/{token}` to fetch its secrets over TLS. Token is single-use, short-TTL, scoped to a specific deployment. This mirrors HashiCorp Vault's cubbyhole response wrapping pattern without requiring Vault infrastructure. --- ## How It Works ``` DeploymentExecutor Server Memory/PG Container | | | |--- mint token (CSPRNG 256-bit) --->| | | store(token, secrets, ttl=5m) | | | | | |--- docker create( | | | CAMELEER_BOOTSTRAP_TOKEN= | | | <token>) | | | | | | |<-- GET /bootstrap/{token} | |--- 200 + secrets ------>| | | (token consumed) | | | | | |<-- GET /bootstrap/{token} | | (second attempt) | | |--- 410 Gone ---------->| ``` ### Token Properties | Property | Value | Rationale | |----------|-------|-----------| | Entropy | 256-bit CSPRNG (Base64url, 43 chars) | Brute-force infeasible (~10^59 years at 1B guesses/sec) | | Single-use | Configurable `maxUses` (default: 3) | Covers restart policy retries | | TTL | 5 minutes (configurable) | Industry standard (Vault default). Covers JVM cold start. | | Scope | Deployment ID + app + environment | Token cannot be used by wrong container | ### Token Storage | Approach | Pros | Cons | |----------|------|------| | In-memory `ConcurrentHashMap` | Fast, no deps, auto-GC on restart | Lost on server restart | | PostgreSQL table | Survives restarts, queryable audit | DB round-trip, needs cleanup | | **Hybrid (recommended)** | In-memory primary + PG fallback | Slightly more complex | Startup sequence loads unexpired tokens from PG into memory. `@Scheduled` cleanup every 60s removes expired tokens. --- ## Security Analysis ### One-Time Token Properties | Aspect | Detail | |--------|--------| | Replay attack | Impossible (atomically consumed via `ConcurrentHashMap.remove()`) | | Tamper detection | If someone else fetched first, legitimate consumer gets **410 Gone** (tamper evidence — same as Vault cubbyhole) | | Exposure window | 5 minutes max (vs entire container lifetime for env vars) | | Brute-force | 256-bit = infeasible. Rate limit endpoint as defense-in-depth. | ### Is the Bootstrap Token as Sensitive as the Secrets? **Yes, during its lifetime.** But risk profile is much better: - Extremely short-lived (5 min vs deployment lifetime) - Single/limited-use (vs secrets used repeatedly) - Scoped to one deployment - Visible in `docker inspect` for only 5 minutes (vs env var secrets for container lifetime) - **Worthless after consumption** ### Comparison with Vault Cubbyhole Our pattern is architecturally identical: | Aspect | Vault Cubbyhole | Our Pattern | |--------|:-:|:-:| | Token type | Single-use wrapping token | Single-use bootstrap token | | TTL | Configurable (default 5m) | Configurable (default 5m) | | Tamper detection | Yes (unwrap fails) | Yes (410 Gone) | | Audit trail | Vault audit log | Server audit log + metrics | | External dependency | **Vault cluster** | **None** | ### Network Security Agent-to-server communication uses Docker bridge networks. The bootstrap token's short TTL reduces the sniffing window. For SaaS, tenant-scoped networks (`cameleer-tenant-{slug}`) provide isolation. mTLS can be added later as hardening. ### Container Restart Handling | Strategy | How | Tradeoff | |----------|-----|----------| | `maxUses = restartPolicy + 1` (recommended) | Token allows 3 fetches within TTL | Simple, slightly weaker than strict single-use | | Server re-mints on restart event | `DockerEventMonitor` detects `start`, mints new token via `docker exec` | Complex but robust | | Agent caches in memory | Survives JVM-level restarts | Not container-level restarts | ### Server Restart Between Mint and Fetch With hybrid storage: PG has the token, startup loads it into memory. Container retries with exponential backoff (1s, 2s, 4s, 8s, max 30s). Seamless recovery. --- ## Industry Precedent This is a well-established production pattern: | System | Pattern | Mechanism | |--------|---------|-----------| | **HashiCorp Vault** | Cubbyhole response wrapping | Single-use wrapping token, 5m TTL, tamper-evident | | **AWS IMDSv2** | Session token via PUT | PUT returns token (6h TTL), hop-limit=1 blocks SSRF | | **Fly.io** | Encrypted vault + agent | Temporary auth token, agent decrypts secrets | | **Cloudflare PAL** | Entrypoint decryption | PAL client talks to pald daemon, decrypts at startup | | **Kubernetes** | Bound ServiceAccount tokens | Time-limited, audience-bound, projected volume | | **SPIFFE/SPIRE** | Workload attestation | Identity from process attestation, no pre-shared secret | Every system without infrastructure-level trust must solve the same bootstrap problem. Our pattern is in the same family. --- ## Standards Alignment | Source | Guidance | Our Pattern | |--------|---------|:-:| | **OWASP Secrets Management** | "Service should retrieve its own secrets at startup" — ranked as **best option** | **Matches exactly** | | **NIST SP 800-204** | "Identity credentials provided securely at startup, not embedded in images" | Compliant | | **NIST SP 800-207** (Zero Trust) | "No resource inherently trusted; every access verified" | Token IS the verification | | **CNCF Security Whitepaper** | "Never bake secrets into images; use secret managers" | Server acts as purpose-built secret manager | OWASP explicitly ranks delivery approaches: 1. **Best:** Service fetches own secrets at runtime ← **this is our pattern** 2. Good: Sidecar injects secrets 3. Acceptable: Orchestrator injects env vars 4. Bad: Secrets baked into images --- ## Platform Compatibility | Platform | Network Path | Reachability | |----------|-------------|-------------| | **Docker standalone** | Bridge network, Docker DNS `cameleer3-server:8081` | Works naturally | | **Docker Swarm** | Overlay network | Works across nodes | | **Kubernetes** | ClusterIP service | Works (but K8s Secrets may be preferred) | | **Docker-in-Docker** | Tenant network (`cameleer-tenant-{slug}`) | Already supported | ### vs K8s Secrets: Not Redundant | Feature | K8s Secrets | Bootstrap Callback | |---------|:-:|:-:| | Secrets visible in etcd | Yes (even encrypted, admin-readable) | No (server memory only) | | Audit trail | K8s audit log (if enabled) | **Built-in server audit log** | | Dynamic per-deployment secrets | No (static) | **Yes** | | Tenant isolation | Namespace-level | **Token + deployment scoped** | --- ## Implementation Plan ### Server-Side Components | Component | Description | Effort | |-----------|-------------|--------| | `BootstrapTokenStore` | `ConcurrentHashMap<String, TokenEntry>` + PG table; `mint()`, `consume()`, `cleanup()` | New class | | `BootstrapSecretController` | `GET /api/v1/bootstrap/{token}` — unauthenticated (token IS the auth) | New endpoint | | `BootstrapTokenCleanupJob` | `@Scheduled` every 60s, remove expired tokens | New job | | `DeploymentExecutor` change | Mint token, pass `CAMELEER_BOOTSTRAP_TOKEN` instead of secrets | Modify | | `SecurityConfig` change | Permit `/api/v1/bootstrap/**` without JWT auth | Modify | | Flyway V11 migration | `bootstrap_tokens` table: `token_hash`, `deployment_id`, `secrets_encrypted`, `expires_at`, `consumed_at`, `uses_remaining` | New migration | | `ServerMetrics` | `cameleer.bootstrap.{minted,consumed,expired,rejected}` counters | Extend | **Estimated: 2-3 days server-side.** ### Agent-Side ```java // In agent init (before SSE connect): String token = System.getenv("CAMELEER_BOOTSTRAP_TOKEN"); if (token != null) { Map<String, String> secrets = fetchSecrets(serverUrl + "/api/v1/bootstrap/" + token); secrets.forEach(System::setProperty); } ``` **Estimated: 0.5-1 day agent-side.** ### Non-Java Containers ```bash if [ -n "$CAMELEER_BOOTSTRAP_TOKEN" ]; then SECRETS=$(curl -sf "$CAMELEER_SERVER_URL/api/v1/bootstrap/$CAMELEER_BOOTSTRAP_TOKEN") eval $(echo "$SECRETS" | jq -r '.secrets | to_entries[] | "export \(.key)=\(.value)"') unset CAMELEER_BOOTSTRAP_TOKEN fi ``` ### Migration Path 1. Implement endpoint; agent supports both (env var fallback if no bootstrap token) 2. `DeploymentExecutor` uses tokens for new deployments; existing unaffected 3. Remove env var secret injection from `buildEnvVars()` 4. Deprecation period for old agents --- ## Advantages Over Alternatives | Criterion | Env Vars (current) | Swarm Secrets | Vault | **Bootstrap Callback** | |-----------|:-:|:-:|:-:|:-:| | Secrets in `docker inspect` | Yes | No | No | **No** (only token, 5m TTL) | | External dependency | None | Swarm mode | Vault cluster | **None** | | All platforms | Yes | Swarm only | All (with Vault) | **Yes** | | Audit trail | None | None | Full | **Full** | | Tamper detection | None | None | Yes | **Yes** (410 on re-fetch) | | Rotation | Redeploy | Swarm rotation | Dynamic | **Auto on redeploy** | | Operational complexity | Minimal | Moderate | High | **Low-Moderate** | --- ## Disadvantages & Mitigations | Disadvantage | Severity | Mitigation | |---|---|---| | Startup dependency on server | Medium | Retry with backoff; server already SPOF for agents | | 1-5s startup latency | Low | Fetch during JVM init, overlaps with other startup | | Secrets in server memory | Medium | Already true (server reads from config); no new risk | | Container restart re-issue | Medium | `maxUses` matching restart policy; logged per-use | | Token visible in `docker inspect` | Low | Worthless after consumption; 5-min TTL | --- ## Recommendation ### Verdict: ⭐⭐⭐⭐½ (4.5/5) — Strongly Recommended | Criterion | Rating | Notes | |-----------|:---:|-------| | Security improvement | 5/5 | Eliminates secrets from docker inspect, adds tamper detection | | Implementation complexity | 4/5 | Clean, well-bounded; ~3 days server + 1 day agent | | Operational overhead | 4/5 | Server already SPOF for agents; no new failure mode | | Industry alignment | 5/5 | Matches Vault cubbyhole, Fly.io, OWASP "best" ranking | | Platform compatibility | 5/5 | Works on Docker, Swarm, K8s, DinD | **This is the recommended primary delivery mechanism** for Cameleer3 because: 1. **Natural fit** — agents already call back on startup (registration, SSE). One more HTTP call. 2. **Zero new dependencies** — no Vault, no Swarm mode, no CSI driver. 3. **Eliminates the `docker inspect` problem** — the #1 security gap today. 4. **Audit trail for free** — every fetch logged with deployment ID, container ID, timestamp. 5. **Multi-platform** — simple HTTP GET works everywhere. 6. **Incremental migration** — dual-mode agent, gradual rollout. ### Implementation Priority | Priority | Item | |----------|------| | P0 | `BootstrapTokenStore` + endpoint | | P0 | `DeploymentExecutor` integration | | P1 | Agent-side fetch in `cameleer3-common` | | P1 | Metrics + audit logging | | P2 | PG persistence for restart resilience | | P2 | Rate limiting on bootstrap endpoint | | P3 | mTLS between agent and server | ### Combines Well With - **Option 3 (#131)**: Encrypt secrets at rest in PG (AES-256-GCM) — complementary layer - **Option 2 (#TBD)**: Tmpfs file mount — callback delivers secrets, tmpfs stores them without env var exposure ### What NOT to Do - Don't build a full secrets manager (that's Vault's job if ever needed) - Don't use bootstrap token for ongoing auth (agent already gets JWT at registration) - Don't encrypt the bootstrap token itself (worthless after consumption; adds complexity) ### Sources - [HashiCorp Vault Response Wrapping](https://developer.hashicorp.com/vault/docs/concepts/response-wrapping) - [Vault Cubbyhole Auth Principles](https://www.hashicorp.com/en/blog/cubbyhole-authentication-principles) - [OWASP Secrets Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html) - [NIST SP 800-204 Microservice Security](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-204.pdf) - [NIST SP 800-207 Zero Trust Architecture](https://csrc.nist.gov/pubs/sp/800/207/final) - [CNCF Cloud Native Security Whitepaper](https://tag-security.cncf.io/community/resources/security-whitepaper/v1/cloud-native-security-whitepaper/) - [AWS IMDSv2 Security](https://aws.amazon.com/blogs/security/get-the-full-benefits-of-imdsv2-and-disable-imdsv1-across-your-aws-infrastructure/) - [Fly.io Secrets](https://fly.io/docs/apps/secrets/) - [Cloudflare PAL Container Identity Bootstrapping](https://blog.cloudflare.com/pal-a-container-identity-bootstrapping-tool/) - [K8s Bound ServiceAccount Tokens](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-bound-service-account-tokens)
claude added the featuresecurity labels 2026-04-15 00:37:17 +02:00
Sign in to join this conversation.