Secret delivery option 6: Server-side bootstrap callback (one-time token fetch) #135

New Issue

claude · 2026-04-15T00:37:17+02:00

claude commented

2026-04-15 00:37:17 +02:00

Parent epic: #129

Overview

Pass only a one-time bootstrap token to the container. On startup, the agent calls GET /api/v1/bootstrap/{token} to fetch its secrets over TLS. Token is single-use, short-TTL, scoped to a specific deployment. This mirrors HashiCorp Vault's cubbyhole response wrapping pattern without requiring Vault infrastructure.

How It Works

DeploymentExecutor                  Server Memory/PG             Container
      |                                    |                         |
      |--- mint token (CSPRNG 256-bit) --->|                         |
      |    store(token, secrets, ttl=5m)   |                         |
      |                                    |                         |
      |--- docker create(                  |                         |
      |      CAMELEER_BOOTSTRAP_TOKEN=     |                         |
      |      <token>)                      |                         |
      |                                    |                         |
      |                                    |<-- GET /bootstrap/{token}
      |                                    |--- 200 + secrets ------>|
      |                                    |    (token consumed)     |
      |                                    |                         |
      |                                    |<-- GET /bootstrap/{token}
      |                                    |    (second attempt)     |
      |                                    |--- 410 Gone ---------->|

Token Properties

Property	Value	Rationale
Entropy	256-bit CSPRNG (Base64url, 43 chars)	Brute-force infeasible (~10^59 years at 1B guesses/sec)
Single-use	Configurable `maxUses` (default: 3)	Covers restart policy retries
TTL	5 minutes (configurable)	Industry standard (Vault default). Covers JVM cold start.
Scope	Deployment ID + app + environment	Token cannot be used by wrong container

Token Storage

Approach	Pros	Cons
In-memory `ConcurrentHashMap`	Fast, no deps, auto-GC on restart	Lost on server restart
PostgreSQL table	Survives restarts, queryable audit	DB round-trip, needs cleanup
Hybrid (recommended)	In-memory primary + PG fallback	Slightly more complex

Startup sequence loads unexpired tokens from PG into memory. @Scheduled cleanup every 60s removes expired tokens.

Security Analysis

One-Time Token Properties

Aspect	Detail
Replay attack	Impossible (atomically consumed via `ConcurrentHashMap.remove()`)
Tamper detection	If someone else fetched first, legitimate consumer gets 410 Gone (tamper evidence — same as Vault cubbyhole)
Exposure window	5 minutes max (vs entire container lifetime for env vars)
Brute-force	256-bit = infeasible. Rate limit endpoint as defense-in-depth.

Is the Bootstrap Token as Sensitive as the Secrets?

Yes, during its lifetime. But risk profile is much better:

Extremely short-lived (5 min vs deployment lifetime)
Single/limited-use (vs secrets used repeatedly)
Scoped to one deployment
Visible in docker inspect for only 5 minutes (vs env var secrets for container lifetime)
Worthless after consumption

Comparison with Vault Cubbyhole

Our pattern is architecturally identical:

Aspect	Vault Cubbyhole	Our Pattern
Token type	Single-use wrapping token	Single-use bootstrap token
TTL	Configurable (default 5m)	Configurable (default 5m)
Tamper detection	Yes (unwrap fails)	Yes (410 Gone)
Audit trail	Vault audit log	Server audit log + metrics
External dependency	Vault cluster	None

Network Security

Agent-to-server communication uses Docker bridge networks. The bootstrap token's short TTL reduces the sniffing window. For SaaS, tenant-scoped networks (cameleer-tenant-{slug}) provide isolation. mTLS can be added later as hardening.

Container Restart Handling

Strategy	How	Tradeoff
`maxUses = restartPolicy + 1` (recommended)	Token allows 3 fetches within TTL	Simple, slightly weaker than strict single-use
Server re-mints on restart event	`DockerEventMonitor` detects `start`, mints new token via `docker exec`	Complex but robust
Agent caches in memory	Survives JVM-level restarts	Not container-level restarts

Server Restart Between Mint and Fetch

With hybrid storage: PG has the token, startup loads it into memory. Container retries with exponential backoff (1s, 2s, 4s, 8s, max 30s). Seamless recovery.

Industry Precedent

This is a well-established production pattern:

System	Pattern	Mechanism
HashiCorp Vault	Cubbyhole response wrapping	Single-use wrapping token, 5m TTL, tamper-evident
AWS IMDSv2	Session token via PUT	PUT returns token (6h TTL), hop-limit=1 blocks SSRF
Fly.io	Encrypted vault + agent	Temporary auth token, agent decrypts secrets
Cloudflare PAL	Entrypoint decryption	PAL client talks to pald daemon, decrypts at startup
Kubernetes	Bound ServiceAccount tokens	Time-limited, audience-bound, projected volume
SPIFFE/SPIRE	Workload attestation	Identity from process attestation, no pre-shared secret

Every system without infrastructure-level trust must solve the same bootstrap problem. Our pattern is in the same family.

Standards Alignment

Source	Guidance	Our Pattern
OWASP Secrets Management	"Service should retrieve its own secrets at startup" — ranked as best option	Matches exactly
NIST SP 800-204	"Identity credentials provided securely at startup, not embedded in images"	Compliant
NIST SP 800-207 (Zero Trust)	"No resource inherently trusted; every access verified"	Token IS the verification
CNCF Security Whitepaper	"Never bake secrets into images; use secret managers"	Server acts as purpose-built secret manager

OWASP explicitly ranks delivery approaches:

Best: Service fetches own secrets at runtime ← this is our pattern
Good: Sidecar injects secrets
Acceptable: Orchestrator injects env vars
Bad: Secrets baked into images

Platform Compatibility

Platform	Network Path	Reachability
Docker standalone	Bridge network, Docker DNS `cameleer3-server:8081`	Works naturally
Docker Swarm	Overlay network	Works across nodes
Kubernetes	ClusterIP service	Works (but K8s Secrets may be preferred)
Docker-in-Docker	Tenant network (`cameleer-tenant-{slug}`)	Already supported

vs K8s Secrets: Not Redundant

Feature	K8s Secrets	Bootstrap Callback
Secrets visible in etcd	Yes (even encrypted, admin-readable)	No (server memory only)
Audit trail	K8s audit log (if enabled)	Built-in server audit log
Dynamic per-deployment secrets	No (static)	Yes
Tenant isolation	Namespace-level	Token + deployment scoped

Implementation Plan

Server-Side Components

Component	Description	Effort
`BootstrapTokenStore`	`ConcurrentHashMap<String, TokenEntry>` + PG table; `mint()`, `consume()`, `cleanup()`	New class
`BootstrapSecretController`	`GET /api/v1/bootstrap/{token}` — unauthenticated (token IS the auth)	New endpoint
`BootstrapTokenCleanupJob`	`@Scheduled` every 60s, remove expired tokens	New job
`DeploymentExecutor` change	Mint token, pass `CAMELEER_BOOTSTRAP_TOKEN` instead of secrets	Modify
`SecurityConfig` change	Permit `/api/v1/bootstrap/**` without JWT auth	Modify
Flyway V11 migration	`bootstrap_tokens` table: `token_hash`, `deployment_id`, `secrets_encrypted`, `expires_at`, `consumed_at`, `uses_remaining`	New migration
`ServerMetrics`	`cameleer.bootstrap.{minted,consumed,expired,rejected}` counters	Extend

Estimated: 2-3 days server-side.

Agent-Side

// In agent init (before SSE connect):
String token = System.getenv("CAMELEER_BOOTSTRAP_TOKEN");
if (token != null) {
    Map<String, String> secrets = fetchSecrets(serverUrl + "/api/v1/bootstrap/" + token);
    secrets.forEach(System::setProperty);
}

Estimated: 0.5-1 day agent-side.

Non-Java Containers

if [ -n "$CAMELEER_BOOTSTRAP_TOKEN" ]; then
  SECRETS=$(curl -sf "$CAMELEER_SERVER_URL/api/v1/bootstrap/$CAMELEER_BOOTSTRAP_TOKEN")
  eval $(echo "$SECRETS" | jq -r '.secrets | to_entries[] | "export \(.key)=\(.value)"')
  unset CAMELEER_BOOTSTRAP_TOKEN
fi

Migration Path

Implement endpoint; agent supports both (env var fallback if no bootstrap token)
DeploymentExecutor uses tokens for new deployments; existing unaffected
Remove env var secret injection from buildEnvVars()
Deprecation period for old agents

Advantages Over Alternatives

Criterion	Env Vars (current)	Swarm Secrets	Vault	Bootstrap Callback
Secrets in `docker inspect`	Yes	No	No	No (only token, 5m TTL)
External dependency	None	Swarm mode	Vault cluster	None
All platforms	Yes	Swarm only	All (with Vault)	Yes
Audit trail	None	None	Full	Full
Tamper detection	None	None	Yes	Yes (410 on re-fetch)
Rotation	Redeploy	Swarm rotation	Dynamic	Auto on redeploy
Operational complexity	Minimal	Moderate	High	Low-Moderate

Disadvantages & Mitigations

Disadvantage	Severity	Mitigation
Startup dependency on server	Medium	Retry with backoff; server already SPOF for agents
1-5s startup latency	Low	Fetch during JVM init, overlaps with other startup
Secrets in server memory	Medium	Already true (server reads from config); no new risk
Container restart re-issue	Medium	`maxUses` matching restart policy; logged per-use
Token visible in `docker inspect`	Low	Worthless after consumption; 5-min TTL

Recommendation

Verdict: ⭐⭐⭐⭐½ (4.5/5) — Strongly Recommended

Criterion	Rating	Notes
Security improvement	5/5	Eliminates secrets from docker inspect, adds tamper detection
Implementation complexity	4/5	Clean, well-bounded; ~3 days server + 1 day agent
Operational overhead	4/5	Server already SPOF for agents; no new failure mode
Industry alignment	5/5	Matches Vault cubbyhole, Fly.io, OWASP "best" ranking
Platform compatibility	5/5	Works on Docker, Swarm, K8s, DinD

This is the recommended primary delivery mechanism for Cameleer3 because:

Natural fit — agents already call back on startup (registration, SSE). One more HTTP call.
Zero new dependencies — no Vault, no Swarm mode, no CSI driver.
Eliminates the docker inspect problem — the #1 security gap today.
Audit trail for free — every fetch logged with deployment ID, container ID, timestamp.
Multi-platform — simple HTTP GET works everywhere.
Incremental migration — dual-mode agent, gradual rollout.

Implementation Priority

Priority	Item
P0	`BootstrapTokenStore` + endpoint
P0	`DeploymentExecutor` integration
P1	Agent-side fetch in `cameleer3-common`
P1	Metrics + audit logging
P2	PG persistence for restart resilience
P2	Rate limiting on bootstrap endpoint
P3	mTLS between agent and server

Combines Well With

Option 3 (#131): Encrypt secrets at rest in PG (AES-256-GCM) — complementary layer
Option 2 (#TBD): Tmpfs file mount — callback delivers secrets, tmpfs stores them without env var exposure

What NOT to Do

Don't build a full secrets manager (that's Vault's job if ever needed)
Don't use bootstrap token for ongoing auth (agent already gets JWT at registration)
Don't encrypt the bootstrap token itself (worthless after consumption; adds complexity)

Sources

Parent epic: #129 ## Overview Pass only a one-time bootstrap token to the container. On startup, the agent calls `GET /api/v1/bootstrap/{token}` to fetch its secrets over TLS. Token is single-use, short-TTL, scoped to a specific deployment. This mirrors HashiCorp Vault's cubbyhole response wrapping pattern without requiring Vault infrastructure. --- ## How It Works ``` DeploymentExecutor Server Memory/PG Container | | | |--- mint token (CSPRNG 256-bit) --->| | | store(token, secrets, ttl=5m) | | | | | |--- docker create( | | | CAMELEER_BOOTSTRAP_TOKEN= | | | <token>) | | | | | | |<-- GET /bootstrap/{token} | |--- 200 + secrets ------>| | | (token consumed) | | | | | |<-- GET /bootstrap/{token} | | (second attempt) | | |--- 410 Gone ---------->| ``` ### Token Properties | Property | Value | Rationale | |----------|-------|-----------| | Entropy | 256-bit CSPRNG (Base64url, 43 chars) | Brute-force infeasible (~10^59 years at 1B guesses/sec) | | Single-use | Configurable `maxUses` (default: 3) | Covers restart policy retries | | TTL | 5 minutes (configurable) | Industry standard (Vault default). Covers JVM cold start. | | Scope | Deployment ID + app + environment | Token cannot be used by wrong container | ### Token Storage | Approach | Pros | Cons | |----------|------|------| | In-memory `ConcurrentHashMap` | Fast, no deps, auto-GC on restart | Lost on server restart | | PostgreSQL table | Survives restarts, queryable audit | DB round-trip, needs cleanup | | **Hybrid (recommended)** | In-memory primary + PG fallback | Slightly more complex | Startup sequence loads unexpired tokens from PG into memory. `@Scheduled` cleanup every 60s removes expired tokens. --- ## Security Analysis ### One-Time Token Properties | Aspect | Detail | |--------|--------| | Replay attack | Impossible (atomically consumed via `ConcurrentHashMap.remove()`) | | Tamper detection | If someone else fetched first, legitimate consumer gets **410 Gone** (tamper evidence — same as Vault cubbyhole) | | Exposure window | 5 minutes max (vs entire container lifetime for env vars) | | Brute-force | 256-bit = infeasible. Rate limit endpoint as defense-in-depth. | ### Is the Bootstrap Token as Sensitive as the Secrets? **Yes, during its lifetime.** But risk profile is much better: - Extremely short-lived (5 min vs deployment lifetime) - Single/limited-use (vs secrets used repeatedly) - Scoped to one deployment - Visible in `docker inspect` for only 5 minutes (vs env var secrets for container lifetime) - **Worthless after consumption** ### Comparison with Vault Cubbyhole Our pattern is architecturally identical: | Aspect | Vault Cubbyhole | Our Pattern | |--------|:-:|:-:| | Token type | Single-use wrapping token | Single-use bootstrap token | | TTL | Configurable (default 5m) | Configurable (default 5m) | | Tamper detection | Yes (unwrap fails) | Yes (410 Gone) | | Audit trail | Vault audit log | Server audit log + metrics | | External dependency | **Vault cluster** | **None** | ### Network Security Agent-to-server communication uses Docker bridge networks. The bootstrap token's short TTL reduces the sniffing window. For SaaS, tenant-scoped networks (`cameleer-tenant-{slug}`) provide isolation. mTLS can be added later as hardening. ### Container Restart Handling | Strategy | How | Tradeoff | |----------|-----|----------| | `maxUses = restartPolicy + 1` (recommended) | Token allows 3 fetches within TTL | Simple, slightly weaker than strict single-use | | Server re-mints on restart event | `DockerEventMonitor` detects `start`, mints new token via `docker exec` | Complex but robust | | Agent caches in memory | Survives JVM-level restarts | Not container-level restarts | ### Server Restart Between Mint and Fetch With hybrid storage: PG has the token, startup loads it into memory. Container retries with exponential backoff (1s, 2s, 4s, 8s, max 30s). Seamless recovery. --- ## Industry Precedent This is a well-established production pattern: | System | Pattern | Mechanism | |--------|---------|-----------| | **HashiCorp Vault** | Cubbyhole response wrapping | Single-use wrapping token, 5m TTL, tamper-evident | | **AWS IMDSv2** | Session token via PUT | PUT returns token (6h TTL), hop-limit=1 blocks SSRF | | **Fly.io** | Encrypted vault + agent | Temporary auth token, agent decrypts secrets | | **Cloudflare PAL** | Entrypoint decryption | PAL client talks to pald daemon, decrypts at startup | | **Kubernetes** | Bound ServiceAccount tokens | Time-limited, audience-bound, projected volume | | **SPIFFE/SPIRE** | Workload attestation | Identity from process attestation, no pre-shared secret | Every system without infrastructure-level trust must solve the same bootstrap problem. Our pattern is in the same family. --- ## Standards Alignment | Source | Guidance | Our Pattern | |--------|---------|:-:| | **OWASP Secrets Management** | "Service should retrieve its own secrets at startup" — ranked as **best option** | **Matches exactly** | | **NIST SP 800-204** | "Identity credentials provided securely at startup, not embedded in images" | Compliant | | **NIST SP 800-207** (Zero Trust) | "No resource inherently trusted; every access verified" | Token IS the verification | | **CNCF Security Whitepaper** | "Never bake secrets into images; use secret managers" | Server acts as purpose-built secret manager | OWASP explicitly ranks delivery approaches: 1. **Best:** Service fetches own secrets at runtime ← **this is our pattern** 2. Good: Sidecar injects secrets 3. Acceptable: Orchestrator injects env vars 4. Bad: Secrets baked into images --- ## Platform Compatibility | Platform | Network Path | Reachability | |----------|-------------|-------------| | **Docker standalone** | Bridge network, Docker DNS `cameleer3-server:8081` | Works naturally | | **Docker Swarm** | Overlay network | Works across nodes | | **Kubernetes** | ClusterIP service | Works (but K8s Secrets may be preferred) | | **Docker-in-Docker** | Tenant network (`cameleer-tenant-{slug}`) | Already supported | ### vs K8s Secrets: Not Redundant | Feature | K8s Secrets | Bootstrap Callback | |---------|:-:|:-:| | Secrets visible in etcd | Yes (even encrypted, admin-readable) | No (server memory only) | | Audit trail | K8s audit log (if enabled) | **Built-in server audit log** | | Dynamic per-deployment secrets | No (static) | **Yes** | | Tenant isolation | Namespace-level | **Token + deployment scoped** | --- ## Implementation Plan ### Server-Side Components | Component | Description | Effort | |-----------|-------------|--------| | `BootstrapTokenStore` | `ConcurrentHashMap<String, TokenEntry>` + PG table; `mint()`, `consume()`, `cleanup()` | New class | | `BootstrapSecretController` | `GET /api/v1/bootstrap/{token}` — unauthenticated (token IS the auth) | New endpoint | | `BootstrapTokenCleanupJob` | `@Scheduled` every 60s, remove expired tokens | New job | | `DeploymentExecutor` change | Mint token, pass `CAMELEER_BOOTSTRAP_TOKEN` instead of secrets | Modify | | `SecurityConfig` change | Permit `/api/v1/bootstrap/**` without JWT auth | Modify | | Flyway V11 migration | `bootstrap_tokens` table: `token_hash`, `deployment_id`, `secrets_encrypted`, `expires_at`, `consumed_at`, `uses_remaining` | New migration | | `ServerMetrics` | `cameleer.bootstrap.{minted,consumed,expired,rejected}` counters | Extend | **Estimated: 2-3 days server-side.** ### Agent-Side ```java // In agent init (before SSE connect): String token = System.getenv("CAMELEER_BOOTSTRAP_TOKEN"); if (token != null) { Map<String, String> secrets = fetchSecrets(serverUrl + "/api/v1/bootstrap/" + token); secrets.forEach(System::setProperty); } ``` **Estimated: 0.5-1 day agent-side.** ### Non-Java Containers ```bash if [ -n "$CAMELEER_BOOTSTRAP_TOKEN" ]; then SECRETS=$(curl -sf "$CAMELEER_SERVER_URL/api/v1/bootstrap/$CAMELEER_BOOTSTRAP_TOKEN") eval $(echo "$SECRETS" | jq -r '.secrets | to_entries[] | "export \(.key)=\(.value)"') unset CAMELEER_BOOTSTRAP_TOKEN fi ``` ### Migration Path 1. Implement endpoint; agent supports both (env var fallback if no bootstrap token) 2. `DeploymentExecutor` uses tokens for new deployments; existing unaffected 3. Remove env var secret injection from `buildEnvVars()` 4. Deprecation period for old agents --- ## Advantages Over Alternatives | Criterion | Env Vars (current) | Swarm Secrets | Vault | **Bootstrap Callback** | |-----------|:-:|:-:|:-:|:-:| | Secrets in `docker inspect` | Yes | No | No | **No** (only token, 5m TTL) | | External dependency | None | Swarm mode | Vault cluster | **None** | | All platforms | Yes | Swarm only | All (with Vault) | **Yes** | | Audit trail | None | None | Full | **Full** | | Tamper detection | None | None | Yes | **Yes** (410 on re-fetch) | | Rotation | Redeploy | Swarm rotation | Dynamic | **Auto on redeploy** | | Operational complexity | Minimal | Moderate | High | **Low-Moderate** | --- ## Disadvantages & Mitigations | Disadvantage | Severity | Mitigation | |---|---|---| | Startup dependency on server | Medium | Retry with backoff; server already SPOF for agents | | 1-5s startup latency | Low | Fetch during JVM init, overlaps with other startup | | Secrets in server memory | Medium | Already true (server reads from config); no new risk | | Container restart re-issue | Medium | `maxUses` matching restart policy; logged per-use | | Token visible in `docker inspect` | Low | Worthless after consumption; 5-min TTL | --- ## Recommendation ### Verdict: ⭐⭐⭐⭐½ (4.5/5) — Strongly Recommended | Criterion | Rating | Notes | |-----------|:---:|-------| | Security improvement | 5/5 | Eliminates secrets from docker inspect, adds tamper detection | | Implementation complexity | 4/5 | Clean, well-bounded; ~3 days server + 1 day agent | | Operational overhead | 4/5 | Server already SPOF for agents; no new failure mode | | Industry alignment | 5/5 | Matches Vault cubbyhole, Fly.io, OWASP "best" ranking | | Platform compatibility | 5/5 | Works on Docker, Swarm, K8s, DinD | **This is the recommended primary delivery mechanism** for Cameleer3 because: 1. **Natural fit** — agents already call back on startup (registration, SSE). One more HTTP call. 2. **Zero new dependencies** — no Vault, no Swarm mode, no CSI driver. 3. **Eliminates the `docker inspect` problem** — the #1 security gap today. 4. **Audit trail for free** — every fetch logged with deployment ID, container ID, timestamp. 5. **Multi-platform** — simple HTTP GET works everywhere. 6. **Incremental migration** — dual-mode agent, gradual rollout. ### Implementation Priority | Priority | Item | |----------|------| | P0 | `BootstrapTokenStore` + endpoint | | P0 | `DeploymentExecutor` integration | | P1 | Agent-side fetch in `cameleer3-common` | | P1 | Metrics + audit logging | | P2 | PG persistence for restart resilience | | P2 | Rate limiting on bootstrap endpoint | | P3 | mTLS between agent and server | ### Combines Well With - **Option 3 (#131)**: Encrypt secrets at rest in PG (AES-256-GCM) — complementary layer - **Option 2 (#TBD)**: Tmpfs file mount — callback delivers secrets, tmpfs stores them without env var exposure ### What NOT to Do - Don't build a full secrets manager (that's Vault's job if ever needed) - Don't use bootstrap token for ongoing auth (agent already gets JWT at registration) - Don't encrypt the bootstrap token itself (worthless after consumption; adds complexity) ### Sources - [HashiCorp Vault Response Wrapping](https://developer.hashicorp.com/vault/docs/concepts/response-wrapping) - [Vault Cubbyhole Auth Principles](https://www.hashicorp.com/en/blog/cubbyhole-authentication-principles) - [OWASP Secrets Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html) - [NIST SP 800-204 Microservice Security](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-204.pdf) - [NIST SP 800-207 Zero Trust Architecture](https://csrc.nist.gov/pubs/sp/800/207/final) - [CNCF Cloud Native Security Whitepaper](https://tag-security.cncf.io/community/resources/security-whitepaper/v1/cloud-native-security-whitepaper/) - [AWS IMDSv2 Security](https://aws.amazon.com/blogs/security/get-the-full-benefits-of-imdsv2-and-disable-imdsv1-across-your-aws-infrastructure/) - [Fly.io Secrets](https://fly.io/docs/apps/secrets/) - [Cloudflare PAL Container Identity Bootstrapping](https://blog.cloudflare.com/pal-a-container-identity-bootstrapping-tool/) - [K8s Bound ServiceAccount Tokens](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-bound-service-account-tokens)

claude added the feature security labels 2026-04-15 00:37:17 +02:00

claude referenced this issue

2026-04-15 00:37:56 +02:00

Epic: Secure secret delivery to provisioned containers #129

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cameleer/cameleer-server#135