Files
cameleer-server/docs/superpowers/specs/2026-04-14-container-startup-log-capture-design.md
hsiegeln cb3ebfea7c
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
chore: rename cameleer3 to cameleer
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:42 +02:00

8.5 KiB

Container Startup Log Capture

Capture Docker container stdout/stderr from the moment a container starts until the Cameleer agent inside fully registers (SSE connection established). Stores logs in ClickHouse for display in the deployment view and general log search.

Problem

When a deployed application crashes during startup — before the Cameleer agent can connect and send logs via the normal ingestion pipeline — all diagnostic output is lost. The container may be removed before anyone can inspect it, leaving operators blind to the root cause.

Solution

A ContainerLogForwarder component streams Docker log output in real-time for each managed container, batches lines, and flushes them to the existing ClickHouse logs table with source = 'container'. Capture stops when the agent establishes its SSE connection, at which point the agent's own log pipeline takes over.

Architecture

Core Interface Extension

Extend RuntimeOrchestrator (core module) with three new methods:

// in RuntimeOrchestrator.java
void startLogCapture(String containerId, String appSlug, String envSlug, String tenantId);
void stopLogCapture(String containerId);
void stopLogCaptureByApp(String appSlug, String envSlug);

DisabledRuntimeOrchestrator implements these as no-ops. DockerRuntimeOrchestrator delegates to ContainerLogForwarder.

ContainerLogForwarder

Package: com.cameleer.server.app.runtime (Docker-specific, alongside DockerRuntimeOrchestrator, DockerEventMonitor, etc.)

Responsibilities:

  • Manages active capture sessions in a ConcurrentHashMap<String, CaptureSession> keyed by container ID
  • Each CaptureSession holds: containerId, appSlug, envSlug, tenantId, a Future<?> for the streaming thread, and a buffer of pending log lines
  • Uses a bounded thread pool (fixed size ~10 threads)

Streaming logic:

  • Calls dockerClient.logContainerCmd(containerId).withFollowStream(true).withStdOut(true).withStdErr(true).withTimestamps(true)
  • Callback onNext(Frame) appends to an in-memory buffer
  • Every ~2 seconds (or every 50 lines, whichever comes first), flushes the buffer to ClickHouse via ClickHouseLogStore.insertBufferedBatch() — constructs BufferedLogEntry records with source = "container", the deployment's app/env/tenant metadata, and container name as instanceId
  • On onComplete() (container stopped) or onError() — final flush, remove session from map

Safety:

  • Max capture duration: 5 minutes. A scheduled cleanup (every 30s) stops sessions exceeding this limit.
  • @PreDestroy cleanup: stop all active captures on server shutdown.

ClickHouse Field Mapping

Uses the existing logs table. No schema changes required.

Field Value
source 'container'
application appSlug from deployment
environment envSlug from deployment
tenant_id tenantId from deployment
instance_id containerName (e.g., prod-orderservice-0)
timestamp Parsed from Docker timestamp prefix
message Log line content (after timestamp)
level Inferred by regex (see below)
logger_name Empty string (not parseable from raw stdout)
thread_name Empty string
stack_trace Empty string (stack traces appear as consecutive message lines)
exchange_id Empty string
mdc Empty map

Log Level Inference

  • Regex scan for common Java log patterns: ERROR, WARN, INFO, DEBUG, TRACE
  • Stack trace continuation lines (starting with \tat or Caused by:) inherit ERROR level
  • Lines matching no pattern default to INFO

Integration Points

Start Capture — DeploymentExecutor

After each replica container is started (inside the replica loop):

orchestrator.startLogCapture(containerId, appSlug, envSlug, tenantId);

Stop Capture — SseConnectionManager.connect()

When an agent connects SSE, look up its AgentInfo from the registry to get application + environmentId:

orchestrator.stopLogCaptureByApp(application, environmentId);

Best-effort call — no-op if no capture exists for that app+env (e.g., non-Docker agent).

Stop Capture — Container Death

DockerEventMonitor handles die/oom events. After updating replica state:

orchestrator.stopLogCapture(containerId);

Triggers final flush of buffered lines before cleanup.

Stop Capture — Deployment Failure Cleanup

No extra code needed. When DeploymentExecutor stops/removes containers on health check failure, the Docker die event flows through DockerEventMonitor which calls stopLogCapture. The event monitor path handles it.

UI Changes

1. Deployment Startup Log Panel

A collapsible log panel below the DeploymentProgress component in the deployment detail view.

Data source: Queries /api/v1/logs?application={appSlug}&environment={envSlug}&source=container&from={deployCreatedAt}

Polling behavior:

  • Auto-refreshes every 3 seconds while deployment status is STARTING
  • Stops polling when status reaches RUNNING or FAILED
  • Manual refresh button available in all states

Status indicator:

  • Green "live" badge + "polling every 3s" text while STARTING
  • Red "stopped" badge when FAILED
  • No badge when RUNNING (panel remains visible with historical startup logs)

Layout: Uses existing LogViewer component from @cameleer/design-system and shared log panel styles from ui/src/styles/log-panel.module.css.

2. Source Badge in Log Views

Everywhere logs are displayed (AgentInstance page, LogTab, general log search), each log line gets a small source badge:

  • container — slate/gray badge
  • app — green badge
  • agent — existing behavior

The source field already exists in LogEntryResponse. This is a rendering-only change in the LogViewer or its wrapper.

3. Source Filter Update

The log toolbar source filter (currently App vs Agent) adds Container as a third option. The backend /api/v1/logs endpoint already accepts source as a query parameter — no backend change needed for filtering.

Edge Cases

Multi-replica: Each replica gets its own capture session keyed by container ID. instance_id in ClickHouse is the container name (e.g., prod-orderservice-0). stopLogCaptureByApp() stops all sessions for that app+env pair.

Server restart during capture: Active sessions are in-memory and lost on restart. Not a problem — containers likely restart too (Docker restart policy), and new captures start when DeploymentExecutor runs again. Already-flushed logs survive in ClickHouse.

Container produces no output: Follow stream stays open but idle (parked thread, no CPU cost). Cleaned up by the 5-minute timeout or container death.

Rapid redeployment: Old container dies -> stopLogCapture(oldContainerId). New container starts -> startLogCapture(newContainerId, ...). Different container IDs, no conflict.

Log overlap: When the agent connects and starts sending source='app' logs, there may be a brief overlap with source='container' logs for the same timeframe. Both are shown with source badges. Users can filter by source if needed.

Files Changed

Backend — New

File Description
app/runtime/ContainerLogForwarder.java Docker log streaming, buffering, ClickHouse flush

Backend — Modified

File Change
core/runtime/RuntimeOrchestrator.java Add 3 log capture methods to interface
app/runtime/DockerRuntimeOrchestrator.java Implement log capture methods, delegate to ContainerLogForwarder
app/runtime/DisabledRuntimeOrchestrator.java No-op implementations of new methods
app/runtime/DeploymentExecutor.java Call startLogCapture() after container start
app/agent/SseConnectionManager.java Call stopLogCaptureByApp() on SSE connect
app/runtime/DockerEventMonitor.java Call stopLogCapture() on die/oom events
app/runtime/RuntimeOrchestratorAutoConfig.java Wire ContainerLogForwarder into DockerRuntimeOrchestrator

Frontend — Modified

File Change
ui/src/pages/AppsTab/AppsTab.tsx Add startup log panel below DeploymentProgress
ui/src/api/queries/logs.ts Hook for deployment startup logs query
Log display components Add source badge rendering
Log toolbar Add Container to source filter options

No Changes

File Reason
ClickHouse init.sql Existing logs table with source column is sufficient
LogQueryController.java Already accepts source filter parameter
ClickHouseLogStore.java Already writes source field from log entries