Rename Java packages from com.cameleer3 to com.cameleer, module directories from cameleer3-* to cameleer-*, and all references throughout workflows, Dockerfiles, docs, migrations, and pom.xml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.5 KiB
Container Startup Log Capture
Capture Docker container stdout/stderr from the moment a container starts until the Cameleer agent inside fully registers (SSE connection established). Stores logs in ClickHouse for display in the deployment view and general log search.
Problem
When a deployed application crashes during startup — before the Cameleer agent can connect and send logs via the normal ingestion pipeline — all diagnostic output is lost. The container may be removed before anyone can inspect it, leaving operators blind to the root cause.
Solution
A ContainerLogForwarder component streams Docker log output in real-time for each managed container, batches lines, and flushes them to the existing ClickHouse logs table with source = 'container'. Capture stops when the agent establishes its SSE connection, at which point the agent's own log pipeline takes over.
Architecture
Core Interface Extension
Extend RuntimeOrchestrator (core module) with three new methods:
// in RuntimeOrchestrator.java
void startLogCapture(String containerId, String appSlug, String envSlug, String tenantId);
void stopLogCapture(String containerId);
void stopLogCaptureByApp(String appSlug, String envSlug);
DisabledRuntimeOrchestrator implements these as no-ops. DockerRuntimeOrchestrator delegates to ContainerLogForwarder.
ContainerLogForwarder
Package: com.cameleer.server.app.runtime (Docker-specific, alongside DockerRuntimeOrchestrator, DockerEventMonitor, etc.)
Responsibilities:
- Manages active capture sessions in a
ConcurrentHashMap<String, CaptureSession>keyed by container ID - Each
CaptureSessionholds: containerId, appSlug, envSlug, tenantId, aFuture<?>for the streaming thread, and a buffer of pending log lines - Uses a bounded thread pool (fixed size ~10 threads)
Streaming logic:
- Calls
dockerClient.logContainerCmd(containerId).withFollowStream(true).withStdOut(true).withStdErr(true).withTimestamps(true) - Callback
onNext(Frame)appends to an in-memory buffer - Every ~2 seconds (or every 50 lines, whichever comes first), flushes the buffer to ClickHouse via
ClickHouseLogStore.insertBufferedBatch()— constructsBufferedLogEntryrecords withsource = "container", the deployment's app/env/tenant metadata, and container name asinstanceId - On
onComplete()(container stopped) oronError()— final flush, remove session from map
Safety:
- Max capture duration: 5 minutes. A scheduled cleanup (every 30s) stops sessions exceeding this limit.
@PreDestroycleanup: stop all active captures on server shutdown.
ClickHouse Field Mapping
Uses the existing logs table. No schema changes required.
| Field | Value |
|---|---|
source |
'container' |
application |
appSlug from deployment |
environment |
envSlug from deployment |
tenant_id |
tenantId from deployment |
instance_id |
containerName (e.g., prod-orderservice-0) |
timestamp |
Parsed from Docker timestamp prefix |
message |
Log line content (after timestamp) |
level |
Inferred by regex (see below) |
logger_name |
Empty string (not parseable from raw stdout) |
thread_name |
Empty string |
stack_trace |
Empty string (stack traces appear as consecutive message lines) |
exchange_id |
Empty string |
mdc |
Empty map |
Log Level Inference
- Regex scan for common Java log patterns:
ERROR,WARN,INFO,DEBUG,TRACE - Stack trace continuation lines (starting with
\tatorCaused by:) inherit ERROR level - Lines matching no pattern default to INFO
Integration Points
Start Capture — DeploymentExecutor
After each replica container is started (inside the replica loop):
orchestrator.startLogCapture(containerId, appSlug, envSlug, tenantId);
Stop Capture — SseConnectionManager.connect()
When an agent connects SSE, look up its AgentInfo from the registry to get application + environmentId:
orchestrator.stopLogCaptureByApp(application, environmentId);
Best-effort call — no-op if no capture exists for that app+env (e.g., non-Docker agent).
Stop Capture — Container Death
DockerEventMonitor handles die/oom events. After updating replica state:
orchestrator.stopLogCapture(containerId);
Triggers final flush of buffered lines before cleanup.
Stop Capture — Deployment Failure Cleanup
No extra code needed. When DeploymentExecutor stops/removes containers on health check failure, the Docker die event flows through DockerEventMonitor which calls stopLogCapture. The event monitor path handles it.
UI Changes
1. Deployment Startup Log Panel
A collapsible log panel below the DeploymentProgress component in the deployment detail view.
Data source: Queries /api/v1/logs?application={appSlug}&environment={envSlug}&source=container&from={deployCreatedAt}
Polling behavior:
- Auto-refreshes every 3 seconds while deployment status is STARTING
- Stops polling when status reaches RUNNING or FAILED
- Manual refresh button available in all states
Status indicator:
- Green "live" badge + "polling every 3s" text while STARTING
- Red "stopped" badge when FAILED
- No badge when RUNNING (panel remains visible with historical startup logs)
Layout: Uses existing LogViewer component from @cameleer/design-system and shared log panel styles from ui/src/styles/log-panel.module.css.
2. Source Badge in Log Views
Everywhere logs are displayed (AgentInstance page, LogTab, general log search), each log line gets a small source badge:
container— slate/gray badgeapp— green badgeagent— existing behavior
The source field already exists in LogEntryResponse. This is a rendering-only change in the LogViewer or its wrapper.
3. Source Filter Update
The log toolbar source filter (currently App vs Agent) adds Container as a third option. The backend /api/v1/logs endpoint already accepts source as a query parameter — no backend change needed for filtering.
Edge Cases
Multi-replica: Each replica gets its own capture session keyed by container ID. instance_id in ClickHouse is the container name (e.g., prod-orderservice-0). stopLogCaptureByApp() stops all sessions for that app+env pair.
Server restart during capture: Active sessions are in-memory and lost on restart. Not a problem — containers likely restart too (Docker restart policy), and new captures start when DeploymentExecutor runs again. Already-flushed logs survive in ClickHouse.
Container produces no output: Follow stream stays open but idle (parked thread, no CPU cost). Cleaned up by the 5-minute timeout or container death.
Rapid redeployment: Old container dies -> stopLogCapture(oldContainerId). New container starts -> startLogCapture(newContainerId, ...). Different container IDs, no conflict.
Log overlap: When the agent connects and starts sending source='app' logs, there may be a brief overlap with source='container' logs for the same timeframe. Both are shown with source badges. Users can filter by source if needed.
Files Changed
Backend — New
| File | Description |
|---|---|
app/runtime/ContainerLogForwarder.java |
Docker log streaming, buffering, ClickHouse flush |
Backend — Modified
| File | Change |
|---|---|
core/runtime/RuntimeOrchestrator.java |
Add 3 log capture methods to interface |
app/runtime/DockerRuntimeOrchestrator.java |
Implement log capture methods, delegate to ContainerLogForwarder |
app/runtime/DisabledRuntimeOrchestrator.java |
No-op implementations of new methods |
app/runtime/DeploymentExecutor.java |
Call startLogCapture() after container start |
app/agent/SseConnectionManager.java |
Call stopLogCaptureByApp() on SSE connect |
app/runtime/DockerEventMonitor.java |
Call stopLogCapture() on die/oom events |
app/runtime/RuntimeOrchestratorAutoConfig.java |
Wire ContainerLogForwarder into DockerRuntimeOrchestrator |
Frontend — Modified
| File | Change |
|---|---|
ui/src/pages/AppsTab/AppsTab.tsx |
Add startup log panel below DeploymentProgress |
ui/src/api/queries/logs.ts |
Hook for deployment startup logs query |
| Log display components | Add source badge rendering |
| Log toolbar | Add Container to source filter options |
No Changes
| File | Reason |
|---|---|
ClickHouse init.sql |
Existing logs table with source column is sufficient |
LogQueryController.java |
Already accepts source filter parameter |
ClickHouseLogStore.java |
Already writes source field from log entries |