11 KiB
Phase 4: Observability Pipeline + Inbound Routing
Date: 2026-04-04 Status: Draft Depends on: Phase 3 (Runtime Orchestration + Environments) Gitea issue: #28
Context
Phase 3 delivered the managed Camel runtime: customers upload a JAR, the platform builds a container with the cameleer3 agent injected, and deploys it. The agent connects to cameleer3-server and sends traces, metrics, diagrams, and logs to ClickHouse. But there is no way for the user to see this data yet, and customer apps that expose HTTP endpoints are not reachable.
Phase 4 completes the loop: deploy an app, hit its endpoint, see the traces in the dashboard.
cameleer3-server already has the complete observability stack — ClickHouse schemas with tenant_id partitioning, full search/stats/diagram/log REST APIs, and a React SPA dashboard. Phase 4 is a wiring phase, not a build-from-scratch phase.
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Observability UI | Serve existing cameleer3-server React SPA via Traefik | Already built. SaaS management UI is Phase 9 — observability UI is not SaaS-specific. |
| API access | Traefik routes directly to cameleer3-server with forward-auth | No proxy layer needed. Forward-auth validates user, injects headers. Server API works as-is. |
| Server changes | None | Single-tenant Docker mode works out of the box. CAMELEER_TENANT_ID env var already supported. |
| Agent changes | None | Agent already sends applicationId, environmentId, connects to CAMELEER_EXPORT_ENDPOINT. |
| Tenant ID | Set CAMELEER_TENANT_ID to tenant slug in Docker Compose |
Tags ClickHouse data with the real tenant identity from day one. Avoids 'default' → real-id migration later. |
| Inbound routing | Optional exposedPort on deployment, Traefik labels on customer containers |
Thin feature. {app}.{env}.{tenant}.{domain} routing via Traefik Host rule. |
What's Already Working (Phase 3)
- Customer containers on the
cameleerbridge network - Agent configured:
CAMELEER_AUTH_TOKEN,CAMELEER_EXPORT_ENDPOINT=http://cameleer3-server:8081,CAMELEER_APPLICATION_ID,CAMELEER_ENVIRONMENT_ID - cameleer3-server writes traces/metrics/diagrams/logs to ClickHouse
- Traefik routes
/observe/*to cameleer3-server with forward-auth middleware - Forward-auth endpoint at
/auth/verifyvalidates JWT, returnsX-Tenant-Id,X-User-Id,X-User-Emailheaders
Component 1: Serve cameleer3-server Dashboard
Traefik Routing
Add Traefik labels to the cameleer3-server service in docker-compose.yml to serve the React SPA:
# Existing (Phase 3):
- traefik.http.routers.observe.rule=PathPrefix(`/observe`)
- traefik.http.routers.observe.middlewares=forward-auth
# New (Phase 4):
- traefik.http.routers.dashboard.rule=PathPrefix(`/dashboard`)
- traefik.http.routers.dashboard.middlewares=forward-auth
- traefik.http.services.dashboard.loadbalancer.server.port=8080
The cameleer3-server SPA is served from its own embedded web server. The SPA already calls the server's API endpoints at relative paths — the existing /observe/* Traefik route handles those requests with forward-auth.
Note: If the cameleer3-server SPA expects to be served from / rather than /dashboard, a Traefik StripPrefix middleware may be needed:
- traefik.http.middlewares.dashboard-strip.stripprefix.prefixes=/dashboard
- traefik.http.routers.dashboard.middlewares=forward-auth,dashboard-strip
This depends on how the cameleer3-server SPA is configured (base path). To be verified during implementation.
CAMELEER_TENANT_ID Configuration
Set CAMELEER_TENANT_ID on the cameleer3-server service so all ingested data is tagged with the real tenant slug:
cameleer3-server:
environment:
CAMELEER_TENANT_ID: ${CAMELEER_TENANT_SLUG:-default}
In the Docker single-tenant stack, this is set once during initial setup (e.g., CAMELEER_TENANT_SLUG=acme in .env). All ClickHouse data is then partitioned under this tenant ID.
Add CAMELEER_TENANT_SLUG to .env.example.
Component 2: Agent Connectivity Verification
New endpoint in cameleer-saas to check whether a deployed app's agent has successfully registered with cameleer3-server and is sending data.
API
GET /api/apps/{appId}/agent-status
Returns: 200 + AgentStatusResponse
AgentStatusResponse
public record AgentStatusResponse(
boolean registered,
String state, // ACTIVE, STALE, DEAD, UNKNOWN
Instant lastHeartbeat,
List<String> routeIds,
String applicationId,
String environmentId
) {}
Implementation
AgentStatusService in cameleer-saas calls cameleer3-server's agent registry API:
GET http://cameleer3-server:8081/api/v1/agents
This returns the list of registered agents. The service filters by applicationId matching the app's slug and environmentId matching the environment's slug.
If the cameleer3-server doesn't expose a public agent listing endpoint, the alternative is to query ClickHouse directly for recent data:
SELECT max(timestamp) as last_seen
FROM container_logs
WHERE app_id = ? AND deployment_id = ?
LIMIT 1
The preferred approach is the agent registry API. If it requires authentication, cameleer-saas can use the shared CAMELEER_AUTH_TOKEN as a machine token.
Integration with Deployment Status
After a deployment reaches RUNNING status (container healthy), the platform can poll agent-status to confirm the agent has registered. This could be surfaced as a sub-status:
RUNNING— container is healthyRUNNING_CONNECTED— container healthy + agent registered with serverRUNNING_DISCONNECTED— container healthy but agent not yet registered (timeout: 30s)
This is a nice-to-have enhancement on top of the basic agent-status endpoint.
Component 3: Inbound HTTP Routing for Customer Apps
Data Model
Add exposed_port column to the apps table:
ALTER TABLE apps ADD COLUMN exposed_port INTEGER;
This is the port the customer's Camel app listens on inside the container (e.g., 8080 for a Spring Boot REST app). When set, Traefik routes external traffic to this port.
API
PATCH /api/apps/{appId}/routing
Body: { "exposedPort": 8080 } // or null to disable routing
Returns: 200 + AppResponse
The routable URL is computed and included in AppResponse:
// In AppResponse, add:
String routeUrl // e.g., "http://order-svc.default.acme.localhost" or null if no routing
URL Pattern
{app-slug}.{env-slug}.{tenant-slug}.{domain}
Example: order-svc.default.acme.localhost
The {domain} comes from the DOMAIN env var (already in .env.example).
DockerRuntimeOrchestrator Changes
When starting a container for an app that has exposedPort set, add Traefik labels:
var labels = new HashMap<String, String>();
labels.put("traefik.enable", "true");
labels.put("traefik.http.routers." + containerName + ".rule",
"Host(`" + app.getSlug() + "." + env.getSlug() + "." + tenant.getSlug() + "." + domain + "`)");
labels.put("traefik.http.services." + containerName + ".loadbalancer.server.port",
String.valueOf(app.getExposedPort()));
These labels are set on the Docker container via docker-java's withLabels() on the CreateContainerCmd.
Traefik auto-discovers labeled containers on the cameleer network (already configured in traefik.yml with exposedByDefault: false).
StartContainerRequest Changes
Add optional fields to StartContainerRequest:
public record StartContainerRequest(
String imageRef,
String containerName,
String network,
Map<String, String> envVars,
long memoryLimitBytes,
int cpuShares,
int healthCheckPort,
Map<String, String> labels // NEW: Traefik routing labels
) {}
RuntimeConfig Addition
cameleer:
runtime:
domain: ${DOMAIN:localhost}
Component 4: End-to-End Connectivity Health
Startup Verification
On application startup, cameleer-saas verifies that cameleer3-server is reachable:
@EventListener(ApplicationReadyEvent.class)
public void verifyConnectivity() {
// HTTP GET http://cameleer3-server:8081/actuator/health
// Log result: "cameleer3-server connectivity: OK" or "FAILED: ..."
}
This is a best-effort check, not a hard dependency. If cameleer3-server is not yet running (e.g., starting up), the SaaS platform still starts. The check is logged for diagnostics.
ClickHouse Data Verification
Add a lightweight endpoint for checking whether a deployed app is producing observability data:
GET /api/apps/{appId}/observability-status
Returns: 200 + ObservabilityStatusResponse
public record ObservabilityStatusResponse(
boolean hasTraces,
boolean hasMetrics,
boolean hasDiagrams,
Instant lastTraceAt,
long traceCount24h
) {}
Implementation queries ClickHouse:
SELECT
count() > 0 as has_traces,
max(start_time) as last_trace,
count() as trace_count_24h
FROM executions
WHERE tenant_id = ? AND application_id = ? AND environment = ?
AND start_time > now() - INTERVAL 24 HOUR
This requires cameleer-saas to query ClickHouse directly (the clickHouseDataSource bean from Phase 3). The query is scoped by tenant + application + environment.
Docker Compose Changes
cameleer3-server labels (add dashboard route)
cameleer3-server:
environment:
CAMELEER_TENANT_ID: ${CAMELEER_TENANT_SLUG:-default}
labels:
# Existing:
- traefik.enable=true
- traefik.http.routers.observe.rule=PathPrefix(`/observe`)
- traefik.http.routers.observe.middlewares=forward-auth
- traefik.http.services.observe.loadbalancer.server.port=8080
# New:
- traefik.http.routers.dashboard.rule=PathPrefix(`/dashboard`)
- traefik.http.routers.dashboard.middlewares=forward-auth,dashboard-strip
- traefik.http.middlewares.dashboard-strip.stripprefix.prefixes=/dashboard
- traefik.http.services.dashboard.loadbalancer.server.port=8080
.env.example addition
CAMELEER_TENANT_SLUG=default
Database Migration
-- V010__add_exposed_port_to_apps.sql
ALTER TABLE apps ADD COLUMN exposed_port INTEGER;
New Configuration Properties
cameleer:
runtime:
domain: ${DOMAIN:localhost}
Verification Plan
- Deploy a sample Camel REST app with
exposedPort: 8080 curl http://order-svc.default.acme.localhosthits the Camel app- The Camel route processes the request
- cameleer3 agent captures the trace and sends to cameleer3-server
GET /api/apps/{appId}/agent-statusshowsregistered: true, state: ACTIVEGET /api/apps/{appId}/observability-statusshowshasTraces: true- Open
http://localhost/dashboard— cameleer3-server SPA loads - Traces visible in the dashboard for the deployed app
- Route topology graph shows the Camel route structure
CAMELEER_TENANT_IDis set to the tenant slug in ClickHouse data
What Phase 4 Does NOT Touch
- No changes to cameleer3-server code (works as-is for single-tenant Docker mode)
- No changes to the cameleer3 agent
- No new ClickHouse schemas (cameleer3-server manages its own)
- No SaaS management UI (Phase 9)
- No K8s-specific changes (Phase 5)