Files
cameleer-saas/docs/superpowers/specs/2026-04-04-phase-4-observability-pipeline.md
hsiegeln 41629f3290
All checks were successful
CI / build (push) Successful in 27s
CI / docker (push) Successful in 4s
docs: add Phase 4 Observability Pipeline + Inbound Routing spec
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 20:47:51 +02:00

11 KiB

Phase 4: Observability Pipeline + Inbound Routing

Date: 2026-04-04 Status: Draft Depends on: Phase 3 (Runtime Orchestration + Environments) Gitea issue: #28

Context

Phase 3 delivered the managed Camel runtime: customers upload a JAR, the platform builds a container with the cameleer3 agent injected, and deploys it. The agent connects to cameleer3-server and sends traces, metrics, diagrams, and logs to ClickHouse. But there is no way for the user to see this data yet, and customer apps that expose HTTP endpoints are not reachable.

Phase 4 completes the loop: deploy an app, hit its endpoint, see the traces in the dashboard.

cameleer3-server already has the complete observability stack — ClickHouse schemas with tenant_id partitioning, full search/stats/diagram/log REST APIs, and a React SPA dashboard. Phase 4 is a wiring phase, not a build-from-scratch phase.

Key Decisions

Decision Choice Rationale
Observability UI Serve existing cameleer3-server React SPA via Traefik Already built. SaaS management UI is Phase 9 — observability UI is not SaaS-specific.
API access Traefik routes directly to cameleer3-server with forward-auth No proxy layer needed. Forward-auth validates user, injects headers. Server API works as-is.
Server changes None Single-tenant Docker mode works out of the box. CAMELEER_TENANT_ID env var already supported.
Agent changes None Agent already sends applicationId, environmentId, connects to CAMELEER_EXPORT_ENDPOINT.
Tenant ID Set CAMELEER_TENANT_ID to tenant slug in Docker Compose Tags ClickHouse data with the real tenant identity from day one. Avoids 'default' → real-id migration later.
Inbound routing Optional exposedPort on deployment, Traefik labels on customer containers Thin feature. {app}.{env}.{tenant}.{domain} routing via Traefik Host rule.

What's Already Working (Phase 3)

  • Customer containers on the cameleer bridge network
  • Agent configured: CAMELEER_AUTH_TOKEN, CAMELEER_EXPORT_ENDPOINT=http://cameleer3-server:8081, CAMELEER_APPLICATION_ID, CAMELEER_ENVIRONMENT_ID
  • cameleer3-server writes traces/metrics/diagrams/logs to ClickHouse
  • Traefik routes /observe/* to cameleer3-server with forward-auth middleware
  • Forward-auth endpoint at /auth/verify validates JWT, returns X-Tenant-Id, X-User-Id, X-User-Email headers

Component 1: Serve cameleer3-server Dashboard

Traefik Routing

Add Traefik labels to the cameleer3-server service in docker-compose.yml to serve the React SPA:

# Existing (Phase 3):
- traefik.http.routers.observe.rule=PathPrefix(`/observe`)
- traefik.http.routers.observe.middlewares=forward-auth

# New (Phase 4):
- traefik.http.routers.dashboard.rule=PathPrefix(`/dashboard`)
- traefik.http.routers.dashboard.middlewares=forward-auth
- traefik.http.services.dashboard.loadbalancer.server.port=8080

The cameleer3-server SPA is served from its own embedded web server. The SPA already calls the server's API endpoints at relative paths — the existing /observe/* Traefik route handles those requests with forward-auth.

Note: If the cameleer3-server SPA expects to be served from / rather than /dashboard, a Traefik StripPrefix middleware may be needed:

- traefik.http.middlewares.dashboard-strip.stripprefix.prefixes=/dashboard
- traefik.http.routers.dashboard.middlewares=forward-auth,dashboard-strip

This depends on how the cameleer3-server SPA is configured (base path). To be verified during implementation.

CAMELEER_TENANT_ID Configuration

Set CAMELEER_TENANT_ID on the cameleer3-server service so all ingested data is tagged with the real tenant slug:

cameleer3-server:
  environment:
    CAMELEER_TENANT_ID: ${CAMELEER_TENANT_SLUG:-default}

In the Docker single-tenant stack, this is set once during initial setup (e.g., CAMELEER_TENANT_SLUG=acme in .env). All ClickHouse data is then partitioned under this tenant ID.

Add CAMELEER_TENANT_SLUG to .env.example.

Component 2: Agent Connectivity Verification

New endpoint in cameleer-saas to check whether a deployed app's agent has successfully registered with cameleer3-server and is sending data.

API

GET /api/apps/{appId}/agent-status
Returns: 200 + AgentStatusResponse

AgentStatusResponse

public record AgentStatusResponse(
    boolean registered,
    String state,           // ACTIVE, STALE, DEAD, UNKNOWN
    Instant lastHeartbeat,
    List<String> routeIds,
    String applicationId,
    String environmentId
) {}

Implementation

AgentStatusService in cameleer-saas calls cameleer3-server's agent registry API:

GET http://cameleer3-server:8081/api/v1/agents

This returns the list of registered agents. The service filters by applicationId matching the app's slug and environmentId matching the environment's slug.

If the cameleer3-server doesn't expose a public agent listing endpoint, the alternative is to query ClickHouse directly for recent data:

SELECT max(timestamp) as last_seen
FROM container_logs
WHERE app_id = ? AND deployment_id = ?
LIMIT 1

The preferred approach is the agent registry API. If it requires authentication, cameleer-saas can use the shared CAMELEER_AUTH_TOKEN as a machine token.

Integration with Deployment Status

After a deployment reaches RUNNING status (container healthy), the platform can poll agent-status to confirm the agent has registered. This could be surfaced as a sub-status:

  • RUNNING — container is healthy
  • RUNNING_CONNECTED — container healthy + agent registered with server
  • RUNNING_DISCONNECTED — container healthy but agent not yet registered (timeout: 30s)

This is a nice-to-have enhancement on top of the basic agent-status endpoint.

Component 3: Inbound HTTP Routing for Customer Apps

Data Model

Add exposed_port column to the apps table:

ALTER TABLE apps ADD COLUMN exposed_port INTEGER;

This is the port the customer's Camel app listens on inside the container (e.g., 8080 for a Spring Boot REST app). When set, Traefik routes external traffic to this port.

API

PATCH /api/apps/{appId}/routing
Body: { "exposedPort": 8080 }    // or null to disable routing
Returns: 200 + AppResponse

The routable URL is computed and included in AppResponse:

// In AppResponse, add:
String routeUrl    // e.g., "http://order-svc.default.acme.localhost" or null if no routing

URL Pattern

{app-slug}.{env-slug}.{tenant-slug}.{domain}

Example: order-svc.default.acme.localhost

The {domain} comes from the DOMAIN env var (already in .env.example).

DockerRuntimeOrchestrator Changes

When starting a container for an app that has exposedPort set, add Traefik labels:

var labels = new HashMap<String, String>();
labels.put("traefik.enable", "true");
labels.put("traefik.http.routers." + containerName + ".rule",
    "Host(`" + app.getSlug() + "." + env.getSlug() + "." + tenant.getSlug() + "." + domain + "`)");
labels.put("traefik.http.services." + containerName + ".loadbalancer.server.port",
    String.valueOf(app.getExposedPort()));

These labels are set on the Docker container via docker-java's withLabels() on the CreateContainerCmd.

Traefik auto-discovers labeled containers on the cameleer network (already configured in traefik.yml with exposedByDefault: false).

StartContainerRequest Changes

Add optional fields to StartContainerRequest:

public record StartContainerRequest(
    String imageRef,
    String containerName,
    String network,
    Map<String, String> envVars,
    long memoryLimitBytes,
    int cpuShares,
    int healthCheckPort,
    Map<String, String> labels       // NEW: Traefik routing labels
) {}

RuntimeConfig Addition

cameleer:
  runtime:
    domain: ${DOMAIN:localhost}

Component 4: End-to-End Connectivity Health

Startup Verification

On application startup, cameleer-saas verifies that cameleer3-server is reachable:

@EventListener(ApplicationReadyEvent.class)
public void verifyConnectivity() {
    // HTTP GET http://cameleer3-server:8081/actuator/health
    // Log result: "cameleer3-server connectivity: OK" or "FAILED: ..."
}

This is a best-effort check, not a hard dependency. If cameleer3-server is not yet running (e.g., starting up), the SaaS platform still starts. The check is logged for diagnostics.

ClickHouse Data Verification

Add a lightweight endpoint for checking whether a deployed app is producing observability data:

GET /api/apps/{appId}/observability-status
Returns: 200 + ObservabilityStatusResponse
public record ObservabilityStatusResponse(
    boolean hasTraces,
    boolean hasMetrics,
    boolean hasDiagrams,
    Instant lastTraceAt,
    long traceCount24h
) {}

Implementation queries ClickHouse:

SELECT
    count() > 0 as has_traces,
    max(start_time) as last_trace,
    count() as trace_count_24h
FROM executions
WHERE tenant_id = ? AND application_id = ? AND environment = ?
  AND start_time > now() - INTERVAL 24 HOUR

This requires cameleer-saas to query ClickHouse directly (the clickHouseDataSource bean from Phase 3). The query is scoped by tenant + application + environment.

Docker Compose Changes

cameleer3-server labels (add dashboard route)

cameleer3-server:
  environment:
    CAMELEER_TENANT_ID: ${CAMELEER_TENANT_SLUG:-default}
  labels:
    # Existing:
    - traefik.enable=true
    - traefik.http.routers.observe.rule=PathPrefix(`/observe`)
    - traefik.http.routers.observe.middlewares=forward-auth
    - traefik.http.services.observe.loadbalancer.server.port=8080
    # New:
    - traefik.http.routers.dashboard.rule=PathPrefix(`/dashboard`)
    - traefik.http.routers.dashboard.middlewares=forward-auth,dashboard-strip
    - traefik.http.middlewares.dashboard-strip.stripprefix.prefixes=/dashboard
    - traefik.http.services.dashboard.loadbalancer.server.port=8080

.env.example addition

CAMELEER_TENANT_SLUG=default

Database Migration

-- V010__add_exposed_port_to_apps.sql
ALTER TABLE apps ADD COLUMN exposed_port INTEGER;

New Configuration Properties

cameleer:
  runtime:
    domain: ${DOMAIN:localhost}

Verification Plan

  1. Deploy a sample Camel REST app with exposedPort: 8080
  2. curl http://order-svc.default.acme.localhost hits the Camel app
  3. The Camel route processes the request
  4. cameleer3 agent captures the trace and sends to cameleer3-server
  5. GET /api/apps/{appId}/agent-status shows registered: true, state: ACTIVE
  6. GET /api/apps/{appId}/observability-status shows hasTraces: true
  7. Open http://localhost/dashboard — cameleer3-server SPA loads
  8. Traces visible in the dashboard for the deployed app
  9. Route topology graph shows the Camel route structure
  10. CAMELEER_TENANT_ID is set to the tenant slug in ClickHouse data

What Phase 4 Does NOT Touch

  • No changes to cameleer3-server code (works as-is for single-tenant Docker mode)
  • No changes to the cameleer3 agent
  • No new ClickHouse schemas (cameleer3-server manages its own)
  • No SaaS management UI (Phase 9)
  • No K8s-specific changes (Phase 5)