173 lines
8.5 KiB
Markdown
173 lines
8.5 KiB
Markdown
|
|
# Camel Operations Platform - System Design Document (MVP)
|
||
|
|
|
||
|
|
**Status:** Draft / MVP Definition
|
||
|
|
**Target Audience:** Enterprise IT, DevOps, Integration Architects
|
||
|
|
**Date:** 2026-02-27
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Executive Summary
|
||
|
|
|
||
|
|
### Vision
|
||
|
|
To provide a unified, "Day 2 Operations" platform for Apache Camel that bridges the gap between modern cloud-native practices (GitOps, Kubernetes) and enterprise on-premise requirements (Zero Trust, Data Sovereignty).
|
||
|
|
|
||
|
|
### Problem Statement
|
||
|
|
Enterprises heavily rely on Apache Camel for integration but lack a cohesive operational layer. Existing solutions are either legacy (heavyweight ESBs), lack deep Camel visibility (generic APMs), or require complex DIY Kubernetes management.
|
||
|
|
|
||
|
|
### Key Value Propositions
|
||
|
|
* **"Managed Appliance" Experience:** A single-binary installer that turns any Linux host into a managed Camel runtime (embedded K3s), removing K8s complexity from the developer.
|
||
|
|
* **Zero Trust Architecture:** The runtime connects outbound-only to the SaaS Control Plane via a reverse tunnel. No inbound firewall ports required.
|
||
|
|
* **Camel-Native Observability:** Deep introspection into Camel Routes, Exchanges, and Message bodies, superior to generic HTTP tracing.
|
||
|
|
* **GitOps from Day 0:** All configurations and deployments are driven by Git state, ensuring auditability and rollback capabilities.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. High-Level Architecture
|
||
|
|
|
||
|
|
The architecture follows a hybrid model: a centralized SaaS **Control Plane** for management and visibility, and distributed **Runners** deployed in customer environments (On-Prem, Private Cloud, Edge) to execute workloads.
|
||
|
|
|
||
|
|
### Architecture Diagram Description
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
graph TD
|
||
|
|
subgraph "SaaS Control Plane"
|
||
|
|
UI[Web Console]
|
||
|
|
API[API Gateway]
|
||
|
|
TunnelServer[Tunnel Server]
|
||
|
|
TSDB[(Time-Series DB)]
|
||
|
|
RelDB[(PostgreSQL)]
|
||
|
|
end
|
||
|
|
|
||
|
|
subgraph "Customer Environment (The Runner)"
|
||
|
|
TunnelClient[Tunnel Client]
|
||
|
|
K3s[Embedded K3s Cluster]
|
||
|
|
|
||
|
|
subgraph "Camel Workload Pod"
|
||
|
|
CamelApp[Camel Application]
|
||
|
|
Sidecar[Observability Agent]
|
||
|
|
end
|
||
|
|
|
||
|
|
Build[Build Controller (Kaniko)]
|
||
|
|
Registry[Local Registry]
|
||
|
|
end
|
||
|
|
|
||
|
|
User[User/DevOps] --> UI
|
||
|
|
Git[Git Provider] --Webhook--> API
|
||
|
|
|
||
|
|
%% Connections
|
||
|
|
TunnelClient -- Outbound mTLS (WebSocket/gRPC) --> TunnelServer
|
||
|
|
TunnelServer --> API
|
||
|
|
|
||
|
|
CamelApp -- Traces/Metrics --> Sidecar
|
||
|
|
Sidecar -- Telemetry --> TunnelClient
|
||
|
|
TunnelClient -- Telemetry --> TSDB
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Component Deep Dive
|
||
|
|
|
||
|
|
### 3.1 The Runner (Managed Appliance)
|
||
|
|
|
||
|
|
The Runner is a self-contained runtime environment installed on customer infrastructure. It abstracts the complexity of Kubernetes.
|
||
|
|
|
||
|
|
* **Core Engine:** **K3s** (Lightweight Kubernetes). Selected for its single-binary footprint and low resource usage.
|
||
|
|
* **Ingress Layer:** **Traefik**. Handles internal routing for deployed Camel services.
|
||
|
|
* **Connectivity:** **Reverse Tunnel Client**. Establishes a persistent, multiplexed connection (using technologies like WebSocket or HTTP/2) to the Control Plane. This tunnel carries:
|
||
|
|
* Control commands (Deploy, Restart, Scale).
|
||
|
|
* Telemetry data (Logs, Traces, Metrics).
|
||
|
|
* Proxy traffic (viewing internal Camel endpoints from SaaS UI).
|
||
|
|
* **Build System:**
|
||
|
|
* **Kaniko:** Performs in-cluster container builds from source code without requiring a Docker daemon.
|
||
|
|
* **Local Registry:** A lightweight internal container registry to store built images before deployment.
|
||
|
|
* **Storage:** **Rancher Local Path Provisioner**. Uses node-local storage for ephemeral build artifacts and durable message buffering.
|
||
|
|
* **Security:**
|
||
|
|
* **Namespace Isolation:** Each "Environment" (Dev, Prod) maps to a K8s Namespace.
|
||
|
|
* **Network Policies:** Deny-all by default; allow only whitelisted egress.
|
||
|
|
|
||
|
|
### 3.2 The Control Plane (SaaS)
|
||
|
|
|
||
|
|
The central brain of the platform.
|
||
|
|
|
||
|
|
* **Tech Stack:**
|
||
|
|
* **Backend:** Go (Golang) for high-performance concurrent handling of tunnel connections and telemetry ingestion.
|
||
|
|
* **Frontend:** React / Next.js for a responsive, dashboard-like experience.
|
||
|
|
* **Data Stores:**
|
||
|
|
* **Relational (PostgreSQL):** Users, Organizations, Projects, Environment configurations, RBAC policies.
|
||
|
|
* **Telemetry (ClickHouse or TimescaleDB):** High-volume storage for Camel traces (Exchanges), logs, and metrics. ClickHouse is preferred for query performance on massive trace datasets.
|
||
|
|
* **GitOps Engine:**
|
||
|
|
* Monitors connected Git repositories.
|
||
|
|
* Generates Kubernetes manifests (Deployment, Service, ConfigMap) based on `camel-context.xml` or Route definitions.
|
||
|
|
* Syncs desired state to the Runner via the Tunnel.
|
||
|
|
|
||
|
|
### 3.3 The Observability Stack
|
||
|
|
|
||
|
|
Tailored specifically for Apache Camel integration patterns.
|
||
|
|
|
||
|
|
* **Camel Tracer (Java Agent / Sidecar):**
|
||
|
|
* Attaches to the Camel runtime (Quarkus, Spring Boot, Karaf).
|
||
|
|
* Interceps `ExchangeCreated`, `ExchangeCompleted`, `ExchangeFailed` events.
|
||
|
|
* **Smart Sampling:** Configurable sampling rates to balance overhead vs. visibility.
|
||
|
|
* **Body Capture:** secure redaction (regex masking) of sensitive PII in message bodies before transmission.
|
||
|
|
* **Message Replay Mechanism:**
|
||
|
|
* The Control Plane stores metadata of failed exchanges (Headers, Body blobs).
|
||
|
|
* **Action:** User clicks "Replay" in UI.
|
||
|
|
* **Flow:** Control Plane sends "Replay Command" -> Tunnel -> Runner -> Observability Sidecar.
|
||
|
|
* **Execution:** The Sidecar re-injects the message into the specific Camel Endpoint or Route start.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Data Flow
|
||
|
|
|
||
|
|
### 4.1 Deployment Flow (GitOps)
|
||
|
|
1. **Commit:** Developer pushes code to Git repository.
|
||
|
|
2. **Webhook:** Git provider notifies Control Plane API.
|
||
|
|
3. **Instruction:** Control Plane determines which Runner is target, sends "Build Job" instruction via Tunnel.
|
||
|
|
4. **Pull & Build:** Runner's Build Controller (Kaniko) pulls source, builds container image, pushes to Local Registry.
|
||
|
|
5. **Deploy:** Runner applies updated K8s manifests. K3s pulls image from Local Registry and rolls out the new Pod.
|
||
|
|
6. **Status:** Runner reports `DeploymentStatus: Ready` back to Control Plane.
|
||
|
|
|
||
|
|
### 4.2 Telemetry Flow (Observability)
|
||
|
|
1. **Intercept:** Camel App processes a message. Sidecar captures the trace data (Route ID, Node ID, Duration, Failure/Success, Payload).
|
||
|
|
2. **Buffer:** Sidecar buffers traces in memory (ring buffer) to handle bursts.
|
||
|
|
3. **Transmit:** Batched traces are sent to the local Runner Agent (Tunnel Client).
|
||
|
|
4. **Tunnel:** Data flows upstream through the mTLS tunnel to the Control Plane Ingestor.
|
||
|
|
5. **Persist:** Ingestor validates and writes data to ClickHouse/TimescaleDB.
|
||
|
|
6. **Visualize:** User queries the "Route Diagram" in the UI; backend fetches aggregation from DB.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. Security Model
|
||
|
|
|
||
|
|
### Zero Trust & Connectivity
|
||
|
|
* **No Inbound Ports:** The Runner requires strictly **outbound-only** HTTPS (443) access to the Control Plane.
|
||
|
|
* **Authentication:**
|
||
|
|
* Runner registration uses a short-lived **One-Time Token (OTT)** generated in the UI.
|
||
|
|
* Upon first connect, the Runner performs a certificate exchange (CSR) to obtain a unique mTLS client certificate.
|
||
|
|
* **mTLS Tunnel:** All traffic between Runner and Control Plane is encrypted and mutually authenticated.
|
||
|
|
|
||
|
|
### Secrets Management
|
||
|
|
* **At Rest:** Secrets (API keys, DB passwords) are encrypted in the Control Plane database (AES-256).
|
||
|
|
* **In Transit:** Delivered to the Runner only when needed for deployment.
|
||
|
|
* **On Runner:** Stored as K8s Secrets, mounted as environment variables or files into the Camel Pods.
|
||
|
|
|
||
|
|
### Multi-Tenancy
|
||
|
|
* **Control Plane:** Logical isolation (Row-Level Security) ensures customers cannot see each other's data.
|
||
|
|
* **Runner:** Designed as single-tenant per install (usually), but supports multi-environment isolation via Namespaces if shared by multiple teams within one enterprise.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Future Proofing & Scalability
|
||
|
|
|
||
|
|
### High Availability (HA)
|
||
|
|
* **Control Plane:** Stateless microservices, autoscaled on public cloud (AWS/GCP/Azure). DBs run in clustered mode.
|
||
|
|
* **Runner (MVP):** Single-node K3s.
|
||
|
|
* **Runner (Future):** Multi-node K3s cluster support. The "Appliance" installer will support joining additional nodes for worker capacity and control plane redundancy.
|
||
|
|
|
||
|
|
### Scaling Strategy
|
||
|
|
* **Horizontal Pod Autoscaling (HPA):** The Runner will support defining HPA rules (CPU/Memory based) for Camel workloads.
|
||
|
|
* **Partitioning:** The Telemetry store (ClickHouse) will be partitioned by Time and Customer ID to support years of retention.
|
||
|
|
|
||
|
|
---
|
||
|
|
**Prepared by:** Subagent (OpenClaw)
|