Covers tenant isolation (1 tenant = 1 server instance), environment support (first-class agent property), ClickHouse partitioning (tenant → time → environment → application), PostgreSQL schema-per- tenant via JDBC currentSchema, and agent protocol changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
11 KiB
Multitenancy Architecture Design
Date: 2026-04-04 Status: Draft
Context
Cameleer3 Server is being integrated into a SaaS platform (cameleer-saas). The server must support multiple tenants sharing PostgreSQL and ClickHouse while guaranteeing strict data isolation. Each tenant gets their own cameleer3-server instance. Environments (dev/staging/prod) are a first-class concept within each tenant.
Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Tenant model | 1 customer = 1 tenant | SaaS customer isolation |
| Instance model | 1 tenant = 1 server instance | In-memory state (registry, catalog, SSE) is tenant-scoped |
| Environments | First-class, per-agent property | Agents belong to exactly 1 environment |
| PG isolation | Schema-per-tenant | No query changes needed; Flyway runs per-schema; JDBC currentSchema param |
| CH isolation | Shared DB, tenant_id column + partition key |
Already partially in place; tenant in partition key enables pruning + TTL |
| Agent auth | Per-tenant bootstrap token | SaaS shell provisions tokens; JWT includes tenant_id |
| User scope | Single tenant per user | Logto organizations handle user↔tenant mapping |
| Migration | Fresh install | No backward-compatibility migration needed |
Data Hierarchy
Tenant (customer org)
└─ Environment (dev, staging, prod)
└─ Application (order-service, payment-gateway)
└─ Agent Instance (pod-1, pod-2)
Architecture
Tenant "Acme" ──► cameleer3-server (TENANT_ID=acme)
├─ PG schema: tenant_acme
├─ CH writes: tenant_id='acme'
├─ Agents: env=dev, env=prod
└─ In-memory: registry, catalog, SSE
Tenant "Beta" ──► cameleer3-server (TENANT_ID=beta)
├─ PG schema: tenant_beta
├─ CH writes: tenant_id='beta'
└─ ...
Shared: PostgreSQL (multiple schemas) + ClickHouse (single DB, tenant_id partitioning)
Each server instance reads CAMELEER_TENANT_ID from its environment (default: "default"). This value is used for all ClickHouse reads/writes. The PG schema is set via ?currentSchema=tenant_{id} on the JDBC URL.
1. Agent Protocol Changes
Registration Payload
Add environmentId field:
{
"instanceId": "order-svc-pod-1",
"displayName": "order-svc-pod-1",
"applicationId": "order-service",
"environmentId": "dev",
"version": "1.0-SNAPSHOT",
"routeIds": ["route-orders"],
"capabilities": { "tracing": true, "replay": false }
}
environmentId defaults to "default" if omitted (backward compatibility with older agents).
Heartbeat Payload
Add environmentId (optional, for auto-heal after server restart):
{
"routeStates": { "route-orders": "Started" },
"capabilities": { "tracing": true },
"environmentId": "dev"
}
JWT Claims
Agent JWTs issued by the server include:
tenant— tenant ID (from server config)env— environment ID (from registration)group— application ID (existing)
The SaaS shell uses tenant + env claims to route agent traffic to the correct server instance.
2. Server Configuration
New environment variables:
| Variable | Default | Purpose |
|---|---|---|
CAMELEER_TENANT_ID |
default |
Tenant identifier for all CH data operations |
PG connection includes schema:
spring:
datasource:
url: jdbc:postgresql://pg:5432/cameleer?currentSchema=tenant_${CAMELEER_TENANT_ID:default}
Flyway runs against the configured schema automatically.
3. ClickHouse Schema Changes
Column Ordering Principle
All tables follow the ordering: tenant → time → environment → application → agent/route → specifics
This matches query patterns (most-filtered-first) and gives optimal sparse index data skipping.
Partitioning
All tables: PARTITION BY (tenant_id, toYYYYMM(timestamp)) (or toYYYYMM(bucket) for stats tables).
Benefits:
- Partition pruning by tenant (never scans other tenant's data)
- Partition pruning by month (time-range queries)
- Per-tenant TTL/retention (drop partitions)
Raw Tables
executions
CREATE TABLE executions (
tenant_id String DEFAULT 'default',
start_time DateTime64(3),
environment String DEFAULT 'default',
application_id String,
instance_id String,
-- ... existing columns ...
) ENGINE = ReplacingMergeTree()
PARTITION BY (tenant_id, toYYYYMM(start_time))
ORDER BY (tenant_id, start_time, environment, application_id, route_id, execution_id)
processor_executions
ORDER BY (tenant_id, start_time, environment, application_id, route_id, execution_id, seq)
PARTITION BY (tenant_id, toYYYYMM(start_time))
logs
ORDER BY (tenant_id, timestamp, environment, application, instance_id)
PARTITION BY (tenant_id, toYYYYMM(timestamp))
agent_metrics
ORDER BY (tenant_id, collected_at, environment, instance_id, metric_name)
PARTITION BY (tenant_id, toYYYYMM(collected_at))
route_diagrams
ORDER BY (tenant_id, created_at, environment, route_id, instance_id)
PARTITION BY (tenant_id, toYYYYMM(created_at))
agent_events
ORDER BY (tenant_id, timestamp, environment, instance_id)
PARTITION BY (tenant_id, toYYYYMM(timestamp))
usage_events (new column)
-- Add tenant_id (currently missing)
ORDER BY (tenant_id, timestamp, environment, username, normalized)
PARTITION BY (tenant_id, toYYYYMM(timestamp))
Materialized View Targets (stats_1m_*)
All follow: ORDER BY (tenant_id, bucket, environment, ...), PARTITION BY (tenant_id, toYYYYMM(bucket))
Example for stats_1m_route:
ORDER BY (tenant_id, bucket, environment, application_id, route_id)
PARTITION BY (tenant_id, toYYYYMM(bucket))
MV Source Queries
All materialized view SELECT statements include environment in GROUP BY:
SELECT
tenant_id,
toStartOfMinute(start_time) AS bucket,
environment,
application_id,
route_id,
countState() AS total_count,
...
FROM executions
GROUP BY tenant_id, bucket, environment, application_id, route_id
4. Java Code Changes
Configuration
New config class:
@ConfigurationProperties(prefix = "cameleer.tenant")
public class TenantProperties {
private String id = "default";
// getter/setter
}
Read from CAMELEER_TENANT_ID env var (Spring Boot relaxed binding: cameleer.tenant.id).
AgentInfo Record
Add environmentId field:
public record AgentInfo(
String instanceId,
String displayName,
String applicationId,
String environmentId, // NEW
String version,
List<String> routeIds,
Map<String, Object> capabilities,
AgentState state,
Instant registeredAt,
Instant lastHeartbeat,
Instant staleTransitionTime
) { ... }
ClickHouse Stores
All stores receive TenantProperties via constructor injection and use tenantProperties.getId() instead of hardcoded "default":
Pattern (applies to all stores):
// Before:
private static final String TENANT = "default";
// After:
private final String tenantId;
public ClickHouseStatsStore(JdbcTemplate jdbc, TenantProperties tenantProps) {
this.jdbc = jdbc;
this.tenantId = tenantProps.getId();
}
Files to update:
ClickHouseExecutionStore— writes and readsClickHouseLogStore— writes and readsClickHouseMetricsStore— add tenant_id to INSERTClickHouseMetricsQueryStore— add tenant_id filter to readsClickHouseStatsStore— replaceTENANTconstantClickHouseDiagramStore— replaceTENANTconstantClickHouseSearchIndex— replace hardcoded'default'ClickHouseAgentEventRepository— replaceTENANTconstantClickHouseUsageTracker— add tenant_id to writes and reads
Environment in Write Path
The ChunkAccumulator extracts environmentId from the agent registry and includes it in MergedExecution and ProcessorBatch:
// ChunkAccumulator.toMergedExecution():
AgentInfo agent = registryService.findById(instanceId);
String environment = agent != null ? agent.environmentId() : "default";
// include environment in MergedExecution
Registration Controller
Pass environmentId from registration payload to AgentRegistryService.register(). Default to "default" if absent.
Heartbeat Controller
On auto-heal, use environmentId from heartbeat payload (if present).
5. PostgreSQL — Schema-per-Tenant
No table schema changes. Isolation via JDBC currentSchema:
spring:
datasource:
url: jdbc:postgresql://pg:5432/cameleer?currentSchema=tenant_${CAMELEER_TENANT_ID:default}
Flyway creates tables in the tenant's schema on first startup. Each server instance manages its own schema independently.
The SaaS shell is responsible for:
- Creating the PG schema before starting a tenant's server instance
- Or the server creates it on startup via Flyway's
CREATE SCHEMA IF NOT EXISTS
6. UI Changes
Environment Filter
Add an environment filter dropdown to the sidebar header (next to the time range picker). Persisted in URL query params.
All data queries (executions, stats, logs, catalog) include environment filter when set. "All environments" is the default.
Catalog
The route catalog groups by environment → application → route. The sidebar tree becomes:
dev
└─ order-service
├─ route-orders (42)
└─ route-cbr (18)
prod
└─ order-service
├─ route-orders (1,204)
└─ route-cbr (890)
7. What the SaaS Shell Must Do
The cameleer3-server does NOT manage tenants. The SaaS shell (cameleer-saas) is responsible for:
- Provisioning: Create PG schema
tenant_{id}, generate per-tenant bootstrap token, start cameleer3-server container withCAMELEER_TENANT_ID={id}and PG URL pointing to the schema - Routing: Route agent and UI traffic to the correct server instance (by tenant)
- Lifecycle: Start/stop/upgrade tenant server instances
- Auth: Issue JWTs with tenant claims (via Logto), configure ForwardAuth
8. Scope Summary
| Area | Change | Complexity |
|---|---|---|
| Agent protocol (cameleer3-common) | Add environmentId to registration + heartbeat |
Low |
| Server config | TenantProperties bean, PG schema URL |
Low |
| ClickHouse schema | Add environment column, update ORDER BY/PARTITION BY |
Medium |
| ClickHouse stores (8 files) | Replace hardcoded "default" with injected tenant ID, add environment |
Medium |
| AgentInfo + registry | Add environmentId field |
Low |
| ChunkAccumulator + write pipeline | Include environment in data writes | Low |
| Controllers | Pass environment from registration/heartbeat | Low |
| UI | Environment filter dropdown, catalog grouping | Medium |
| PostgreSQL | No table changes (schema-per-tenant via JDBC URL) | None |
Verification
- Start server with
CAMELEER_TENANT_ID=acmeand PGcurrentSchema=tenant_acme - Register agent with
environmentId=dev - Verify ClickHouse writes contain
tenant_id='acme'andenvironment='dev' - Start second server with
CAMELEER_TENANT_ID=beta - Verify data from tenant "beta" is not visible to tenant "acme" queries
- Verify UI environment filter shows only selected environment's data