Multitenancy: tenant isolation + environment support #123

Closed
opened 2026-04-04 14:37:38 +02:00 by claude · 0 comments
Owner

Context

Cameleer3 Server is being integrated into a SaaS platform (cameleer-saas). The server must support multiple tenants sharing PostgreSQL and ClickHouse while guaranteeing strict data isolation. Each tenant gets their own cameleer3-server instance. Environments (dev/staging/prod) are a first-class concept within each tenant.

Design spec: docs/superpowers/specs/2026-04-04-multitenancy-design.md

Data Hierarchy

Tenant (customer org)  →  1 server instance
  └─ Environment (dev, staging, prod)
       └─ Application (order-service, payment-gateway)
            └─ Agent Instance (pod-1, pod-2)

Key Decisions

Decision Choice
Instance model 1 tenant = 1 server instance
PG isolation Schema-per-tenant (JDBC currentSchema)
CH isolation Shared DB, tenant_id in partition key
Environments First-class agent property, convention-based
Agent auth Per-tenant bootstrap token, JWT includes tenant claim
Column ordering tenant → time → environment → application → specifics

Scope

1. Server Configuration

  • TenantProperties config bean reading CAMELEER_TENANT_ID (default: "default")
  • PG JDBC URL with ?currentSchema=tenant_{id}
  • Flyway runs per-schema

2. Agent Protocol (cameleer3-common)

  • Add environmentId to registration payload (default: "default")
  • Add environmentId to heartbeat payload (for auto-heal)
  • Add tenant + env claims to agent JWT

3. Agent Registry

  • Add environmentId to AgentInfo record
  • Pass environment through registration + heartbeat + auto-heal paths

4. ClickHouse Schema (fresh install, no migration)

  • Add environment column to all tables (DEFAULT 'default')
  • Update ORDER BY: (tenant_id, timestamp, environment, application_id, ...)
  • Update PARTITION BY: (tenant_id, toYYYYMM(timestamp))
  • Update all materialized view GROUP BY to include environment
  • Add tenant_id column to usage_events table

5. ClickHouse Stores (8 files)

  • ClickHouseExecutionStore — replace hardcoded 'default' with injected tenant ID, add environment
  • ClickHouseLogStore — same
  • ClickHouseMetricsStore — add tenant_id to INSERT (currently missing)
  • ClickHouseMetricsQueryStore — add tenant_id filter to reads (currently missing)
  • ClickHouseStatsStore — replace TENANT constant with injected value, add environment filter
  • ClickHouseDiagramStore — replace TENANT constant
  • ClickHouseSearchIndex — replace hardcoded 'default'
  • ClickHouseAgentEventRepository — replace TENANT constant
  • ClickHouseUsageTracker — add tenant_id to writes and reads

6. Write Pipeline

  • ChunkAccumulator — extract environmentId from agent registry, include in MergedExecution and ProcessorBatch
  • MergedExecution — add environment field
  • ProcessorBatch — add environment field
  • BufferedLogEntry — add environment field

7. UI

  • Environment filter dropdown in sidebar/header
  • Catalog grouped by environment → application → route
  • All data queries pass environment filter when set
  • "All environments" as default view

Current State (Audit)

What exists:

  • ClickHouse tables already have tenant_id columns (DEFAULT 'default')
  • Materialized views already GROUP BY tenant_id
  • Write paths for executions/processors pass tenantId (always "default")

What's missing:

  • No environment column anywhere
  • All reads hardcode tenant_id = 'default'
  • usage_events table has no tenant_id column
  • agent_metrics INSERT doesn't include tenant_id
  • PostgreSQL has no tenant isolation (solved by schema-per-tenant)
  • AgentInfo has no environmentId
  • No tenant configuration property

Verification

  1. Start server with CAMELEER_TENANT_ID=acme → PG connects to tenant_acme schema
  2. Register agent with environmentId=dev → CH writes contain tenant_id='acme', environment='dev'
  3. Start second server with CAMELEER_TENANT_ID=beta → data isolated
  4. UI environment filter shows only selected environment's data
## Context Cameleer3 Server is being integrated into a SaaS platform (cameleer-saas). The server must support multiple tenants sharing PostgreSQL and ClickHouse while guaranteeing strict data isolation. Each tenant gets their own cameleer3-server instance. Environments (dev/staging/prod) are a first-class concept within each tenant. **Design spec:** `docs/superpowers/specs/2026-04-04-multitenancy-design.md` ## Data Hierarchy ``` Tenant (customer org) → 1 server instance └─ Environment (dev, staging, prod) └─ Application (order-service, payment-gateway) └─ Agent Instance (pod-1, pod-2) ``` ## Key Decisions | Decision | Choice | |----------|--------| | Instance model | 1 tenant = 1 server instance | | PG isolation | Schema-per-tenant (JDBC `currentSchema`) | | CH isolation | Shared DB, `tenant_id` in partition key | | Environments | First-class agent property, convention-based | | Agent auth | Per-tenant bootstrap token, JWT includes `tenant` claim | | Column ordering | tenant → time → environment → application → specifics | ## Scope ### 1. Server Configuration - [ ] `TenantProperties` config bean reading `CAMELEER_TENANT_ID` (default: `"default"`) - [ ] PG JDBC URL with `?currentSchema=tenant_{id}` - [ ] Flyway runs per-schema ### 2. Agent Protocol (cameleer3-common) - [ ] Add `environmentId` to registration payload (default: `"default"`) - [ ] Add `environmentId` to heartbeat payload (for auto-heal) - [ ] Add `tenant` + `env` claims to agent JWT ### 3. Agent Registry - [ ] Add `environmentId` to `AgentInfo` record - [ ] Pass environment through registration + heartbeat + auto-heal paths ### 4. ClickHouse Schema (fresh install, no migration) - [ ] Add `environment` column to all tables (DEFAULT `'default'`) - [ ] Update ORDER BY: `(tenant_id, timestamp, environment, application_id, ...)` - [ ] Update PARTITION BY: `(tenant_id, toYYYYMM(timestamp))` - [ ] Update all materialized view GROUP BY to include `environment` - [ ] Add `tenant_id` column to `usage_events` table ### 5. ClickHouse Stores (8 files) - [ ] `ClickHouseExecutionStore` — replace hardcoded `'default'` with injected tenant ID, add environment - [ ] `ClickHouseLogStore` — same - [ ] `ClickHouseMetricsStore` — add tenant_id to INSERT (currently missing) - [ ] `ClickHouseMetricsQueryStore` — add tenant_id filter to reads (currently missing) - [ ] `ClickHouseStatsStore` — replace `TENANT` constant with injected value, add environment filter - [ ] `ClickHouseDiagramStore` — replace `TENANT` constant - [ ] `ClickHouseSearchIndex` — replace hardcoded `'default'` - [ ] `ClickHouseAgentEventRepository` — replace `TENANT` constant - [ ] `ClickHouseUsageTracker` — add tenant_id to writes and reads ### 6. Write Pipeline - [ ] `ChunkAccumulator` — extract `environmentId` from agent registry, include in `MergedExecution` and `ProcessorBatch` - [ ] `MergedExecution` — add `environment` field - [ ] `ProcessorBatch` — add `environment` field - [ ] `BufferedLogEntry` — add `environment` field ### 7. UI - [ ] Environment filter dropdown in sidebar/header - [ ] Catalog grouped by environment → application → route - [ ] All data queries pass environment filter when set - [ ] "All environments" as default view ## Current State (Audit) **What exists:** - ClickHouse tables already have `tenant_id` columns (DEFAULT `'default'`) - Materialized views already GROUP BY `tenant_id` - Write paths for executions/processors pass `tenantId` (always `"default"`) **What's missing:** - No `environment` column anywhere - All reads hardcode `tenant_id = 'default'` - `usage_events` table has no `tenant_id` column - `agent_metrics` INSERT doesn't include `tenant_id` - PostgreSQL has no tenant isolation (solved by schema-per-tenant) - `AgentInfo` has no `environmentId` - No tenant configuration property ## Verification 1. Start server with `CAMELEER_TENANT_ID=acme` → PG connects to `tenant_acme` schema 2. Register agent with `environmentId=dev` → CH writes contain `tenant_id='acme'`, `environment='dev'` 3. Start second server with `CAMELEER_TENANT_ID=beta` → data isolated 4. UI environment filter shows only selected environment's data
Sign in to join this conversation.