Files
cameleer-server/docs/superpowers/specs/2026-04-11-infrastructure-endpoint-visibility-design.md
hsiegeln cb3ebfea7c
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
chore: rename cameleer3 to cameleer
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:42 +02:00

9.5 KiB

Infrastructure Endpoint Visibility

Date: 2026-04-11 Status: Approved Scope: cameleer-server + cameleer-saas


Problem

The server's admin section exposes PostgreSQL and ClickHouse diagnostic endpoints (connection strings, pool stats, active queries, table sizes, server versions). In standalone mode this is fine -- the admin user is the platform owner. In SaaS mode, tenant admins receive ADMIN role via OIDC, which grants them access to infrastructure internals they should not see.

Worse, the current endpoints have cross-tenant data leaks in shared-infra deployments:

  • GET /admin/database/queries returns pg_stat_activity which is database-wide, not schema-scoped. Tenant A can see and kill Tenant B's queries.
  • GET /admin/clickhouse/tables, /performance, /queries query system.tables, system.parts, and system.processes globally -- no tenant_id filtering.

Solution

Two complementary changes:

  1. Server: an explicit flag disables infrastructure endpoints entirely when the server is provisioned by the SaaS platform.
  2. SaaS: the vendor console gains its own infrastructure dashboard that queries shared PostgreSQL and ClickHouse directly, with per-tenant breakdown.

Part 1: Server -- Disable Infrastructure Endpoints

New Property

cameleer:
  server:
    security:
      infrastructureendpoints: ${CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS:true}

Default true (standalone mode -- endpoints available as today). The SaaS provisioner sets false on tenant server containers.

Bean Removal via @ConditionalOnProperty

Add to both DatabaseAdminController and ClickHouseAdminController:

@ConditionalOnProperty(
    name = "cameleer.server.security.infrastructureendpoints",
    havingValue = "true",
    matchIfMissing = true
)

When false:

  • Controller beans are not registered by Spring
  • Requests to /api/v1/admin/database/** and /api/v1/admin/clickhouse/** return 404 Not Found (Spring's default for unmapped paths)
  • Controllers do not appear in the OpenAPI spec
  • No role, no interceptor, no filter -- the endpoints simply do not exist

Health Endpoint Flag

Add infrastructureEndpoints boolean to the health endpoint response (GET /api/v1/health). The value reflects the property. This is a public endpoint, and the flag itself is not sensitive -- it only tells the UI whether to render the Database/ClickHouse admin tabs.

Implementation: a custom HealthIndicator bean or a @RestControllerAdvice that enriches the health response. The exact mechanism is an implementation detail.

UI Changes

buildAdminTreeNodes() in sidebar-utils.ts currently returns a static list including Database and ClickHouse nodes. Change it to accept a parameter (or read from a store) and omit those nodes when infrastructureEndpoints is false.

The flag is fetched once from the health endpoint at startup (the UI already calls health for connectivity). Store it in the auth store or a dedicated capabilities store.

Router: the /admin/database and /admin/clickhouse routes remain defined but are unreachable via navigation. If a user navigates directly, the API returns 404 and the page shows its existing error state.

Files Changed (Server)

File Change
application.yml Add cameleer.server.security.infrastructureendpoints: true
DatabaseAdminController.java Add @ConditionalOnProperty annotation
ClickHouseAdminController.java Add @ConditionalOnProperty annotation
Health response Add infrastructureEndpoints boolean
ui/src/components/sidebar-utils.ts Filter admin tree nodes based on flag
ui/src/components/LayoutShell.tsx Fetch and pass flag

Part 2: SaaS -- Vendor Infrastructure Dashboard

Architecture

The SaaS platform sits on the same Docker network as PostgreSQL and ClickHouse. It already has their connection URLs in ProvisioningProperties (datasourceUrl for cameleer PostgreSQL, clickhouseUrl for ClickHouse). It already uses raw JDBC (DriverManager.getConnection()) for tenant data cleanup in TenantDataCleanupService. The infrastructure dashboard uses the same pattern.

The SaaS does NOT call the server's admin endpoints. It queries the shared infrastructure directly. This means:

  • No dependency on server endpoint availability
  • Cross-tenant aggregation is natural (the SaaS knows all tenants)
  • Per-tenant filtering is explicit (WHERE tenant_id = ? for ClickHouse, schema-scoped queries for PostgreSQL)
  • No new Logto scopes or roles needed

Backend

InfrastructureService.java -- new service class. Raw JDBC connections from ProvisioningProperties.datasourceUrl() and .clickhouseUrl(). Methods:

PostgreSQL:

  • getPostgresOverview() -- server version (SELECT version()), total DB size (pg_database_size), active connection count (pg_stat_activity)
  • getPostgresTenantStats() -- per-tenant schema sizes, table counts, row counts. Query information_schema.tables joined with pg_stat_user_tables grouped by table_schema where schema matches tenant_%
  • getPostgresTenantDetail(slug) -- single tenant: table-level breakdown (name, rows, data size, index size) from pg_stat_user_tables filtered to tenant_{slug} schema

ClickHouse:

  • getClickHouseOverview() -- server version, uptime (system.metrics), total disk size, total rows, compression ratio (system.parts aggregated)
  • getClickHouseTenantStats() -- per-tenant row counts and disk usage. Query actual data tables (executions, logs, etc.) with SELECT tenant_id, count(), sum(bytes) ... GROUP BY tenant_id
  • getClickHouseTenantDetail(slug) -- single tenant: per-table breakdown (table name, row count, disk size) filtered by WHERE tenant_id = ?

Note: ClickHouse system.parts does not have a tenant_id column (it is a system table). Per-tenant ClickHouse stats require querying the actual data tables. For the overview, system.parts provides aggregate stats across all tenants.

InfrastructureController.java -- new REST controller at /api/vendor/infrastructure. All endpoints require platform:admin scope via @PreAuthorize("hasAuthority('SCOPE_platform:admin')").

Method Path Returns
GET / Combined PG + CH overview
GET /postgres PG overview + per-tenant breakdown
GET /postgres/{slug} Single tenant PG detail
GET /clickhouse CH overview + per-tenant breakdown
GET /clickhouse/{slug} Single tenant CH detail

Frontend

New vendor sidebar entry: Infrastructure (icon: Server or Database from lucide-react) at /vendor/infrastructure.

Page layout:

  • Two section cards: PostgreSQL and ClickHouse
  • Each shows aggregate KPIs at top (version, total size, connections/queries)
  • Per-tenant table below: slug, schema size / row count, disk usage
  • Click tenant row to expand or navigate to detail view
  • Detail view: per-table breakdown for that tenant

The page follows the existing vendor console patterns (card layout, tables, KPI strips) using @cameleer/design-system components.

SaaS Provisioner Change

DockerTenantProvisioner.createServerContainer() adds one env var to the list passed to tenant server containers:

env.add("CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false");

Files Changed (SaaS)

File Change
DockerTenantProvisioner.java Add env var to tenant server containers
InfrastructureService.java New -- raw JDBC queries for PG + CH stats
InfrastructureController.java New -- vendor-facing REST endpoints
Layout.tsx Add Infrastructure to vendor sidebar
router.tsx Add /vendor/infrastructure route
InfrastructurePage.tsx New -- overview page with PG/CH cards

What Does NOT Change

  • No new Logto scopes, roles, or API resources
  • No new Spring datasource beans (raw JDBC, same as TenantDataCleanupService)
  • No changes to SecurityConfig in either repo
  • No changes to existing tenant admin endpoints or RBAC
  • No changes to ServerApiClient (SaaS queries infra directly, not via server)
  • Standalone server deployments are unaffected (flag defaults to true)

Data Flow Summary

Standalone mode (no SaaS):

  1. Admin user logs into server UI
  2. Admin sidebar shows Database and ClickHouse tabs
  3. Tabs work as today -- full infrastructure visibility

SaaS managed mode:

  1. SaaS provisions tenant server with INFRASTRUCTUREENDPOINTS=false
  2. Tenant admin logs into server UI via OIDC
  3. Admin sidebar shows Users & Roles, OIDC, Audit, Environments -- no Database or ClickHouse tabs
  4. Direct navigation to /admin/database returns 404
  5. Vendor opens SaaS console -> Infrastructure page
  6. SaaS queries shared PG + CH directly with per-tenant filtering
  7. Vendor sees aggregate stats + per-tenant breakdown

Security Properties

  • Tenant isolation: tenant admins cannot see any infrastructure data. The endpoints do not exist on their server instance.
  • Cross-tenant prevention: the SaaS infrastructure dashboard queries with explicit tenant filtering. No tenant can see another tenant's data.
  • Blast radius: the flag is set at provisioning time via env var. A tenant admin cannot change it. Only someone with access to the Docker container config (platform operator) can toggle it.
  • Defense in depth: even if the flag were somehow bypassed, the server's DB/CH admin endpoints expose pg_stat_activity and system.processes globally. The SaaS approach of querying directly with tenant filtering is inherently safer.