docs: add infrastructure endpoint visibility design spec

Covers restricting DB/ClickHouse admin endpoints in SaaS-managed
server instances via @ConditionalOnProperty flag, and building a
vendor-facing infrastructure dashboard in the SaaS platform with
per-tenant PostgreSQL and ClickHouse visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-04-11 22:56:45 +02:00
parent 7a3256f3f6
commit 01b268590d

View File

@@ -0,0 +1,249 @@
# Infrastructure Endpoint Visibility
**Date:** 2026-04-11
**Status:** Approved
**Scope:** cameleer3-server + cameleer-saas
---
## Problem
The server's admin section exposes PostgreSQL and ClickHouse diagnostic
endpoints (connection strings, pool stats, active queries, table sizes, server
versions). In standalone mode this is fine -- the admin user is the platform
owner. In SaaS mode, tenant admins receive `ADMIN` role via OIDC, which grants
them access to infrastructure internals they should not see.
Worse, the current endpoints have cross-tenant data leaks in shared-infra
deployments:
- `GET /admin/database/queries` returns `pg_stat_activity` which is
database-wide, not schema-scoped. Tenant A can see and kill Tenant B's
queries.
- `GET /admin/clickhouse/tables`, `/performance`, `/queries` query
`system.tables`, `system.parts`, and `system.processes` globally -- no
`tenant_id` filtering.
## Solution
Two complementary changes:
1. **Server**: an explicit flag disables infrastructure endpoints entirely when
the server is provisioned by the SaaS platform.
2. **SaaS**: the vendor console gains its own infrastructure dashboard that
queries shared PostgreSQL and ClickHouse directly, with per-tenant breakdown.
---
## Part 1: Server -- Disable Infrastructure Endpoints
### New Property
```yaml
cameleer:
server:
security:
infrastructureendpoints: ${CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS:true}
```
Default `true` (standalone mode -- endpoints available as today). The SaaS
provisioner sets `false` on tenant server containers.
### Bean Removal via @ConditionalOnProperty
Add to both `DatabaseAdminController` and `ClickHouseAdminController`:
```java
@ConditionalOnProperty(
name = "cameleer.server.security.infrastructureendpoints",
havingValue = "true",
matchIfMissing = true
)
```
When `false`:
- Controller beans are not registered by Spring
- Requests to `/api/v1/admin/database/**` and `/api/v1/admin/clickhouse/**`
return **404 Not Found** (Spring's default for unmapped paths)
- Controllers do not appear in the OpenAPI spec
- No role, no interceptor, no filter -- the endpoints simply do not exist
### Health Endpoint Flag
Add `infrastructureEndpoints` boolean to the health endpoint response
(`GET /api/v1/health`). The value reflects the property. This is a public
endpoint, and the flag itself is not sensitive -- it only tells the UI whether
to render the Database/ClickHouse admin tabs.
Implementation: a custom `HealthIndicator` bean or a `@RestControllerAdvice`
that enriches the health response. The exact mechanism is an implementation
detail.
### UI Changes
`buildAdminTreeNodes()` in `sidebar-utils.ts` currently returns a static list
including Database and ClickHouse nodes. Change it to accept a parameter
(or read from a store) and omit those nodes when `infrastructureEndpoints` is
`false`.
The flag is fetched once from the health endpoint at startup (the UI already
calls health for connectivity). Store it in the auth store or a dedicated
capabilities store.
Router: the `/admin/database` and `/admin/clickhouse` routes remain defined
but are unreachable via navigation. If a user navigates directly, the API
returns 404 and the page shows its existing error state.
### Files Changed (Server)
| File | Change |
|------|--------|
| `application.yml` | Add `cameleer.server.security.infrastructureendpoints: true` |
| `DatabaseAdminController.java` | Add `@ConditionalOnProperty` annotation |
| `ClickHouseAdminController.java` | Add `@ConditionalOnProperty` annotation |
| Health response | Add `infrastructureEndpoints` boolean |
| `ui/src/components/sidebar-utils.ts` | Filter admin tree nodes based on flag |
| `ui/src/components/LayoutShell.tsx` | Fetch and pass flag |
---
## Part 2: SaaS -- Vendor Infrastructure Dashboard
### Architecture
The SaaS platform sits on the same Docker network as PostgreSQL and ClickHouse.
It already has their connection URLs in `ProvisioningProperties` (`datasourceUrl`
for cameleer3 PostgreSQL, `clickhouseUrl` for ClickHouse). It already uses raw
JDBC (`DriverManager.getConnection()`) for tenant data cleanup in
`TenantDataCleanupService`. The infrastructure dashboard uses the same pattern.
The SaaS does NOT call the server's admin endpoints. It queries the shared
infrastructure directly. This means:
- No dependency on server endpoint availability
- Cross-tenant aggregation is natural (the SaaS knows all tenants)
- Per-tenant filtering is explicit (`WHERE tenant_id = ?` for ClickHouse,
schema-scoped queries for PostgreSQL)
- No new Logto scopes or roles needed
### Backend
**`InfrastructureService.java`** -- new service class. Raw JDBC connections
from `ProvisioningProperties.datasourceUrl()` and `.clickhouseUrl()`. Methods:
PostgreSQL:
- `getPostgresOverview()` -- server version (`SELECT version()`), total DB
size (`pg_database_size`), active connection count (`pg_stat_activity`)
- `getPostgresTenantStats()` -- per-tenant schema sizes, table counts, row
counts. Query `information_schema.tables` joined with `pg_stat_user_tables`
grouped by `table_schema` where schema matches `tenant_%`
- `getPostgresTenantDetail(slug)` -- single tenant: table-level breakdown
(name, rows, data size, index size) from `pg_stat_user_tables` filtered to
`tenant_{slug}` schema
ClickHouse:
- `getClickHouseOverview()` -- server version, uptime (`system.metrics`),
total disk size, total rows, compression ratio (`system.parts` aggregated)
- `getClickHouseTenantStats()` -- per-tenant row counts and disk usage. Query
actual data tables (executions, logs, etc.) with
`SELECT tenant_id, count(), sum(bytes) ... GROUP BY tenant_id`
- `getClickHouseTenantDetail(slug)` -- single tenant: per-table breakdown
(table name, row count, disk size) filtered by `WHERE tenant_id = ?`
Note: ClickHouse `system.parts` does not have a `tenant_id` column (it is a
system table). Per-tenant ClickHouse stats require querying the actual data
tables. For the overview, `system.parts` provides aggregate stats across all
tenants.
**`InfrastructureController.java`** -- new REST controller at
`/api/vendor/infrastructure`. All endpoints require `platform:admin` scope
via `@PreAuthorize("hasAuthority('SCOPE_platform:admin')")`.
| Method | Path | Returns |
|--------|------|---------|
| GET | `/` | Combined PG + CH overview |
| GET | `/postgres` | PG overview + per-tenant breakdown |
| GET | `/postgres/{slug}` | Single tenant PG detail |
| GET | `/clickhouse` | CH overview + per-tenant breakdown |
| GET | `/clickhouse/{slug}` | Single tenant CH detail |
### Frontend
New vendor sidebar entry: **Infrastructure** (icon: `Server` or `Database`
from lucide-react) at `/vendor/infrastructure`.
**Page layout:**
- Two section cards: PostgreSQL and ClickHouse
- Each shows aggregate KPIs at top (version, total size, connections/queries)
- Per-tenant table below: slug, schema size / row count, disk usage
- Click tenant row to expand or navigate to detail view
- Detail view: per-table breakdown for that tenant
The page follows the existing vendor console patterns (card layout, tables,
KPI strips) using `@cameleer/design-system` components.
### SaaS Provisioner Change
`DockerTenantProvisioner.createServerContainer()` adds one env var to the
list passed to tenant server containers:
```java
env.add("CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false");
```
### Files Changed (SaaS)
| File | Change |
|------|--------|
| `DockerTenantProvisioner.java` | Add env var to tenant server containers |
| `InfrastructureService.java` | New -- raw JDBC queries for PG + CH stats |
| `InfrastructureController.java` | New -- vendor-facing REST endpoints |
| `Layout.tsx` | Add Infrastructure to vendor sidebar |
| `router.tsx` | Add `/vendor/infrastructure` route |
| `InfrastructurePage.tsx` | New -- overview page with PG/CH cards |
---
## What Does NOT Change
- No new Logto scopes, roles, or API resources
- No new Spring datasource beans (raw JDBC, same as TenantDataCleanupService)
- No changes to SecurityConfig in either repo
- No changes to existing tenant admin endpoints or RBAC
- No changes to ServerApiClient (SaaS queries infra directly, not via server)
- Standalone server deployments are unaffected (flag defaults to `true`)
---
## Data Flow Summary
**Standalone mode (no SaaS):**
1. Admin user logs into server UI
2. Admin sidebar shows Database and ClickHouse tabs
3. Tabs work as today -- full infrastructure visibility
**SaaS managed mode:**
1. SaaS provisions tenant server with `INFRASTRUCTUREENDPOINTS=false`
2. Tenant admin logs into server UI via OIDC
3. Admin sidebar shows Users & Roles, OIDC, Audit, Environments -- no
Database or ClickHouse tabs
4. Direct navigation to `/admin/database` returns 404
5. Vendor opens SaaS console -> Infrastructure page
6. SaaS queries shared PG + CH directly with per-tenant filtering
7. Vendor sees aggregate stats + per-tenant breakdown
---
## Security Properties
- **Tenant isolation**: tenant admins cannot see any infrastructure data.
The endpoints do not exist on their server instance.
- **Cross-tenant prevention**: the SaaS infrastructure dashboard queries
with explicit tenant filtering. No tenant can see another tenant's data.
- **Blast radius**: the flag is set at provisioning time via env var. A tenant
admin cannot change it. Only someone with access to the Docker container
config (platform operator) can toggle it.
- **Defense in depth**: even if the flag were somehow bypassed, the server's
DB/CH admin endpoints expose `pg_stat_activity` and `system.processes`
globally. The SaaS approach of querying directly with tenant filtering is
inherently safer.