docs: add infrastructure endpoint visibility design spec
Covers restricting DB/ClickHouse admin endpoints in SaaS-managed server instances via @ConditionalOnProperty flag, and building a vendor-facing infrastructure dashboard in the SaaS platform with per-tenant PostgreSQL and ClickHouse visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,249 @@
|
||||
# Infrastructure Endpoint Visibility
|
||||
|
||||
**Date:** 2026-04-11
|
||||
**Status:** Approved
|
||||
**Scope:** cameleer3-server + cameleer-saas
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
The server's admin section exposes PostgreSQL and ClickHouse diagnostic
|
||||
endpoints (connection strings, pool stats, active queries, table sizes, server
|
||||
versions). In standalone mode this is fine -- the admin user is the platform
|
||||
owner. In SaaS mode, tenant admins receive `ADMIN` role via OIDC, which grants
|
||||
them access to infrastructure internals they should not see.
|
||||
|
||||
Worse, the current endpoints have cross-tenant data leaks in shared-infra
|
||||
deployments:
|
||||
|
||||
- `GET /admin/database/queries` returns `pg_stat_activity` which is
|
||||
database-wide, not schema-scoped. Tenant A can see and kill Tenant B's
|
||||
queries.
|
||||
- `GET /admin/clickhouse/tables`, `/performance`, `/queries` query
|
||||
`system.tables`, `system.parts`, and `system.processes` globally -- no
|
||||
`tenant_id` filtering.
|
||||
|
||||
## Solution
|
||||
|
||||
Two complementary changes:
|
||||
|
||||
1. **Server**: an explicit flag disables infrastructure endpoints entirely when
|
||||
the server is provisioned by the SaaS platform.
|
||||
2. **SaaS**: the vendor console gains its own infrastructure dashboard that
|
||||
queries shared PostgreSQL and ClickHouse directly, with per-tenant breakdown.
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Server -- Disable Infrastructure Endpoints
|
||||
|
||||
### New Property
|
||||
|
||||
```yaml
|
||||
cameleer:
|
||||
server:
|
||||
security:
|
||||
infrastructureendpoints: ${CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS:true}
|
||||
```
|
||||
|
||||
Default `true` (standalone mode -- endpoints available as today). The SaaS
|
||||
provisioner sets `false` on tenant server containers.
|
||||
|
||||
### Bean Removal via @ConditionalOnProperty
|
||||
|
||||
Add to both `DatabaseAdminController` and `ClickHouseAdminController`:
|
||||
|
||||
```java
|
||||
@ConditionalOnProperty(
|
||||
name = "cameleer.server.security.infrastructureendpoints",
|
||||
havingValue = "true",
|
||||
matchIfMissing = true
|
||||
)
|
||||
```
|
||||
|
||||
When `false`:
|
||||
- Controller beans are not registered by Spring
|
||||
- Requests to `/api/v1/admin/database/**` and `/api/v1/admin/clickhouse/**`
|
||||
return **404 Not Found** (Spring's default for unmapped paths)
|
||||
- Controllers do not appear in the OpenAPI spec
|
||||
- No role, no interceptor, no filter -- the endpoints simply do not exist
|
||||
|
||||
### Health Endpoint Flag
|
||||
|
||||
Add `infrastructureEndpoints` boolean to the health endpoint response
|
||||
(`GET /api/v1/health`). The value reflects the property. This is a public
|
||||
endpoint, and the flag itself is not sensitive -- it only tells the UI whether
|
||||
to render the Database/ClickHouse admin tabs.
|
||||
|
||||
Implementation: a custom `HealthIndicator` bean or a `@RestControllerAdvice`
|
||||
that enriches the health response. The exact mechanism is an implementation
|
||||
detail.
|
||||
|
||||
### UI Changes
|
||||
|
||||
`buildAdminTreeNodes()` in `sidebar-utils.ts` currently returns a static list
|
||||
including Database and ClickHouse nodes. Change it to accept a parameter
|
||||
(or read from a store) and omit those nodes when `infrastructureEndpoints` is
|
||||
`false`.
|
||||
|
||||
The flag is fetched once from the health endpoint at startup (the UI already
|
||||
calls health for connectivity). Store it in the auth store or a dedicated
|
||||
capabilities store.
|
||||
|
||||
Router: the `/admin/database` and `/admin/clickhouse` routes remain defined
|
||||
but are unreachable via navigation. If a user navigates directly, the API
|
||||
returns 404 and the page shows its existing error state.
|
||||
|
||||
### Files Changed (Server)
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `application.yml` | Add `cameleer.server.security.infrastructureendpoints: true` |
|
||||
| `DatabaseAdminController.java` | Add `@ConditionalOnProperty` annotation |
|
||||
| `ClickHouseAdminController.java` | Add `@ConditionalOnProperty` annotation |
|
||||
| Health response | Add `infrastructureEndpoints` boolean |
|
||||
| `ui/src/components/sidebar-utils.ts` | Filter admin tree nodes based on flag |
|
||||
| `ui/src/components/LayoutShell.tsx` | Fetch and pass flag |
|
||||
|
||||
---
|
||||
|
||||
## Part 2: SaaS -- Vendor Infrastructure Dashboard
|
||||
|
||||
### Architecture
|
||||
|
||||
The SaaS platform sits on the same Docker network as PostgreSQL and ClickHouse.
|
||||
It already has their connection URLs in `ProvisioningProperties` (`datasourceUrl`
|
||||
for cameleer3 PostgreSQL, `clickhouseUrl` for ClickHouse). It already uses raw
|
||||
JDBC (`DriverManager.getConnection()`) for tenant data cleanup in
|
||||
`TenantDataCleanupService`. The infrastructure dashboard uses the same pattern.
|
||||
|
||||
The SaaS does NOT call the server's admin endpoints. It queries the shared
|
||||
infrastructure directly. This means:
|
||||
|
||||
- No dependency on server endpoint availability
|
||||
- Cross-tenant aggregation is natural (the SaaS knows all tenants)
|
||||
- Per-tenant filtering is explicit (`WHERE tenant_id = ?` for ClickHouse,
|
||||
schema-scoped queries for PostgreSQL)
|
||||
- No new Logto scopes or roles needed
|
||||
|
||||
### Backend
|
||||
|
||||
**`InfrastructureService.java`** -- new service class. Raw JDBC connections
|
||||
from `ProvisioningProperties.datasourceUrl()` and `.clickhouseUrl()`. Methods:
|
||||
|
||||
PostgreSQL:
|
||||
- `getPostgresOverview()` -- server version (`SELECT version()`), total DB
|
||||
size (`pg_database_size`), active connection count (`pg_stat_activity`)
|
||||
- `getPostgresTenantStats()` -- per-tenant schema sizes, table counts, row
|
||||
counts. Query `information_schema.tables` joined with `pg_stat_user_tables`
|
||||
grouped by `table_schema` where schema matches `tenant_%`
|
||||
- `getPostgresTenantDetail(slug)` -- single tenant: table-level breakdown
|
||||
(name, rows, data size, index size) from `pg_stat_user_tables` filtered to
|
||||
`tenant_{slug}` schema
|
||||
|
||||
ClickHouse:
|
||||
- `getClickHouseOverview()` -- server version, uptime (`system.metrics`),
|
||||
total disk size, total rows, compression ratio (`system.parts` aggregated)
|
||||
- `getClickHouseTenantStats()` -- per-tenant row counts and disk usage. Query
|
||||
actual data tables (executions, logs, etc.) with
|
||||
`SELECT tenant_id, count(), sum(bytes) ... GROUP BY tenant_id`
|
||||
- `getClickHouseTenantDetail(slug)` -- single tenant: per-table breakdown
|
||||
(table name, row count, disk size) filtered by `WHERE tenant_id = ?`
|
||||
|
||||
Note: ClickHouse `system.parts` does not have a `tenant_id` column (it is a
|
||||
system table). Per-tenant ClickHouse stats require querying the actual data
|
||||
tables. For the overview, `system.parts` provides aggregate stats across all
|
||||
tenants.
|
||||
|
||||
**`InfrastructureController.java`** -- new REST controller at
|
||||
`/api/vendor/infrastructure`. All endpoints require `platform:admin` scope
|
||||
via `@PreAuthorize("hasAuthority('SCOPE_platform:admin')")`.
|
||||
|
||||
| Method | Path | Returns |
|
||||
|--------|------|---------|
|
||||
| GET | `/` | Combined PG + CH overview |
|
||||
| GET | `/postgres` | PG overview + per-tenant breakdown |
|
||||
| GET | `/postgres/{slug}` | Single tenant PG detail |
|
||||
| GET | `/clickhouse` | CH overview + per-tenant breakdown |
|
||||
| GET | `/clickhouse/{slug}` | Single tenant CH detail |
|
||||
|
||||
### Frontend
|
||||
|
||||
New vendor sidebar entry: **Infrastructure** (icon: `Server` or `Database`
|
||||
from lucide-react) at `/vendor/infrastructure`.
|
||||
|
||||
**Page layout:**
|
||||
- Two section cards: PostgreSQL and ClickHouse
|
||||
- Each shows aggregate KPIs at top (version, total size, connections/queries)
|
||||
- Per-tenant table below: slug, schema size / row count, disk usage
|
||||
- Click tenant row to expand or navigate to detail view
|
||||
- Detail view: per-table breakdown for that tenant
|
||||
|
||||
The page follows the existing vendor console patterns (card layout, tables,
|
||||
KPI strips) using `@cameleer/design-system` components.
|
||||
|
||||
### SaaS Provisioner Change
|
||||
|
||||
`DockerTenantProvisioner.createServerContainer()` adds one env var to the
|
||||
list passed to tenant server containers:
|
||||
|
||||
```java
|
||||
env.add("CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false");
|
||||
```
|
||||
|
||||
### Files Changed (SaaS)
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `DockerTenantProvisioner.java` | Add env var to tenant server containers |
|
||||
| `InfrastructureService.java` | New -- raw JDBC queries for PG + CH stats |
|
||||
| `InfrastructureController.java` | New -- vendor-facing REST endpoints |
|
||||
| `Layout.tsx` | Add Infrastructure to vendor sidebar |
|
||||
| `router.tsx` | Add `/vendor/infrastructure` route |
|
||||
| `InfrastructurePage.tsx` | New -- overview page with PG/CH cards |
|
||||
|
||||
---
|
||||
|
||||
## What Does NOT Change
|
||||
|
||||
- No new Logto scopes, roles, or API resources
|
||||
- No new Spring datasource beans (raw JDBC, same as TenantDataCleanupService)
|
||||
- No changes to SecurityConfig in either repo
|
||||
- No changes to existing tenant admin endpoints or RBAC
|
||||
- No changes to ServerApiClient (SaaS queries infra directly, not via server)
|
||||
- Standalone server deployments are unaffected (flag defaults to `true`)
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Summary
|
||||
|
||||
**Standalone mode (no SaaS):**
|
||||
1. Admin user logs into server UI
|
||||
2. Admin sidebar shows Database and ClickHouse tabs
|
||||
3. Tabs work as today -- full infrastructure visibility
|
||||
|
||||
**SaaS managed mode:**
|
||||
1. SaaS provisions tenant server with `INFRASTRUCTUREENDPOINTS=false`
|
||||
2. Tenant admin logs into server UI via OIDC
|
||||
3. Admin sidebar shows Users & Roles, OIDC, Audit, Environments -- no
|
||||
Database or ClickHouse tabs
|
||||
4. Direct navigation to `/admin/database` returns 404
|
||||
5. Vendor opens SaaS console -> Infrastructure page
|
||||
6. SaaS queries shared PG + CH directly with per-tenant filtering
|
||||
7. Vendor sees aggregate stats + per-tenant breakdown
|
||||
|
||||
---
|
||||
|
||||
## Security Properties
|
||||
|
||||
- **Tenant isolation**: tenant admins cannot see any infrastructure data.
|
||||
The endpoints do not exist on their server instance.
|
||||
- **Cross-tenant prevention**: the SaaS infrastructure dashboard queries
|
||||
with explicit tenant filtering. No tenant can see another tenant's data.
|
||||
- **Blast radius**: the flag is set at provisioning time via env var. A tenant
|
||||
admin cannot change it. Only someone with access to the Docker container
|
||||
config (platform operator) can toggle it.
|
||||
- **Defense in depth**: even if the flag were somehow bypassed, the server's
|
||||
DB/CH admin endpoints expose `pg_stat_activity` and `system.processes`
|
||||
globally. The SaaS approach of querying directly with tenant filtering is
|
||||
inherently safer.
|
||||
Reference in New Issue
Block a user