From 01b268590da106dfb32f14821e87e4c7978ca29b Mon Sep 17 00:00:00 2001 From: hsiegeln <37154749+hsiegeln@users.noreply.github.com> Date: Sat, 11 Apr 2026 22:56:45 +0200 Subject: [PATCH] docs: add infrastructure endpoint visibility design spec Covers restricting DB/ClickHouse admin endpoints in SaaS-managed server instances via @ConditionalOnProperty flag, and building a vendor-facing infrastructure dashboard in the SaaS platform with per-tenant PostgreSQL and ClickHouse visibility. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...frastructure-endpoint-visibility-design.md | 249 ++++++++++++++++++ 1 file changed, 249 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-11-infrastructure-endpoint-visibility-design.md diff --git a/docs/superpowers/specs/2026-04-11-infrastructure-endpoint-visibility-design.md b/docs/superpowers/specs/2026-04-11-infrastructure-endpoint-visibility-design.md new file mode 100644 index 00000000..a20c763c --- /dev/null +++ b/docs/superpowers/specs/2026-04-11-infrastructure-endpoint-visibility-design.md @@ -0,0 +1,249 @@ +# Infrastructure Endpoint Visibility + +**Date:** 2026-04-11 +**Status:** Approved +**Scope:** cameleer3-server + cameleer-saas + +--- + +## Problem + +The server's admin section exposes PostgreSQL and ClickHouse diagnostic +endpoints (connection strings, pool stats, active queries, table sizes, server +versions). In standalone mode this is fine -- the admin user is the platform +owner. In SaaS mode, tenant admins receive `ADMIN` role via OIDC, which grants +them access to infrastructure internals they should not see. + +Worse, the current endpoints have cross-tenant data leaks in shared-infra +deployments: + +- `GET /admin/database/queries` returns `pg_stat_activity` which is + database-wide, not schema-scoped. Tenant A can see and kill Tenant B's + queries. +- `GET /admin/clickhouse/tables`, `/performance`, `/queries` query + `system.tables`, `system.parts`, and `system.processes` globally -- no + `tenant_id` filtering. + +## Solution + +Two complementary changes: + +1. **Server**: an explicit flag disables infrastructure endpoints entirely when + the server is provisioned by the SaaS platform. +2. **SaaS**: the vendor console gains its own infrastructure dashboard that + queries shared PostgreSQL and ClickHouse directly, with per-tenant breakdown. + +--- + +## Part 1: Server -- Disable Infrastructure Endpoints + +### New Property + +```yaml +cameleer: + server: + security: + infrastructureendpoints: ${CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS:true} +``` + +Default `true` (standalone mode -- endpoints available as today). The SaaS +provisioner sets `false` on tenant server containers. + +### Bean Removal via @ConditionalOnProperty + +Add to both `DatabaseAdminController` and `ClickHouseAdminController`: + +```java +@ConditionalOnProperty( + name = "cameleer.server.security.infrastructureendpoints", + havingValue = "true", + matchIfMissing = true +) +``` + +When `false`: +- Controller beans are not registered by Spring +- Requests to `/api/v1/admin/database/**` and `/api/v1/admin/clickhouse/**` + return **404 Not Found** (Spring's default for unmapped paths) +- Controllers do not appear in the OpenAPI spec +- No role, no interceptor, no filter -- the endpoints simply do not exist + +### Health Endpoint Flag + +Add `infrastructureEndpoints` boolean to the health endpoint response +(`GET /api/v1/health`). The value reflects the property. This is a public +endpoint, and the flag itself is not sensitive -- it only tells the UI whether +to render the Database/ClickHouse admin tabs. + +Implementation: a custom `HealthIndicator` bean or a `@RestControllerAdvice` +that enriches the health response. The exact mechanism is an implementation +detail. + +### UI Changes + +`buildAdminTreeNodes()` in `sidebar-utils.ts` currently returns a static list +including Database and ClickHouse nodes. Change it to accept a parameter +(or read from a store) and omit those nodes when `infrastructureEndpoints` is +`false`. + +The flag is fetched once from the health endpoint at startup (the UI already +calls health for connectivity). Store it in the auth store or a dedicated +capabilities store. + +Router: the `/admin/database` and `/admin/clickhouse` routes remain defined +but are unreachable via navigation. If a user navigates directly, the API +returns 404 and the page shows its existing error state. + +### Files Changed (Server) + +| File | Change | +|------|--------| +| `application.yml` | Add `cameleer.server.security.infrastructureendpoints: true` | +| `DatabaseAdminController.java` | Add `@ConditionalOnProperty` annotation | +| `ClickHouseAdminController.java` | Add `@ConditionalOnProperty` annotation | +| Health response | Add `infrastructureEndpoints` boolean | +| `ui/src/components/sidebar-utils.ts` | Filter admin tree nodes based on flag | +| `ui/src/components/LayoutShell.tsx` | Fetch and pass flag | + +--- + +## Part 2: SaaS -- Vendor Infrastructure Dashboard + +### Architecture + +The SaaS platform sits on the same Docker network as PostgreSQL and ClickHouse. +It already has their connection URLs in `ProvisioningProperties` (`datasourceUrl` +for cameleer3 PostgreSQL, `clickhouseUrl` for ClickHouse). It already uses raw +JDBC (`DriverManager.getConnection()`) for tenant data cleanup in +`TenantDataCleanupService`. The infrastructure dashboard uses the same pattern. + +The SaaS does NOT call the server's admin endpoints. It queries the shared +infrastructure directly. This means: + +- No dependency on server endpoint availability +- Cross-tenant aggregation is natural (the SaaS knows all tenants) +- Per-tenant filtering is explicit (`WHERE tenant_id = ?` for ClickHouse, + schema-scoped queries for PostgreSQL) +- No new Logto scopes or roles needed + +### Backend + +**`InfrastructureService.java`** -- new service class. Raw JDBC connections +from `ProvisioningProperties.datasourceUrl()` and `.clickhouseUrl()`. Methods: + +PostgreSQL: +- `getPostgresOverview()` -- server version (`SELECT version()`), total DB + size (`pg_database_size`), active connection count (`pg_stat_activity`) +- `getPostgresTenantStats()` -- per-tenant schema sizes, table counts, row + counts. Query `information_schema.tables` joined with `pg_stat_user_tables` + grouped by `table_schema` where schema matches `tenant_%` +- `getPostgresTenantDetail(slug)` -- single tenant: table-level breakdown + (name, rows, data size, index size) from `pg_stat_user_tables` filtered to + `tenant_{slug}` schema + +ClickHouse: +- `getClickHouseOverview()` -- server version, uptime (`system.metrics`), + total disk size, total rows, compression ratio (`system.parts` aggregated) +- `getClickHouseTenantStats()` -- per-tenant row counts and disk usage. Query + actual data tables (executions, logs, etc.) with + `SELECT tenant_id, count(), sum(bytes) ... GROUP BY tenant_id` +- `getClickHouseTenantDetail(slug)` -- single tenant: per-table breakdown + (table name, row count, disk size) filtered by `WHERE tenant_id = ?` + +Note: ClickHouse `system.parts` does not have a `tenant_id` column (it is a +system table). Per-tenant ClickHouse stats require querying the actual data +tables. For the overview, `system.parts` provides aggregate stats across all +tenants. + +**`InfrastructureController.java`** -- new REST controller at +`/api/vendor/infrastructure`. All endpoints require `platform:admin` scope +via `@PreAuthorize("hasAuthority('SCOPE_platform:admin')")`. + +| Method | Path | Returns | +|--------|------|---------| +| GET | `/` | Combined PG + CH overview | +| GET | `/postgres` | PG overview + per-tenant breakdown | +| GET | `/postgres/{slug}` | Single tenant PG detail | +| GET | `/clickhouse` | CH overview + per-tenant breakdown | +| GET | `/clickhouse/{slug}` | Single tenant CH detail | + +### Frontend + +New vendor sidebar entry: **Infrastructure** (icon: `Server` or `Database` +from lucide-react) at `/vendor/infrastructure`. + +**Page layout:** +- Two section cards: PostgreSQL and ClickHouse +- Each shows aggregate KPIs at top (version, total size, connections/queries) +- Per-tenant table below: slug, schema size / row count, disk usage +- Click tenant row to expand or navigate to detail view +- Detail view: per-table breakdown for that tenant + +The page follows the existing vendor console patterns (card layout, tables, +KPI strips) using `@cameleer/design-system` components. + +### SaaS Provisioner Change + +`DockerTenantProvisioner.createServerContainer()` adds one env var to the +list passed to tenant server containers: + +```java +env.add("CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false"); +``` + +### Files Changed (SaaS) + +| File | Change | +|------|--------| +| `DockerTenantProvisioner.java` | Add env var to tenant server containers | +| `InfrastructureService.java` | New -- raw JDBC queries for PG + CH stats | +| `InfrastructureController.java` | New -- vendor-facing REST endpoints | +| `Layout.tsx` | Add Infrastructure to vendor sidebar | +| `router.tsx` | Add `/vendor/infrastructure` route | +| `InfrastructurePage.tsx` | New -- overview page with PG/CH cards | + +--- + +## What Does NOT Change + +- No new Logto scopes, roles, or API resources +- No new Spring datasource beans (raw JDBC, same as TenantDataCleanupService) +- No changes to SecurityConfig in either repo +- No changes to existing tenant admin endpoints or RBAC +- No changes to ServerApiClient (SaaS queries infra directly, not via server) +- Standalone server deployments are unaffected (flag defaults to `true`) + +--- + +## Data Flow Summary + +**Standalone mode (no SaaS):** +1. Admin user logs into server UI +2. Admin sidebar shows Database and ClickHouse tabs +3. Tabs work as today -- full infrastructure visibility + +**SaaS managed mode:** +1. SaaS provisions tenant server with `INFRASTRUCTUREENDPOINTS=false` +2. Tenant admin logs into server UI via OIDC +3. Admin sidebar shows Users & Roles, OIDC, Audit, Environments -- no + Database or ClickHouse tabs +4. Direct navigation to `/admin/database` returns 404 +5. Vendor opens SaaS console -> Infrastructure page +6. SaaS queries shared PG + CH directly with per-tenant filtering +7. Vendor sees aggregate stats + per-tenant breakdown + +--- + +## Security Properties + +- **Tenant isolation**: tenant admins cannot see any infrastructure data. + The endpoints do not exist on their server instance. +- **Cross-tenant prevention**: the SaaS infrastructure dashboard queries + with explicit tenant filtering. No tenant can see another tenant's data. +- **Blast radius**: the flag is set at provisioning time via env var. A tenant + admin cannot change it. Only someone with access to the Docker container + config (platform operator) can toggle it. +- **Defense in depth**: even if the flag were somehow bypassed, the server's + DB/CH admin endpoints expose `pg_stat_activity` and `system.processes` + globally. The SaaS approach of querying directly with tenant filtering is + inherently safer.