Files
cameleer-server/docs/superpowers/specs/2026-03-17-infrastructure-overview-design.md
hsiegeln 2bcbff3ee6 docs: add infrastructure overview design spec
Covers admin navigation restructuring, database/OpenSearch monitoring pages,
configurable thresholds, database-backed audit log (SOC2), and phased
implementation plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 14:55:47 +01:00

17 KiB

Infrastructure Overview — Admin Pages Design

Date: 2026-03-17 Status: Approved Scope: Phase 1 implementation; full vision documented with Phase 2+ sections marked

Overview

Add Database and OpenSearch admin pages to the Cameleer3 Server UI, allowing administrators to monitor subsystem health, inspect metrics, and perform basic maintenance actions. Restructure admin navigation from a single OIDC page to a sidebar sub-menu with dedicated pages per concern.

Goals

  • Give admins real-time visibility into PostgreSQL and OpenSearch health, performance, and storage
  • Enable basic maintenance actions (kill queries, delete indices) without SSH/kubectl access
  • Provide configurable thresholds for visual status indicators (green/yellow/red)
  • Establish a database-backed audit log for all admin actions (SOC2 compliance foundation)
  • Design for future expansion (VACUUM, reindex, OPERATOR role) without requiring restructuring

Non-Goals (Phase 1)

  • Database maintenance actions (VACUUM ANALYZE, Reindex)
  • OpenSearch bulk operations (Force Reindex All, Flush)
  • OPERATOR role with restricted permissions
  • TimescaleDB-specific features (hypertable stats, continuous aggregate status)
  • Alerting or notifications beyond visual indicators

1. Admin Navigation Restructuring

Current State

Single gear icon at bottom of AppSidebar linking directly to /admin/oidc.

New Structure

The gear icon expands/collapses an admin sub-menu in the sidebar:

── Apps ──────────────
  app-1
  app-2
── Admin (gear icon) ─
  Database              → /admin/database
  OpenSearch            → /admin/opensearch
  Audit Log             → /admin/audit
  OIDC                  → /admin/oidc
  Users                 → (future)
  • Admin section visible only to users with ADMIN role
  • Section collapsed by default; state persisted in localStorage
  • Active sub-item highlighted
  • /admin redirects to /admin/database
  • Existing OidcAdminPage unchanged functionally, re-routed from being the sole admin page to a sub-page

2. Database Page (/admin/database)

Header

  • Connection status badge (green/red)
  • PostgreSQL version (with TimescaleDB extension noted if present)
  • Host and schema name
  • Manual refresh button (refreshes all sections)

Connection Pool Section

  • Visual bar showing active connections vs. max pool size
  • Metrics: active, idle, pending, max wait time
  • Status badge based on configurable threshold (% of pool in use)
  • Source: HikariCP pool MXBean
  • Auto-refreshes every 15 seconds

Table Sizes Section

  • Table with columns: Table, Rows, Size, Index Size
  • All application tables listed (executions, processor_executions, route_diagrams, agent_metrics, users, oidc_config, admin_thresholds)
  • Summary row: total data size, total index size
  • Source: pg_stat_user_tables + pg_relation_size
  • Manual refresh only (expensive query)

Active Queries Section

  • Table with columns: PID, Duration, State, Query (truncated), Action
  • Queries > warning threshold highlighted yellow, > critical threshold highlighted red
  • Kill button per row → calls pg_terminate_backend(pid)
  • Kill requires confirmation dialog
  • After kill, query list refreshes automatically
  • Source: pg_stat_activity
  • Auto-refreshes every 15 seconds

Maintenance Section (Phase 2 — Visible but Disabled)

  • Buttons: Run VACUUM ANALYZE, Reindex Tables
  • Greyed out with tooltip: "Available in a future release"

Thresholds Section

  • Collapsible, collapsed by default
  • Configurable values:
    • Connection pool usage: warning % and critical %
    • Query duration: warning seconds and critical seconds
  • Save button persists to database

3. OpenSearch Page (/admin/opensearch)

Header

  • Cluster health badge (green/yellow/red — maps directly to OpenSearch cluster health)
  • OpenSearch version
  • Node count
  • Host URL
  • Manual refresh button

Indexing Pipeline Section

  • Visual bar showing queue depth vs. max queue size
  • Metrics: queue depth, failed document count, debounce interval, indexing rate (docs/s), time since last indexed
  • Status badge based on configurable thresholds
  • Source: SearchIndexer internal stats (exposed via SearchIndexerStats interface)
  • Auto-refreshes every 15 seconds

Indices Section

  • Search/filter by index name pattern (text input)
  • Filter by health — All / Green / Yellow / Red dropdown
  • Sortable columns — Name, Docs, Size, Health, Shards (click column header)
  • Pagination — 10 per page, server-side
  • Summary row above table — total index count, total docs, total storage
  • Delete button (trash icon) per row:
    • Confirmation dialog: "Delete index {name}? This cannot be undone."
    • User must type the index name to confirm
    • After deletion, table and summary refresh
  • Table columns: Index, Docs, Size, Health, Shards (primary/replica)
  • Source: OpenSearch _cat/indices API
  • Manual refresh only

Performance Section

  • Metrics: query cache hit rate, request cache hit rate, average search latency, average indexing latency, JVM heap used (visual bar with used/max)
  • Source: OpenSearch _nodes/stats API
  • Auto-refreshes every 15 seconds

Operations Section (Phase 2 — Visible but Disabled)

  • Buttons: Force Reindex All, Flush Index, Delete Index (bulk via checkbox selection)
  • Greyed out with tooltip: "Available in a future release"

Thresholds Section

  • Collapsible, collapsed by default
  • Configurable values:
    • Cluster health: warning level, critical level
    • Queue depth: warning count, critical count
    • JVM heap usage: warning %, critical %
    • Failed docs: warning count, critical count
  • Save button persists to database

4. Audit Log Page (/admin/audit)

Purpose

Database-backed audit trail of all administrative actions across the system. Provides SOC2-compliant evidence of who did what, when, and from where. The audit log is append-only — entries cannot be modified or deleted through the UI or API.

Header

  • Total event count
  • Date range selector (default: last 7 days)

Audit Log Table

┌─ Audit Log ────────────────────────────────────────────────┐
│ Date range: [2026-03-10] to [2026-03-17]                   │
│ [User: All ▾]  [Category: All ▾]  [Search: ________]      │
│                                                            │
│ Timestamp            User     Category   Action     Target │
│ 2026-03-17 14:32:01  admin    INFRA      kill_query PID 42 │
│ 2026-03-17 14:28:15  admin    INFRA      delete_idx exec-… │
│ 2026-03-17 12:01:44  admin    CONFIG     update     oidc   │
│ 2026-03-17 09:15:22  jdoe     AUTH       login             │
│ 2026-03-16 18:45:00  admin    USER_MGMT  update_roles u:5  │
│ ...                                                        │
│                                                            │
│ ◀ 1  2  3  ...  12 ▶              Showing 1-25 of 294     │
└────────────────────────────────────────────────────────────┘
  • Filterable by user, category, date range
  • Searchable by free text (matches action, target, detail)
  • Sortable by timestamp (default: newest first)
  • Pagination — 25 per page, server-side
  • Detail expansion — click a row to expand and show full detail JSON
  • Read-only — no edit or delete actions available (compliance requirement)
  • Export (Phase 2) — CSV/JSON download for auditors

Audit Categories

Category Actions Logged
INFRA kill_query, delete_index, update_thresholds
AUTH login, login_oidc, logout, login_failed
USER_MGMT create_user, update_roles, delete_user
CONFIG update_oidc, delete_oidc, test_oidc

What Gets Logged

Every admin action across the system, not just infrastructure pages:

  • Infrastructure: kill query, delete OpenSearch index, save thresholds
  • OIDC: save config, delete config, test connection
  • User management: update roles, delete user
  • Authentication: login (success and failure), OIDC login, logout

Audit Record Fields

Field Description
timestamp When the action occurred (server time, UTC)
username Authenticated user who performed the action
action Machine-readable action name (e.g., kill_query, delete_index)
category Grouping: INFRA, AUTH, USER_MGMT, CONFIG
target What was acted on (e.g., PID, index name, user ID)
detail JSONB with action-specific context (e.g., query text for killed query, old/new roles for role change)
result SUCCESS or FAILURE
ip_address Client IP address from the request

Backend Implementation

  • AuditService — central service injected into all admin controllers
  • Single method: log(action, category, target, detail, result)
  • Extracts username and IP from SecurityContextHolder and HttpServletRequest
  • Writes to both the audit_log table AND SLF4J (belt and suspenders)
  • Async write option not used — audit must be synchronous for compliance guarantees

5. Backend API

All endpoints under /api/v1/admin/ — secured by existing Spring Security filter chain (ROLE_ADMIN required). Controllers additionally annotated with @PreAuthorize("hasRole('ADMIN')") for defense-in-depth.

Database Endpoints

Method Path Description
GET /admin/database/status Version, host, schema, connection state
GET /admin/database/pool Active, idle, pending, max wait (HikariCP)
GET /admin/database/tables Table names, row counts, data sizes, index sizes
GET /admin/database/queries Active queries: pid, duration, state, SQL
POST /admin/database/queries/{pid}/kill Terminate query via pg_terminate_backend

OpenSearch Endpoints

Method Path Description
GET /admin/opensearch/status Version, host, cluster health, node count
GET /admin/opensearch/pipeline Queue depth, failed count, debounce, rate, last indexed
GET /admin/opensearch/indices Paginated, sortable, filterable index list
DELETE /admin/opensearch/indices/{name} Delete specific index (with audit log)
GET /admin/opensearch/performance Cache rates, latencies, JVM heap

Indices Query Parameters

Param Type Default Description
search string Filter by index name pattern
health enum ALL Filter by health: ALL, GREEN, YELLOW, RED
sort string name Sort field: name, docs, size, health
order enum asc Sort direction: asc, desc
page int 0 Page number (zero-based)
size int 10 Page size

Audit Log Endpoints

Method Path Description
GET /admin/audit Paginated, filterable audit log entries

Audit Log Query Parameters

Param Type Default Description
username string Filter by username
category enum Filter by category: INFRA, AUTH, USER_MGMT, CONFIG
search string Free text search across action, target, detail
from ISO date 7 days ago Start of date range
to ISO date now End of date range
sort string timestamp Sort field
order enum desc Sort direction: asc, desc
page int 0 Page number (zero-based)
size int 25 Page size

Thresholds Endpoints

Method Path Description
GET /admin/thresholds All configured thresholds
PUT /admin/thresholds Save thresholds (database + OpenSearch in one payload)

Thresholds Payload

{
  "database": {
    "connectionPoolWarning": 80,
    "connectionPoolCritical": 95,
    "queryDurationWarning": 1.0,
    "queryDurationCritical": 10.0
  },
  "opensearch": {
    "clusterHealthWarning": "YELLOW",
    "clusterHealthCritical": "RED",
    "queueDepthWarning": 100,
    "queueDepthCritical": 500,
    "jvmHeapWarning": 75,
    "jvmHeapCritical": 90,
    "failedDocsWarning": 1,
    "failedDocsCritical": 10
  }
}

6. Security

Enforcement Layers

  1. Spring Security filter chain/api/v1/admin/** requires ROLE_ADMIN (existing configuration)
  2. Controller annotation@PreAuthorize("hasRole('ADMIN')") on each controller class (defense-in-depth)
  3. UI role check — sidebar admin section hidden for non-admin users (cosmetic only, not a security boundary)

Audit Logging

All admin actions are persisted to the audit_log database table (see Section 4 and Section 7 — Data Storage) AND logged via SLF4J at INFO level. The database record is the source of truth for compliance; the SLF4J log provides operational visibility.

The AuditService is injected into all admin controllers (infrastructure, OIDC, user management) and the authentication flow. See Section 4 (Audit Log Page) for full details on what is logged and the record structure.

Future: OPERATOR Role (Phase 2+)

Design anticipates a read-only OPERATOR role:

  • Can view all monitoring data
  • Cannot perform destructive actions (kill, delete)
  • Implementation: method-level @PreAuthorize on action endpoints, UI conditionally disables buttons based on role

7. Data Storage

New Flyway Migration: V9

CREATE TABLE admin_thresholds (
    id          INTEGER PRIMARY KEY DEFAULT 1,
    config      JSONB NOT NULL DEFAULT '{}',
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_by  TEXT NOT NULL,
    CONSTRAINT  single_row CHECK (id = 1)
);

CREATE TABLE audit_log (
    id          BIGSERIAL PRIMARY KEY,
    timestamp   TIMESTAMPTZ NOT NULL DEFAULT now(),
    username    TEXT NOT NULL,
    action      TEXT NOT NULL,
    category    TEXT NOT NULL,
    target      TEXT,
    detail      JSONB,
    result      TEXT NOT NULL,
    ip_address  TEXT
);

CREATE INDEX idx_audit_log_timestamp ON audit_log (timestamp DESC);
CREATE INDEX idx_audit_log_username ON audit_log (username);
CREATE INDEX idx_audit_log_category ON audit_log (category);

admin_thresholds:

  • Single-row table (same pattern as oidc_config)
  • JSON column for flexibility — adding new thresholds doesn't require schema changes
  • Tracks who last updated and when

audit_log:

  • Append-only table — no UPDATE or DELETE exposed via API
  • Indexed on timestamp (primary query axis), username, and category for filtered views
  • JSONB detail column holds action-specific context without schema changes
  • No foreign key to users table — username is denormalized so audit records survive user deletion

8. Frontend Architecture

New Files

File Purpose
pages/admin/DatabaseAdminPage.tsx Database monitoring and management
pages/admin/OpenSearchAdminPage.tsx OpenSearch monitoring and management
pages/admin/AuditLogPage.tsx Audit log viewer
api/queries/admin/database.ts React Query hooks for database endpoints
api/queries/admin/opensearch.ts React Query hooks for OpenSearch endpoints
api/queries/admin/thresholds.ts React Query hooks for threshold endpoints
api/queries/admin/audit.ts React Query hooks for audit log endpoint
components/admin/StatusBadge.tsx Color-coded status indicator (green/yellow/red)
components/admin/RefreshableCard.tsx Card with manual refresh button + optional auto-refresh
components/admin/ConfirmDeleteDialog.tsx Confirmation dialog requiring name input for destructive actions

Modified Files

File Change
components/layout/AppSidebar.tsx Refactor admin section to collapsible sub-menu with multiple items
router.tsx Add routes for /admin/database, /admin/opensearch, /admin/audit, redirect /admin
SpaForwardController.java Ensure /admin/* forwarding covers new routes

Auto-Refresh Strategy

  • React Query refetchInterval: 15000 on lightweight endpoints (pool, queries, pipeline, performance)
  • Heavy endpoints (tables, indices) use refetchInterval: false — manual refresh only
  • Refresh button calls queryClient.invalidateQueries for all queries on that page

9. Implementation Phases

Phase 1 (Current Scope)

  1. Admin sidebar restructuring
  2. Database page — all monitoring sections + kill query
  3. OpenSearch page — all monitoring sections + delete index
  4. Threshold configuration (both pages)
  5. Audit log — database-backed audit trail + admin viewer page
  6. Retrofit audit logging into existing admin controllers (OIDC, user management) and auth flow
  7. Backend endpoints with RBAC enforcement
  8. Flyway migration V9 for thresholds + audit_log tables

Phase 2

  • Database maintenance actions (VACUUM ANALYZE, Reindex)
  • OpenSearch operations (Force Reindex All, Flush)
  • Bulk index operations (checkbox selection)
  • Audit log CSV/JSON export for auditors
  • OPERATOR role with view-only permissions

Phase 3

  • TimescaleDB-aware metrics (hypertable chunks, continuous aggregate status, compression)
  • Historical trend charts for key metrics
  • Alerting/notification system