docs: add infrastructure overview design spec
Covers admin navigation restructuring, database/OpenSearch monitoring pages, configurable thresholds, database-backed audit log (SOC2), and phased implementation plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,455 @@
|
||||
# Infrastructure Overview — Admin Pages Design
|
||||
|
||||
**Date:** 2026-03-17
|
||||
**Status:** Approved
|
||||
**Scope:** Phase 1 implementation; full vision documented with Phase 2+ sections marked
|
||||
|
||||
## Overview
|
||||
|
||||
Add Database and OpenSearch admin pages to the Cameleer3 Server UI, allowing administrators to monitor subsystem health, inspect metrics, and perform basic maintenance actions. Restructure admin navigation from a single OIDC page to a sidebar sub-menu with dedicated pages per concern.
|
||||
|
||||
## Goals
|
||||
|
||||
- Give admins real-time visibility into PostgreSQL and OpenSearch health, performance, and storage
|
||||
- Enable basic maintenance actions (kill queries, delete indices) without SSH/kubectl access
|
||||
- Provide configurable thresholds for visual status indicators (green/yellow/red)
|
||||
- Establish a database-backed audit log for all admin actions (SOC2 compliance foundation)
|
||||
- Design for future expansion (VACUUM, reindex, OPERATOR role) without requiring restructuring
|
||||
|
||||
## Non-Goals (Phase 1)
|
||||
|
||||
- Database maintenance actions (VACUUM ANALYZE, Reindex)
|
||||
- OpenSearch bulk operations (Force Reindex All, Flush)
|
||||
- OPERATOR role with restricted permissions
|
||||
- TimescaleDB-specific features (hypertable stats, continuous aggregate status)
|
||||
- Alerting or notifications beyond visual indicators
|
||||
|
||||
---
|
||||
|
||||
## 1. Admin Navigation Restructuring
|
||||
|
||||
### Current State
|
||||
|
||||
Single gear icon at bottom of `AppSidebar` linking directly to `/admin/oidc`.
|
||||
|
||||
### New Structure
|
||||
|
||||
The gear icon expands/collapses an admin sub-menu in the sidebar:
|
||||
|
||||
```
|
||||
── Apps ──────────────
|
||||
app-1
|
||||
app-2
|
||||
── Admin (gear icon) ─
|
||||
Database → /admin/database
|
||||
OpenSearch → /admin/opensearch
|
||||
Audit Log → /admin/audit
|
||||
OIDC → /admin/oidc
|
||||
Users → (future)
|
||||
```
|
||||
|
||||
- Admin section visible only to users with `ADMIN` role
|
||||
- Section collapsed by default; state persisted in localStorage
|
||||
- Active sub-item highlighted
|
||||
- `/admin` redirects to `/admin/database`
|
||||
- Existing `OidcAdminPage` unchanged functionally, re-routed from being the sole admin page to a sub-page
|
||||
|
||||
---
|
||||
|
||||
## 2. Database Page (`/admin/database`)
|
||||
|
||||
### Header
|
||||
|
||||
- Connection status badge (green/red)
|
||||
- PostgreSQL version (with TimescaleDB extension noted if present)
|
||||
- Host and schema name
|
||||
- Manual refresh button (refreshes all sections)
|
||||
|
||||
### Connection Pool Section
|
||||
|
||||
- Visual bar showing active connections vs. max pool size
|
||||
- Metrics: active, idle, pending, max wait time
|
||||
- Status badge based on configurable threshold (% of pool in use)
|
||||
- Source: HikariCP pool MXBean
|
||||
- **Auto-refreshes every 15 seconds**
|
||||
|
||||
### Table Sizes Section
|
||||
|
||||
- Table with columns: Table, Rows, Size, Index Size
|
||||
- All application tables listed (executions, processor_executions, route_diagrams, agent_metrics, users, oidc_config, admin_thresholds)
|
||||
- Summary row: total data size, total index size
|
||||
- Source: `pg_stat_user_tables` + `pg_relation_size`
|
||||
- **Manual refresh only** (expensive query)
|
||||
|
||||
### Active Queries Section
|
||||
|
||||
- Table with columns: PID, Duration, State, Query (truncated), Action
|
||||
- Queries > warning threshold highlighted yellow, > critical threshold highlighted red
|
||||
- Kill button per row → calls `pg_terminate_backend(pid)`
|
||||
- Kill requires confirmation dialog
|
||||
- After kill, query list refreshes automatically
|
||||
- Source: `pg_stat_activity`
|
||||
- **Auto-refreshes every 15 seconds**
|
||||
|
||||
### Maintenance Section (Phase 2 — Visible but Disabled)
|
||||
|
||||
- Buttons: Run VACUUM ANALYZE, Reindex Tables
|
||||
- Greyed out with tooltip: "Available in a future release"
|
||||
|
||||
### Thresholds Section
|
||||
|
||||
- Collapsible, collapsed by default
|
||||
- Configurable values:
|
||||
- Connection pool usage: warning % and critical %
|
||||
- Query duration: warning seconds and critical seconds
|
||||
- Save button persists to database
|
||||
|
||||
---
|
||||
|
||||
## 3. OpenSearch Page (`/admin/opensearch`)
|
||||
|
||||
### Header
|
||||
|
||||
- Cluster health badge (green/yellow/red — maps directly to OpenSearch cluster health)
|
||||
- OpenSearch version
|
||||
- Node count
|
||||
- Host URL
|
||||
- Manual refresh button
|
||||
|
||||
### Indexing Pipeline Section
|
||||
|
||||
- Visual bar showing queue depth vs. max queue size
|
||||
- Metrics: queue depth, failed document count, debounce interval, indexing rate (docs/s), time since last indexed
|
||||
- Status badge based on configurable thresholds
|
||||
- Source: `SearchIndexer` internal stats (exposed via `SearchIndexerStats` interface)
|
||||
- **Auto-refreshes every 15 seconds**
|
||||
|
||||
### Indices Section
|
||||
|
||||
- **Search/filter** by index name pattern (text input)
|
||||
- **Filter by health** — All / Green / Yellow / Red dropdown
|
||||
- **Sortable columns** — Name, Docs, Size, Health, Shards (click column header)
|
||||
- **Pagination** — 10 per page, server-side
|
||||
- **Summary row** above table — total index count, total docs, total storage
|
||||
- **Delete button** (trash icon) per row:
|
||||
- Confirmation dialog: "Delete index `{name}`? This cannot be undone."
|
||||
- User must type the index name to confirm
|
||||
- After deletion, table and summary refresh
|
||||
- Table columns: Index, Docs, Size, Health, Shards (primary/replica)
|
||||
- Source: OpenSearch `_cat/indices` API
|
||||
- **Manual refresh only**
|
||||
|
||||
### Performance Section
|
||||
|
||||
- Metrics: query cache hit rate, request cache hit rate, average search latency, average indexing latency, JVM heap used (visual bar with used/max)
|
||||
- Source: OpenSearch `_nodes/stats` API
|
||||
- **Auto-refreshes every 15 seconds**
|
||||
|
||||
### Operations Section (Phase 2 — Visible but Disabled)
|
||||
|
||||
- Buttons: Force Reindex All, Flush Index, Delete Index (bulk via checkbox selection)
|
||||
- Greyed out with tooltip: "Available in a future release"
|
||||
|
||||
### Thresholds Section
|
||||
|
||||
- Collapsible, collapsed by default
|
||||
- Configurable values:
|
||||
- Cluster health: warning level, critical level
|
||||
- Queue depth: warning count, critical count
|
||||
- JVM heap usage: warning %, critical %
|
||||
- Failed docs: warning count, critical count
|
||||
- Save button persists to database
|
||||
|
||||
---
|
||||
|
||||
## 4. Audit Log Page (`/admin/audit`)
|
||||
|
||||
### Purpose
|
||||
|
||||
Database-backed audit trail of all administrative actions across the system. Provides SOC2-compliant evidence of who did what, when, and from where. The audit log is append-only — entries cannot be modified or deleted through the UI or API.
|
||||
|
||||
### Header
|
||||
|
||||
- Total event count
|
||||
- Date range selector (default: last 7 days)
|
||||
|
||||
### Audit Log Table
|
||||
|
||||
```
|
||||
┌─ Audit Log ────────────────────────────────────────────────┐
|
||||
│ Date range: [2026-03-10] to [2026-03-17] │
|
||||
│ [User: All ▾] [Category: All ▾] [Search: ________] │
|
||||
│ │
|
||||
│ Timestamp User Category Action Target │
|
||||
│ 2026-03-17 14:32:01 admin INFRA kill_query PID 42 │
|
||||
│ 2026-03-17 14:28:15 admin INFRA delete_idx exec-… │
|
||||
│ 2026-03-17 12:01:44 admin CONFIG update oidc │
|
||||
│ 2026-03-17 09:15:22 jdoe AUTH login │
|
||||
│ 2026-03-16 18:45:00 admin USER_MGMT update_roles u:5 │
|
||||
│ ... │
|
||||
│ │
|
||||
│ ◀ 1 2 3 ... 12 ▶ Showing 1-25 of 294 │
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- **Filterable** by user, category, date range
|
||||
- **Searchable** by free text (matches action, target, detail)
|
||||
- **Sortable** by timestamp (default: newest first)
|
||||
- **Pagination** — 25 per page, server-side
|
||||
- **Detail expansion** — click a row to expand and show full `detail` JSON
|
||||
- **Read-only** — no edit or delete actions available (compliance requirement)
|
||||
- **Export** (Phase 2) — CSV/JSON download for auditors
|
||||
|
||||
### Audit Categories
|
||||
|
||||
| Category | Actions Logged |
|
||||
|----------|---------------|
|
||||
| `INFRA` | kill_query, delete_index, update_thresholds |
|
||||
| `AUTH` | login, login_oidc, logout, login_failed |
|
||||
| `USER_MGMT` | create_user, update_roles, delete_user |
|
||||
| `CONFIG` | update_oidc, delete_oidc, test_oidc |
|
||||
|
||||
### What Gets Logged
|
||||
|
||||
Every admin action across the system, not just infrastructure pages:
|
||||
|
||||
- **Infrastructure:** kill query, delete OpenSearch index, save thresholds
|
||||
- **OIDC:** save config, delete config, test connection
|
||||
- **User management:** update roles, delete user
|
||||
- **Authentication:** login (success and failure), OIDC login, logout
|
||||
|
||||
### Audit Record Fields
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `timestamp` | When the action occurred (server time, UTC) |
|
||||
| `username` | Authenticated user who performed the action |
|
||||
| `action` | Machine-readable action name (e.g., `kill_query`, `delete_index`) |
|
||||
| `category` | Grouping: `INFRA`, `AUTH`, `USER_MGMT`, `CONFIG` |
|
||||
| `target` | What was acted on (e.g., PID, index name, user ID) |
|
||||
| `detail` | JSONB with action-specific context (e.g., query text for killed query, old/new roles for role change) |
|
||||
| `result` | `SUCCESS` or `FAILURE` |
|
||||
| `ip_address` | Client IP address from the request |
|
||||
|
||||
### Backend Implementation
|
||||
|
||||
- `AuditService` — central service injected into all admin controllers
|
||||
- Single method: `log(action, category, target, detail, result)`
|
||||
- Extracts username and IP from `SecurityContextHolder` and `HttpServletRequest`
|
||||
- Writes to both the `audit_log` table AND SLF4J (belt and suspenders)
|
||||
- Async write option not used — audit must be synchronous for compliance guarantees
|
||||
|
||||
---
|
||||
|
||||
## 5. Backend API
|
||||
|
||||
All endpoints under `/api/v1/admin/` — secured by existing Spring Security filter chain (`ROLE_ADMIN` required). Controllers additionally annotated with `@PreAuthorize("hasRole('ADMIN')")` for defense-in-depth.
|
||||
|
||||
### Database Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/admin/database/status` | Version, host, schema, connection state |
|
||||
| `GET` | `/admin/database/pool` | Active, idle, pending, max wait (HikariCP) |
|
||||
| `GET` | `/admin/database/tables` | Table names, row counts, data sizes, index sizes |
|
||||
| `GET` | `/admin/database/queries` | Active queries: pid, duration, state, SQL |
|
||||
| `POST` | `/admin/database/queries/{pid}/kill` | Terminate query via `pg_terminate_backend` |
|
||||
|
||||
### OpenSearch Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/admin/opensearch/status` | Version, host, cluster health, node count |
|
||||
| `GET` | `/admin/opensearch/pipeline` | Queue depth, failed count, debounce, rate, last indexed |
|
||||
| `GET` | `/admin/opensearch/indices` | Paginated, sortable, filterable index list |
|
||||
| `DELETE` | `/admin/opensearch/indices/{name}` | Delete specific index (with audit log) |
|
||||
| `GET` | `/admin/opensearch/performance` | Cache rates, latencies, JVM heap |
|
||||
|
||||
#### Indices Query Parameters
|
||||
|
||||
| Param | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `search` | string | — | Filter by index name pattern |
|
||||
| `health` | enum | `ALL` | Filter by health: ALL, GREEN, YELLOW, RED |
|
||||
| `sort` | string | `name` | Sort field: name, docs, size, health |
|
||||
| `order` | enum | `asc` | Sort direction: asc, desc |
|
||||
| `page` | int | `0` | Page number (zero-based) |
|
||||
| `size` | int | `10` | Page size |
|
||||
|
||||
### Audit Log Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/admin/audit` | Paginated, filterable audit log entries |
|
||||
|
||||
#### Audit Log Query Parameters
|
||||
|
||||
| Param | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `username` | string | — | Filter by username |
|
||||
| `category` | enum | — | Filter by category: INFRA, AUTH, USER_MGMT, CONFIG |
|
||||
| `search` | string | — | Free text search across action, target, detail |
|
||||
| `from` | ISO date | 7 days ago | Start of date range |
|
||||
| `to` | ISO date | now | End of date range |
|
||||
| `sort` | string | `timestamp` | Sort field |
|
||||
| `order` | enum | `desc` | Sort direction: asc, desc |
|
||||
| `page` | int | `0` | Page number (zero-based) |
|
||||
| `size` | int | `25` | Page size |
|
||||
|
||||
### Thresholds Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| `GET` | `/admin/thresholds` | All configured thresholds |
|
||||
| `PUT` | `/admin/thresholds` | Save thresholds (database + OpenSearch in one payload) |
|
||||
|
||||
### Thresholds Payload
|
||||
|
||||
```json
|
||||
{
|
||||
"database": {
|
||||
"connectionPoolWarning": 80,
|
||||
"connectionPoolCritical": 95,
|
||||
"queryDurationWarning": 1.0,
|
||||
"queryDurationCritical": 10.0
|
||||
},
|
||||
"opensearch": {
|
||||
"clusterHealthWarning": "YELLOW",
|
||||
"clusterHealthCritical": "RED",
|
||||
"queueDepthWarning": 100,
|
||||
"queueDepthCritical": 500,
|
||||
"jvmHeapWarning": 75,
|
||||
"jvmHeapCritical": 90,
|
||||
"failedDocsWarning": 1,
|
||||
"failedDocsCritical": 10
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Security
|
||||
|
||||
### Enforcement Layers
|
||||
|
||||
1. **Spring Security filter chain** — `/api/v1/admin/**` requires `ROLE_ADMIN` (existing configuration)
|
||||
2. **Controller annotation** — `@PreAuthorize("hasRole('ADMIN')")` on each controller class (defense-in-depth)
|
||||
3. **UI role check** — sidebar admin section hidden for non-admin users (cosmetic only, not a security boundary)
|
||||
|
||||
### Audit Logging
|
||||
|
||||
All admin actions are persisted to the `audit_log` database table (see Section 4 and Section 7 — Data Storage) AND logged via SLF4J at INFO level. The database record is the source of truth for compliance; the SLF4J log provides operational visibility.
|
||||
|
||||
The `AuditService` is injected into all admin controllers (infrastructure, OIDC, user management) and the authentication flow. See Section 4 (Audit Log Page) for full details on what is logged and the record structure.
|
||||
|
||||
### Future: OPERATOR Role (Phase 2+)
|
||||
|
||||
Design anticipates a read-only `OPERATOR` role:
|
||||
- Can view all monitoring data
|
||||
- Cannot perform destructive actions (kill, delete)
|
||||
- Implementation: method-level `@PreAuthorize` on action endpoints, UI conditionally disables buttons based on role
|
||||
|
||||
---
|
||||
|
||||
## 7. Data Storage
|
||||
|
||||
### New Flyway Migration: V9
|
||||
|
||||
```sql
|
||||
CREATE TABLE admin_thresholds (
|
||||
id INTEGER PRIMARY KEY DEFAULT 1,
|
||||
config JSONB NOT NULL DEFAULT '{}',
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
updated_by TEXT NOT NULL,
|
||||
CONSTRAINT single_row CHECK (id = 1)
|
||||
);
|
||||
|
||||
CREATE TABLE audit_log (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||||
username TEXT NOT NULL,
|
||||
action TEXT NOT NULL,
|
||||
category TEXT NOT NULL,
|
||||
target TEXT,
|
||||
detail JSONB,
|
||||
result TEXT NOT NULL,
|
||||
ip_address TEXT
|
||||
);
|
||||
|
||||
CREATE INDEX idx_audit_log_timestamp ON audit_log (timestamp DESC);
|
||||
CREATE INDEX idx_audit_log_username ON audit_log (username);
|
||||
CREATE INDEX idx_audit_log_category ON audit_log (category);
|
||||
```
|
||||
|
||||
**admin_thresholds:**
|
||||
- Single-row table (same pattern as `oidc_config`)
|
||||
- JSON column for flexibility — adding new thresholds doesn't require schema changes
|
||||
- Tracks who last updated and when
|
||||
|
||||
**audit_log:**
|
||||
- Append-only table — no UPDATE or DELETE exposed via API
|
||||
- Indexed on timestamp (primary query axis), username, and category for filtered views
|
||||
- JSONB `detail` column holds action-specific context without schema changes
|
||||
- No foreign key to `users` table — username is denormalized so audit records survive user deletion
|
||||
|
||||
---
|
||||
|
||||
## 8. Frontend Architecture
|
||||
|
||||
### New Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `pages/admin/DatabaseAdminPage.tsx` | Database monitoring and management |
|
||||
| `pages/admin/OpenSearchAdminPage.tsx` | OpenSearch monitoring and management |
|
||||
| `pages/admin/AuditLogPage.tsx` | Audit log viewer |
|
||||
| `api/queries/admin/database.ts` | React Query hooks for database endpoints |
|
||||
| `api/queries/admin/opensearch.ts` | React Query hooks for OpenSearch endpoints |
|
||||
| `api/queries/admin/thresholds.ts` | React Query hooks for threshold endpoints |
|
||||
| `api/queries/admin/audit.ts` | React Query hooks for audit log endpoint |
|
||||
| `components/admin/StatusBadge.tsx` | Color-coded status indicator (green/yellow/red) |
|
||||
| `components/admin/RefreshableCard.tsx` | Card with manual refresh button + optional auto-refresh |
|
||||
| `components/admin/ConfirmDeleteDialog.tsx` | Confirmation dialog requiring name input for destructive actions |
|
||||
|
||||
### Modified Files
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `components/layout/AppSidebar.tsx` | Refactor admin section to collapsible sub-menu with multiple items |
|
||||
| `router.tsx` | Add routes for `/admin/database`, `/admin/opensearch`, `/admin/audit`, redirect `/admin` |
|
||||
| `SpaForwardController.java` | Ensure `/admin/*` forwarding covers new routes |
|
||||
|
||||
### Auto-Refresh Strategy
|
||||
|
||||
- React Query `refetchInterval: 15000` on lightweight endpoints (pool, queries, pipeline, performance)
|
||||
- Heavy endpoints (tables, indices) use `refetchInterval: false` — manual refresh only
|
||||
- Refresh button calls `queryClient.invalidateQueries` for all queries on that page
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Phases
|
||||
|
||||
### Phase 1 (Current Scope)
|
||||
|
||||
1. Admin sidebar restructuring
|
||||
2. Database page — all monitoring sections + kill query
|
||||
3. OpenSearch page — all monitoring sections + delete index
|
||||
4. Threshold configuration (both pages)
|
||||
5. Audit log — database-backed audit trail + admin viewer page
|
||||
6. Retrofit audit logging into existing admin controllers (OIDC, user management) and auth flow
|
||||
7. Backend endpoints with RBAC enforcement
|
||||
8. Flyway migration V9 for thresholds + audit_log tables
|
||||
|
||||
### Phase 2
|
||||
|
||||
- Database maintenance actions (VACUUM ANALYZE, Reindex)
|
||||
- OpenSearch operations (Force Reindex All, Flush)
|
||||
- Bulk index operations (checkbox selection)
|
||||
- Audit log CSV/JSON export for auditors
|
||||
- OPERATOR role with view-only permissions
|
||||
|
||||
### Phase 3
|
||||
|
||||
- TimescaleDB-aware metrics (hypertable chunks, continuous aggregate status, compression)
|
||||
- Historical trend charts for key metrics
|
||||
- Alerting/notification system
|
||||
Reference in New Issue
Block a user