- Document SearchIndexerStats interface and required SearchIndexer changes - Add @EnableMethodSecurity prerequisite and retrofit of existing controllers - Limit audit log free-text search to indexed text columns (not JSONB) - Split migrations into V9 (thresholds) and V10 (audit_log) - Add user_agent field to audit records for SOC2 forensics - Add thresholds validation rules, pagination limits, error response shapes - Clarify SPA forwarding, single-row pattern, OpenSearch client reuse - Add audit log retention note for Phase 2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
22 KiB
Infrastructure Overview — Admin Pages Design
Date: 2026-03-17 Status: Approved Scope: Phase 1 implementation; full vision documented with Phase 2+ sections marked
Overview
Add Database and OpenSearch admin pages to the Cameleer3 Server UI, allowing administrators to monitor subsystem health, inspect metrics, and perform basic maintenance actions. Restructure admin navigation from a single OIDC page to a sidebar sub-menu with dedicated pages per concern.
Goals
- Give admins real-time visibility into PostgreSQL and OpenSearch health, performance, and storage
- Enable basic maintenance actions (kill queries, delete indices) without SSH/kubectl access
- Provide configurable thresholds for visual status indicators (green/yellow/red)
- Establish a database-backed audit log for all admin actions (SOC2 compliance foundation)
- Design for future expansion (VACUUM, reindex, OPERATOR role) without requiring restructuring
Non-Goals (Phase 1)
- Database maintenance actions (VACUUM ANALYZE, Reindex)
- OpenSearch bulk operations (Force Reindex All, Flush)
- OPERATOR role with restricted permissions
- TimescaleDB-specific features (hypertable stats, continuous aggregate status)
- Alerting or notifications beyond visual indicators
1. Admin Navigation Restructuring
Current State
Single gear icon at bottom of AppSidebar linking directly to /admin/oidc.
New Structure
The gear icon expands/collapses an admin sub-menu in the sidebar:
── Apps ──────────────
app-1
app-2
── Admin (gear icon) ─
Database → /admin/database
OpenSearch → /admin/opensearch
Audit Log → /admin/audit
OIDC → /admin/oidc
Users → /admin/users (backend exists, UI page is future scope)
- Admin section visible only to users with
ADMINrole - Section collapsed by default; state persisted in localStorage
- Active sub-item highlighted
/adminredirects to/admin/database- Existing
OidcAdminPageunchanged functionally, re-routed from being the sole admin page to a sub-page
2. Database Page (/admin/database)
Header
- Connection status badge (green/red)
- PostgreSQL version (with TimescaleDB extension noted if present)
- Host and schema name
- Manual refresh button (refreshes all sections)
Connection Pool Section
- Visual bar showing active connections vs. max pool size
- Metrics: active, idle, pending, max wait time
- Status badge based on configurable threshold (% of pool in use)
- Source: HikariCP pool MXBean
- Auto-refreshes every 15 seconds
Table Sizes Section
- Table with columns: Table, Rows, Size, Index Size
- All application tables listed (executions, processor_executions, route_diagrams, agent_metrics, users, oidc_config, admin_thresholds)
- Summary row: total data size, total index size
- Source:
pg_stat_user_tables+pg_relation_size - Manual refresh only (expensive query)
Active Queries Section
- Table with columns: PID, Duration, State, Query (truncated), Action
- Queries > warning threshold highlighted yellow, > critical threshold highlighted red
- Kill button per row → calls
pg_terminate_backend(pid) - Kill requires confirmation dialog
- After kill, query list refreshes automatically
- Source:
pg_stat_activity - Auto-refreshes every 15 seconds
Maintenance Section (Phase 2 — Visible but Disabled)
- Buttons: Run VACUUM ANALYZE, Reindex Tables
- Greyed out with tooltip: "Available in a future release"
Thresholds Section
- Collapsible, collapsed by default
- Configurable values:
- Connection pool usage: warning % and critical %
- Query duration: warning seconds and critical seconds
- Save button persists to database
3. OpenSearch Page (/admin/opensearch)
Header
- Cluster health badge (green/yellow/red — maps directly to OpenSearch cluster health)
- OpenSearch version
- Node count
- Host URL
- Manual refresh button
Indexing Pipeline Section
- Visual bar showing queue depth vs. max queue size
- Metrics: queue depth, failed document count, debounce interval, indexing rate (docs/s), time since last indexed
- Status badge based on configurable thresholds
- Source:
SearchIndexerinternal stats, exposed via a newSearchIndexerStatsinterface incameleer3-server-core - Auto-refreshes every 15 seconds
Implementation note: SearchIndexer currently has no stats API. This requires adding:
AtomicLong failedCount— incremented on indexing errorsAtomicLong indexedCount— incremented on successful index operationsvolatile Instant lastIndexedAt— updated after each successful batch- Rate calculation via a sliding window counter (e.g., count delta over last 15s interval)
SearchIndexerStatsinterface in core module with getters for all above, implemented bySearchIndexerqueueDepthandmaxQueueSizealready derivable from the internalBlockingQueue
Indices Section
- Search/filter by index name pattern (text input)
- Filter by health — All / Green / Yellow / Red dropdown
- Sortable columns — Name, Docs, Size, Health, Shards (click column header)
- Pagination — 10 per page, server-side
- Summary row above table — total index count, total docs, total storage
- Delete button (trash icon) per row:
- Confirmation dialog: "Delete index
{name}? This cannot be undone." - User must type the index name to confirm
- After deletion, table and summary refresh
- Confirmation dialog: "Delete index
- Table columns: Index, Docs, Size, Health, Shards (primary/replica)
- Source: OpenSearch
_cat/indicesAPI - Manual refresh only
Performance Section
- Metrics: query cache hit rate, request cache hit rate, average search latency, average indexing latency, JVM heap used (visual bar with used/max)
- Source: OpenSearch
_nodes/statsAPI - Auto-refreshes every 15 seconds
Operations Section (Phase 2 — Visible but Disabled)
- Buttons: Force Reindex All, Flush Index, Delete Index (bulk via checkbox selection)
- Greyed out with tooltip: "Available in a future release"
Thresholds Section
- Collapsible, collapsed by default
- Configurable values:
- Cluster health: warning level, critical level
- Queue depth: warning count, critical count
- JVM heap usage: warning %, critical %
- Failed docs: warning count, critical count
- Save button persists to database
4. Audit Log Page (/admin/audit)
Purpose
Database-backed audit trail of all administrative actions across the system. Provides SOC2-compliant evidence of who did what, when, and from where. The audit log is append-only — entries cannot be modified or deleted through the UI or API.
Header
- Total event count
- Date range selector (default: last 7 days)
Audit Log Table
┌─ Audit Log ────────────────────────────────────────────────┐
│ Date range: [2026-03-10] to [2026-03-17] │
│ [User: All ▾] [Category: All ▾] [Search: ________] │
│ │
│ Timestamp User Category Action Target │
│ 2026-03-17 14:32:01 admin INFRA kill_query PID 42 │
│ 2026-03-17 14:28:15 admin INFRA delete_idx exec-… │
│ 2026-03-17 12:01:44 admin CONFIG update oidc │
│ 2026-03-17 09:15:22 jdoe AUTH login │
│ 2026-03-16 18:45:00 admin USER_MGMT update_roles u:5 │
│ ... │
│ │
│ ◀ 1 2 3 ... 12 ▶ Showing 1-25 of 294 │
└────────────────────────────────────────────────────────────┘
- Filterable by user, category, date range
- Searchable by free text (matches
actionandtargetcolumns only — JSONBdetailexcluded from text search for performance) - Sortable by timestamp (default: newest first)
- Pagination — 25 per page, server-side
- Detail expansion — click a row to expand and show full
detailJSON - Read-only — no edit or delete actions available (compliance requirement)
- Export (Phase 2) — CSV/JSON download for auditors
Audit Categories
| Category | Actions Logged |
|---|---|
INFRA |
kill_query, delete_index, update_thresholds |
AUTH |
login, login_oidc, logout, login_failed |
USER_MGMT |
create_user, update_roles, delete_user |
CONFIG |
update_oidc, delete_oidc, test_oidc |
What Gets Logged
Every admin action across the system, not just infrastructure pages:
- Infrastructure: kill query, delete OpenSearch index, save thresholds
- OIDC: save config, delete config, test connection
- User management: update roles, delete user
- Authentication: login (success and failure), OIDC login, logout
Audit Record Fields
| Field | Description |
|---|---|
timestamp |
When the action occurred (server time, UTC) |
username |
Authenticated user who performed the action |
action |
Machine-readable action name (e.g., kill_query, delete_index) |
category |
Grouping: INFRA, AUTH, USER_MGMT, CONFIG |
target |
What was acted on (e.g., PID, index name, user ID) |
detail |
JSONB with action-specific context (e.g., query text for killed query, old/new roles for role change) |
result |
SUCCESS or FAILURE |
ip_address |
Client IP address from the request |
user_agent |
Browser/client identification from request header (SOC2 forensics) |
Backend Implementation
AuditService— central service incameleer3-server-core, injected into all admin controllers via direct method calls (no AOP/interceptor — consistent with existing controller style)- Primary method:
log(action, category, target, detail, result)— extracts username and IP fromSecurityContextHolderandHttpServletRequest - Overloaded method for pre-auth contexts:
log(username, action, category, target, detail, result, request)— used by auth controllers whereSecurityContextis not yet populated (login success/failure) - Captures
user_agentfromHttpServletRequestheader - Writes to both the
audit_logtable AND SLF4J (belt and suspenders) - Async write option not used — audit must be synchronous for compliance guarantees
- Retrofit into existing controllers: add
auditService.log(...)calls toOidcConfigAdminController(save, delete, test) andUserAdminController(update roles, delete user), and auth controllers (login, OIDC login, logout, failed login)
5. Backend API
All endpoints under /api/v1/admin/ — secured by existing Spring Security filter chain (ROLE_ADMIN required). Controllers additionally annotated with @PreAuthorize("hasRole('ADMIN')") for defense-in-depth.
Database Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/admin/database/status |
Version, host, schema, connection state |
GET |
/admin/database/pool |
Active, idle, pending, max wait (HikariCP) |
GET |
/admin/database/tables |
Table names, row counts, data sizes, index sizes |
GET |
/admin/database/queries |
Active queries: pid, duration, state, SQL |
POST |
/admin/database/queries/{pid}/kill |
Terminate query via pg_terminate_backend |
OpenSearch Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/admin/opensearch/status |
Version, host, cluster health, node count |
GET |
/admin/opensearch/pipeline |
Queue depth, failed count, debounce, rate, last indexed |
GET |
/admin/opensearch/indices |
Paginated, sortable, filterable index list |
DELETE |
/admin/opensearch/indices/{name} |
Delete specific index (with audit log) |
GET |
/admin/opensearch/performance |
Cache rates, latencies, JVM heap |
OpenSearch client: All admin OpenSearch endpoints reuse the existing OpenSearchClient bean configured in OpenSearchConfig.java. No separate client or credentials needed — the admin endpoints call cluster-level APIs (_cluster/health, _cat/indices, _nodes/stats) using the same connection.
Indices Query Parameters
| Param | Type | Default | Description |
|---|---|---|---|
search |
string | — | Filter by index name pattern |
health |
enum | ALL |
Filter by health: ALL, GREEN, YELLOW, RED |
sort |
string | name |
Sort field: name, docs, size, health |
order |
enum | asc |
Sort direction: asc, desc |
page |
int | 0 |
Page number (zero-based) |
size |
int | 10 |
Page size |
Audit Log Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/admin/audit |
Paginated, filterable audit log entries |
Audit Log Query Parameters
| Param | Type | Default | Description |
|---|---|---|---|
username |
string | — | Filter by username |
category |
enum | — | Filter by category: INFRA, AUTH, USER_MGMT, CONFIG |
search |
string | — | Free text search across action and target columns (not JSONB detail) |
from |
ISO date | 7 days ago | Start of date range |
to |
ISO date | now | End of date range |
sort |
string | timestamp |
Sort field |
order |
enum | desc |
Sort direction: asc, desc |
page |
int | 0 |
Page number (zero-based) |
size |
int | 25 |
Page size |
Thresholds Endpoints
| Method | Path | Description |
|---|---|---|
GET |
/admin/thresholds |
All configured thresholds |
PUT |
/admin/thresholds |
Save thresholds (database + OpenSearch in one payload) |
Thresholds Payload
{
"database": {
"connectionPoolWarning": 80,
"connectionPoolCritical": 95,
"queryDurationWarning": 1.0,
"queryDurationCritical": 10.0
},
"opensearch": {
"clusterHealthWarning": "YELLOW",
"clusterHealthCritical": "RED",
"queueDepthWarning": 100,
"queueDepthCritical": 500,
"jvmHeapWarning": 75,
"jvmHeapCritical": 90,
"failedDocsWarning": 1,
"failedDocsCritical": 10
}
}
Thresholds Validation Rules
- Warning must be <= critical for all numeric threshold pairs
- Percentage values must be 0–100
- Duration values must be > 0
clusterHealthWarningmust be less severe thanclusterHealthCritical(GREEN < YELLOW < RED)- Backend returns 400 Bad Request with field-level error messages on validation failure
Pagination Limits
- All paginated endpoints enforce a maximum page size of 100 (
min(requested, 100)) - Applies to: indices listing, audit log
Error Responses
All new endpoints return errors in a consistent shape:
{
"status": 404,
"error": "Not Found",
"message": "No active query with PID 12345"
}
Specific error cases:
POST /admin/database/queries/{pid}/kill— 404 if PID not found, 500 ifpg_terminate_backendfailsDELETE /admin/opensearch/indices/{name}— 404 if index not found, 502 if OpenSearch unreachableGET /admin/database/status— returns 200 with"connected": falseif database is unreachable (not 503), so the frontend can render a red status badge rather than an error stateGET /admin/opensearch/status— returns 200 with"clusterHealth": "UNREACHABLE"if OpenSearch is down
6. Security
Enforcement Layers
- Spring Security filter chain —
/api/v1/admin/**requiresROLE_ADMIN(existing configuration) - Controller annotation —
@PreAuthorize("hasRole('ADMIN')")on each controller class (defense-in-depth). This is a new convention — existing controllers (OidcConfigAdminController,UserAdminController) must be retrofitted with this annotation as part of Phase 1. @EnableMethodSecurity— must be added toSecurityConfig.javato activate@PreAuthorizeprocessing (prerequisite for layer 2)- UI role check — sidebar admin section hidden for non-admin users (cosmetic only, not a security boundary)
Audit Logging
All admin actions are persisted to the audit_log database table (see Section 4 and Section 7 — Data Storage) AND logged via SLF4J at INFO level. The database record is the source of truth for compliance; the SLF4J log provides operational visibility.
The AuditService is injected into all admin controllers (infrastructure, OIDC, user management) and the authentication flow. See Section 4 (Audit Log Page) for full details on what is logged and the record structure.
Future: OPERATOR Role (Phase 2+)
Design anticipates a read-only OPERATOR role:
- Can view all monitoring data
- Cannot perform destructive actions (kill, delete)
- Implementation: method-level
@PreAuthorizeon action endpoints, UI conditionally disables buttons based on role
7. Data Storage
Flyway Migration V9: Admin Thresholds
CREATE TABLE admin_thresholds (
id INTEGER PRIMARY KEY DEFAULT 1,
config JSONB NOT NULL DEFAULT '{}',
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_by TEXT NOT NULL,
CONSTRAINT single_row CHECK (id = 1)
);
- Single-row table using
CHECK (id = 1)constraint — stricter than theoidc_configpattern (which uses a text PK defaulting to'default'without a constraint). The CHECK approach is preferred going forward as it explicitly prevents multiple rows. - JSON column for flexibility — adding new thresholds doesn't require schema changes
- Tracks who last updated and when
Flyway Migration V10: Audit Log
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
username TEXT NOT NULL,
action TEXT NOT NULL,
category TEXT NOT NULL,
target TEXT,
detail JSONB,
result TEXT NOT NULL,
ip_address TEXT,
user_agent TEXT
);
CREATE INDEX idx_audit_log_timestamp ON audit_log (timestamp DESC);
CREATE INDEX idx_audit_log_username ON audit_log (username);
CREATE INDEX idx_audit_log_category ON audit_log (category);
CREATE INDEX idx_audit_log_action ON audit_log (action);
CREATE INDEX idx_audit_log_target ON audit_log (target);
- Separate migration from thresholds so they can be developed and tested independently
- Append-only table — no UPDATE or DELETE exposed via API
- Indexed on timestamp (primary query axis), username, category, action, and target for filtered views and free-text search via
ILIKEon indexed text columns - JSONB
detailcolumn holds action-specific context without schema changes (not searched via free text — use row expansion for detail inspection) user_agentfield captures client identification for forensic analysis (SOC2)- No foreign key to
userstable — username is denormalized so audit records survive user deletion - Retention: unbounded in Phase 1. Phase 2+ should add a retention/archival strategy (e.g., TimescaleDB hypertable with retention policy, or periodic archive to cold storage). Typical SOC2 retention is 7 years.
8. Frontend Architecture
New Files
| File | Purpose |
|---|---|
pages/admin/DatabaseAdminPage.tsx |
Database monitoring and management |
pages/admin/OpenSearchAdminPage.tsx |
OpenSearch monitoring and management |
pages/admin/AuditLogPage.tsx |
Audit log viewer |
api/queries/admin/database.ts |
React Query hooks for database endpoints |
api/queries/admin/opensearch.ts |
React Query hooks for OpenSearch endpoints |
api/queries/admin/thresholds.ts |
React Query hooks for threshold endpoints |
api/queries/admin/audit.ts |
React Query hooks for audit log endpoint |
components/admin/StatusBadge.tsx |
Color-coded status indicator (green/yellow/red) |
components/admin/RefreshableCard.tsx |
Card with manual refresh button + optional auto-refresh |
components/admin/ConfirmDeleteDialog.tsx |
Confirmation dialog requiring name input for destructive actions |
Modified Files
| File | Change |
|---|---|
components/layout/AppSidebar.tsx |
Refactor admin section to collapsible sub-menu with multiple items |
router.tsx |
Add routes for /admin/database, /admin/opensearch, /admin/audit, redirect /admin |
SpaForwardController.java |
Existing /admin/{path:[^\\.]*} pattern already covers single-segment routes — no change needed unless deeper routes are added |
Auto-Refresh Strategy
- React Query
refetchInterval: 15000on lightweight endpoints (pool, queries, pipeline, performance) - Heavy endpoints (tables, indices) use
refetchInterval: false— manual refresh only - Refresh button calls
queryClient.invalidateQueriesfor all queries on that page
9. Implementation Phases
Phase 1 (Current Scope)
- Admin sidebar restructuring
- Database page — all monitoring sections + kill query
- OpenSearch page — all monitoring sections + delete index
- Threshold configuration (both pages)
- Audit log — database-backed audit trail + admin viewer page
- Retrofit audit logging into existing admin controllers (OIDC, user management) and auth flow
- Backend endpoints with RBAC enforcement
- Flyway migrations V9 (thresholds) and V10 (audit_log)
SearchIndexerStatsinterface andSearchIndexerstats instrumentation@EnableMethodSecurity+@PreAuthorizeretrofit on existing admin controllers
Phase 2
- Database maintenance actions (VACUUM ANALYZE, Reindex)
- OpenSearch operations (Force Reindex All, Flush)
- Bulk index operations (checkbox selection)
- Audit log CSV/JSON export for auditors
- Audit log retention/archival strategy (7-year SOC2 requirement)
- OPERATOR role with view-only permissions
Phase 3
- TimescaleDB-aware metrics (hypertable chunks, continuous aggregate status, compression)
- Historical trend charts for key metrics
- Alerting/notification system