docs: address spec review feedback for infrastructure overview

- Document SearchIndexerStats interface and required SearchIndexer changes
- Add @EnableMethodSecurity prerequisite and retrofit of existing controllers
- Limit audit log free-text search to indexed text columns (not JSONB)
- Split migrations into V9 (thresholds) and V10 (audit_log)
- Add user_agent field to audit records for SOC2 forensics
- Add thresholds validation rules, pagination limits, error response shapes
- Clarify SPA forwarding, single-row pattern, OpenSearch client reuse
- Add audit log retention note for Phase 2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
hsiegeln
2026-03-17 15:01:53 +01:00
parent 2bcbff3ee6
commit a634bf9f9d

View File

@@ -45,7 +45,7 @@ The gear icon expands/collapses an admin sub-menu in the sidebar:
OpenSearch → /admin/opensearch
Audit Log → /admin/audit
OIDC → /admin/oidc
Users → (future)
Users → /admin/users (backend exists, UI page is future scope)
```
- Admin section visible only to users with `ADMIN` role
@@ -121,9 +121,17 @@ The gear icon expands/collapses an admin sub-menu in the sidebar:
- Visual bar showing queue depth vs. max queue size
- Metrics: queue depth, failed document count, debounce interval, indexing rate (docs/s), time since last indexed
- Status badge based on configurable thresholds
- Source: `SearchIndexer` internal stats (exposed via `SearchIndexerStats` interface)
- Source: `SearchIndexer` internal stats, exposed via a new `SearchIndexerStats` interface in `cameleer3-server-core`
- **Auto-refreshes every 15 seconds**
**Implementation note:** `SearchIndexer` currently has no stats API. This requires adding:
- `AtomicLong failedCount` — incremented on indexing errors
- `AtomicLong indexedCount` — incremented on successful index operations
- `volatile Instant lastIndexedAt` — updated after each successful batch
- Rate calculation via a sliding window counter (e.g., count delta over last 15s interval)
- `SearchIndexerStats` interface in core module with getters for all above, implemented by `SearchIndexer`
- `queueDepth` and `maxQueueSize` already derivable from the internal `BlockingQueue`
### Indices Section
- **Search/filter** by index name pattern (text input)
@@ -193,7 +201,7 @@ Database-backed audit trail of all administrative actions across the system. Pro
```
- **Filterable** by user, category, date range
- **Searchable** by free text (matches action, target, detail)
- **Searchable** by free text (matches `action` and `target` columns only — JSONB `detail` excluded from text search for performance)
- **Sortable** by timestamp (default: newest first)
- **Pagination** — 25 per page, server-side
- **Detail expansion** — click a row to expand and show full `detail` JSON
@@ -230,14 +238,17 @@ Every admin action across the system, not just infrastructure pages:
| `detail` | JSONB with action-specific context (e.g., query text for killed query, old/new roles for role change) |
| `result` | `SUCCESS` or `FAILURE` |
| `ip_address` | Client IP address from the request |
| `user_agent` | Browser/client identification from request header (SOC2 forensics) |
### Backend Implementation
- `AuditService` — central service injected into all admin controllers
- Single method: `log(action, category, target, detail, result)`
- Extracts username and IP from `SecurityContextHolder` and `HttpServletRequest`
- `AuditService` — central service in `cameleer3-server-core`, injected into all admin controllers via direct method calls (no AOP/interceptor — consistent with existing controller style)
- Primary method: `log(action, category, target, detail, result)` — extracts username and IP from `SecurityContextHolder` and `HttpServletRequest`
- Overloaded method for pre-auth contexts: `log(username, action, category, target, detail, result, request)` — used by auth controllers where `SecurityContext` is not yet populated (login success/failure)
- Captures `user_agent` from `HttpServletRequest` header
- Writes to both the `audit_log` table AND SLF4J (belt and suspenders)
- Async write option not used — audit must be synchronous for compliance guarantees
- Retrofit into existing controllers: add `auditService.log(...)` calls to `OidcConfigAdminController` (save, delete, test) and `UserAdminController` (update roles, delete user), and auth controllers (login, OIDC login, logout, failed login)
---
@@ -265,6 +276,8 @@ All endpoints under `/api/v1/admin/` — secured by existing Spring Security fil
| `DELETE` | `/admin/opensearch/indices/{name}` | Delete specific index (with audit log) |
| `GET` | `/admin/opensearch/performance` | Cache rates, latencies, JVM heap |
**OpenSearch client:** All admin OpenSearch endpoints reuse the existing `OpenSearchClient` bean configured in `OpenSearchConfig.java`. No separate client or credentials needed — the admin endpoints call cluster-level APIs (`_cluster/health`, `_cat/indices`, `_nodes/stats`) using the same connection.
#### Indices Query Parameters
| Param | Type | Default | Description |
@@ -288,7 +301,7 @@ All endpoints under `/api/v1/admin/` — secured by existing Spring Security fil
|-------|------|---------|-------------|
| `username` | string | — | Filter by username |
| `category` | enum | — | Filter by category: INFRA, AUTH, USER_MGMT, CONFIG |
| `search` | string | — | Free text search across action, target, detail |
| `search` | string | — | Free text search across `action` and `target` columns (not JSONB `detail`) |
| `from` | ISO date | 7 days ago | Start of date range |
| `to` | ISO date | now | End of date range |
| `sort` | string | `timestamp` | Sort field |
@@ -326,6 +339,37 @@ All endpoints under `/api/v1/admin/` — secured by existing Spring Security fil
}
```
### Thresholds Validation Rules
- Warning must be <= critical for all numeric threshold pairs
- Percentage values must be 0100
- Duration values must be > 0
- `clusterHealthWarning` must be less severe than `clusterHealthCritical` (GREEN < YELLOW < RED)
- Backend returns 400 Bad Request with field-level error messages on validation failure
### Pagination Limits
- All paginated endpoints enforce a maximum page size of 100 (`min(requested, 100)`)
- Applies to: indices listing, audit log
### Error Responses
All new endpoints return errors in a consistent shape:
```json
{
"status": 404,
"error": "Not Found",
"message": "No active query with PID 12345"
}
```
Specific error cases:
- `POST /admin/database/queries/{pid}/kill` — 404 if PID not found, 500 if `pg_terminate_backend` fails
- `DELETE /admin/opensearch/indices/{name}` — 404 if index not found, 502 if OpenSearch unreachable
- `GET /admin/database/status` — returns 200 with `"connected": false` if database is unreachable (not 503), so the frontend can render a red status badge rather than an error state
- `GET /admin/opensearch/status` — returns 200 with `"clusterHealth": "UNREACHABLE"` if OpenSearch is down
---
## 6. Security
@@ -333,8 +377,9 @@ All endpoints under `/api/v1/admin/` — secured by existing Spring Security fil
### Enforcement Layers
1. **Spring Security filter chain**`/api/v1/admin/**` requires `ROLE_ADMIN` (existing configuration)
2. **Controller annotation**`@PreAuthorize("hasRole('ADMIN')")` on each controller class (defense-in-depth)
3. **UI role check**sidebar admin section hidden for non-admin users (cosmetic only, not a security boundary)
2. **Controller annotation**`@PreAuthorize("hasRole('ADMIN')")` on each controller class (defense-in-depth). This is a new convention — existing controllers (`OidcConfigAdminController`, `UserAdminController`) must be retrofitted with this annotation as part of Phase 1.
3. **`@EnableMethodSecurity`** — must be added to `SecurityConfig.java` to activate `@PreAuthorize` processing (prerequisite for layer 2)
4. **UI role check** — sidebar admin section hidden for non-admin users (cosmetic only, not a security boundary)
### Audit Logging
@@ -353,7 +398,7 @@ Design anticipates a read-only `OPERATOR` role:
## 7. Data Storage
### New Flyway Migration: V9
### Flyway Migration V9: Admin Thresholds
```sql
CREATE TABLE admin_thresholds (
@@ -363,7 +408,15 @@ CREATE TABLE admin_thresholds (
updated_by TEXT NOT NULL,
CONSTRAINT single_row CHECK (id = 1)
);
```
- Single-row table using `CHECK (id = 1)` constraint — stricter than the `oidc_config` pattern (which uses a text PK defaulting to `'default'` without a constraint). The CHECK approach is preferred going forward as it explicitly prevents multiple rows.
- JSON column for flexibility — adding new thresholds doesn't require schema changes
- Tracks who last updated and when
### Flyway Migration V10: Audit Log
```sql
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
@@ -373,24 +426,24 @@ CREATE TABLE audit_log (
target TEXT,
detail JSONB,
result TEXT NOT NULL,
ip_address TEXT
ip_address TEXT,
user_agent TEXT
);
CREATE INDEX idx_audit_log_timestamp ON audit_log (timestamp DESC);
CREATE INDEX idx_audit_log_username ON audit_log (username);
CREATE INDEX idx_audit_log_category ON audit_log (category);
CREATE INDEX idx_audit_log_action ON audit_log (action);
CREATE INDEX idx_audit_log_target ON audit_log (target);
```
**admin_thresholds:**
- Single-row table (same pattern as `oidc_config`)
- JSON column for flexibility — adding new thresholds doesn't require schema changes
- Tracks who last updated and when
**audit_log:**
- Separate migration from thresholds so they can be developed and tested independently
- Append-only table — no UPDATE or DELETE exposed via API
- Indexed on timestamp (primary query axis), username, and category for filtered views
- JSONB `detail` column holds action-specific context without schema changes
- Indexed on timestamp (primary query axis), username, category, action, and target for filtered views and free-text search via `ILIKE` on indexed text columns
- JSONB `detail` column holds action-specific context without schema changes (not searched via free text — use row expansion for detail inspection)
- `user_agent` field captures client identification for forensic analysis (SOC2)
- No foreign key to `users` table — username is denormalized so audit records survive user deletion
- **Retention:** unbounded in Phase 1. Phase 2+ should add a retention/archival strategy (e.g., TimescaleDB hypertable with retention policy, or periodic archive to cold storage). Typical SOC2 retention is 7 years.
---
@@ -417,7 +470,7 @@ CREATE INDEX idx_audit_log_category ON audit_log (category);
|------|--------|
| `components/layout/AppSidebar.tsx` | Refactor admin section to collapsible sub-menu with multiple items |
| `router.tsx` | Add routes for `/admin/database`, `/admin/opensearch`, `/admin/audit`, redirect `/admin` |
| `SpaForwardController.java` | Ensure `/admin/*` forwarding covers new routes |
| `SpaForwardController.java` | Existing `/admin/{path:[^\\.]*}` pattern already covers single-segment routes — no change needed unless deeper routes are added |
### Auto-Refresh Strategy
@@ -438,7 +491,9 @@ CREATE INDEX idx_audit_log_category ON audit_log (category);
5. Audit log — database-backed audit trail + admin viewer page
6. Retrofit audit logging into existing admin controllers (OIDC, user management) and auth flow
7. Backend endpoints with RBAC enforcement
8. Flyway migration V9 for thresholds + audit_log tables
8. Flyway migrations V9 (thresholds) and V10 (audit_log)
9. `SearchIndexerStats` interface and `SearchIndexer` stats instrumentation
10. `@EnableMethodSecurity` + `@PreAuthorize` retrofit on existing admin controllers
### Phase 2
@@ -446,6 +501,7 @@ CREATE INDEX idx_audit_log_category ON audit_log (category);
- OpenSearch operations (Force Reindex All, Flush)
- Bulk index operations (checkbox selection)
- Audit log CSV/JSON export for auditors
- Audit log retention/archival strategy (7-year SOC2 requirement)
- OPERATOR role with view-only permissions
### Phase 3