Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control
planes can build the server-health dashboard without direct ClickHouse
access. One generic /query endpoint covers every panel in the
server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag,
filter-by-tag, counter-delta mode with per-server_instance_id rotation
handling, and a derived 'mean' statistic for timers. Regex-validated
identifiers, parameterised literals, 31-day range cap, 500-series response
cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated:
all 17 suggested panels now expressed as single-endpoint queries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
211 lines
28 KiB
Markdown
211 lines
28 KiB
Markdown
---
|
|
paths:
|
|
- "cameleer-server-app/**"
|
|
---
|
|
|
|
# App Module Key Classes
|
|
|
|
`cameleer-server-app/src/main/java/com/cameleer/server/app/`
|
|
|
|
## URL taxonomy
|
|
|
|
User-facing data and config endpoints live under `/api/v1/environments/{envSlug}/...`. Env is a path segment, never a query param. The `envSlug` is resolved to an `Environment` bean via the `@EnvPath` argument resolver (`web/EnvironmentPathResolver.java`) — 404 on unknown slug.
|
|
|
|
**Slugs are immutable after creation** for both environments and apps. Slug regex: `^[a-z0-9][a-z0-9-]{0,63}$`. Validated in `EnvironmentService.create` and `AppService.createApp`. Update endpoints (`PUT`) do not accept a slug field; Jackson drops it as an unknown property.
|
|
|
|
### Flat-endpoint allow-list
|
|
|
|
These paths intentionally stay flat (no `/environments/{envSlug}` prefix). Every new endpoint should be env-scoped unless it appears here and the reason is documented.
|
|
|
|
| Path prefix | Why flat |
|
|
|---|---|
|
|
| `/api/v1/data/**` | Agent ingestion. JWT `env` claim is authoritative; URL-embedded env would invite spoofing. |
|
|
| `/api/v1/agents/register`, `/refresh`, `/{id}/heartbeat`, `/{id}/events` (SSE), `/{id}/deregister`, `/{id}/commands`, `/{id}/commands/{id}/ack`, `/{id}/replay` | Agent self-service; JWT-bound. |
|
|
| `/api/v1/agents/commands`, `/api/v1/agents/groups/{group}/commands` | Operator fan-out; target scope is explicit in query params. |
|
|
| `/api/v1/agents/config` | Agent-authoritative config read; JWT → registry → (app, env). |
|
|
| `/api/v1/admin/{users,roles,groups,oidc,license,audit,rbac/stats,claim-mappings,thresholds,sensitive-keys,usage,clickhouse,database,environments,outbound-connections}` | Truly cross-env admin. Env CRUD URLs use `{envSlug}`, not UUID. |
|
|
| `/api/v1/catalog`, `/api/v1/catalog/{applicationId}` | Cross-env discovery is the purpose. Env is an optional filter via `?environment=`. |
|
|
| `/api/v1/executions/{execId}`, `/processors/**` | Exchange IDs are globally unique; permalinks. |
|
|
| `/api/v1/diagrams/{contentHash}/render`, `POST /api/v1/diagrams/render` | Content-addressed or stateless. |
|
|
| `/api/v1/alerts/notifications/{id}/retry` | Notification IDs are globally unique; no env routing needed. |
|
|
| `/api/v1/auth/**` | Pre-auth; no env context exists. |
|
|
| `/api/v1/health`, `/prometheus`, `/api-docs/**`, `/swagger-ui/**` | Server metadata. |
|
|
|
|
## Tenant isolation invariant
|
|
|
|
ClickHouse is shared across tenants. Every ClickHouse query must filter by `tenant_id` (from `CAMELEER_SERVER_TENANT_ID` env var, resolved via `TenantContext`/config) in addition to `environment`. New controllers added under `/environments/{envSlug}/...` must preserve this — the env filter from the path does not replace the tenant filter.
|
|
|
|
## User ID conventions
|
|
|
|
`users.user_id` stores the **bare** identifier:
|
|
- Local users: `<username>` (e.g. `admin`, `alice`)
|
|
- OIDC users: `oidc:<sub>` (e.g. `oidc:c7a93b…`)
|
|
|
|
JWT subjects carry a `user:` namespace prefix (`user:admin`, `user:oidc:<sub>`) so `JwtAuthenticationFilter` can distinguish user tokens from agent tokens. All three write paths upsert the **bare** form:
|
|
|
|
- `UiAuthController.login` — computes `userId = request.username()`, signs with `subject = "user:" + userId`.
|
|
- `OidcAuthController.callback` — `userId = "oidc:" + oidcUser.subject()`, signs with `subject = "user:" + userId`.
|
|
- `UserAdminController.createUser` — `userId = request.username()`.
|
|
|
|
Env-scoped read-path controllers (`AlertController`, `AlertRuleController`, `AlertSilenceController`, `OutboundConnectionAdminController`) strip `"user:"` from `SecurityContextHolder.authentication.name` before using it as an FK. All FKs to `users(user_id)` (e.g. `alert_rules.created_by`, `outbound_connections.created_by`, `alert_reads.user_id`, `user_roles.user_id`, `user_groups.user_id`) therefore reference the bare form. If you add a new controller that needs the acting user id for an FK insert, follow the same strip pattern.
|
|
|
|
## controller/ — REST endpoints
|
|
|
|
### Env-scoped (user-facing data & config)
|
|
|
|
- `AppController` — `/api/v1/environments/{envSlug}/apps`. GET list / POST create / GET `{appSlug}` / DELETE `{appSlug}` / GET `{appSlug}/versions` / POST `{appSlug}/versions` (JAR upload) / PUT `{appSlug}/container-config` / GET `{appSlug}/dirty-state` (returns `DirtyStateResponse{dirty, lastSuccessfulDeploymentId, differences}` — compares current JAR+config against last RUNNING deployment snapshot; dirty=true when no snapshot exists). App slug uniqueness is per-env (`(env, app_slug)` is the natural key). `CreateAppRequest` body has no env (path), validates slug regex. Injects `DirtyStateCalculator` bean (registered in `RuntimeBeanConfig`, requires `ObjectMapper` with `JavaTimeModule`).
|
|
- `DeploymentController` — `/api/v1/environments/{envSlug}/apps/{appSlug}/deployments`. GET list / POST create (body `{ appVersionId }`) / POST `{id}/stop` / POST `{id}/promote` (body `{ targetEnvironment: slug }` — target app slug must exist in target env) / GET `{id}/logs`. All lifecycle ops (`POST /` deploy, `POST /{id}/stop`, `POST /{id}/promote`) audited under `AuditCategory.DEPLOYMENT`. Action codes: `deploy_app`, `stop_deployment`, `promote_deployment`. Acting user resolved via the `user:` prefix-strip convention; both SUCCESS and FAILURE branches write audit rows. `created_by` (TEXT, nullable) populated from `SecurityContextHolder` and surfaced on the `Deployment` DTO.
|
|
- `ApplicationConfigController` — `/api/v1/environments/{envSlug}`. GET `/config` (list), GET/PUT `/apps/{appSlug}/config`, GET `/apps/{appSlug}/processor-routes`, POST `/apps/{appSlug}/config/test-expression`. PUT accepts `?apply=staged|live` (default `live`). `live` saves to DB and pushes `CONFIG_UPDATE` SSE to live agents in this env (existing behavior); `staged` saves to DB only, skipping the SSE push — used by the unified app deployment page. Audit action is `stage_app_config` for staged writes, `update_app_config` for live. Invalid `apply` values return 400.
|
|
- `AppSettingsController` — `/api/v1/environments/{envSlug}`. GET `/app-settings` (list), GET/PUT/DELETE `/apps/{appSlug}/settings`. ADMIN/OPERATOR only.
|
|
- `SearchController` — `/api/v1/environments/{envSlug}`. GET `/executions`, POST `/executions/search`, GET `/stats`, `/stats/timeseries`, `/stats/timeseries/by-app`, `/stats/timeseries/by-route`, `/stats/punchcard`, `/attributes/keys`, `/errors/top`.
|
|
- `LogQueryController` — GET `/api/v1/environments/{envSlug}/logs` (filters: source (multi, comma-split, OR-joined), level (multi, comma-split, OR-joined), application, agentId, exchangeId, logger, q, time range, instanceIds (multi, comma-split, AND-joined as WHERE instance_id IN (...) — used by the Checkpoint detail drawer to scope logs to a deployment's replicas); sort asc/desc). Cursor-paginated, returns `{ data, nextCursor, hasMore, levelCounts }`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — same-millisecond tiebreak via the `insert_id` UUID column on `logs`.
|
|
- `RouteCatalogController` — GET `/api/v1/environments/{envSlug}/routes` (merged route catalog from registry + ClickHouse; env filter unconditional).
|
|
- `RouteMetricsController` — GET `/api/v1/environments/{envSlug}/routes/metrics`, GET `/api/v1/environments/{envSlug}/routes/metrics/processors`.
|
|
- `AgentListController` — GET `/api/v1/environments/{envSlug}/agents` (registered agents with runtime metrics, filtered to env).
|
|
- `AgentEventsController` — GET `/api/v1/environments/{envSlug}/agents/events` (lifecycle events; cursor-paginated, returns `{ data, nextCursor, hasMore }`; order `(timestamp DESC, insert_id DESC)`; cursor is base64url of `"{timestampIso}|{insert_id_uuid}"` — `insert_id` is a stable UUID column used as a same-millisecond tiebreak).
|
|
- `AgentMetricsController` — GET `/api/v1/environments/{envSlug}/agents/{agentId}/metrics` (JVM/Camel metrics). Rejects cross-env agents (404) as defence-in-depth.
|
|
- `DiagramRenderController` — GET `/api/v1/environments/{envSlug}/apps/{appSlug}/routes/{routeId}/diagram` returns the most recent diagram for (app, env, route) via `DiagramStore.findLatestContentHashForAppRoute`. Registry-independent — routes whose publishing agents were removed still resolve. Also GET `/api/v1/diagrams/{contentHash}/render` (flat — content hashes are globally unique), the point-in-time path consumed by the exchange viewer via `ExecutionDetail.diagramContentHash`.
|
|
- `AlertRuleController` — `/api/v1/environments/{envSlug}/alerts/rules`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/enable` / POST `{id}/disable` / POST `{id}/render-preview` / POST `{id}/test-evaluate`. OPERATOR+ for mutations, VIEWER+ for reads. CRITICAL: attribute keys in `ExchangeMatchCondition.filter.attributes` are validated at rule-save time against `^[a-zA-Z0-9._-]+$` — they are later inlined into ClickHouse SQL. `AgentLifecycleCondition` is allowlist-only — the `AgentLifecycleEventType` enum (REGISTERED / RE_REGISTERED / DEREGISTERED / WENT_STALE / WENT_DEAD / RECOVERED) plus the record compact ctor (non-empty `eventTypes`, `withinSeconds ≥ 1`) do the validation; custom agent-emitted event types are tracked in backlog issue #145. Webhook validation: verifies `outboundConnectionId` exists and `isAllowedInEnvironment`. Null notification templates default to `""` (NOT NULL constraint). Audit: `ALERT_RULE_CHANGE`.
|
|
- `AlertController` — `/api/v1/environments/{envSlug}/alerts`. GET list (inbox filtered by userId/groupIds/roleNames via `InAppInboxQuery`; optional multi-value `state`, `severity`, tri-state `acked`, tri-state `read` query params; soft-deleted rows always excluded) / GET `/unread-count` / GET `{id}` / POST `{id}/ack` / POST `{id}/read` / POST `/bulk-read` / POST `/bulk-ack` (VIEWER+) / DELETE `{id}` (OPERATOR+, soft-delete) / POST `/bulk-delete` (OPERATOR+) / POST `{id}/restore` (OPERATOR+, clears `deleted_at`). `requireLiveInstance` helper returns 404 on soft-deleted rows; `restore` explicitly fetches regardless of `deleted_at`. `BulkIdsRequest` is the shared body for bulk-read/ack/delete (`{ instanceIds }`). `AlertDto` includes `readAt`; `deletedAt` is intentionally NOT on the wire. Inbox SQL: `? = ANY(target_user_ids) OR target_group_ids && ? OR target_role_names && ?` — requires at least one matching target (no broadcast concept).
|
|
- `AlertSilenceController` — `/api/v1/environments/{envSlug}/alerts/silences`. GET list / POST create / DELETE `{id}`. 422 if `endsAt <= startsAt`. OPERATOR+ for mutations, VIEWER+ for list. Audit: `ALERT_SILENCE_CHANGE`.
|
|
- `AlertNotificationController` — Dual-path (no class-level prefix). GET `/api/v1/environments/{envSlug}/alerts/{alertId}/notifications` (VIEWER+); POST `/api/v1/alerts/notifications/{id}/retry` (OPERATOR+, flat — notification IDs globally unique). Retry resets attempts to 0 and sets `nextAttemptAt = now`.
|
|
|
|
### Env admin (env-slug-parameterized, not env-scoped data)
|
|
|
|
- `EnvironmentAdminController` — `/api/v1/admin/environments`. GET list / POST create / GET `{envSlug}` / PUT `{envSlug}` / DELETE `{envSlug}` / PUT `{envSlug}/default-container-config` / PUT `{envSlug}/jar-retention`. Slug immutable — PUT body has no slug field; any slug supplied is dropped by Jackson. Slug validated on POST. `UpdateEnvironmentRequest` carries `color` (nullable); unknown values rejected with 400 via `EnvironmentColor.isValid`. Null/absent color preserves the existing value.
|
|
|
|
### Agent-only (JWT-authoritative, intentionally flat)
|
|
|
|
- `AgentRegistrationController` — POST `/register` (requires `environmentId` in body; 400 if missing), POST `/{id}/refresh` (rejects tokens with no `env` claim), POST `/{id}/heartbeat` (env from body preferred, JWT fallback; 400 if neither), POST `/{id}/deregister`.
|
|
- `AgentSseController` — GET `/{id}/events` (SSE connection).
|
|
- `AgentCommandController` — POST `/{agentId}/commands`, POST `/groups/{group}/commands`, POST `/commands` (broadcast), POST `/{agentId}/commands/{commandId}/ack`, POST `/{agentId}/replay`.
|
|
- `AgentConfigController` — GET `/api/v1/agents/config`. Agent-authoritative config read: resolves (app, env) from JWT subject → registry (registry miss falls back to JWT env claim; no registry entry → 404 since application can't be derived).
|
|
|
|
### Ingestion (agent-only, JWT-authoritative)
|
|
|
|
- `LogIngestionController` — POST `/api/v1/data/logs` (accepts `List<LogEntry>`; WARNs on missing identity, unregistered agents, empty payloads, buffer-full drops).
|
|
- `EventIngestionController` — POST `/api/v1/data/events`.
|
|
- `ChunkIngestionController` — POST `/api/v1/data/executions`. Accepts a single `ExecutionChunk` or an array (fields include `exchangeId`, `applicationId`, `instanceId`, `routeId`, `status`, `startTime`, `endTime`, `durationMs`, `chunkSeq`, `final`, `processors: FlatProcessorRecord[]`). The accumulator merges non-final chunks by exchangeId and emits the merged envelope on the final chunk or on stale timeout. Legacy `ExecutionController` / `RouteExecution` shape is retired.
|
|
- `MetricsController` — POST `/api/v1/data/metrics`.
|
|
- `DiagramController` — POST `/api/v1/data/diagrams` (resolves applicationId + environment from the agent registry keyed on JWT subject; stamps both on the stored `TaggedDiagram`).
|
|
|
|
### Cross-env discovery (flat)
|
|
|
|
- `CatalogController` — GET `/api/v1/catalog` (merges managed apps + in-memory agents + CH stats; optional `?environment=` filter). DELETE `/api/v1/catalog/{applicationId}` (ADMIN: dismiss app, purge all CH data + PG record).
|
|
|
|
### Admin (cross-env, flat)
|
|
|
|
- `UserAdminController` — CRUD `/api/v1/admin/users`, POST `/{id}/roles`, POST `/{id}/set-password`.
|
|
- `RoleAdminController` — CRUD `/api/v1/admin/roles`.
|
|
- `GroupAdminController` — CRUD `/api/v1/admin/groups`.
|
|
- `OidcConfigAdminController` — GET/POST `/api/v1/admin/oidc`, POST `/test`.
|
|
- `OutboundConnectionAdminController` — `/api/v1/admin/outbound-connections`. GET list / POST create / GET `{id}` / PUT `{id}` / DELETE `{id}` / POST `{id}/test` / GET `{id}/usage`. RBAC: list/get/usage ADMIN|OPERATOR; mutations + test ADMIN.
|
|
- `SensitiveKeysAdminController` — GET/PUT `/api/v1/admin/sensitive-keys`. GET returns 200 or 204 if not configured. PUT accepts `{ keys: [...] }` with optional `?pushToAgents=true`. Fan-out iterates every distinct `(application, environment)` slice — intentional global baseline + per-env overrides.
|
|
- `ClaimMappingAdminController` — CRUD `/api/v1/admin/claim-mappings`, POST `/test`.
|
|
- `LicenseAdminController` — GET/POST `/api/v1/admin/license`.
|
|
- `ThresholdAdminController` — CRUD `/api/v1/admin/thresholds`.
|
|
- `AuditLogController` — GET `/api/v1/admin/audit`.
|
|
- `RbacStatsController` — GET `/api/v1/admin/rbac/stats`.
|
|
- `UsageAnalyticsController` — GET `/api/v1/admin/usage` (ClickHouse `usage_events`).
|
|
- `ClickHouseAdminController` — GET `/api/v1/admin/clickhouse/**` (conditional on `infrastructureendpoints` flag).
|
|
- `DatabaseAdminController` — GET `/api/v1/admin/database/**` (conditional on `infrastructureendpoints` flag).
|
|
- `ServerMetricsAdminController` — `/api/v1/admin/server-metrics/**`. GET `/catalog`, GET `/instances`, POST `/query`. Generic read API over the `server_metrics` ClickHouse table so SaaS dashboards don't need direct CH access. Delegates to `ServerMetricsQueryStore` (impl `ClickHouseServerMetricsQueryStore`). Validation: metric/tag regex `^[a-zA-Z0-9._]+$`, statistic regex `^[a-z_]+$`, `to - from ≤ 31 days`, stepSeconds ∈ [10, 3600], response capped at 500 series. `IllegalArgumentException` → 400. `/query` supports `raw` + `delta` modes (delta does per-`server_instance_id` positive-clipped differences, then aggregates across instances). Derived `statistic=mean` for timers computes `sum(total|total_time)/sum(count)` per bucket.
|
|
|
|
### Other (flat)
|
|
|
|
- `DetailController` — GET `/api/v1/executions/{executionId}` + processor snapshot endpoints.
|
|
- `MetricsController` — exposes `/api/v1/metrics` and `/api/v1/prometheus` (server-side Prometheus scrape endpoint).
|
|
|
|
## runtime/ — Docker orchestration
|
|
|
|
- `DockerRuntimeOrchestrator` — implements RuntimeOrchestrator; Docker Java client (zerodep transport), container lifecycle
|
|
- `DeploymentExecutor` — @Async staged deploy: PRE_FLIGHT -> PULL_IMAGE -> CREATE_NETWORK -> START_REPLICAS -> HEALTH_CHECK -> SWAP_TRAFFIC -> COMPLETE. Container names are `{tenantId}-{envSlug}-{appSlug}-{replicaIndex}-{generation}`, where `generation` is the first 8 chars of the deployment UUID — old and new replicas coexist during a blue/green swap. Per-replica `CAMELEER_AGENT_INSTANCEID` env var is `{envSlug}-{appSlug}-{replicaIndex}-{generation}`. Branches on `DeploymentStrategy.fromWire(config.deploymentStrategy())`: **blue-green** (default) starts all N → waits for all healthy → stops old (partial health = FAILED, preserves old untouched); **rolling** replaces replicas one at a time with rollback only for in-flight new containers (already-replaced old stay stopped; un-replaced old keep serving). DEGRADED is now only set by `DockerEventMonitor` post-deploy, never by the executor.
|
|
- `DockerNetworkManager` — ensures bridge networks (cameleer-traefik, cameleer-env-{slug}), connects containers
|
|
- `DockerEventMonitor` — persistent Docker event stream listener (die, oom, start, stop), updates deployment status
|
|
- `TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Per-container identity labels: `cameleer.replica` (index), `cameleer.generation` (deployment-scoped 8-char id — for Prometheus/Grafana deploy-boundary annotations), `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}-{generation}`). Router/service label keys are generation-agnostic so load balancing spans old + new replicas during a blue/green overlap.
|
|
- `PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
|
|
- `ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
|
|
- `DisabledRuntimeOrchestrator` — no-op when runtime not enabled
|
|
|
|
## metrics/ — Prometheus observability
|
|
|
|
- `ServerMetrics` — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via `/api/v1/prometheus`.
|
|
- `ServerInstanceIdConfig` — `@Configuration`, exposes `@Bean("serverInstanceId") String`. Resolution precedence: `cameleer.server.instance-id` property → `HOSTNAME` env → `InetAddress.getLocalHost()` → random UUID. Fixed at boot; rotates across restarts so counters restart cleanly.
|
|
- `ServerMetricsSnapshotScheduler` — `@Scheduled(fixedDelayString = "${cameleer.server.self-metrics.interval-ms:60000}")`. Walks `MeterRegistry.getMeters()` each tick, emits one `ServerMetricSample` per `Measurement` (Timer/DistributionSummary produce multiple rows per meter — one per Micrometer `Statistic`). Skips non-finite values; logs and swallows store failures. Disabled via `cameleer.server.self-metrics.enabled=false` (`@ConditionalOnProperty`). Write-only — no query endpoint yet; inspect via `/api/v1/admin/clickhouse/query`.
|
|
|
|
## storage/ — PostgreSQL repositories (JdbcTemplate)
|
|
|
|
- `PostgresAppRepository`, `PostgresAppVersionRepository`, `PostgresEnvironmentRepository`
|
|
- `PostgresDeploymentRepository` — includes JSONB replica_states, deploy_stage, findByContainerId. Also carries `deployed_config_snapshot` JSONB (Flyway V3) populated by `DeploymentExecutor` via `saveDeployedConfigSnapshot(UUID, DeploymentConfigSnapshot)` on successful RUNNING transition. Consumed by `DirtyStateCalculator` for the `/apps/{slug}/dirty-state` endpoint and by the UI for checkpoint restore.
|
|
- `PostgresUserRepository`, `PostgresRoleRepository`, `PostgresGroupRepository`
|
|
- `PostgresAuditRepository`, `PostgresOidcConfigRepository`, `PostgresClaimMappingRepository`, `PostgresSensitiveKeysRepository`
|
|
- `PostgresAppSettingsRepository`, `PostgresApplicationConfigRepository`, `PostgresThresholdRepository`. Both `app_settings` and `application_config` are env-scoped (PK `(app_id, environment)` / `(application, environment)`); finders take `(app, env)` — no env-agnostic variants.
|
|
|
|
## storage/ — ClickHouse stores
|
|
|
|
- `ClickHouseExecutionStore`, `ClickHouseMetricsStore`, `ClickHouseMetricsQueryStore`
|
|
- `ClickHouseStatsStore` — pre-aggregated stats, punchcard
|
|
- `ClickHouseDiagramStore`, `ClickHouseAgentEventRepository`
|
|
- `ClickHouseUsageTracker` — usage_events for billing
|
|
- `ClickHouseRouteCatalogStore` — persistent route catalog with first_seen cache, warm-loaded on startup
|
|
- `ClickHouseServerMetricsStore` — periodic dumps of the server's own Micrometer registry into the `server_metrics` table. Tenant-stamped (bound at the scheduler, not the bean); no `environment` column (server straddles envs). Batch-insert via `JdbcTemplate.batchUpdate` with `Map(String, String)` tag binding. Written by `ServerMetricsSnapshotScheduler`.
|
|
- `ClickHouseServerMetricsQueryStore` — read side of `server_metrics` for dashboards. Implements `ServerMetricsQueryStore`. `catalog(from,to)` returns name+type+statistics+tagKeys, `listInstances(from,to)` returns server_instance_ids with first/last seen, `query(request)` builds bucketed time-series with `raw` or `delta` mode and supports a derived `mean` statistic for timers. All identifier inputs regex-validated; tenant_id always bound; max range 31 days; series count capped at 500. Exposed via `ServerMetricsAdminController`.
|
|
|
|
## search/ — ClickHouse search and log stores
|
|
|
|
- `ClickHouseLogStore` — log storage and query, MDC-based exchange/processor correlation
|
|
- `ClickHouseSearchIndex` — full-text search
|
|
|
|
## security/ — Spring Security
|
|
|
|
- `SecurityConfig` — WebSecurityFilterChain, JWT filter, CORS, OIDC conditional. `/api/v1/admin/outbound-connections/**` GETs permit OPERATOR in addition to ADMIN (defense-in-depth at controller level); mutations remain ADMIN-only. Alerting matchers: GET `/environments/*/alerts/**` VIEWER+; POST/PUT/DELETE rules and silences OPERATOR+; ack/read/bulk-read VIEWER+; POST `/alerts/notifications/*/retry` OPERATOR+.
|
|
- `JwtAuthenticationFilter` — OncePerRequestFilter, validates Bearer tokens
|
|
- `JwtServiceImpl` — HMAC-SHA256 JWT (Nimbus JOSE)
|
|
- `UiAuthController` — `/api/v1/auth` (login, refresh, me). Upserts `users.user_id = request.username()` (bare); signs JWTs with `subject = "user:" + userId`. `refresh`/`me` strip the `"user:"` prefix from incoming subjects via `stripSubjectPrefix()` before any DB/RBAC lookup.
|
|
- `OidcAuthController` — `/api/v1/auth/oidc` (login-uri, token-exchange, logout). Upserts `users.user_id = "oidc:" + oidcUser.subject()` (no `user:` prefix); signs JWTs with `subject = "user:oidc:" + oidcUser.subject()`. `applyClaimMappings` + `getSystemRoleNames` calls all use the bare `oidc:<sub>` form.
|
|
- `OidcTokenExchanger` — code -> tokens, role extraction from access_token then id_token
|
|
- `OidcProviderHelper` — OIDC discovery, JWK source cache
|
|
|
|
## agent/ — Agent lifecycle
|
|
|
|
- `SseConnectionManager` — manages per-agent SSE connections, delivers commands
|
|
- `AgentLifecycleMonitor` — @Scheduled 10s, LIVE->STALE->DEAD transitions
|
|
- `SsePayloadSigner` — Ed25519 signs SSE payloads for agent verification
|
|
|
|
## retention/ — JAR cleanup
|
|
|
|
- `JarRetentionJob` — @Scheduled 03:00 daily, per-environment retention, skips deployed versions
|
|
|
|
## alerting/eval/ — Rule evaluation
|
|
|
|
- `AlertEvaluatorJob` — @Scheduled tick driver; per-rule claim/release via `AlertRuleRepository`, dispatches to per-kind `ConditionEvaluator`, persists advanced cursor on release via `AlertRule.withEvalState`.
|
|
- `BatchResultApplier` — `@Component` that wraps a single rule's tick outcome (`EvalResult.Batch` = `firings` + `nextEvalState`) in one `@Transactional` boundary: instance upserts + notification enqueues + cursor advance commit atomically or roll back together. This is the exactly-once-per-exchange guarantee for `PER_EXCHANGE` fire mode.
|
|
- `ConditionEvaluator` — interface; per-kind implementations: `ExchangeMatchEvaluator`, `AgentLifecycleEvaluator`, `AgentStateEvaluator`, `DeploymentStateEvaluator`, `JvmMetricEvaluator`, `LogPatternEvaluator`, `RouteMetricEvaluator`.
|
|
- `AlertStateTransitions` — PER_EXCHANGE vs rule-level FSM helpers (fire/resolve/ack).
|
|
- `PerKindCircuitBreaker` — trips noisy per-kind evaluators; `TickCache` — per-tick shared lookups (apps, envs, silences).
|
|
|
|
## http/ — Outbound HTTP client implementation
|
|
|
|
- `SslContextBuilder` — composes SSL context from `OutboundHttpProperties` + `OutboundHttpRequestContext`. Supports SYSTEM_DEFAULT (JDK roots + configured CA extras), TRUST_ALL (short-circuit no-op TrustManager), TRUST_PATHS (JDK roots + system extras + per-request extras). Throws `IllegalArgumentException("CA file not found: ...")` on missing PEM.
|
|
- `ApacheOutboundHttpClientFactory` — Apache HttpClient 5 impl of `OutboundHttpClientFactory`. Memoizes clients per `CacheKey(trustAll, caPaths, mode, connectTimeout, readTimeout)`. Applies `NoopHostnameVerifier` when trust-all is active.
|
|
- `config/OutboundHttpConfig` — `@ConfigurationProperties("cameleer.server.outbound-http")`. Exposes beans: `OutboundHttpProperties`, `SslContextBuilder`, `OutboundHttpClientFactory`. `@PostConstruct` logs WARN on trust-all and throws if configured CA paths don't exist.
|
|
|
|
## outbound/ — Admin-managed outbound connections (implementation)
|
|
|
|
- `crypto/SecretCipher` — AES-GCM symmetric cipher with key derived via HMAC-SHA256(jwtSecret, "cameleer-outbound-secret-v1"). Ciphertext format: base64(IV(12 bytes) || GCM output with 128-bit tag). `encrypt` throws `IllegalStateException`; `decrypt` throws `IllegalArgumentException` on tamper/wrong-key/malformed.
|
|
- `storage/PostgresOutboundConnectionRepository` — JdbcTemplate impl. `save()` upserts by id; JSONB serialization via ObjectMapper; UUID arrays via `ConnectionCallback`. Reads `created_by`/`updated_by` as String (= users.user_id TEXT).
|
|
- `OutboundConnectionServiceImpl` — service layer. Tenant bound at construction via `cameleer.server.tenant.id` property. Uniqueness check via `findByName`. Narrowing-envs guard: rejects update that removes envs while rules reference the connection (rulesReferencing stubbed in Plan 01, wired in Plan 02). Delete guard: rejects if referenced by rules.
|
|
- `controller/OutboundConnectionAdminController` — REST controller. Class-level `@PreAuthorize("hasRole('ADMIN')")` defaults; GETs relaxed to ADMIN|OPERATOR. Resolves acting user id via the user-id convention (strip `"user:"` from `authentication.name` → matches `users.user_id` FK). Audit via `AuditCategory.OUTBOUND_CONNECTION_CHANGE`.
|
|
- `dto/OutboundConnectionRequest` — Bean Validation: `@NotBlank` name, `@Pattern("^https://.+")` url, `@NotNull` method/tlsTrustMode/auth. Compact ctor throws `IllegalArgumentException` if TRUST_PATHS with empty paths list.
|
|
- `dto/OutboundConnectionDto` — response DTO. `hmacSecretSet: boolean` instead of the ciphertext; `authKind: OutboundAuthKind` instead of the full auth config.
|
|
- `dto/OutboundConnectionTestResult` — result of POST `/{id}/test`: status, latencyMs, responseSnippet (first 512 chars), tlsProtocol/cipherSuite/peerCertSubject (protocol is "TLS" stub; enriched in Plan 02 follow-up), error (nullable).
|
|
- `config/OutboundBeanConfig` — registers `OutboundConnectionRepository`, `SecretCipher`, `OutboundConnectionService` beans.
|
|
|
|
## config/ — Spring beans
|
|
|
|
- `RuntimeOrchestratorAutoConfig` — conditional Docker/Disabled orchestrator + NetworkManager + EventMonitor
|
|
- `RuntimeBeanConfig` — DeploymentExecutor, AppService, EnvironmentService
|
|
- `SecurityBeanConfig` — JwtService, Ed25519, BootstrapTokenValidator
|
|
- `StorageBeanConfig` — all repositories
|
|
- `ClickHouseConfig` — ClickHouse JdbcTemplate, schema initializer
|