Files
cameleer-server/.claude/rules/core-classes.md

116 lines
10 KiB
Markdown
Raw Normal View History

---
paths:
- "cameleer-server-core/**"
---
# Core Module Key Classes
`cameleer-server-core/src/main/java/com/cameleer/server/core/`
## agent/ — Agent lifecycle and commands
- `AgentRegistryService` — in-memory registry (ConcurrentHashMap), register/heartbeat/lifecycle
- `AgentInfo` — record: id, name, application, environmentId, version, routeIds, capabilities, state
- `AgentCommand` — record: id, type, targetAgent, payload, createdAt, expiresAt
- `AgentEventService` — records agent state changes, heartbeats
- `AgentState` — enum: LIVE, STALE, DEAD, SHUTDOWN
- `CommandType` — enum for command types (config-update, deep-trace, replay, route-control, etc.)
- `CommandStatus` — enum for command acknowledgement states
- `CommandReply` — record: command execution result from agent
feat(alerting): AGENT_LIFECYCLE condition kind with per-subject fire mode Allows alert rules to fire on agent-lifecycle events — REGISTERED, RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather than only on current state. Each matching `(agent, eventType, timestamp)` becomes its own ackable AlertInstance, so outages on distinct agents are independently routable. Core: - New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record (scope, eventTypes, withinSeconds). Compact ctor rejects empty eventTypes and withinSeconds<1. - Strict allowlist enum `AgentLifecycleEventType` (six entries matching the server-emitted types in `AgentRegistrationController` and `AgentLifecycleMonitor`). Custom agent-emitted event types tracked in backlog issue #145. - `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes, from, to, limit)` — new read path ordered `(timestamp ASC, insert_id ASC)` used by the evaluator. Implemented on `ClickHouseAgentEventRepository` with tenant + env filter mandatory. App: - `AgentLifecycleEvaluator` queries events in the last `withinSeconds` window and returns `EvalResult.Batch` with one `Firing` per row. Every Firing carries a canonical `_subjectFingerprint` of `"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event` subtrees for Mustache templating. - `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}`. - Validation is delegated to the record compact ctor + enum at Jackson deserialization time — matches the existing policy of keeping controller validators focused on env-scoped / SQL-injection concerns. Schema: - V16 migration generalises the V15 per-exchange discriminator on `alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a fallback to the legacy `exchange.id` expression. Scalar kinds still resolve to `''` and keep one-open-per-rule. Duplicate-key path in `PostgresAlertInstanceRepository.save` is unchanged — the index is the deduper. UI: - New `AgentLifecycleForm.tsx` wizard form with multi-select chips for the six allowed event types + `withinSeconds` input. Wired into `ConditionStep`, `form-state` (validation + defaults: WENT_DEAD, 300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the new option array. - `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind, and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`. Tests (all passing): - 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive + empty/zero/unknown-type rejection). - 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty window, multi-agent fingerprint shape, scope forwarding, missing env). - `NotificationContextBuilderTest` switch now covers the new kind. - 119 alerting unit tests + 71 UI tests green. Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
2026-04-21 14:52:08 +02:00
- `AgentEventRecord`, `AgentEventRepository` — event persistence. `AgentEventRepository.queryPage(...)` is cursor-paginated (`AgentEventPage{data, nextCursor, hasMore}`); the legacy non-paginated `query(...)` path is gone. `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes, from, to, limit)` returns matching events ordered by `(timestamp ASC, insert_id ASC)` — consumed by `AgentLifecycleEvaluator`.
- `AgentEventPage` — record: `(List<AgentEventRecord> data, String nextCursor, boolean hasMore)` returned by `AgentEventRepository.queryPage`
- `AgentEventListener` — callback interface for agent events
- `RouteStateRegistry` — tracks per-agent route states
## runtime/ — App/Environment/Deployment domain
- `App` — record: id, environmentId, slug, displayName, containerConfig (JSONB)
- `AppVersion` — record: id, appId, version, jarPath, detectedRuntimeType, detectedMainClass
- `Environment` — record: id, slug, displayName, production, enabled, defaultContainerConfig, jarRetentionCount, color, createdAt. `color` is one of the 8 preset palette values validated by `EnvironmentColor.VALUES` and CHECK-constrained in PostgreSQL (V2 migration).
- `EnvironmentColor` — constants: `DEFAULT = "slate"`, `VALUES = {slate,red,amber,green,teal,blue,purple,pink}`, `isValid(String)`.
- `Deployment` — record: id, appId, appVersionId, environmentId, status, targetState, deploymentStrategy, replicaStates (JSONB), deployStage, containerId, containerName
- `DeploymentStatus` — enum: STOPPED, STARTING, RUNNING, DEGRADED, STOPPING, FAILED
- `DeployStage` — enum: PRE_FLIGHT, PULL_IMAGE, CREATE_NETWORK, START_REPLICAS, HEALTH_CHECK, SWAP_TRAFFIC, COMPLETE
- `DeploymentService` — createDeployment (deletes terminal deployments first), markRunning, markFailed, markStopped
- `RuntimeType` — enum: AUTO, SPRING_BOOT, QUARKUS, PLAIN_JAVA, NATIVE
- `RuntimeDetector` — probes JAR files at upload time: detects runtime from manifest Main-Class (Spring Boot loader, Quarkus entry point, plain Java) or native binary (non-ZIP magic bytes)
- `ContainerRequest` — record: 20 fields for Docker container creation (includes runtimeType, customArgs, mainClass)
- `ContainerStatus` — record: state, running, exitCode, error
- `ResolvedContainerConfig` — record: typed config with memoryLimitMb, memoryReserveMb, cpuRequest, cpuLimit, appPort, exposedPorts, customEnvVars, stripPathPrefix, sslOffloading, routingMode, routingDomain, serverUrl, replicas, deploymentStrategy, routeControlEnabled, replayEnabled, runtimeType, customArgs, extraNetworks
- `RoutingMode` — enum for routing strategies
- `ConfigMerger` — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig
- `RuntimeOrchestrator` — interface: startContainer, stopContainer, getContainerStatus, getLogs, startLogCapture, stopLogCapture
- `AppRepository`, `AppVersionRepository`, `EnvironmentRepository`, `DeploymentRepository` — repository interfaces
- `AppService`, `EnvironmentService` — domain services
## search/ — Execution search and stats
- `SearchService` — search, count, stats, statsForApp, statsForRoute, timeseries, timeseriesForApp, timeseriesForRoute, timeseriesGroupedByApp, timeseriesGroupedByRoute, slaCompliance, slaCountsByApp, slaCountsByRoute, topErrors, activeErrorTypes, punchcard, distinctAttributeKeys. `statsForRoute`/`timeseriesForRoute` take `(routeId, applicationId)` — app filter is applied to `stats_1m_route`.
- `SearchRequest` / `SearchResult` — search DTOs
- `ExecutionStats`, `ExecutionSummary` — stats aggregation records
- `StatsTimeseries`, `TopError` — timeseries and error DTOs
- `LogSearchRequest` / `LogSearchResponse` — log search DTOs. `LogSearchRequest.sources` / `levels` are `List<String>` (null-normalized, multi-value OR); `cursor` + `limit` + `sort` drive keyset pagination. Response carries `nextCursor` + `hasMore` + per-level `levelCounts`.
## storage/ — Storage abstractions
- `ExecutionStore`, `MetricsStore`, `MetricsQueryStore`, `StatsStore`, `DiagramStore`, `RouteCatalogStore`, `SearchIndex`, `LogIndex` — interfaces
- `RouteCatalogEntry` — record: applicationId, routeId, environment, firstSeen, lastSeen
- `LogEntryResult` — log query result record
- `model/``ExecutionDocument`, `MetricTimeSeries`, `MetricsSnapshot`
## rbac/ — Role-based access control
- `RbacService` — interface: role/group CRUD, assignRoleToUser, removeRoleFromUser, addUserToGroup, removeUserFromGroup, getDirectRolesForUser, getEffectiveRolesForUser, clearManagedAssignments, assignManagedRole, addUserToManagedGroup, getStats, listUsers
- `SystemRole` — enum: AGENT, VIEWER, OPERATOR, ADMIN; `normalizeScope()` maps scopes
- `UserDetail`, `RoleDetail`, `GroupDetail` — records
- `UserSummary`, `RoleSummary`, `GroupSummary` — lightweight list records
- `RbacStats` — aggregate stats record
- `AssignmentOrigin` — enum: DIRECT, CLAIM_MAPPING (tracks how roles were assigned)
- `ClaimMappingRule` — record: OIDC claim-to-role mapping rule
- `ClaimMappingService` — interface: CRUD for claim mapping rules
- `ClaimMappingRepository` — persistence interface
- `RoleRepository`, `GroupRepository` — persistence interfaces
## admin/ — Server-wide admin config
- `SensitiveKeysConfig` — record: keys (List<String>, immutable)
- `SensitiveKeysRepository` — interface: find(), save()
- `SensitiveKeysMerger` — pure function: merge(global, perApp) -> union with case-insensitive dedup, preserves first-seen casing. Returns null when both inputs null.
feat!: scope per-app config and settings by environment BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes. Agents must now send environmentId on registration (400 if missing). Two tables previously keyed on app name alone caused cross-environment data bleed: writing config for (app=X, env=dev) would overwrite the row used by (app=X, env=prod) agents, and agent startup fetches ignored env entirely. - V1 schema: application_config and app_settings are now PK (app, env). - Repositories: env-keyed finders/saves; env is the authoritative column, stamped on the stored JSON so the row agrees with itself. - ApplicationConfigController.getConfig is dual-mode — AGENT role uses JWT env claim (agents cannot spoof env); non-agent callers provide env via ?environment= query param. - AppSettingsController endpoints now require ?environment=. - SensitiveKeysAdminController fan-out iterates (app, env) slices so each env gets its own merged keys. - DiagramController ingestion stamps env on TaggedDiagram; ClickHouse route_diagrams INSERT + findProcessorRouteMapping are env-scoped. - AgentRegistrationController: environmentId is required on register; removed all "default" fallbacks from register/refresh/heartbeat auto-heal. - UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings, useAllAppSettings, useUpdateAppSettings) take env, wired to useEnvironmentStore at all call sites. - New ConfigEnvIsolationIT covers env-isolation for both repositories. Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 22:25:21 +02:00
- `AppSettings`, `AppSettingsRepository` — per-app-per-env settings config and persistence. Record carries `(applicationId, environment, …)`; repository methods are `findByApplicationAndEnvironment`, `findByEnvironment`, `save`, `delete(appId, env)`. `AppSettings.defaults(appId, env)` produces a default instance scoped to an environment.
- `ThresholdConfig`, `ThresholdRepository` — alerting threshold config and persistence
- `AuditService` — audit logging facade
- `AuditRecord`, `AuditResult`, `AuditCategory` (enum: `INFRA, AUTH, USER_MGMT, CONFIG, RBAC, AGENT, OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE`), `AuditRepository` — audit trail records and persistence
## http/ — Outbound HTTP primitives (cross-cutting)
- `OutboundHttpClientFactory` — interface: `clientFor(context)` returns memoized `CloseableHttpClient`
- `OutboundHttpProperties` — record: `trustAll, trustedCaPemPaths, defaultConnectTimeout, defaultReadTimeout, proxyUrl, proxyUsername, proxyPassword`
- `OutboundHttpRequestContext` — record of per-call TLS/timeout overrides; `systemDefault()` static factory
- `TrustMode` — enum: `SYSTEM_DEFAULT | TRUST_ALL | TRUST_PATHS`
## outbound/ — Admin-managed outbound connections
- `OutboundConnection` — record: id, tenantId, name, description, url, method, defaultHeaders, defaultBodyTmpl, tlsTrustMode, tlsCaPemPaths, hmacSecretCiphertext, auth, allowedEnvironmentIds, createdAt, createdBy (String user_id), updatedAt, updatedBy (String user_id). `isAllowedInEnvironment(envId)` returns true when allowed-envs list is empty OR contains the env.
- `OutboundAuth` — sealed interface + records: `None | Bearer(tokenCiphertext) | Basic(username, passwordCiphertext)`. Jackson `@JsonTypeInfo(use = DEDUCTION)` — wire shape has no discriminator, subtype inferred from fields.
- `OutboundAuthKind`, `OutboundMethod` — enums
- `OutboundConnectionRepository` — CRUD by (tenantId, id): save/findById/findByName/listByTenant/delete
- `OutboundConnectionService` — create/update/delete/get/list with uniqueness + narrow-envs + delete-if-referenced guards. `rulesReferencing(id)` stubbed in Plan 01 (returns `[]`); populated in Plan 02 against `AlertRuleRepository`.
## security/ — Auth
- `JwtService` — interface: createAccessToken, createRefreshToken, validateAccessToken, validateRefreshToken
- `Ed25519SigningService` — interface: sign, getPublicKeyBase64 (config signing)
- `OidcConfig` — record: enabled, issuerUri, clientId, clientSecret, rolesClaim, defaultRoles, autoSignup, displayNameClaim, userIdClaim, audience, additionalScopes
- `OidcConfigRepository` — persistence interface
- `PasswordPolicyValidator` — min 12 chars, 3-of-4 character classes, no username match
- `UserInfo`, `UserRepository` — user identity records and persistence
- `InvalidTokenException` — thrown on revoked/expired tokens
## ingestion/ — Buffered data pipeline
refactor(ingestion): drop dead legacy execution-ingestion path ExecutionController was @ConditionalOnMissingBean(ChunkAccumulator.class), and ChunkAccumulator is registered unconditionally — the legacy controller never bound in any profile. Even if it had, IngestionService.ingestExecution called executionStore.upsert(), and the only ExecutionStore impl (ClickHouseExecutionStore) threw UnsupportedOperationException from upsert and upsertProcessors. The entire RouteExecution → upsert path was dead code carrying four transitive dependencies (RouteExecution import, eventPublisher wiring, body-size-limit config, searchIndexer::onExecutionUpdated hook). Removed: - cameleer-server-app/.../controller/ExecutionController.java (whole file) - ExecutionStore.upsert + upsertProcessors (interface methods) - ClickHouseExecutionStore.upsert + upsertProcessors (thrower overrides) - IngestionService.ingestExecution + toExecutionRecord + flattenProcessors + hasAnyTraceData + truncateBody + toJson/toJsonObject helpers - IngestionService constructor now takes (DiagramStore, WriteBuffer<Metrics>); dropped ExecutionStore + Consumer<ExecutionUpdatedEvent> + bodySizeLimit - StorageBeanConfig.ingestionService(...) simplified accordingly Untouched because still in use: - ExecutionRecord / ProcessorRecord records (findById / findProcessors / SearchIndexer / DetailController) - SearchIndexer (its onExecutionUpdated never fires now since no-one publishes ExecutionUpdatedEvent, but SearchIndexerStats is still referenced by ClickHouseAdminController — separate cleanup) - TaggedExecution record has no remaining callers after this change — flagged in core-classes.md as a leftover; separate cleanup. Rule docs updated: - .claude/rules/app-classes.md: retired ExecutionController bullet, fixed stale URL for ChunkIngestionController (it owns /api/v1/data/executions, not /api/v1/ingestion/chunk/executions). - .claude/rules/core-classes.md: IngestionService surface + note the dead TaggedExecution. Full IT suite post-removal: 560 tests run, 11 F + 1 E — same 12 failures in the same 3 previously-parked classes (AgentSseControllerIT / SseSigningIT SSE-timing + ClickHouseStatsStoreIT timezone bug). No regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 22:50:51 +02:00
- `IngestionService` — diagram + metrics facade (`ingestDiagram`, `acceptMetrics`, `getMetricsBuffer`). Execution ingestion went through here via the legacy `RouteExecution` shape until `ChunkAccumulator` took over writes from the chunked pipeline — the `ingestExecution` path plus its `ExecutionStore.upsert` / `upsertProcessors` dependencies were removed.
- `ChunkAccumulator` — batches data for efficient flush; owns the execution write path (chunks → buffers → flush scheduler → `ClickHouseExecutionStore.insertExecutionBatch`).
- `WriteBuffer` — bounded ring buffer for async flush
- `BufferedLogEntry` — log entry wrapper with metadata
- `MergedExecution`, `TaggedDiagram` — tagged ingestion records. `TaggedDiagram` carries `(instanceId, applicationId, environment, graph)` — env is resolved from the agent registry in the controller and stamped on the ClickHouse `route_diagrams` row.