cameleer-server

Author	SHA1	Message	Date
hsiegeln	9aad2f3871	docs(rules): document AttributeFilter + SearchController attr param All checks were successful CI / cleanup-branch (pull_request) Has been skipped Details CI / build (pull_request) Successful in 1m50s Details CI / docker (pull_request) Has been skipped Details CI / deploy (pull_request) Has been skipped Details CI / deploy-feature (pull_request) Has been skipped Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 11:22:27 +02:00
hsiegeln	f049a0a6a0	docs(rules): capture new DiagramStore method and registry-free lookup - app-classes: DiagramRenderController by-route endpoint no longer depends on the agent registry; points at findLatestContentHashForAppRoute and cross-refs the exchange viewer's content-hash path. - core-classes: document the new DiagramStore method and note why the agent-scoped findContentHashForRoute stays for the ingest path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:11:45 +02:00
hsiegeln	21db92ff00	fix(traefik): make TLS cert resolver configurable, omit when unset All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m15s Details CI / docker (push) Successful in 1m3s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 42s Details Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on every router. That assumes a resolver literally named `default` exists in the Traefik static config — true for ACME-backed installs, false for dev/local installs that use a file-based TLS store. Traefik logs "Router uses a nonexistent certificate resolver" for the bogus resolver on every managed app, and any future attempt to define a differently- named real resolver would silently skip these routers. Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver` into `ResolvedContainerConfig.certResolver`. When blank the `tls.certresolver` label is omitted entirely; `tls=true` is still emitted so Traefik serves the default TLS-store cert. When set, the label is emitted with the configured resolver name. Not per-app/per-env configurable: there is one Traefik per server instance and one resolver config; app-level override would only let users break their own routers. TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank). Full unit suite 211/0/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:47 +02:00
hsiegeln	165c9f10e3	feat(deploy): externalRouting toggle to keep apps off Traefik All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m26s Details CI / docker (push) Successful in 1m5s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details Adds a boolean `externalRouting` flag (default `true`) on ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only the identity labels (`managed-by`, `cameleer.`) and skips every `traefik.` label, so the container is not published by Traefik. Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}` can still reach it via Docker DNS on whatever port the app listens on. TDD: new TraefikLabelBuilderTest covers enabled (default labels present), disabled (zero traefik.* labels), and disabled (identity labels retained) cases. Full module unit suite: 208/0/0. Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form state, Resources tab toggle, POST payload, and snapshot-to-form mapping. Rule files updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:03:48 +02:00
hsiegeln	d192f6b57c	docs(rules): deployment audit + checkpoints table + SideDrawer + log instanceIds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 13:51:22 +02:00
hsiegeln	c6aef5ab35	fix(deploy): Checkpoints — preserve STOPPED history, fix filter + placement All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m4s Details CI / docker (push) Successful in 1m15s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details - Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment. STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty. Now only FAILED rows are pruned; STOPPED deployments are retained as restorable checkpoints (they still carry deployed_config_snapshot from their RUNNING window). - UI filter: any deployment with a snapshot is a checkpoint (was RUNNING\|DEGRADED only, which excluded the main case — the previous blue/green deployment now in STOPPED). - UI placement: Checkpoints disclosure now renders inside IdentitySection, matching the design spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:26:46 +02:00
hsiegeln	007597715a	docs(rules): deployment strategies + generation suffix All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m8s Details CI / docker (push) Successful in 1m30s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 46s Details Refresh the three rules files to match the new executor behavior: - docker-orchestration.md: rewrite DeploymentExecutor Details with container naming scheme ({...}-{replica}-{generation}), strategy dispatch (blue-green vs rolling), and the new DEGRADED semantics (post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder bullets for the generation suffix + new cameleer.generation label. - app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets mirror the same. - core-classes.md: add DeploymentStrategy enum; note DEGRADED is now post-deploy-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:02:51 +02:00
hsiegeln	c2eab71a31	env(admin): per-environment color field + V2 migration - V2__add_environment_color.sql adds a CHECK-constrained VARCHAR color column (default 'slate'); existing rows backfill to slate. - Environment record + EnvironmentColor constants (8 preset values) flow through repository, service, and admin API. - UpdateEnvironmentRequest.color nullable: null preserves existing; unknown values → 400. - ITs cover valid / invalid / null-preserves behaviour; existing Environment constructor call-sites updated with the new color arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 19:24:30 +02:00
hsiegeln	06c6f53bbc	refactor(ingestion): remove unused TaggedExecution record No callers after the legacy PG ingestion path was retired in `0f635576`. core-classes.md updated to drop the leftover note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 23:33:26 +02:00
hsiegeln	0f635576a3	refactor(ingestion): drop dead legacy execution-ingestion path ExecutionController was @ConditionalOnMissingBean(ChunkAccumulator.class), and ChunkAccumulator is registered unconditionally — the legacy controller never bound in any profile. Even if it had, IngestionService.ingestExecution called executionStore.upsert(), and the only ExecutionStore impl (ClickHouseExecutionStore) threw UnsupportedOperationException from upsert and upsertProcessors. The entire RouteExecution → upsert path was dead code carrying four transitive dependencies (RouteExecution import, eventPublisher wiring, body-size-limit config, searchIndexer::onExecutionUpdated hook). Removed: - cameleer-server-app/.../controller/ExecutionController.java (whole file) - ExecutionStore.upsert + upsertProcessors (interface methods) - ClickHouseExecutionStore.upsert + upsertProcessors (thrower overrides) - IngestionService.ingestExecution + toExecutionRecord + flattenProcessors + hasAnyTraceData + truncateBody + toJson/toJsonObject helpers - IngestionService constructor now takes (DiagramStore, WriteBuffer<Metrics>); dropped ExecutionStore + Consumer<ExecutionUpdatedEvent> + bodySizeLimit - StorageBeanConfig.ingestionService(...) simplified accordingly Untouched because still in use: - ExecutionRecord / ProcessorRecord records (findById / findProcessors / SearchIndexer / DetailController) - SearchIndexer (its onExecutionUpdated never fires now since no-one publishes ExecutionUpdatedEvent, but SearchIndexerStats is still referenced by ClickHouseAdminController — separate cleanup) - TaggedExecution record has no remaining callers after this change — flagged in core-classes.md as a leftover; separate cleanup. Rule docs updated: - .claude/rules/app-classes.md: retired ExecutionController bullet, fixed stale URL for ChunkIngestionController (it owns /api/v1/data/executions, not /api/v1/ingestion/chunk/executions). - .claude/rules/core-classes.md: IngestionService surface + note the dead TaggedExecution. Full IT suite post-removal: 560 tests run, 11 F + 1 E — same 12 failures in the same 3 previously-parked classes (AgentSseControllerIT / SseSigningIT SSE-timing + ClickHouseStatsStoreIT timezone bug). No regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 22:50:51 +02:00
hsiegeln	414f7204bf	feat(alerting): AGENT_LIFECYCLE condition kind with per-subject fire mode Allows alert rules to fire on agent-lifecycle events — REGISTERED, RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather than only on current state. Each matching `(agent, eventType, timestamp)` becomes its own ackable AlertInstance, so outages on distinct agents are independently routable. Core: - New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record (scope, eventTypes, withinSeconds). Compact ctor rejects empty eventTypes and withinSeconds<1. - Strict allowlist enum `AgentLifecycleEventType` (six entries matching the server-emitted types in `AgentRegistrationController` and `AgentLifecycleMonitor`). Custom agent-emitted event types tracked in backlog issue #145. - `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes, from, to, limit)` — new read path ordered `(timestamp ASC, insert_id ASC)` used by the evaluator. Implemented on `ClickHouseAgentEventRepository` with tenant + env filter mandatory. App: - `AgentLifecycleEvaluator` queries events in the last `withinSeconds` window and returns `EvalResult.Batch` with one `Firing` per row. Every Firing carries a canonical `_subjectFingerprint` of `"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event` subtrees for Mustache templating. - `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}`. - Validation is delegated to the record compact ctor + enum at Jackson deserialization time — matches the existing policy of keeping controller validators focused on env-scoped / SQL-injection concerns. Schema: - V16 migration generalises the V15 per-exchange discriminator on `alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a fallback to the legacy `exchange.id` expression. Scalar kinds still resolve to `''` and keep one-open-per-rule. Duplicate-key path in `PostgresAlertInstanceRepository.save` is unchanged — the index is the deduper. UI: - New `AgentLifecycleForm.tsx` wizard form with multi-select chips for the six allowed event types + `withinSeconds` input. Wired into `ConditionStep`, `form-state` (validation + defaults: WENT_DEAD, 300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the new option array. - `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind, and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`. Tests (all passing): - 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive + empty/zero/unknown-type rejection). - 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty window, multi-agent fingerprint shape, scope forwarding, missing env). - `NotificationContextBuilderTest` switch now covers the new kind. - 119 alerting unit tests + 71 UI tests green. Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.	2026-04-21 14:52:08 +02:00
hsiegeln	1dd1f10c0e	docs(rules): document http/ and outbound/ packages + admin controller Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:02:09 +02:00
hsiegeln	d40833b96a	docs(rules): refresh for insert_id UUID cursor + AgentEventPage All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m23s Details CI / docker (push) Successful in 1m10s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 37s Details - LogQueryController: note response shape, sort param, and that the cursor tiebreak is the insert_id UUID column (not exchange/instance) - AgentEventsController: cursor now carries insert_id UUID (was instanceId); order is (timestamp DESC, insert_id DESC) - core-classes: add AgentEventPage record; note that the non-paginated AgentEventRepository.query(...) path has been removed - core-classes: note LogSearchRequest.sources/levels are now List<String> with multi-value OR semantics Keeps the rule files in sync with the cursor-pagination + multi-select filter work on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:43:25 +02:00
hsiegeln	9b1ef51d77	feat!: scope per-app config and settings by environment All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m27s Details CI / docker (push) Successful in 1m10s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 1m40s Details SonarQube / sonarqube (push) Successful in 4m29s Details BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes. Agents must now send environmentId on registration (400 if missing). Two tables previously keyed on app name alone caused cross-environment data bleed: writing config for (app=X, env=dev) would overwrite the row used by (app=X, env=prod) agents, and agent startup fetches ignored env entirely. - V1 schema: application_config and app_settings are now PK (app, env). - Repositories: env-keyed finders/saves; env is the authoritative column, stamped on the stored JSON so the row agrees with itself. - ApplicationConfigController.getConfig is dual-mode — AGENT role uses JWT env claim (agents cannot spoof env); non-agent callers provide env via ?environment= query param. - AppSettingsController endpoints now require ?environment=. - SensitiveKeysAdminController fan-out iterates (app, env) slices so each env gets its own merged keys. - DiagramController ingestion stamps env on TaggedDiagram; ClickHouse route_diagrams INSERT + findProcessorRouteMapping are env-scoped. - AgentRegistrationController: environmentId is required on register; removed all "default" fallbacks from register/refresh/heartbeat auto-heal. - UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings, useAllAppSettings, useUpdateAppSettings) take env, wired to useEnvironmentStore at all call sites. - New ConfigEnvIsolationIT covers env-isolation for both repositories. Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 22:25:21 +02:00
hsiegeln	e2d9428dff	fix: drop stale instance_id filter from search and scope route stats by app All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m28s Details CI / docker (push) Successful in 1m11s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 42s Details The exchange search silently filtered by the in-memory agent registry's current instance IDs on top of application_id. Historical exchanges written by previous agent instances (or any instance not currently registered, e.g. after a server restart before agents heartbeat back) were hidden from results even though they matched the application filter. Fix: drop the applicationId -> instanceIds resolution in SearchController. Rely on application_id = ? in ClickHouseSearchIndex; keep explicit instanceIds filtering only when a client passes them. Related cleanup: the agentIds parameter on StatsStore.statsForRoute / timeseriesForRoute was silently discarded inside ClickHouseStatsStore, so per-route stats aggregated across any apps sharing a routeId. Replace with String applicationId and add application_id to the stats_1m_route filters so per-route stats are correctly scoped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 19:49:55 +02:00
hsiegeln	b77968bb2d	docs: update rule files with RouteCatalogStore classes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 18:50:39 +02:00
hsiegeln	810f493639	chore: track .claude/rules/ and add self-maintenance instruction All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m23s Details CI / docker (push) Successful in 5m22s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 44s Details Un-ignore .claude/rules/ so path-scoped rule files are shared via git. Add instruction in CLAUDE.md to update rule files when modifying classes, controllers, endpoints, or metrics — keeps rules current as part of normal workflow rather than requiring separate maintenance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 09:26:53 +02:00

17 Commits