cameleer-server

Author	SHA1	Message	Date
hsiegeln	945ecd78cf	feat(license): LicenseUsageController GET /api/v1/admin/license/usage Returns state, expiresAt/daysRemaining, lastValidatedAt, message (LicenseMessageRenderer.forState), and a limits[] array where each entry carries key/current/cap/source ("license" vs "default"). Adds public AgentRegistryService.liveCount() so max_agents can be reported from the in-memory registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:42:39 +02:00
hsiegeln	cc5d88d708	feat(license): surface execution/log/metric retention days on Environment Adds three int fields to the Environment record + repository row mapper, matching the columns added in V5. Default value is 1 per the V5 NOT NULL DEFAULT 1. Read DTO surfaces the fields via Jackson record serialization; setter endpoint deferred to a follow-up that wires the corresponding license cap checks. The canonical constructor enforces >= 1 for each retention field — V5 guarantees this at the DB level, but the runtime guard catches in-memory construction errors (e.g., test sites that pass 0). Test sites updated to the 12-arg signature with retention defaults of 1. EnvironmentAdminControllerIT gains a regression test asserting the wire shape exposes all three fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:22:40 +02:00
hsiegeln	71f3b70b86	feat(license): enforce max_alert_rules at AlertRuleController.create Adds AlertRuleRepository.count() and a LicenseEnforcer.assertWithinCap call at the top of the POST handler. Default cap = 2; the 3rd rule gets the standard 403 envelope. Sibling alert ITs that legitimately need more than 2 rules get the cap lifted via the test-license helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:50:59 +02:00
hsiegeln	1ff30905f7	feat(license): enforce max_users at user creation paths Wires LicenseEnforcer into UserAdminController.createUser and OidcAuthController auto-signup. Cap fires before any validation so over-cap creates short-circuit cheaply. Audit emission already present (LicenseEnforcer 3-arg ctor from T16 emits cap_exceeded under AuditCategory.LICENSE). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:29:54 +02:00
hsiegeln	afdaee628b	feat(license): enforce max_agents at AgentRegistryService.register Adds a CreateGuard to AgentRegistryService that fires only on NEW registrations: re-registers of an existing agent bypass the cap (they don't grow the registry, and rejecting them would orphan an agent that already counts against the cap). Live-only count for cap enforcement — STALE/DEAD/SHUTDOWN agents are excluded so the cap reflects the working fleet, not historical residue. Reuses the CreateGuard pattern from T18-T19. The global LicenseExceptionAdvice maps the resulting LicenseCapExceededException to 403 with the structured envelope — no AgentRegistrationController changes needed. AgentCapEnforcementIT exercises the HTTP path end-to-end: two registers succeed at cap=2, a third returns 403 with the expected envelope, and a re-register of an already-registered agent succeeds at-cap. Sibling agent-registering ITs (AgentControllerIT, DiagramIT, ExecutionIT, SearchIT, ProtocolIT, BackpressureIT, JwtRefreshIT, RegistrationIT, SecurityIT, SseSigningIT, IngestionSchemaIT) lift max_agents in @BeforeEach and clear the synthetic license in @AfterEach — the in-memory registry is shared across @SpringBootTest reuse boundaries, so without the lift the default-tier max_agents=5 would be exhausted by accumulated test residue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:19:08 +02:00
hsiegeln	80dafe685b	feat(license): enforce max_apps at AppService.createApp Adds CreateGuard hook to AppService.createApp using the same pattern as T18 (EnvironmentService). AppRepository.count() added; the bean wires LicenseEnforcer.assertWithinCap("max_apps", current, 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:36:34 +02:00
hsiegeln	8a64a9e04c	feat(license): enforce max_environments at EnvironmentService.create Adds CreateGuard functional interface to core (preserves the no-Spring boundary between core and app) and wires LicenseEnforcer into the EnvironmentService bean in RuntimeBeanConfig so POST /api/v1/admin/environments rejects with the structured 403 envelope (error/limit/cap/state/message) once the cap is reached. Default tier max_environments=1; the V1 baseline seeds the default env, so the very next create through the API is rejected unless a license lifts the cap. Also adds EnvironmentRepository.count() (with PostgresEnvironmentRepository impl), TestSecurityHelper.installTestLicenseWithCaps(...) so existing ITs that POST envs keep working, and a defensive cleanup in LicenseUsageReaderIT/EnvironmentAdminControllerIT to stay order-independent under Testcontainer reuse (deletes deployments+apps before envs to avoid FK violations). Test: EnvironmentCapEnforcementIT (new) drives the rejection path end-to-end and asserts the 403 body shape produced by LicenseExceptionAdvice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:16:41 +02:00
hsiegeln	2f75b2865b	feat(license): add AuditCategory.LICENSE Tasks downstream (LicenseService, LicenseEnforcer) audit under this category for install_license / replace_license / reject_license / revalidate_license / cap_exceeded actions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:06:07 +02:00
hsiegeln	0499a54ebc	feat(license): rewrite LicenseGate around state + effective limits LicenseGate now exposes getState() (delegates to LicenseStateMachine), getEffectiveLimits() (merged over DefaultTierLimits in ACTIVE/GRACE, defaults-only in ABSENT/EXPIRED/INVALID), markInvalid(reason), and clear(). Internal snapshot is an immutable record-like class swapped atomically so concurrent reads see a consistent license+reason pair. Removes the transient openSentinel() and getTier() introduced by earlier tasks (no production consumers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:48:56 +02:00
hsiegeln	ddc0b686c3	feat(license): add LicenseLimits, DefaultTierLimits, LicenseStateMachine Pure-domain FSM (ABSENT/ACTIVE/GRACE/EXPIRED/INVALID) and the default-tier constants per spec §3. invalidReason wins over any loaded license so signature failures surface as INVALID rather than masking as ABSENT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:47:10 +02:00
hsiegeln	cf84d80de7	feat(license): require licenseId + tenantId in validator Spec §2.1 — both fields are required and the validator rejects a token whose tenantId does not match the server's configured tenant (CAMELEER_SERVER_TENANT_ID). Self-hosted customers cannot strip tenantId because the field is in the signed payload. LicenseBeanConfig and LicenseAdminController updated to pass the expected tenant to the validator constructor. The transient placeholder/TODO from Task 2 is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:40:04 +02:00
hsiegeln	2ebe4989bb	feat(license): expand LicenseInfo with licenseId, tenantId, grace period Required fields per spec §2.1. tenantId is non-blank; gracePeriodDays defines the post-exp window during which limits keep applying. isExpired() now honours the grace; isAfterRawExpiry() distinguishes ACTIVE from GRACE for the state machine in Task 4. Validator and gate use placeholder values temporarily; Task 3 wires the validator to read the new fields, Task 5 rewrites the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:33:16 +02:00
hsiegeln	551a7f12b5	refactor(license): remove dead Feature enum and isEnabled scaffolding Spec §9 — feature flags are out of scope for license enforcement. Drops Feature.java, LicenseGate.isEnabled, LicenseInfo.hasFeature, and the corresponding test cases. LicenseValidator now silently ignores any features array on the wire (no error). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:21:51 +02:00
hsiegeln	c5b6f2bbad	fix(dirty-state): exclude live-pushed fields from deploy diff All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m13s Details CI / docker (push) Successful in 1m2s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details SonarQube / sonarqube (push) Successful in 5m49s Details Live-pushed config fields (taps, tapVersion, tracedProcessors, routeRecording) apply via SSE CONFIG_UPDATE — they take effect on running agents without a redeploy and are fetched on agent restart from application_config. They must not contribute to the "pending deploy" diff against the last-successful-deployment snapshot. Before this fix, applying a tap from the process diagram correctly rolled out in real time but then marked the app "Pending Deploy (1)" because DirtyStateCalculator compared every agentConfig field. This also contradicted the UI rule (ui.md) that the live tabs "never mark dirty". Adds taps, tapVersion, tracedProcessors, routeRecording to AGENT_CONFIG_IGNORED_KEYS. Updates the nested-path test to use a staged field (sensitiveKeys) and adds a new test asserting that divergent live-push fields keep dirty=false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 14:42:07 +02:00
hsiegeln	f9b5f235cc	feat(search): extend SearchRequest with attributeFilters (legacy ctor preserved)	2026-04-24 09:59:05 +02:00
hsiegeln	0b419db9f1	feat(search): add AttributeFilter record with key regex + wildcard pattern translation	2026-04-24 09:51:28 +02:00
hsiegeln	d58c8cde2e	feat(server): REST API over server_metrics for SaaS dashboards Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control planes can build the server-health dashboard without direct ClickHouse access. One generic /query endpoint covers every panel in the server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag, filter-by-tag, counter-delta mode with per-server_instance_id rotation handling, and a derived 'mean' statistic for timers. Regex-validated identifiers, parameterised literals, 31-day range cap, 500-series response cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated: all 17 suggested panels now expressed as single-endpoint queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:41:02 +02:00
hsiegeln	48ce75bf38	feat(server): persist server self-metrics into ClickHouse Snapshot the full Micrometer registry (cameleer business metrics, alerting metrics, and Spring Boot Actuator defaults) every 60s into a new server_metrics table so server health survives restarts without an external Prometheus. Includes a dashboard-builder reference for the SaaS team. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:20:45 +02:00
hsiegeln	c7e5c7fa2d	refactor(diagrams): retire findContentHashForRouteByAgents All production callers migrated to findLatestContentHashForAppRoute in the preceding commits. The agent-scoped lookup adds no coverage beyond the latest-per-(app,env,route) resolver, so the dead API is removed along with its test coverage and unused imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:02:47 +02:00
hsiegeln	d3ce5e861b	feat(diagrams): add findLatestContentHashForAppRoute with app-route cache Agent-scoped lookups miss diagrams from routes whose publishing agents have been redeployed or removed. The new method resolves by (applicationId, environment, routeId) + created_at DESC, independent of the agent registry. An in-memory cache mirrors the existing hashCache pattern, warm-loaded at startup via argMax. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:58:49 +02:00
hsiegeln	21db92ff00	fix(traefik): make TLS cert resolver configurable, omit when unset All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m15s Details CI / docker (push) Successful in 1m3s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 42s Details Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on every router. That assumes a resolver literally named `default` exists in the Traefik static config — true for ACME-backed installs, false for dev/local installs that use a file-based TLS store. Traefik logs "Router uses a nonexistent certificate resolver" for the bogus resolver on every managed app, and any future attempt to define a differently- named real resolver would silently skip these routers. Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver` into `ResolvedContainerConfig.certResolver`. When blank the `tls.certresolver` label is omitted entirely; `tls=true` is still emitted so Traefik serves the default TLS-store cert. When set, the label is emitted with the configured resolver name. Not per-app/per-env configurable: there is one Traefik per server instance and one resolver config; app-level override would only let users break their own routers. TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank). Full unit suite 211/0/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:47 +02:00
hsiegeln	165c9f10e3	feat(deploy): externalRouting toggle to keep apps off Traefik All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m26s Details CI / docker (push) Successful in 1m5s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details Adds a boolean `externalRouting` flag (default `true`) on ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only the identity labels (`managed-by`, `cameleer.`) and skips every `traefik.` label, so the container is not published by Traefik. Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}` can still reach it via Docker DNS on whatever port the app listens on. TDD: new TraefikLabelBuilderTest covers enabled (default labels present), disabled (zero traefik.* labels), and disabled (identity labels retained) cases. Full module unit suite: 208/0/0. Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form state, Resources tab toggle, POST payload, and snapshot-to-form mapping. Rule files updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:03:48 +02:00
hsiegeln	382e1801a7	feat(logs): add instanceIds multi-value filter to /logs endpoint Adds List<String> instanceIds to LogSearchRequest (null-normalized to List.of() in compact ctor) and generates an IN clause in both ClickHouseLogStore.search() and countLogs(), mirroring the existing sources pattern. LogQueryController parses ?instanceIds= as a comma-split list. All existing LogSearchRequest call sites updated. New ClickHouseLogStoreInstanceIdsIT covers: multi-value filter, empty filter (all rows), null filter (all rows), single-value filter, and coexistence with the singular instanceId field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:41:09 +02:00
hsiegeln	a141e99a07	feat(deploy): cascade createdBy through Deployment record + service + repo Appends String createdBy to the Deployment record (after createdAt), updates both with-er methods to pass it through, threads the parameter through DeploymentRepository.create, DeploymentService.createDeployment/promote, and PostgresDeploymentRepository (INSERT + SELECT_COLS + mapRow). DeploymentController passes null as placeholder (Task 4 will resolve from SecurityContextHolder). Covers with PostgresDeploymentRepositoryCreatedByIT verifying round-trip via both createDeployment and promote. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:04:15 +02:00
hsiegeln	15d00f039c	feat(audit): add DEPLOYMENT audit category	2026-04-23 11:51:28 +02:00
hsiegeln	c6aef5ab35	fix(deploy): Checkpoints — preserve STOPPED history, fix filter + placement All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m4s Details CI / docker (push) Successful in 1m15s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details - Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment. STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty. Now only FAILED rows are pruned; STOPPED deployments are retained as restorable checkpoints (they still carry deployed_config_snapshot from their RUNNING window). - UI filter: any deployment with a snapshot is a checkpoint (was RUNNING\|DEGRADED only, which excluded the main case — the previous blue/green deployment now in STOPPED). - UI placement: Checkpoints disclosure now renders inside IdentitySection, matching the design spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:26:46 +02:00
hsiegeln	5304c8ee01	core(deploy): DeploymentStrategy enum with safe wire conversion Typed enum (BLUE_GREEN, ROLLING) with fromWire/toWire kebab-case translation. fromWire falls back to BLUE_GREEN for unknown or null input so the executor dispatch site never null-checks and no misconfigured container-config can throw at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:42:35 +02:00
hsiegeln	f8dccaae2b	fix(deploy): stop previous active deployment before START_REPLICAS (fixes 409) Container names are deterministic: {tenant}-{envSlug}-{appSlug}-{replica}. The prior code did the stop-existing step at SWAP_TRAFFIC, after START_REPLICAS had already tried to create containers with the same names — so a redeploy against a RUNNING app consistently failed with Docker 409 "container name already in use". Move the stop-existing block to run right after CREATE_NETWORK and before START_REPLICAS. SWAP_TRAFFIC becomes a label-only marker (traffic is swapped implicitly by Traefik labels once new replicas are healthy). Also: add `findActiveByAppIdAndEnvironmentIdExcluding` so the SQL excludes the current deployment by id — previously the Java-side `!id.equals(me)` guard failed because the newly-inserted row has status=STARTING (DB default) and ORDER BY created_at DESC LIMIT 1 picked the new row, hiding the actual previous deployment. Trade-off: this is destroy-then-start rather than true blue/green — brief downtime during the swap. Matches the pre-unified-page behavior and is what users reasonably expect. True blue/green would require per-deployment container names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 01:01:00 +02:00
hsiegeln	d33c039a17	fix(deploy): address final review — sensitiveKeys snapshot, dirty scrubbing, transition race, refetch invalidations - Issue 1: add List<String> sensitiveKeys as 4th field to DeploymentConfigSnapshot; populate from agentConfig.getSensitiveKeys() in DeploymentExecutor; handleRestore hydrates from snap.sensitiveKeys directly; Deployment type in apps.ts gains sensitiveKeys field - Issue 2: after createApp succeeds, refetchQueries(['apps', envSlug]) before navigate so the new app is in cache before the router renders the deployed view (eliminates transient Save- disabled flash) - Issue 3: useDeploymentPageState useEffect now uses prevServerStateRef to detect local edits; background refetches only overwrite form when no local changes are present - Issue 5: handleRedeploy invalidates dirty-state + versions queries after createDeployment resolves; handleSave invalidates dirty-state after staged save - Issue 10: DirtyStateCalculator strips volatile agentConfig keys (version, updatedAt, updatedBy, environment, application) before JSON comparison via scrubAgentConfig(); adds versionBumpDoesNotMarkDirty test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 23:29:01 +02:00
hsiegeln	97f25b4c7e	test(deploy): register JavaTimeModule in DirtyStateCalculator unit test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:38:57 +02:00
hsiegeln	6591f2fde3	api(apps): GET /apps/{slug}/dirty-state returns desired-vs-deployed diff Wires DirtyStateCalculator behind an HTTP endpoint on AppController. Adds findLatestSuccessfulByAppAndEnv to PostgresDeploymentRepository, registers DirtyStateCalculator as a Spring bean (with ObjectMapper for JavaTimeModule support), and covers all three scenarios with IT. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:35:35 +02:00
hsiegeln	24464c0772	core(deploy): recurse into nested diffs + unquote scalar values in DirtyStateCalculator - compareJson now recurses when both nodes are ObjectNode, so nested maps (tracedProcessors, routeRecording, routeSamplingRates) produce deep paths like agentConfig.tracedProcessors.proc-1 instead of a blob diff - Extract nodeToString helper: value nodes use asText() (strips JSON quotes), null becomes "(none)", arrays/objects get compact JSON - Apply nodeToString in both diff-emission paths (top-level mismatch + leaf) - Add three new tests: nullAgentConfigInSnapshot, nestedAgentField_reportsDeepPath, stringField_differenceValueIsUnquoted (8 tests total, all pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:25:04 +02:00
hsiegeln	e4ccce1e3b	core(deploy): add DirtyStateCalculator + DirtyStateResult Pure-logic dirty-state detection: compares desired JAR + agent config + container config against the DeploymentConfigSnapshot from the last successful deployment. Returns a structured DirtyStateResult with per-field differences. 5 unit tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:20:49 +02:00
hsiegeln	7f9cfc7f18	core(deploy): add deployedConfigSnapshot field to Deployment model Appends DeploymentConfigSnapshot deployedConfigSnapshot to the Deployment record and adds a matching withDeployedConfigSnapshot wither. All positional call sites (repository mapper, test fixture) updated to pass null; Task 1.4 will wire real persistence and Task 1.5 will populate the field on RUNNING transition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:31:48 +02:00
hsiegeln	06fa7d832f	core(deploy): type jarVersionId as UUID (match domain convention) All other FKs to app_versions.id (e.g. Deployment.appVersionId) use UUID; DeploymentConfigSnapshot.jarVersionId was incorrectly typed as String. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:29:26 +02:00
hsiegeln	d580b6e90c	core(deploy): add DeploymentConfigSnapshot record Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:26:30 +02:00
hsiegeln	c2eab71a31	env(admin): per-environment color field + V2 migration - V2__add_environment_color.sql adds a CHECK-constrained VARCHAR color column (default 'slate'); existing rows backfill to slate. - Environment record + EnvironmentColor constants (8 preset values) flow through repository, service, and admin API. - UpdateEnvironmentRequest.color nullable: null preserves existing; unknown values → 400. - ITs cover valid / invalid / null-preserves behaviour; existing Environment constructor call-sites updated with the new color arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 19:24:30 +02:00
hsiegeln	e483e52eee	alerting(core): drop unused perExchangeLingerSeconds from ExchangeMatchCondition Dead field — was enforced by compact ctor as required for PER_EXCHANGE, but never read anywhere in the codebase. Removal tightens the API surface and is precondition for the Task 3.3 cross-field validator. Pre-prod; no shim / migration.	2026-04-22 17:10:53 +02:00
hsiegeln	0bad014811	core(alerting): AlertRule.withEvalState wither for cursor threading	2026-04-22 16:04:55 +02:00
hsiegeln	b41f34c090	search: SearchRequest.afterExecutionId — composite (startTime, execId) predicate Adds an optional afterExecutionId field to SearchRequest. When combined with a non-null timeFrom, ClickHouseSearchIndex applies a strictly-after tuple predicate (start_time > ts OR (start_time = ts AND execution_id > id)) so same-millisecond exchanges can be consumed exactly once across ticks. When afterExecutionId is null, timeFrom keeps its existing >= semantics — no behaviour change for any current caller. Also adds the SearchRequest.withCursor(ts, id) wither. Threads the field through existing withInstanceIds / withEnvironment witheres. All existing positional call-sites (SearchController, ExchangeMatchEvaluator, ClickHouseSearchIndexIT, ClickHouseChunkPipelineIT) pass null for the new slot. Task 1.2 of docs/superpowers/plans/2026-04-22-per-exchange-exactly-once.md. The evaluator-side wiring that actually supplies the cursor is Task 1.5.	2026-04-22 15:49:05 +02:00
hsiegeln	06c6f53bbc	refactor(ingestion): remove unused TaggedExecution record No callers after the legacy PG ingestion path was retired in `0f635576`. core-classes.md updated to drop the leftover note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 23:33:26 +02:00
hsiegeln	98cbf8f3fc	refactor(search): drop dead SearchIndexer subsystem After the ExecutionController removal (`0f635576`), SearchIndexer subscribed to ExecutionUpdatedEvent but nothing publishes that event. Every SearchIndexerStats metric returned always-zero, and the admin /api/v1/admin/clickhouse/pipeline endpoint that surfaced those stats carried no signal. Backend removed: - core: SearchIndexer, SearchIndexerStats, ExecutionUpdatedEvent - app: IndexerPipelineResponse DTO, /pipeline endpoint on ClickHouseAdminController (field + ctor param) - StorageBeanConfig.searchIndexer bean UI removed: - IndexerPipeline type + useIndexerPipeline hook in api/queries/admin/clickhouse.ts - Indexer Pipeline card in ClickHouseAdminPage.tsx (plus ProgressBar import and pipeline* CSS classes) OpenAPI schema.d.ts + openapi.json regenerated (stale /pipeline path and IndexerPipelineResponse schema removed). SearchIndex interface + ClickHouseSearchIndex impl kept — those are live and used by SearchService + ExchangeMatchEvaluator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 23:32:49 +02:00
hsiegeln	0f635576a3	refactor(ingestion): drop dead legacy execution-ingestion path ExecutionController was @ConditionalOnMissingBean(ChunkAccumulator.class), and ChunkAccumulator is registered unconditionally — the legacy controller never bound in any profile. Even if it had, IngestionService.ingestExecution called executionStore.upsert(), and the only ExecutionStore impl (ClickHouseExecutionStore) threw UnsupportedOperationException from upsert and upsertProcessors. The entire RouteExecution → upsert path was dead code carrying four transitive dependencies (RouteExecution import, eventPublisher wiring, body-size-limit config, searchIndexer::onExecutionUpdated hook). Removed: - cameleer-server-app/.../controller/ExecutionController.java (whole file) - ExecutionStore.upsert + upsertProcessors (interface methods) - ClickHouseExecutionStore.upsert + upsertProcessors (thrower overrides) - IngestionService.ingestExecution + toExecutionRecord + flattenProcessors + hasAnyTraceData + truncateBody + toJson/toJsonObject helpers - IngestionService constructor now takes (DiagramStore, WriteBuffer<Metrics>); dropped ExecutionStore + Consumer<ExecutionUpdatedEvent> + bodySizeLimit - StorageBeanConfig.ingestionService(...) simplified accordingly Untouched because still in use: - ExecutionRecord / ProcessorRecord records (findById / findProcessors / SearchIndexer / DetailController) - SearchIndexer (its onExecutionUpdated never fires now since no-one publishes ExecutionUpdatedEvent, but SearchIndexerStats is still referenced by ClickHouseAdminController — separate cleanup) - TaggedExecution record has no remaining callers after this change — flagged in core-classes.md as a leftover; separate cleanup. Rule docs updated: - .claude/rules/app-classes.md: retired ExecutionController bullet, fixed stale URL for ChunkIngestionController (it owns /api/v1/data/executions, not /api/v1/ingestion/chunk/executions). - .claude/rules/core-classes.md: IngestionService surface + note the dead TaggedExecution. Full IT suite post-removal: 560 tests run, 11 F + 1 E — same 12 failures in the same 3 previously-parked classes (AgentSseControllerIT / SseSigningIT SSE-timing + ClickHouseStatsStoreIT timezone bug). No regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 22:50:51 +02:00
hsiegeln	fb54f9cbd2	fix(agent): revive DEAD agents on heartbeat (not just STALE) Some checks failed CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m5s Details CI / deploy (push) Has been cancelled Details CI / deploy-feature (push) Has been cancelled Details CI / docker (push) Has been cancelled Details Reproduction: pause a container long enough to cross both the stale and dead thresholds, then unpause. The agent resumes sending heartbeats but the server keeps it shown as DEAD. Only a full container restart (which re-registers) fixes it. Root cause: AgentRegistryService.heartbeat() only revived STALE → LIVE. A DEAD agent's heartbeat updated lastHeartbeat but left state unchanged. checkLifecycle() never downgrades DEAD either (no-op in that branch), so the agent was permanently stuck in DEAD until a register() call. Fix: extend the revival branch to also cover DEAD. Same process; a heartbeat is proof of liveness regardless of the previous state. Also: AgentLifecycleMonitor.mapTransitionEvent() now emits RECOVERED for DEAD → LIVE, mirroring its behavior for STALE → LIVE, so the lifecycle timeline captures the transition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 20:55:47 +02:00
hsiegeln	99b739d946	fix(alerts): backend hardening + complete ACKNOWLEDGED migration - new AlertInstanceRepository.filterInEnvLive(ids, env): single-query bulk ID validation - AlertController.inEnvLiveIds now one SQL round-trip instead of N - bulkMarkRead SQL: defense-in-depth AND deleted_at IS NULL - bulkAck SQL already had deleted_at IS NULL guard — no change needed - PostgresAlertInstanceRepositoryIT: add filterInEnvLive_excludes_other_env_and_soft_deleted - V12MigrationIT: remove alert_reads assertion (table dropped by V17) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 18:48:57 +02:00
hsiegeln	da2819332c	feat(alerts): Postgres repo — read_at/deleted_at columns, filter params, new mutations - save/rowMapper read+write read_at and deleted_at - listForInbox: tri-state acked/read filters; always excludes deleted - countUnreadBySeverity: rewire without alert_reads join, preserve zero-fill - new: markRead/bulkMarkRead/softDelete/bulkSoftDelete/bulkAck/restore - delete PostgresAlertReadRepository + its bean - restore zero-fill Javadoc on interface - mechanical compile-fixes in AlertController, InAppInboxQuery, AlertControllerIT, InAppInboxQueryTest; Task 6 owns the rewrite - PostgresAlertReadRepositoryIT stubbed @Disabled; Task 7 owns migration Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 17:56:06 +02:00
hsiegeln	55b2a00458	feat(alerts): core repo — filter params + markRead/softDelete/bulkAck/restore; drop AlertReadRepository - listForInbox gains tri-state acked/read filter params - countUnreadBySeverityForUser(envId, userId) → countUnreadBySeverity(envId, userId, groupIds, roleNames) - new methods: markRead, bulkMarkRead, softDelete, bulkSoftDelete, bulkAck, restore - delete AlertReadRepository — read is now global on alert_instances.read_at Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 17:38:10 +02:00
hsiegeln	82e82350f9	refactor(alerts): drop ACKNOWLEDGED from AlertState, add readAt/deletedAt to AlertInstance - AlertState: remove ACKNOWLEDGED case (V17 migration already dropped it from DB enum) - AlertInstance: insert readAt + deletedAt Instant fields after lastNotifiedAt; add withReadAt/withDeletedAt withers; update all existing withers to pass both fields positionally - AlertStateTransitions: add null,null for readAt/deletedAt in newInstance ctor call; collapse FIRING,ACKNOWLEDGED switch arm to just FIRING - AlertScopeTest: update AlertState.values() assertion to 3 values; fix stale ConditionKind.hasSize(6) to 7 (JVM_METRIC was added earlier) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 17:12:37 +02:00
hsiegeln	414f7204bf	feat(alerting): AGENT_LIFECYCLE condition kind with per-subject fire mode Allows alert rules to fire on agent-lifecycle events — REGISTERED, RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather than only on current state. Each matching `(agent, eventType, timestamp)` becomes its own ackable AlertInstance, so outages on distinct agents are independently routable. Core: - New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record (scope, eventTypes, withinSeconds). Compact ctor rejects empty eventTypes and withinSeconds<1. - Strict allowlist enum `AgentLifecycleEventType` (six entries matching the server-emitted types in `AgentRegistrationController` and `AgentLifecycleMonitor`). Custom agent-emitted event types tracked in backlog issue #145. - `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes, from, to, limit)` — new read path ordered `(timestamp ASC, insert_id ASC)` used by the evaluator. Implemented on `ClickHouseAgentEventRepository` with tenant + env filter mandatory. App: - `AgentLifecycleEvaluator` queries events in the last `withinSeconds` window and returns `EvalResult.Batch` with one `Firing` per row. Every Firing carries a canonical `_subjectFingerprint` of `"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event` subtrees for Mustache templating. - `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}`. - Validation is delegated to the record compact ctor + enum at Jackson deserialization time — matches the existing policy of keeping controller validators focused on env-scoped / SQL-injection concerns. Schema: - V16 migration generalises the V15 per-exchange discriminator on `alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a fallback to the legacy `exchange.id` expression. Scalar kinds still resolve to `''` and keep one-open-per-rule. Duplicate-key path in `PostgresAlertInstanceRepository.save` is unchanged — the index is the deduper. UI: - New `AgentLifecycleForm.tsx` wizard form with multi-select chips for the six allowed event types + `withinSeconds` input. Wired into `ConditionStep`, `form-state` (validation + defaults: WENT_DEAD, 300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the new option array. - `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`, `{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind, and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`. Tests (all passing): - 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive + empty/zero/unknown-type rejection). - 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty window, multi-agent fingerprint shape, scope forwarding, missing env). - `NotificationContextBuilderTest` switch now covers the new kind. - 119 alerting unit tests + 71 UI tests green. Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.	2026-04-21 14:52:08 +02:00
hsiegeln	f037d8c922	feat(alerting): server-side state+severity filters, ButtonGroup filter UI Backend: `GET /environments/{envSlug}/alerts` now accepts optional multi-value `state=…` and `severity=…` query params. Filters are pushed down to PostgresAlertInstanceRepository, which appends `AND state::text = ANY(?)` / `AND severity::text = ANY(?)` to the inbox query (null/empty = no filter). `AlertInstanceRepository.listForInbox` gained a 7-arg overload; the old 5-arg form is preserved as a default delegate so existing callers (evaluator, AlertingFullLifecycleIT, PostgresAlertInstanceRepositoryIT) compile unchanged. `InAppInboxQuery.listInbox` also has a new filtered overload. UI: InboxPage severity filter migrated from `SegmentedTabs` (single-select, no color cues) to `ButtonGroup` (multi-select with severity-coloured dots), matching the topnavbar status-filter pattern. `useAlerts` forwards the filters as query params and cache-keys on the filter tuple so each combo is independently cached. Unit + hook tests updated to the new contract (5 UI tests + 8 Java unit tests passing). OpenAPI types regenerated from the fresh local backend.	2026-04-21 12:47:31 +02:00

1 2

83 Commits