cameleer-server

Author	SHA1	Message	Date
hsiegeln	0499a54ebc	feat(license): rewrite LicenseGate around state + effective limits LicenseGate now exposes getState() (delegates to LicenseStateMachine), getEffectiveLimits() (merged over DefaultTierLimits in ACTIVE/GRACE, defaults-only in ABSENT/EXPIRED/INVALID), markInvalid(reason), and clear(). Internal snapshot is an immutable record-like class swapped atomically so concurrent reads see a consistent license+reason pair. Removes the transient openSentinel() and getTier() introduced by earlier tasks (no production consumers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:48:56 +02:00
hsiegeln	cf84d80de7	feat(license): require licenseId + tenantId in validator Spec §2.1 — both fields are required and the validator rejects a token whose tenantId does not match the server's configured tenant (CAMELEER_SERVER_TENANT_ID). Self-hosted customers cannot strip tenantId because the field is in the signed payload. LicenseBeanConfig and LicenseAdminController updated to pass the expected tenant to the validator constructor. The transient placeholder/TODO from Task 2 is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:40:04 +02:00
hsiegeln	2ebe4989bb	feat(license): expand LicenseInfo with licenseId, tenantId, grace period Required fields per spec §2.1. tenantId is non-blank; gracePeriodDays defines the post-exp window during which limits keep applying. isExpired() now honours the grace; isAfterRawExpiry() distinguishes ACTIVE from GRACE for the state machine in Task 4. Validator and gate use placeholder values temporarily; Task 3 wires the validator to read the new fields, Task 5 rewrites the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:33:16 +02:00
hsiegeln	551a7f12b5	refactor(license): remove dead Feature enum and isEnabled scaffolding Spec §9 — feature flags are out of scope for license enforcement. Drops Feature.java, LicenseGate.isEnabled, LicenseInfo.hasFeature, and the corresponding test cases. LicenseValidator now silently ignores any features array on the wire (no error). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:21:51 +02:00
hsiegeln	8e9ad47077	feat(runtime): harden tenant containers + auto-detect gVisor (#152 ) Tenant JARs are arbitrary user code: Camel ships components (camel-exec, camel-bean, MVEL/Groovy templating) that turn a header into shell, and Java 17 has no SecurityManager — the JVM is not a security boundary. This applies an unconditional hardening contract to every tenant container so a single runc CVE no longer equals host takeover. DockerRuntimeOrchestrator.startContainer now sets: - cap_drop ALL (Capability.values() — docker-java has no ALL constant) - security_opt: no-new-privileges, apparmor=docker-default (default seccomp profile applies implicitly) - read_only rootfs, pids_limit=512 - /tmp tmpfs rw,nosuid,size=256m — no noexec, since Netty/Snappy/LZ4/Zstd dlopen native libs from /tmp via mmap(PROT_EXEC) which noexec blocks The orchestrator also probes `docker info` at construction and uses runsc (gVisor) automatically when the daemon has it registered. Override via cameleer.server.runtime.dockerruntime (e.g. "kata"); empty = auto. Outbound TCP, DNS, and TLS are unaffected — caps/seccomp don't gate those — so vanilla Camel-Kafka producers/consumers and REST integrations keep working unchanged. Stateful tenants (Kafka Streams with on-disk state stores, apps writing to /var/log/...) need explicit writeable volumes; that's tracked in #153 as the natural follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 20:58:26 +02:00
hsiegeln	f27a0044f1	refactor(search): align ResponseStatusException imports + add wildcard HTTP test	2026-04-24 10:30:42 +02:00
hsiegeln	5c9323cfed	feat(search): accept attr= multi-value query param on /executions GET Add a repeatable attr query parameter to the GET /executions endpoint that parses key-only (exists check) and key:value (exact or wildcard-via-*) filters. Invalid keys are mapped to HTTP 400 via ResponseStatusException. The POST /executions/search path already honoured attributeFilters from the request body via the Jackson canonical ctor; an IT now proves it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:23:52 +02:00
hsiegeln	2dcbd5a772	feat(search): push AttributeFilter list into ClickHouse WHERE clause Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:13:30 +02:00
hsiegeln	d58c8cde2e	feat(server): REST API over server_metrics for SaaS dashboards Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control planes can build the server-health dashboard without direct ClickHouse access. One generic /query endpoint covers every panel in the server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag, filter-by-tag, counter-delta mode with per-server_instance_id rotation handling, and a derived 'mean' statistic for timers. Regex-validated identifiers, parameterised literals, 31-day range cap, 500-series response cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated: all 17 suggested panels now expressed as single-endpoint queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:41:02 +02:00
hsiegeln	48ce75bf38	feat(server): persist server self-metrics into ClickHouse Snapshot the full Micrometer registry (cameleer business metrics, alerting metrics, and Spring Boot Actuator defaults) every 60s into a new server_metrics table so server health survives restarts without an external Prometheus. Includes a dashboard-builder reference for the SaaS team. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:20:45 +02:00
hsiegeln	f8e382c217	test(diagrams): add removed-route + point-in-time coverage Store-level: assert findLatestContentHashForAppRoute picks the newest hash across publishing instances (proves the lookup survives agent removal), isolates by (app, env), and returns empty for blank inputs. Controller-level: assert the env-scoped /routes/{routeId}/diagram endpoint resolves without a registry prerequisite, 404s for unknown routes, and that an execution's stored diagramContentHash stays pinned to the point-in-time version after a newer diagram is stored — the "latest" endpoint flips to v2, the by-hash render remains byte-stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:11:06 +02:00
hsiegeln	c7e5c7fa2d	refactor(diagrams): retire findContentHashForRouteByAgents All production callers migrated to findLatestContentHashForAppRoute in the preceding commits. The agent-scoped lookup adds no coverage beyond the latest-per-(app,env,route) resolver, so the dead API is removed along with its test coverage and unused imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:02:47 +02:00
hsiegeln	21db92ff00	fix(traefik): make TLS cert resolver configurable, omit when unset All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m15s Details CI / docker (push) Successful in 1m3s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 42s Details Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on every router. That assumes a resolver literally named `default` exists in the Traefik static config — true for ACME-backed installs, false for dev/local installs that use a file-based TLS store. Traefik logs "Router uses a nonexistent certificate resolver" for the bogus resolver on every managed app, and any future attempt to define a differently- named real resolver would silently skip these routers. Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver` into `ResolvedContainerConfig.certResolver`. When blank the `tls.certresolver` label is omitted entirely; `tls=true` is still emitted so Traefik serves the default TLS-store cert. When set, the label is emitted with the configured resolver name. Not per-app/per-env configurable: there is one Traefik per server instance and one resolver config; app-level override would only let users break their own routers. TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank). Full unit suite 211/0/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:47 +02:00
hsiegeln	165c9f10e3	feat(deploy): externalRouting toggle to keep apps off Traefik All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m26s Details CI / docker (push) Successful in 1m5s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details Adds a boolean `externalRouting` flag (default `true`) on ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only the identity labels (`managed-by`, `cameleer.`) and skips every `traefik.` label, so the container is not published by Traefik. Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}` can still reach it via Docker DNS on whatever port the app listens on. TDD: new TraefikLabelBuilderTest covers enabled (default labels present), disabled (zero traefik.* labels), and disabled (identity labels retained) cases. Full module unit suite: 208/0/0. Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form state, Resources tab toggle, POST payload, and snapshot-to-form mapping. Rule files updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:03:48 +02:00
hsiegeln	e36c82c4db	test(deploy): scope schema ITs to current_schema + clear deployments FK in teardown All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m59s Details CI / docker (push) Successful in 1m5s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 38s Details Surface from the Task 0 testcontainers.reuse enable: when the same Postgres container is reused across `mvn verify` runs, Flyway migrates both `public` and `tenant_default` schemas (the app.yml default URL uses ?currentSchema=tenant_default; AbstractPostgresIT overrides to public). Schema-introspection assertions saw duplicate rows/indexes/enums. Plus: OutboundConnectionAdminControllerIT's AfterEach couldn't delete its test users because sibling deployment ITs (Task 4) left deployments.created_by references — FK blocks the DELETE. Clear referencing deployments first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:06:56 +02:00
hsiegeln	382e1801a7	feat(logs): add instanceIds multi-value filter to /logs endpoint Adds List<String> instanceIds to LogSearchRequest (null-normalized to List.of() in compact ctor) and generates an IN clause in both ClickHouseLogStore.search() and countLogs(), mirroring the existing sources pattern. LogQueryController parses ?instanceIds= as a comma-split list. All existing LogSearchRequest call sites updated. New ClickHouseLogStoreInstanceIdsIT covers: multi-value filter, empty filter (all rows), null filter (all rows), single-value filter, and coexistence with the singular instanceId field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:41:09 +02:00
hsiegeln	2312a7304d	fix(deploy): widen promote FAILURE audit detail + clean up test envs	2026-04-23 12:29:46 +02:00
hsiegeln	47d5611462	feat(audit): audit deploy/stop/promote with DEPLOYMENT category Wires AuditService and AppVersionRepository into DeploymentController. Replaces null createdBy placeholder with currentUserId() on createDeployment/promote. Adds audit log entries (SUCCESS + FAILURE) for deploy_app, stop_deployment, and promote_deployment actions. Fixes FK violations in affected ITs by seeding the test-operator and alice users into the users table before deploy calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:24:27 +02:00
hsiegeln	9043dc00b0	test(deploy): clean up seeded users + document null createdBy placeholder Fix Issue 1: Add @AfterEach cleanup for alice/bob users in PostgresDeploymentRepositoryCreatedByIT to prevent test leakage (FK order: deployments -> app_versions -> apps, then users). Fix Issue 2: Add comment at first create(..., null) call site in PostgresDeploymentRepositoryIT documenting the null placeholder for pre-V4 rows where createdBy is nullable. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-23 12:10:21 +02:00
hsiegeln	a141e99a07	feat(deploy): cascade createdBy through Deployment record + service + repo Appends String createdBy to the Deployment record (after createdAt), updates both with-er methods to pass it through, threads the parameter through DeploymentRepository.create, DeploymentService.createDeployment/promote, and PostgresDeploymentRepository (INSERT + SELECT_COLS + mapRow). DeploymentController passes null as placeholder (Task 4 will resolve from SecurityContextHolder). Covers with PostgresDeploymentRepositoryCreatedByIT verifying round-trip via both createDeployment and promote. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:04:15 +02:00
hsiegeln	35748ea7a1	feat(deploy): V4 migration — add created_by to deployments	2026-04-23 11:44:05 +02:00
hsiegeln	242ef1f0af	perf(build): faster Maven + UI + CI pipelines All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m43s Details CI / docker (push) Successful in 4m13s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details - Maven: enable useIncrementalCompilation; Surefire forkCount=1C + reuseForks=true so unit-test JVMs are reused per CPU core instead of spawning per class (205 tests pass under the new strategy). - Testcontainers: opt-in reuse via .withReuse(true) on Postgres + ClickHouse base; per-developer enable via ~/.testcontainers.properties. - UI: drop redundant `tsc --noEmit` from `npm run build` (Vite already type-checks); split into a dedicated `npm run typecheck` script. - CI: cache ~/.npm and ui/node_modules/.vite alongside Maven; npm ci with --prefer-offline --no-audit --fund=false; paths-ignore for docs-only, .planning/ and .claude/ changes so doc-only pushes skip the pipeline. - Docs: CLAUDE.md + .claude/rules/cicd.md updated with the new build knobs and the Testcontainers reuse opt-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:48:34 +02:00
hsiegeln	c6aef5ab35	fix(deploy): Checkpoints — preserve STOPPED history, fix filter + placement All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m4s Details CI / docker (push) Successful in 1m15s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details - Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment. STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty. Now only FAILED rows are pruned; STOPPED deployments are retained as restorable checkpoints (they still carry deployed_config_snapshot from their RUNNING window). - UI filter: any deployment with a snapshot is a checkpoint (was RUNNING\|DEGRADED only, which excluded the main case — the previous blue/green deployment now in STOPPED). - UI placement: Checkpoints disclosure now renders inside IdentitySection, matching the design spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:26:46 +02:00
hsiegeln	e9f523f2b8	test(deploy): blue-green + rolling strategy ITs Four ITs covering strategy behavior: - BlueGreenStrategyIT#blueGreen_allHealthy_stopsOldAfterNew: old is stopped only after all new replicas are healthy. - BlueGreenStrategyIT#blueGreen_partialHealthy_preservesOldAndMarksFailed: strict all-healthy — one starting replica aborts the deploy and leaves the previous deployment RUNNING untouched. - RollingStrategyIT#rolling_allHealthy_replacesOneByOne: InOrder on stopContainer confirms old-0 stops before old-1 (the interleaving that distinguishes rolling from blue-green). - RollingStrategyIT#rolling_failsMidRollout_preservesRemainingOld: mid-rollout health failure stops only the in-flight new containers and the already-replaced old-0; old-1 stays untouched. Shortens healthchecktimeout to 2s via @TestPropertySource so failure paths complete in ~25s instead of ~60s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:00:00 +02:00
hsiegeln	ffdaeabc9f	test(deploy): lock in FAILED→null snapshot for health-check-fail path Existing IT only exercises the startContainer-throws path, where the exception bypasses the entire try block. Add a test where startContainer succeeds but getContainerStatus never returns healthy — this covers the early-exit at the HEALTH_CHECK stage, which is the common real-world failure shape and closest to the snapshot-write point. Shortens healthchecktimeout to 2s via @TestPropertySource so the test completes in a few seconds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 00:37:37 +02:00
hsiegeln	d33c039a17	fix(deploy): address final review — sensitiveKeys snapshot, dirty scrubbing, transition race, refetch invalidations - Issue 1: add List<String> sensitiveKeys as 4th field to DeploymentConfigSnapshot; populate from agentConfig.getSensitiveKeys() in DeploymentExecutor; handleRestore hydrates from snap.sensitiveKeys directly; Deployment type in apps.ts gains sensitiveKeys field - Issue 2: after createApp succeeds, refetchQueries(['apps', envSlug]) before navigate so the new app is in cache before the router renders the deployed view (eliminates transient Save- disabled flash) - Issue 3: useDeploymentPageState useEffect now uses prevServerStateRef to detect local edits; background refetches only overwrite form when no local changes are present - Issue 5: handleRedeploy invalidates dirty-state + versions queries after createDeployment resolves; handleSave invalidates dirty-state after staged save - Issue 10: DirtyStateCalculator strips volatile agentConfig keys (version, updatedAt, updatedBy, environment, application) before JSON comparison via scrubAgentConfig(); adds versionBumpDoesNotMarkDirty test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 23:29:01 +02:00
hsiegeln	6591f2fde3	api(apps): GET /apps/{slug}/dirty-state returns desired-vs-deployed diff Wires DirtyStateCalculator behind an HTTP endpoint on AppController. Adds findLatestSuccessfulByAppAndEnv to PostgresDeploymentRepository, registers DirtyStateCalculator as a Spring bean (with ObjectMapper for JavaTimeModule support), and covers all three scenarios with IT. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:35:35 +02:00
hsiegeln	76352c0d6f	test(config): tighten audit assertions + @DirtiesContext on ApplicationConfigControllerIT - Add @DirtiesContext(AFTER_CLASS) so the SpyBean-forked context is torn down after the 6 tests finish, preventing permanent cache pollution - Replace single-row queryForObject with queryForList + hasSize(1) in both audit tests so spurious extra rows will fail explicitly - Assert auditCount == 0 in the 400 test to lock in the no-audit-on-bad-input invariant Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:18:44 +02:00
hsiegeln	e716dbf8ca	test(config): verify audit action in staged/live config IT Replace the misleading putConfig_staged_auditActionIsStagedAppConfig test (which only checked pushResult.total == 0, a duplicate of _savesButDoesNotPush) with two real audit-log assertions: one verifying "stage_app_config" is written for apply=staged and a new companion test verifying "update_app_config" for the live path. Uses jdbcTemplate to query audit_log directly (Option B). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:13:53 +02:00
hsiegeln	76129d407e	api(config): ?apply=staged\|live gates SSE push on PUT /apps/{slug}/config When apply=staged, saves to DB only — no CONFIG_UPDATE dispatched to agents. When apply=live (default, back-compat), preserves today's immediate-push behavior. Unknown apply values return 400. Audit action is stage_app_config vs update_app_config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 22:07:36 +02:00
hsiegeln	9b1240274d	test(deploy): assert containerConfig round-trip + strict RUNNING in snapshot IT Adds the missing containerConfig assertion to snapshot_isPopulated_whenDeploymentReachesRunning (runtimeType + appPort entries), and tightens the await predicate from .isIn(RUNNING, DEGRADED) to .isEqualTo(RUNNING) — the mock returns a healthy container so RUNNING is deterministic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:54:57 +02:00
hsiegeln	a79eafeaf4	runtime(deploy): capture config snapshot on RUNNING transition Injects PostgresApplicationConfigRepository into DeploymentExecutor and calls saveDeployedConfigSnapshot at the COMPLETE stage, before markRunning. Snapshot contains jarVersionId, agentConfig (nullable), and app.containerConfig. The FAILED catch path is left untouched so snapshot stays null on failure. Verified by DeploymentSnapshotIT. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:51:00 +02:00
hsiegeln	9b851c4622	test(deploy): autowire repository in snapshot IT (JavaTimeModule-safe) Replace manual `new PostgresDeploymentRepository(jdbcTemplate, new ObjectMapper())` with `@Autowired PostgresDeploymentRepository repository` to use the Spring-managed bean whose ObjectMapper has JavaTimeModule registered. Also removes the redundant isNotNull() assertion whose work is done by the field-level assertions that follow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:43:40 +02:00
hsiegeln	d3e86b9d77	storage(deploy): persist deployed_config_snapshot as JSONB Wire SELECT_COLS, mapRow deserialization, and saveDeployedConfigSnapshot update method. Adds PostgresDeploymentRepositoryIT with roundtrip, null-default, and clear-to-null tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:39:04 +02:00
hsiegeln	7f9cfc7f18	core(deploy): add deployedConfigSnapshot field to Deployment model Appends DeploymentConfigSnapshot deployedConfigSnapshot to the Deployment record and adds a matching withDeployedConfigSnapshot wither. All positional call sites (repository mapper, test fixture) updated to pass null; Task 1.4 will wire real persistence and Task 1.5 will populate the field on RUNNING transition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 21:31:48 +02:00
hsiegeln	c2eab71a31	env(admin): per-environment color field + V2 migration - V2__add_environment_color.sql adds a CHECK-constrained VARCHAR color column (default 'slate'); existing rows backfill to slate. - Environment record + EnvironmentColor constants (8 preset values) flow through repository, service, and admin API. - UpdateEnvironmentRequest.color nullable: null preserves existing; unknown values → 400. - ITs cover valid / invalid / null-preserves behaviour; existing Environment constructor call-sites updated with the new color arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 19:24:30 +02:00
hsiegeln	e470fc0dab	alerting(eval): clamp first-run cursor to deployBacklogCap — flood guard New property cameleer.server.alerting.perExchangeDeployBacklogCapSeconds (default 86400 = 24h, 0 disables). On first run (no persisted cursor or malformed), clamp cursorTs to max(rule.createdAt, now - cap) so a long-lived PER_EXCHANGE rule doesn't scan from its creation date forward on first post-deploy tick. Normal-advance path unaffected. Follows up final-review I-1 on the PER_EXCHANGE exactly-once phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:34:23 +02:00
hsiegeln	cfc619505a	alerting(it): AlertingFullLifecycleIT — exactly-once across ticks, ack isolation End-to-end lifecycle test: 5 FAILED exchanges across 2 ticks produces exactly 5 FIRING instances + 5 PENDING notifications. Tick 3 with no new exchanges produces zero new instances or notifications. Ack on one instance leaves the other four untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:07:45 +02:00
hsiegeln	0f6bafae8e	alerting(api): cross-field validation for PER_EXCHANGE + empty-targets guard PER_EXCHANGE rules: 400 if reNotifyMinutes != 0 or forDurationSeconds != 0. Any rule: 400 if webhooks + targets are both empty (never notifies anyone). Turns green: AlertRuleControllerIT#createPerExchangeRule_with*NonZero_returns400, AlertRuleControllerIT#createAnyRule_withEmptyWebhooksAndTargets_returns400.	2026-04-22 17:31:11 +02:00
hsiegeln	377968eb53	alerting(it): RED tests for PER_EXCHANGE cross-field validation + empty targets Three failing IT tests documenting the contract Task 3.3 will satisfy: - createPerExchangeRule_withReNotifyMinutesNonZero_returns400 - createPerExchangeRule_withForDurationSecondsNonZero_returns400 - createAnyRule_withEmptyWebhooksAndTargets_returns400	2026-04-22 17:17:47 +02:00
hsiegeln	e483e52eee	alerting(core): drop unused perExchangeLingerSeconds from ExchangeMatchCondition Dead field — was enforced by compact ctor as required for PER_EXCHANGE, but never read anywhere in the codebase. Removal tightens the API surface and is precondition for the Task 3.3 cross-field validator. Pre-prod; no shim / migration.	2026-04-22 17:10:53 +02:00
hsiegeln	ba4e2bb68f	alerting(eval): atomic per-rule batch commit via @Transactional — Phase 2 close Wraps instance writes, notification enqueues, and cursor advance in one transactional boundary per rule tick. Rollback leaves the rule replayable on next tick. Turns the Phase 2 atomicity IT green (see AlertEvaluatorJobIT #tickRollback_faultOnSecondNotificationInsert_leavesCursorUnchanged).	2026-04-22 17:03:07 +02:00
hsiegeln	989dde23eb	alerting(it): RED test pinning Phase 2 tick-atomicity contract Fault-injection IT asserts that a crash mid-batch rolls back every instance + notification write AND leaves the cursor unchanged. Fails against current (Phase 1 only) code — turns green when Task 2.2 wraps batch processing in @Transactional.	2026-04-22 16:51:09 +02:00
hsiegeln	3c3d90c45b	test(alerting): align AlertEvaluatorJobIT CH cleanup with house style Replace async @AfterEach ALTER...DELETE with @BeforeEach TRUNCATE TABLE executions — matches the convention used in ClickHouseExecutionStoreIT and peers. Env-slug isolation was already preventing cross-test pollution; this change is about hygiene and determinism (TRUNCATE is synchronous).	2026-04-22 16:45:28 +02:00
hsiegeln	5bd0e09df3	alerting(eval): persist advanced cursor via releaseClaim — Phase 1 close Fixes the notification-bleed regression pinned by AlertEvaluatorJobIT#tick2_noNewExchanges_enqueuesZeroAdditionalNotifications.	2026-04-22 16:36:01 +02:00
hsiegeln	4acf0aeeff	alerting(eval): PER_EXCHANGE composite cursor — monotone across same-ms exchanges Tests: - cursorMonotonicity_sameMillisecondExchanges_fireExactlyOncePerTick - firstRun_boundedByRuleCreatedAt_notRetentionHistory	2026-04-22 16:11:01 +02:00
hsiegeln	c2252a0e72	alerting(eval): RED tests for PER_EXCHANGE cursor monotonicity + first-run bound Two failing tests documenting the contract Task 1.5 will satisfy: - cursorMonotonicity_sameMillisecondExchanges_fireExactlyOncePerTick - firstRun_boundedByRuleCreatedAt_notRetentionHistory Compile may fail until Task 1.4 adds AlertRule.withEvalState wither.	2026-04-22 15:58:16 +02:00
hsiegeln	b41f34c090	search: SearchRequest.afterExecutionId — composite (startTime, execId) predicate Adds an optional afterExecutionId field to SearchRequest. When combined with a non-null timeFrom, ClickHouseSearchIndex applies a strictly-after tuple predicate (start_time > ts OR (start_time = ts AND execution_id > id)) so same-millisecond exchanges can be consumed exactly once across ticks. When afterExecutionId is null, timeFrom keeps its existing >= semantics — no behaviour change for any current caller. Also adds the SearchRequest.withCursor(ts, id) wither. Threads the field through existing withInstanceIds / withEnvironment witheres. All existing positional call-sites (SearchController, ExchangeMatchEvaluator, ClickHouseSearchIndexIT, ClickHouseChunkPipelineIT) pass null for the new slot. Task 1.2 of docs/superpowers/plans/2026-04-22-per-exchange-exactly-once.md. The evaluator-side wiring that actually supplies the cursor is Task 1.5.	2026-04-22 15:49:05 +02:00
hsiegeln	6fa8e3aa30	alerting(eval): EvalResult.Batch carries nextEvalState for cursor threading	2026-04-22 15:42:20 +02:00
hsiegeln	a694491140	fix(metrics): MetricsFlushScheduler honour ingestion config flush interval The @Scheduled placeholder read ${ingestion.flush-interval-ms:1000} (unprefixed) but IngestionConfig binds cameleer.server.ingestion.* — YAML tuning of the metrics flush interval was silently ignored and the scheduler fell back to the 1s default in every environment. Corrected to ${cameleer.server.ingestion.flush-interval-ms:1000}. (The initial attempt to bind via SpEL #{@ingestionConfig.flushIntervalMs} failed because beans registered via @EnableConfigurationProperties use a compound bean name "<prefix>-<FQN>", not the simple camelCase form. The property-placeholder path is sufficient — IngestionConfig still owns the Java-side default.) BackpressureIT: drops the obsolete workaround property `ingestion.flush-interval-ms=60000`; the single prefixed override now controls both buffer config and flush cadence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 23:28:00 +02:00

1 2 3

144 Commits