cameleer-server

Author	SHA1	Message	Date
hsiegeln	046f08fe87	feat(license): enforce max_jar_retention_count at PUT jar-retention Returns 422 UNPROCESSABLE_ENTITY when jarRetentionCount exceeds license cap. Default tier cap = 3. The other three retention caps (execution/log/metric retention days) are deferred to T26+ where the corresponding fields are added to Environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:16:04 +02:00
hsiegeln	56bddcc747	feat(license): enforce compute caps at DeploymentExecutor PRE_FLIGHT Adds ComputeUsage record + computeUsage() helper to LicenseUsageReader that aggregates from PG. DeploymentExecutor.executeAsync runs three assertWithinCap checks (max_total_cpu_millis, max_total_memory_mb, max_total_replicas) right after config resolution. The existing executor try/catch turns a LicenseCapExceededException into a FAILED deployment with the cap message in the failure reason. Adds ComputeCapEnforcementIT (HTTP-driven; @MockBean RuntimeOrchestrator, since cap rejection short-circuits before any orchestrator call) plus defensive license lifts in BlueGreenStrategyIT, RollingStrategyIT, DeploymentSnapshotIT, and DeploymentControllerAuditIT so sequential deploys under testcontainer reuse don't trip the new caps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 15:09:39 +02:00
hsiegeln	71f3b70b86	feat(license): enforce max_alert_rules at AlertRuleController.create Adds AlertRuleRepository.count() and a LicenseEnforcer.assertWithinCap call at the top of the POST handler. Default cap = 2; the 3rd rule gets the standard 403 envelope. Sibling alert ITs that legitimately need more than 2 rules get the cap lifted via the test-license helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:50:59 +02:00
hsiegeln	5a579415a1	feat(license): enforce max_outbound_connections at OutboundConnectionServiceImpl.create Adds LicenseEnforcer.assertWithinCap call at the top of create() using repo.listByTenant(tenantId).size() as the current count. Lifts the cap in OutboundConnectionAdminControllerIT (duplicateNameReturns409 needs 2 creates in one test). LicenseExceptionAdvice maps the rejection to the standard 403 envelope; cap_exceeded audit row emitted via the LicenseEnforcer 3-arg ctor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:40:12 +02:00
hsiegeln	1ff30905f7	feat(license): enforce max_users at user creation paths Wires LicenseEnforcer into UserAdminController.createUser and OidcAuthController auto-signup. Cap fires before any validation so over-cap creates short-circuit cheaply. Audit emission already present (LicenseEnforcer 3-arg ctor from T16 emits cap_exceeded under AuditCategory.LICENSE). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:29:54 +02:00
hsiegeln	afdaee628b	feat(license): enforce max_agents at AgentRegistryService.register Adds a CreateGuard to AgentRegistryService that fires only on NEW registrations: re-registers of an existing agent bypass the cap (they don't grow the registry, and rejecting them would orphan an agent that already counts against the cap). Live-only count for cap enforcement — STALE/DEAD/SHUTDOWN agents are excluded so the cap reflects the working fleet, not historical residue. Reuses the CreateGuard pattern from T18-T19. The global LicenseExceptionAdvice maps the resulting LicenseCapExceededException to 403 with the structured envelope — no AgentRegistrationController changes needed. AgentCapEnforcementIT exercises the HTTP path end-to-end: two registers succeed at cap=2, a third returns 403 with the expected envelope, and a re-register of an already-registered agent succeeds at-cap. Sibling agent-registering ITs (AgentControllerIT, DiagramIT, ExecutionIT, SearchIT, ProtocolIT, BackpressureIT, JwtRefreshIT, RegistrationIT, SecurityIT, SseSigningIT, IngestionSchemaIT) lift max_agents in @BeforeEach and clear the synthetic license in @AfterEach — the in-memory registry is shared across @SpringBootTest reuse boundaries, so without the lift the default-tier max_agents=5 would be exhausted by accumulated test residue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 14:19:08 +02:00
hsiegeln	80dafe685b	feat(license): enforce max_apps at AppService.createApp Adds CreateGuard hook to AppService.createApp using the same pattern as T18 (EnvironmentService). AppRepository.count() added; the bean wires LicenseEnforcer.assertWithinCap("max_apps", current, 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:36:34 +02:00
hsiegeln	198811b752	refactor(license-test): rename installTestLicenseWithCaps -> installSyntheticUnsignedLicense Makes the signature-bypass loud at every call site since T19-T25 will copy this pattern 5+ more times. The helper still loads via LicenseGate.load() directly (no signature check) — the new name ensures any future caller has to acknowledge that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:24:58 +02:00
hsiegeln	8a64a9e04c	feat(license): enforce max_environments at EnvironmentService.create Adds CreateGuard functional interface to core (preserves the no-Spring boundary between core and app) and wires LicenseEnforcer into the EnvironmentService bean in RuntimeBeanConfig so POST /api/v1/admin/environments rejects with the structured 403 envelope (error/limit/cap/state/message) once the cap is reached. Default tier max_environments=1; the V1 baseline seeds the default env, so the very next create through the API is rejected unless a license lifts the cap. Also adds EnvironmentRepository.count() (with PostgresEnvironmentRepository impl), TestSecurityHelper.installTestLicenseWithCaps(...) so existing ITs that POST envs keep working, and a defensive cleanup in LicenseUsageReaderIT/EnvironmentAdminControllerIT to stay order-independent under Testcontainer reuse (deletes deployments+apps before envs to avoid FK violations). Test: EnvironmentCapEnforcementIT (new) drives the rejection path end-to-end and asserts the 403 body shape produced by LicenseExceptionAdvice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:16:41 +02:00
hsiegeln	f291d7c24d	feat(license): LicenseUsageReader aggregates current usage One COUNT per entity table; one SUM-grouped query over non-stopped deployments for compute caps. SQL traverses deployed_config_snapshot->'containerConfig' (corrected from the plan's top-level path; the snapshot record nests containerConfig under that key). agentCount is fed in by the controller since it's an in-memory registry value, not a DB row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:47:59 +02:00
hsiegeln	9b9b56043c	fix(license): explicit @Autowired ctor + tolerate audit failures Two follow-ups to LicenseEnforcer review: - Add @Autowired to the 3-arg ctor so Spring picks it unambiguously (the 2-arg test ctor is otherwise an equally-greedy candidate). - Wrap audit.log() in try/catch + log.warn so a degraded audit DB cannot mask a cap rejection: callers still see HTTP 403 even when audit storage is unhealthy. - Extract counter name to private static final. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:43:27 +02:00
hsiegeln	4985348827	feat(license): LicenseEnforcer single entry point assertWithinCap consults LicenseGate.getEffectiveLimits, throws LicenseCapExceededException on overflow, increments cameleer_license_cap_rejections_total{limit=...} for telemetry, and emits an AuditCategory.LICENSE cap_exceeded audit row when an AuditService is wired (3-arg ctor; the test-only 2-arg ctor passes null and the audit call short-circuits). Unknown limit keys are programmer errors (IllegalArgumentException), not 403s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:36:58 +02:00
hsiegeln	2bad9c3e48	feat(license): cap-exceeded exception + state-aware message renderer LicenseCapExceededException + @ControllerAdvice mapping to 403 with a body that includes state, limit, current, cap, and a per-state human message templated by LicenseMessageRenderer (covers ABSENT/ACTIVE/ GRACE/EXPIRED/INVALID with day counts and reason). Adds the forState() overload now (used by the /usage endpoint in Task 30) so both surfaces share identical phrasing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:26:39 +02:00
hsiegeln	b95e80a24a	feat(license): wire LicenseService into boot order (env > file > DB) LicenseBootLoader @PostConstruct calls LicenseService.loadInitial, which delegates to install() so env-var/file/DB paths share a single audit + event-publish code path. A missing public key now produces an always-failing validator (constructed with a throwaway keypair so the parent ctor accepts it) so loaded tokens route to INVALID instead of being silently ignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:16:49 +02:00
hsiegeln	6fbcf10ee4	feat(license): LicenseService + LicenseChangedEvent Single mediation point for token install/replace/revalidate. Audits under AuditCategory.LICENSE, persists to PG, mutates the LicenseGate, and publishes LicenseChangedEvent so downstream listeners (RetentionPolicyApplier, LicenseMetrics) react uniformly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:11:48 +02:00
hsiegeln	2e51deb511	feat(license): PostgresLicenseRepository + LicenseRecord JdbcTemplate-backed repo; upsert is ON CONFLICT (tenant_id), touch updates only last_validated_at, delete is provided for future operator-clear flow (not exposed as REST in v1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:05:35 +02:00
hsiegeln	20aefd5bf6	feat(license): Flyway V5 — license table + environments retention columns Per-tenant license row stores the signed token, licenseId for audit, installed/expires/last_validated timestamps. environments gains three INTEGER NOT NULL DEFAULT 1 retention columns (execution, log, metric) so existing rows land inside the default-tier cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 11:02:44 +02:00
hsiegeln	0499a54ebc	feat(license): rewrite LicenseGate around state + effective limits LicenseGate now exposes getState() (delegates to LicenseStateMachine), getEffectiveLimits() (merged over DefaultTierLimits in ACTIVE/GRACE, defaults-only in ABSENT/EXPIRED/INVALID), markInvalid(reason), and clear(). Internal snapshot is an immutable record-like class swapped atomically so concurrent reads see a consistent license+reason pair. Removes the transient openSentinel() and getTier() introduced by earlier tasks (no production consumers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:48:56 +02:00
hsiegeln	cf84d80de7	feat(license): require licenseId + tenantId in validator Spec §2.1 — both fields are required and the validator rejects a token whose tenantId does not match the server's configured tenant (CAMELEER_SERVER_TENANT_ID). Self-hosted customers cannot strip tenantId because the field is in the signed payload. LicenseBeanConfig and LicenseAdminController updated to pass the expected tenant to the validator constructor. The transient placeholder/TODO from Task 2 is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:40:04 +02:00
hsiegeln	2ebe4989bb	feat(license): expand LicenseInfo with licenseId, tenantId, grace period Required fields per spec §2.1. tenantId is non-blank; gracePeriodDays defines the post-exp window during which limits keep applying. isExpired() now honours the grace; isAfterRawExpiry() distinguishes ACTIVE from GRACE for the state machine in Task 4. Validator and gate use placeholder values temporarily; Task 3 wires the validator to read the new fields, Task 5 rewrites the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:33:16 +02:00
hsiegeln	551a7f12b5	refactor(license): remove dead Feature enum and isEnabled scaffolding Spec §9 — feature flags are out of scope for license enforcement. Drops Feature.java, LicenseGate.isEnabled, LicenseInfo.hasFeature, and the corresponding test cases. LicenseValidator now silently ignores any features array on the wire (no error). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 10:21:51 +02:00
hsiegeln	f6b76b2d5e	docs(runtime): document hardening contract and runtime override (#152 ) Surfaces the multi-tenant container hardening contract introduced in the prior commit so operators and integrators know what is enforced and why. - application.yml: declare `cameleer.server.runtime.dockerruntime` alongside the other runtime properties (empty = auto-detect runsc). - HOWTO.md: add the override row to the Runtime config table. - SERVER-CAPABILITIES.md: new "Multi-Tenant Runtime Sandboxing" section describing the cap_drop, no-new-privileges, AppArmor, read-only rootfs, pids_limit, /tmp tmpfs, and runsc auto-detect contract — plus the on-disk state caveat that motivates issue #153. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 21:06:10 +02:00
hsiegeln	8e9ad47077	feat(runtime): harden tenant containers + auto-detect gVisor (#152 ) Tenant JARs are arbitrary user code: Camel ships components (camel-exec, camel-bean, MVEL/Groovy templating) that turn a header into shell, and Java 17 has no SecurityManager — the JVM is not a security boundary. This applies an unconditional hardening contract to every tenant container so a single runc CVE no longer equals host takeover. DockerRuntimeOrchestrator.startContainer now sets: - cap_drop ALL (Capability.values() — docker-java has no ALL constant) - security_opt: no-new-privileges, apparmor=docker-default (default seccomp profile applies implicitly) - read_only rootfs, pids_limit=512 - /tmp tmpfs rw,nosuid,size=256m — no noexec, since Netty/Snappy/LZ4/Zstd dlopen native libs from /tmp via mmap(PROT_EXEC) which noexec blocks The orchestrator also probes `docker info` at construction and uses runsc (gVisor) automatically when the daemon has it registered. Override via cameleer.server.runtime.dockerruntime (e.g. "kata"); empty = auto. Outbound TCP, DNS, and TLS are unaffected — caps/seccomp don't gate those — so vanilla Camel-Kafka producers/consumers and REST integrations keep working unchanged. Stateful tenants (Kafka Streams with on-disk state stores, apps writing to /var/log/...) need explicit writeable volumes; that's tracked in #153 as the natural follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 20:58:26 +02:00
hsiegeln	f27a0044f1	refactor(search): align ResponseStatusException imports + add wildcard HTTP test	2026-04-24 10:30:42 +02:00
hsiegeln	5c9323cfed	feat(search): accept attr= multi-value query param on /executions GET Add a repeatable attr query parameter to the GET /executions endpoint that parses key-only (exists check) and key:value (exact or wildcard-via-*) filters. Invalid keys are mapped to HTTP 400 via ResponseStatusException. The POST /executions/search path already honoured attributeFilters from the request body via the Jackson canonical ctor; an IT now proves it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:23:52 +02:00
hsiegeln	2dcbd5a772	feat(search): push AttributeFilter list into ClickHouse WHERE clause Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 10:13:30 +02:00
hsiegeln	b5ee9e1d1f	feat(ui): server metrics admin dashboard Adds /admin/server-metrics page mirroring the Database/ClickHouse visibility rules: sidebar entry gated on capabilities.infrastructureEndpoints, backend controller now has @ConditionalOnProperty(infrastructureendpoints) and class-level @PreAuthorize('hasRole(ADMIN)'). Dashboard panels are driven from docs/server-self-metrics.md via the generic /api/v1/admin/server-metrics/{catalog,instances,query} API — Server Health, JVM, HTTP & DB pools, and conditionally Alerting + Deployments when their metrics appear in the catalog. ThemedChart / Line / Area from the design system; hooks in ui/src/api/queries/admin/serverMetrics.ts. Not yet browser-verified against a running dev server — backend IT covers the API end-to-end (8 tests), UI typecheck + production bundle both clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 09:00:14 +02:00
hsiegeln	d58c8cde2e	feat(server): REST API over server_metrics for SaaS dashboards Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control planes can build the server-health dashboard without direct ClickHouse access. One generic /query endpoint covers every panel in the server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag, filter-by-tag, counter-delta mode with per-server_instance_id rotation handling, and a derived 'mean' statistic for timers. Regex-validated identifiers, parameterised literals, 31-day range cap, 500-series response cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated: all 17 suggested panels now expressed as single-endpoint queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:41:02 +02:00
hsiegeln	48ce75bf38	feat(server): persist server self-metrics into ClickHouse Snapshot the full Micrometer registry (cameleer business metrics, alerting metrics, and Spring Boot Actuator defaults) every 60s into a new server_metrics table so server health survives restarts without an external Prometheus. Includes a dashboard-builder reference for the SaaS team. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 23:20:45 +02:00
hsiegeln	f8e382c217	test(diagrams): add removed-route + point-in-time coverage Store-level: assert findLatestContentHashForAppRoute picks the newest hash across publishing instances (proves the lookup survives agent removal), isolates by (app, env), and returns empty for blank inputs. Controller-level: assert the env-scoped /routes/{routeId}/diagram endpoint resolves without a registry prerequisite, 404s for unknown routes, and that an execution's stored diagramContentHash stays pinned to the point-in-time version after a newer diagram is stored — the "latest" endpoint flips to v2, the by-hash render remains byte-stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:11:06 +02:00
hsiegeln	c7e5c7fa2d	refactor(diagrams): retire findContentHashForRouteByAgents All production callers migrated to findLatestContentHashForAppRoute in the preceding commits. The agent-scoped lookup adds no coverage beyond the latest-per-(app,env,route) resolver, so the dead API is removed along with its test coverage and unused imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:02:47 +02:00
hsiegeln	0995ab35c4	fix(catalog): preserve fromEndpointUri for removed routes Both catalog controllers resolved the from-endpoint URI via findContentHashForRouteByAgents, which filtered by the currently-live agent instance_ids. Routes removed between app versions therefore lost their fromUri even though the diagram row still exists. Route through findLatestContentHashForAppRoute so resolution depends only on (app, env, route) — stays populated for historical routes. CatalogController now resolves the per-row env slug up-front so the fromUri lookup works even for cross-env queries against managed apps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:01:19 +02:00
hsiegeln	480a53c80c	fix(diagrams): by-route lookup no longer requires live agents The env-scoped /routes/{routeId}/diagram endpoint filtered diagrams by the currently-live agent instance_ids. Routes removed between app versions have no live publisher, so the lookup returned 404 even though the historical diagram row still exists in route_diagrams. Sidebar entries for removed routes showed "no diagram" as a result. Switch to findLatestContentHashForAppRoute which resolves directly off (applicationId, environment, routeId) + created_at DESC, independent of the agent registry. The controller no longer depends on AgentRegistryService. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:59:43 +02:00
hsiegeln	d3ce5e861b	feat(diagrams): add findLatestContentHashForAppRoute with app-route cache Agent-scoped lookups miss diagrams from routes whose publishing agents have been redeployed or removed. The new method resolves by (applicationId, environment, routeId) + created_at DESC, independent of the agent registry. An in-memory cache mirrors the existing hashCache pattern, warm-loaded at startup via argMax. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:58:49 +02:00
hsiegeln	21db92ff00	fix(traefik): make TLS cert resolver configurable, omit when unset All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m15s Details CI / docker (push) Successful in 1m3s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 42s Details Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on every router. That assumes a resolver literally named `default` exists in the Traefik static config — true for ACME-backed installs, false for dev/local installs that use a file-based TLS store. Traefik logs "Router uses a nonexistent certificate resolver" for the bogus resolver on every managed app, and any future attempt to define a differently- named real resolver would silently skip these routers. Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver` into `ResolvedContainerConfig.certResolver`. When blank the `tls.certresolver` label is omitted entirely; `tls=true` is still emitted so Traefik serves the default TLS-store cert. When set, the label is emitted with the configured resolver name. Not per-app/per-env configurable: there is one Traefik per server instance and one resolver config; app-level override would only let users break their own routers. TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank). Full unit suite 211/0/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:18:47 +02:00
hsiegeln	165c9f10e3	feat(deploy): externalRouting toggle to keep apps off Traefik All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m26s Details CI / docker (push) Successful in 1m5s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details Adds a boolean `externalRouting` flag (default `true`) on ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only the identity labels (`managed-by`, `cameleer.`) and skips every `traefik.` label, so the container is not published by Traefik. Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}` can still reach it via Docker DNS on whatever port the app listens on. TDD: new TraefikLabelBuilderTest covers enabled (default labels present), disabled (zero traefik.* labels), and disabled (identity labels retained) cases. Full module unit suite: 208/0/0. Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form state, Resources tab toggle, POST payload, and snapshot-to-form mapping. Rule files updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:03:48 +02:00
hsiegeln	0cf64b2928	fix(audit): exclude env-scoped executions/search from safety-net log All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m24s Details CI / docker (push) Successful in 1m1s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 37s Details The exclusion list still named the legacy flat `/api/v1/search/executions` URL, which no longer exists — the endpoint moved to env-scoped `/api/v1/environments/{envSlug}/executions/search`. Exact-match Set lookup never matched, so every UI search POST produced an audit row. Switch to AntPathMatcher over a pattern list so the dynamic envSlug is handled correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:35:44 +02:00
hsiegeln	e36c82c4db	test(deploy): scope schema ITs to current_schema + clear deployments FK in teardown All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m59s Details CI / docker (push) Successful in 1m5s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 38s Details Surface from the Task 0 testcontainers.reuse enable: when the same Postgres container is reused across `mvn verify` runs, Flyway migrates both `public` and `tenant_default` schemas (the app.yml default URL uses ?currentSchema=tenant_default; AbstractPostgresIT overrides to public). Schema-introspection assertions saw duplicate rows/indexes/enums. Plus: OutboundConnectionAdminControllerIT's AfterEach couldn't delete its test users because sibling deployment ITs (Task 4) left deployments.created_by references — FK blocks the DELETE. Clear referencing deployments first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:06:56 +02:00
hsiegeln	ed0e616109	refactor(logs): drop dead null guards on instanceIds filter (record normalizes)	2026-04-23 12:52:18 +02:00
hsiegeln	382e1801a7	feat(logs): add instanceIds multi-value filter to /logs endpoint Adds List<String> instanceIds to LogSearchRequest (null-normalized to List.of() in compact ctor) and generates an IN clause in both ClickHouseLogStore.search() and countLogs(), mirroring the existing sources pattern. LogQueryController parses ?instanceIds= as a comma-split list. All existing LogSearchRequest call sites updated. New ClickHouseLogStoreInstanceIdsIT covers: multi-value filter, empty filter (all rows), null filter (all rows), single-value filter, and coexistence with the singular instanceId field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:41:09 +02:00
hsiegeln	2312a7304d	fix(deploy): widen promote FAILURE audit detail + clean up test envs	2026-04-23 12:29:46 +02:00
hsiegeln	47d5611462	feat(audit): audit deploy/stop/promote with DEPLOYMENT category Wires AuditService and AppVersionRepository into DeploymentController. Replaces null createdBy placeholder with currentUserId() on createDeployment/promote. Adds audit log entries (SUCCESS + FAILURE) for deploy_app, stop_deployment, and promote_deployment actions. Fixes FK violations in affected ITs by seeding the test-operator and alice users into the users table before deploy calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:24:27 +02:00
hsiegeln	9043dc00b0	test(deploy): clean up seeded users + document null createdBy placeholder Fix Issue 1: Add @AfterEach cleanup for alice/bob users in PostgresDeploymentRepositoryCreatedByIT to prevent test leakage (FK order: deployments -> app_versions -> apps, then users). Fix Issue 2: Add comment at first create(..., null) call site in PostgresDeploymentRepositoryIT documenting the null placeholder for pre-V4 rows where createdBy is nullable. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-23 12:10:21 +02:00
hsiegeln	a141e99a07	feat(deploy): cascade createdBy through Deployment record + service + repo Appends String createdBy to the Deployment record (after createdAt), updates both with-er methods to pass it through, threads the parameter through DeploymentRepository.create, DeploymentService.createDeployment/promote, and PostgresDeploymentRepository (INSERT + SELECT_COLS + mapRow). DeploymentController passes null as placeholder (Task 4 will resolve from SecurityContextHolder). Covers with PostgresDeploymentRepositoryCreatedByIT verifying round-trip via both createDeployment and promote. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 12:04:15 +02:00
hsiegeln	35748ea7a1	feat(deploy): V4 migration — add created_by to deployments	2026-04-23 11:44:05 +02:00
hsiegeln	242ef1f0af	perf(build): faster Maven + UI + CI pipelines All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m43s Details CI / docker (push) Successful in 4m13s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details - Maven: enable useIncrementalCompilation; Surefire forkCount=1C + reuseForks=true so unit-test JVMs are reused per CPU core instead of spawning per class (205 tests pass under the new strategy). - Testcontainers: opt-in reuse via .withReuse(true) on Postgres + ClickHouse base; per-developer enable via ~/.testcontainers.properties. - UI: drop redundant `tsc --noEmit` from `npm run build` (Vite already type-checks); split into a dedicated `npm run typecheck` script. - CI: cache ~/.npm and ui/node_modules/.vite alongside Maven; npm ci with --prefer-offline --no-audit --fund=false; paths-ignore for docs-only, .planning/ and .claude/ changes so doc-only pushes skip the pipeline. - Docs: CLAUDE.md + .claude/rules/cicd.md updated with the new build knobs and the Testcontainers reuse opt-in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:48:34 +02:00
hsiegeln	c6aef5ab35	fix(deploy): Checkpoints — preserve STOPPED history, fix filter + placement All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 2m4s Details CI / docker (push) Successful in 1m15s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 41s Details - Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment. STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty. Now only FAILED rows are pruned; STOPPED deployments are retained as restorable checkpoints (they still carry deployed_config_snapshot from their RUNNING window). - UI filter: any deployment with a snapshot is a checkpoint (was RUNNING\|DEGRADED only, which excluded the main case — the previous blue/green deployment now in STOPPED). - UI placement: Checkpoints disclosure now renders inside IdentitySection, matching the design spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:26:46 +02:00
hsiegeln	e9f523f2b8	test(deploy): blue-green + rolling strategy ITs Four ITs covering strategy behavior: - BlueGreenStrategyIT#blueGreen_allHealthy_stopsOldAfterNew: old is stopped only after all new replicas are healthy. - BlueGreenStrategyIT#blueGreen_partialHealthy_preservesOldAndMarksFailed: strict all-healthy — one starting replica aborts the deploy and leaves the previous deployment RUNNING untouched. - RollingStrategyIT#rolling_allHealthy_replacesOneByOne: InOrder on stopContainer confirms old-0 stops before old-1 (the interleaving that distinguishes rolling from blue-green). - RollingStrategyIT#rolling_failsMidRollout_preservesRemainingOld: mid-rollout health failure stops only the in-flight new containers and the already-replaced old-0; old-1 stays untouched. Shortens healthchecktimeout to 2s via @TestPropertySource so failure paths complete in ~25s instead of ~60s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:00:00 +02:00
hsiegeln	653f983a08	deploy: rolling strategy (per-replica replacement) Replace the Phase 3 stub with a working rolling implementation. Flow: - Capture previous deployment's per-index container ids up front. - For i = 0..replicas-1: - Start new[i] (gen-suffixed name, coexists with old[i]). - Wait for new[i] healthy (new waitForOneHealthy helper). - On success: stop old[i] if present, continue. - On failure: stop in-flight new[0..i], leave un-replaced old[i+1..N] running, mark FAILED. Already-replaced old replicas are not restored — rolling is not reversible; user redeploys to recover. - After the loop: sweep any leftover old replicas (when replica count shrank) and mark the old deployment STOPPED. Resource peak: replicas + 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:53:52 +02:00
hsiegeln	459cdfe427	deploy: blue-green strategy (start → health-all → stop old) Phase 3 of deployment-strategies plan. Refactor executeAsync to dispatch on DeploymentStrategy.fromWire(config.deploymentStrategy()). Blue-green (default): - Start all N new replicas (gen-suffixed names coexist with old). - Wait for ALL healthy (strict — partial-healthy = FAILED, preserves previous deployment untouched). - Only then find + stop the previous deployment. - Final status is always RUNNING; DEGRADED is now reserved for post-deploy replica crashes (set by DockerEventMonitor). Rolling: stub — throws UnsupportedOperationException for now, gets its real implementation in Phase 4. Refactor details: - Extract DeployCtx record to carry 13 per-deploy values around. - Extract startReplica(ctx, i, stateOut) — shared by both strategy paths. - Extract persistSnapshotAndMarkRunning(ctx, primaryCid) — shared finalizer. - Rename waitForAnyHealthy → waitForAllHealthy (the name was misleading; the method already waited for all, just returned partial on timeout). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 09:51:24 +02:00

1 2 3 4 5

223 Commits