cameleer-server

Author	SHA1	Message	Date
hsiegeln	09b49f096c	feat(alerting): per-severity breakdown on unread-count DTO Spec §13 calls for the notification bell to colour-code by highest unread severity (CRITICAL → error, WARNING → amber, INFO → muted). The old { count } DTO forced the UI to pick one static colour, so NotificationBell shipped with a TODO. Grow the contract instead: UnreadCountResponse = { total, bySeverity: { CRITICAL, WARNING, INFO } } Guarantees: - every severity is always present with a >=0 value (no undefined keys on the wire), so the UI can branch without defaults. - total = sum of bySeverity values — kept explicit on the wire for cheap top-line display, not recomputed client-side. Backend - AlertInstanceRepository: replaces countUnreadForUser(long) with countUnreadBySeverityForUser returning Map<AlertSeverity, Long>. One SQL round-trip per (env, user) — GROUP BY ai.severity over the same NOT EXISTS(alert_reads) filter. - UnreadCountResponse.from(Map) normalises and defensively copies; missing severities default to 0. - InAppInboxQuery.countUnread now returns the DTO, caches the full response (still 5s TTL) so severity breakdown gets the same hit-rate as the total did before. - AlertController just hands the DTO back. Breaking change — no backwards-compat shim: the `count` field is gone. UI and tests updated in the same commit; there are no other API consumers in the tree. Frontend - Regenerated openapi.json + schema.d.ts against a fresh build of the new backend. - NotificationBell branches badge colour on the highest unread severity (CRITICAL > WARNING > INFO) via new CSS variants. - Tests cover all four paths: zero, critical-present, warning-only, info-only. Tests: 7 unit tests + 12 ITs (incl. new grouping + empty-map) + 49 vitest (was 46; +3 severity-branch assertions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 18:15:56 +02:00
hsiegeln	424894a3e2	fix(alerting/I-1): retry endpoint resets attempts to 0 instead of incrementing AlertNotificationRepository gains resetForRetry(UUID, Instant) which sets attempts=0, status=PENDING, next_attempt_at=now, and clears claim/response fields. AlertNotificationController calls resetForRetry instead of scheduleRetry so a manual retry always starts from a clean slate. AlertNotificationControllerIT adds retryResetsAttemptsToZero to verify attempts==0 and status==PENDING after three prior markFailed calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-20 08:25:59 +02:00
hsiegeln	d74079da63	fix(alerting/B-2): implement re-notify cadence sweep and lastNotifiedAt tracking AlertInstanceRepository gains listFiringDueForReNotify(Instant) — only returns instances where last_notified_at IS NOT NULL and cadence has elapsed (IS NULL branch excluded: sweep only re-notifies, initial notify is the dispatcher's job). AlertEvaluatorJob.sweepReNotify() runs at the end of each tick, enqueues fresh notifications for eligible instances and stamps last_notified_at. NotificationDispatchJob stamps last_notified_at on the alert_instance when a notification is DELIVERED, providing the anchor timestamp for cadence checks. PostgresAlertInstanceRepositoryIT adds listFiringDueForReNotify test covering the three-rule eligibility matrix (never-notified, long-ago, recent). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-20 08:25:50 +02:00
hsiegeln	f1abca3a45	refactor(alerting): rename P95_LATENCY_MS → AVG_DURATION_MS to match what stats_1m_route exposes The evaluator mapped P95_LATENCY_MS to ExecutionStats.avgDurationMs because stats_1m_route has no p95 column. Exposing the old name implied p95 semantics operators did not get. Rename to AVG_DURATION_MS makes the contract honest. Updated RouteMetric enum (with javadoc), evaluator switch, and admin guide. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-20 07:36:43 +02:00
hsiegeln	bf178ba141	fix(alerting): populate AlertInstance.rule_snapshot so history survives rule delete - Add withRuleSnapshot(Map) wither to AlertInstance (same pattern as other withers) - Call snapshotRule(rule) + withRuleSnapshot in both applyResult (single-firing) and applyBatchFiring paths so every persisted instance carries a non-empty JSONB snapshot - Strip null values from the Jackson-serialized map before wrapping in the immutable snapshot so Map.copyOf in the compact ctor does not throw NPE on nullable rule fields - Add ruleSnapshotIsPersistedOnInstanceCreation IT: asserts name/severity/conditionKind appear in the rule_snapshot column after a tick fires an instance - Add historySurvivesRuleDelete IT: fires an instance, deletes the rule, asserts rule_id IS NULL and rule_snapshot still contains the rule name (spec §5 guarantee) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 20:09:28 +02:00
hsiegeln	657dc2d407	feat(alerting): AlertingProperties + AlertStateTransitions state machine - AlertingProperties @ConfigurationProperties with effective*() accessors and 5000 ms floor clamp on evaluatorTickIntervalMs; warn logged at startup - AlertStateTransitions pure static state machine: Clear/Firing/Batch/Error branches, PENDING→FIRING promotion on forDuration elapsed; Batch delegated to job - AlertInstance wither helpers: withState, withFiredAt, withResolvedAt, withAck, withSilenced, withTitleMessage, withLastNotifiedAt, withContext - AlertingBeanConfig gains @EnableConfigurationProperties(AlertingProperties), alertingInstanceId bean (hostname:pid), alertingClock bean, PerKindCircuitBreaker bean wired from props - 12 unit tests in AlertStateTransitionsTest covering all transitions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:58:12 +02:00
hsiegeln	c53f642838	chore(alerting): add jmustache 1.16 Declared in cameleer-server-core pom (canonical location for unit-testable rendering without Spring) and mirrored in cameleer-server-app pom so the app module compiles standalone without a full reactor install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:26:57 +02:00
hsiegeln	7b79d3aa64	feat(alerting): countExecutionsForAlerting for exchange-match evaluator Adds AlertMatchSpec record (core) and ClickHouseSearchIndex.countExecutionsForAlerting — no FINAL, no text subqueries. Filters by tenant, env, app, route, status, time window, and optional after-cursor. Attributes (JSON string column) use inlined JSONExtractString key literals since ClickHouse JDBC does not bind ? placeholders inside JSON functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:18:49 +02:00
hsiegeln	45028de1db	feat(alerting): Postgres repository for alert_instances with inbox queries Implements AlertInstanceRepository: save (upsert), findById, findOpenForRule, listForInbox (3-way OR: user/group/role via && array-overlap + ANY), countUnreadForUser (LEFT JOIN alert_reads), ack, resolve, markSilenced, deleteResolvedBefore. Integration test covers all 9 scenarios including inbox fan-out across all three target types. Also adds @JsonIgnoreProperties(ignoreUnknown=true) to SilenceMatcher to suppress Jackson serializing isWildcard() as a round-trip field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:04:51 +02:00
hsiegeln	1ff256dce0	feat(alerting): core repository interfaces	2026-04-19 18:43:36 +02:00
hsiegeln	e7a9042677	feat(alerting): core domain records (rule, instance, silence, notification)	2026-04-19 18:43:03 +02:00
hsiegeln	56a7b6de7d	feat(alerting): sealed AlertCondition hierarchy with Jackson deduction	2026-04-19 18:42:04 +02:00
hsiegeln	530bc32040	feat(alerting): core enums + AlertScope	2026-04-19 18:36:29 +02:00
hsiegeln	5103dc91be	feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories	2026-04-19 18:34:08 +02:00
hsiegeln	ea4c56e7f6	feat(outbound): admin CRUD REST + RBAC + audit New audit categories: OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE. Controller-level @PreAuthorize defaults to ADMIN; GETs relaxed to ADMIN\|OPERATOR. SecurityConfig permits OPERATOR GETs on /api/v1/admin/outbound-connections/**. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:43:48 +02:00
hsiegeln	380ccb102b	fix(outbound): align user FK with users(user_id) TEXT schema V11 migration referenced users(id) as uuid, but V1 users table has user_id as TEXT primary key. Amending V11 and the OutboundConnection record before Task 7's integration tests catch this at Flyway startup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:18:12 +02:00
hsiegeln	46b8f63fd1	feat(outbound): core domain records for outbound connections Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:10:17 +02:00
hsiegeln	2224f7d902	feat(http): core outbound HTTP interfaces and property records	2026-04-19 15:39:57 +02:00
hsiegeln	6d3956935d	refactor(events): remove dead non-paginated query path AgentEventService.queryEvents, AgentEventRepository.query, and the ClickHouse implementation have had no callers since /agents/events became cursor-paginated. Remove them along with their dedicated IT tests. queryPage and its tests remain as the single query path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 13:16:28 +02:00
hsiegeln	67a834153e	feat(events): add AgentEventPage + queryPage interface Introduces cursor-paginated query on AgentEventRepository. The cursor format is owned by the implementation. The existing non-paginated query(...) is kept for internal consumers.	2026-04-17 11:52:42 +02:00
hsiegeln	769752a327	feat(logs): widen source filter to multi-value OR list Replaces LogSearchRequest.source (String) with sources (List<String>) and emits 'source IN (...)' when non-empty. LogQueryController parses ?source=a,b,c the same way it parses ?level=a,b,c.	2026-04-17 11:48:10 +02:00
hsiegeln	62dd71b860	fix: stamp environment on agent_events rows All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m28s Details CI / docker (push) Successful in 1m13s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 43s Details The agent_events table has an `environment` column and AgentEventsController filters on it, but the INSERT never populated it — every row got the column default ('default'). Result: Timeline on the Application Runtime page was empty whenever the user's selected env was anything other than 'default'. Thread env through the write path: - AgentEventRepository.insert + AgentEventService.recordEvent gain an `environment` param; delete the no-env query overload (unused). - ClickHouseAgentEventRepository.insert writes the column (falls back to 'default' on null to match column DEFAULT). - All 5 callers source env from the agent registry (AgentInfo.environmentId) or the registration request body; AgentLifecycleMonitor, deregister, command ack, event ingestion, register/re-register. - Integration test updated for the new signatures. Pre-existing rows in deployed CH will still report environment='default'. New events from this build forward will carry the correct env. Backfill (UPDATE ... FROM apps) is left as a manual DB step if historical timeline is needed for non-default envs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 10:30:56 +02:00
hsiegeln	d02fa73080	fix: scope correlation-chain query to the exchange's own env Correlated exchanges always share the env of the one being viewed — using the globally-selected env from the picker was wrong if the user switched envs after opening a detail view (or arrived via permalink). Thread `environment` through: - `ExecutionStore.ExecutionRecord` gains `environment` field; the ClickHouse `executions` table already stores this, just not read back. - `ClickHouseExecutionStore.findById` SELECT adds the column; mapper populates it. - `ExecutionDetail` gains `environment`; `DetailService` passes through. - `IngestionService.toExecutionRecord` passes null — this legacy PG ingestion path isn't active when ClickHouse is enabled, and the read-side is what drives the correlation UI. - UI `ExchangeHeader` reads `detail.environment ?? storeEnv` and extends the TS type locally (schema.d.ts catches up on next regen). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 10:19:42 +02:00
hsiegeln	6d9e456b97	feat!: move apps & deployments under /api/v1/environments/{envSlug}/apps/{appSlug}/... P3B of the taxonomy migration. App and deployment routes are now env-scoped in the URL itself, making the (env, app_slug) uniqueness key explicit. Previously /api/v1/apps/{appSlug} was ambiguous: with the same app deployed to multiple environments (dev/staging/prod), the handler called AppService.getBySlug(slug) which returns the first row matching slug regardless of env. Server: - AppController: @RequestMapping("/api/v1/environments/{envSlug}/ apps"). Every handler now calls appService.getByEnvironmentAndSlug(env.id(), appSlug) — 404 if the app doesn't exist in this env. CreateAppRequest body drops environmentId (it's in the path). - DeploymentController: @RequestMapping("/api/v1/environments/ {envSlug}/apps/{appSlug}/deployments"). DeployRequest body drops environmentId. PromoteRequest body switches from targetEnvironmentId (UUID) to targetEnvironment (slug); promote handler resolves the target env by slug and looks up the app with the same slug in the target env (fails with 404 if the target app doesn't exist yet — apps must exist in both source and target before promote). - AppService: added getByEnvironmentAndSlug helper; createApp now validates slug against ^[a-z0-9][a-z0-9-]{0,63}$ (400 on invalid). SPA: - queries/admin/apps.ts: rewritten. Hooks take envSlug where env-scoped. Removed useAllApps (no flat endpoint). Renamed path param naming: appId → appSlug throughout. Added usePromoteDeployment. Query keys include envSlug so cache is env-scoped. - AppsTab.tsx: call sites updated. When no environment is selected, the managed-app list is empty — cross-env discovery lives in the Runtime tab (catalog). handleDeploy/handleStop/etc. pass envSlug to the new hook signatures. BREAKING CHANGE: /api/v1/apps/ paths removed. Clients must use /api/v1/environments/{envSlug}/apps/{appSlug}/. Request bodies for POST /apps and POST /apps/{slug}/deployments no longer accept environmentId (use the URL path instead). Promote body uses slug not UUID. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 23:38:37 +02:00
hsiegeln	6b5ee10944	feat!: environment admin URLs use slug; validate and immutabilize slug UUID-based admin paths were the only remaining UUID-in-URL pattern in the API. Migrates /api/v1/admin/environments/{id} to /{envSlug} so slugs are the single environment identifier in every URL. UUIDs stay internal to the database. - Controller: @PathVariable UUID id → @PathVariable String envSlug on get/update/delete and the two nested endpoints (default-container- config, jar-retention). Handlers resolve slug → Environment via EnvironmentService.getBySlug, then delegate to existing UUID-based service methods. - Service: create() now validates slug against ^[a-z0-9][a-z0-9-]{0,63}$ and returns 400 on invalid slugs. Rationale documented in the class: slugs are immutable after creation because they appear in URLs, Docker network names, container names, and ClickHouse partition keys. - UpdateEnvironmentRequest has no slug field and Jackson's default ignore-unknown behavior drops any slug supplied in a PUT body; regression test (updateEnvironment_withSlugInBody_ignoresSlug) documents this invariant. - SPA: mutation args change from { id } to { slug }. EnvironmentsPage still uses env.id for local selection state (UUID from DB) but passes env.slug to every mutation. BREAKING CHANGE: /api/v1/admin/environments/{id:UUID}/... paths removed. Clients must use /{envSlug}/... (slug from the environments list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 23:23:31 +02:00
hsiegeln	fcb53dd010	fix!: require environment on diagram lookup and attribute keys queries Closes two cross-env data leakage paths. Both endpoints previously returned data aggregated across all environments, so a diagram or attribute key from dev could appear in a prod UI query (and vice versa). B1: GET /api/v1/diagrams?application=&routeId= now requires ?environment= and resolves agents via registryService.findByApplicationAndEnvironment instead of findByApplication. Prevents serving a dev diagram for a prod route. B2: GET /api/v1/search/attributes/keys now requires ?environment=. SearchIndex.distinctAttributeKeys gains an environment parameter and the ClickHouse query adds the env filter alongside the existing tenant_id filter. Prevents prod attribute names leaking into dev autocompletion (and vice versa). SPA hooks updated to thread environment through from useEnvironmentStore; query keys include environment so React Query re-fetches on env switch. No call-site changes needed — hook signatures unchanged. B3 (AgentMetricsController env scope) deferred to P3C: agent-env is effectively 1:1 today via the instance_id naming ({envSlug}-{appSlug}-{replicaIndex}), and the URL migration in P3C to /api/v1/environments/{env}/agents/{agentId}/metrics naturally introduces env from path. A minimal P1 fix would regress the "view metrics of a killed agent" case. BREAKING CHANGE: Both endpoints now require ?environment= (slug). Clients omitting the parameter receive 400. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 23:19:55 +02:00
hsiegeln	9b1ef51d77	feat!: scope per-app config and settings by environment All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m27s Details CI / docker (push) Successful in 1m10s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 1m40s Details SonarQube / sonarqube (push) Successful in 4m29s Details BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes. Agents must now send environmentId on registration (400 if missing). Two tables previously keyed on app name alone caused cross-environment data bleed: writing config for (app=X, env=dev) would overwrite the row used by (app=X, env=prod) agents, and agent startup fetches ignored env entirely. - V1 schema: application_config and app_settings are now PK (app, env). - Repositories: env-keyed finders/saves; env is the authoritative column, stamped on the stored JSON so the row agrees with itself. - ApplicationConfigController.getConfig is dual-mode — AGENT role uses JWT env claim (agents cannot spoof env); non-agent callers provide env via ?environment= query param. - AppSettingsController endpoints now require ?environment=. - SensitiveKeysAdminController fan-out iterates (app, env) slices so each env gets its own merged keys. - DiagramController ingestion stamps env on TaggedDiagram; ClickHouse route_diagrams INSERT + findProcessorRouteMapping are env-scoped. - AgentRegistrationController: environmentId is required on register; removed all "default" fallbacks from register/refresh/heartbeat auto-heal. - UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings, useAllAppSettings, useUpdateAppSettings) take env, wired to useEnvironmentStore at all call sites. - New ConfigEnvIsolationIT covers env-isolation for both repositories. Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 22:25:21 +02:00
hsiegeln	e2d9428dff	fix: drop stale instance_id filter from search and scope route stats by app All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m28s Details CI / docker (push) Successful in 1m11s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 42s Details The exchange search silently filtered by the in-memory agent registry's current instance IDs on top of application_id. Historical exchanges written by previous agent instances (or any instance not currently registered, e.g. after a server restart before agents heartbeat back) were hidden from results even though they matched the application filter. Fix: drop the applicationId -> instanceIds resolution in SearchController. Rely on application_id = ? in ClickHouseSearchIndex; keep explicit instanceIds filtering only when a client passes them. Related cleanup: the agentIds parameter on StatsStore.statsForRoute / timeseriesForRoute was silently discarded inside ClickHouseStatsStore, so per-route stats aggregated across any apps sharing a routeId. Replace with String applicationId and add application_id to the stats_1m_route filters so per-route stats are correctly scoped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 19:49:55 +02:00
hsiegeln	4ee43bab95	fix: include headers and properties in has_trace_data detection All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m28s Details CI / docker (push) Successful in 1m10s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 38s Details IngestionService.hasAnyTraceData() and ChunkAccumulator only checked for inputBody/outputBody when setting has_trace_data on executions. Headers and properties captured via deep tracing were not considered, causing the trace data indicator to be missing in the exchange list. DetailService already checked all six fields — this aligns the ingestion path to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 19:26:15 +02:00
hsiegeln	887a9b6faa	feat: add RouteCatalogStore interface and RouteCatalogEntry record	2026-04-16 18:45:42 +02:00
hsiegeln	78396a2796	fix: sidebar route selection and missing routes after server restart All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m20s Details CI / docker (push) Successful in 1m10s Details CI / deploy-feature (push) Has been skipped Details CI / deploy (push) Successful in 38s Details Two sidebar bugs fixed: 1. Route entries never highlighted on navigation because sidebar-utils generated /apps/ paths for route children while effectiveSelectedPath normalizes to /exchanges/. The design system does exact string matching. 2. Routes disappeared from sidebar when agents had no recent exchange data. Heartbeat carried routeStates (with route IDs as keys) but AgentRegistryService.heartbeat() never updated AgentInfo.routeIds. After server restart, auto-heal registered agents with empty routes, leaving ClickHouse (24h window) as the only discovery source. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:42:01 +02:00
hsiegeln	859cf7c10d	fix: support pre-3.2 Spring Boot JARs in runtime entrypoint All checks were successful CI / cleanup-branch (push) Has been skipped Details CI / build (push) Successful in 1m39s Details CI / docker (push) Successful in 1m38s Details CI / deploy (push) Successful in 47s Details CI / deploy-feature (push) Has been skipped Details RuntimeDetector now derives the correct PropertiesLauncher FQN from the JAR manifest Main-Class package. Spring Boot 3.2+ uses org.springframework.boot.loader.launch.PropertiesLauncher, pre-3.2 uses org.springframework.boot.loader.PropertiesLauncher. DockerRuntimeOrchestrator uses the detected class instead of a hardcoded 3.2+ reference, falling back to 3.2+ when not auto-detected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:21:01 +02:00
hsiegeln	cb3ebfea7c	chore: rename cameleer3 to cameleer Some checks failed CI / cleanup-branch (push) Has been skipped Details CI / build (push) Failing after 18s Details CI / docker (push) Has been skipped Details CI / deploy (push) Has been skipped Details CI / deploy-feature (push) Has been skipped Details Rename Java packages from com.cameleer3 to com.cameleer, module directories from cameleer3-* to cameleer-*, and all references throughout workflows, Dockerfiles, docs, migrations, and pom.xml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 15:28:42 +02:00

33 Commits