Commit Graph

100 Commits

Author SHA1 Message Date
hsiegeln
841793d7b9 feat(alerting): AlertController in-app inbox with ack/read/bulk-read (Task 33)
- GET /environments/{envSlug}/alerts — inbox filtered by userId/groupIds/roleNames via InAppInboxQuery
- GET /unread-count — memoized unread count (5s TTL)
- GET /{id}, POST /{id}/ack, POST /{id}/read, POST /bulk-read
- bulkRead filters instanceIds to env before delegating to AlertReadRepository
- VIEWER+ for all endpoints; env isolation enforced by requireInstance
- 7 IT tests: list, env isolation, unread-count, ack flow, read, bulk-read, viewer access

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 21:28:55 +02:00
hsiegeln
c1b34f592b feat(alerting): AlertRuleController with attribute-key SQL injection validation (Task 32)
- POST/GET/PUT/DELETE /environments/{envSlug}/alerts/rules CRUD
- POST /{id}/enable, /{id}/disable, /{id}/render-preview, /{id}/test-evaluate
- Attribute-key validation: rejects keys not matching ^[a-zA-Z0-9._-]+$ at rule-save time
  (CRITICAL: ExchangeMatchCondition attribute keys are inlined into ClickHouse SQL)
- Webhook validation: verifies outboundConnectionId exists and is allowed in env
- Null-safe notification template defaults to "" for NOT NULL DB constraint
- Fixed misleading comment in ClickHouseSearchIndex to document validation contract
- OPERATOR+ for mutations, VIEWER+ for reads
- Audit: ALERT_RULE_CREATE/UPDATE/DELETE/ENABLE/DISABLE with AuditCategory.ALERT_RULE_CHANGE
- 11 IT tests covering RBAC, SQL-injection prevention, enable/disable, audit, render-preview

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 21:28:46 +02:00
hsiegeln
d3dd8882bd feat(alerting): InAppInboxQuery with 5s unread-count memoization
listInbox resolves user groups+roles via RbacService.getEffectiveGroupsForUser
/ getEffectiveRolesForUser then delegates to AlertInstanceRepository.
countUnread memoized per (envId, userId) with 5s TTL via ConcurrentHashMap
using a controllable Clock. 6 unit tests covering delegation, cache hit,
TTL expiry, and isolation between users/envs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 20:25:00 +02:00
hsiegeln
6b48bc63bf feat(alerting): NotificationDispatchJob outbox loop with silence + retry
Claim-polling SchedulingConfigurer: claims due notifications, resolves
instance/connection/rule, checks active silences, dispatches via
WebhookDispatcher, classifies outcomes into DELIVERED/FAILED/retry.
Guards null rule/env after deletion. 5 Testcontainers ITs: 200/503/404
outcomes, active silence suppression, deleted connection fast-fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 20:24:54 +02:00
hsiegeln
466aceb920 feat(alerting): WebhookDispatcher with HMAC + TLS + retry classification
Renders URL/headers/body with Mustache, optionally HMAC-signs the body
(X-Cameleer-Signature), supports POST/PUT/PATCH, classifies 2xx/4xx/5xx
into DELIVERED/FAILED/retry. 8 WireMock-backed IT tests including HTTPS
TRUST_ALL against WireMock self-signed cert.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 20:24:47 +02:00
hsiegeln
6f1feaa4b0 feat(alerting): HmacSigner for webhook signature
HmacSHA256 signer returning sha256=<lowercase-hex>. 5 unit tests covering
known vector, prefix, hex casing, and different secrets/bodies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 20:24:39 +02:00
hsiegeln
bf178ba141 fix(alerting): populate AlertInstance.rule_snapshot so history survives rule delete
- Add withRuleSnapshot(Map) wither to AlertInstance (same pattern as other withers)
- Call snapshotRule(rule) + withRuleSnapshot in both applyResult (single-firing) and
  applyBatchFiring paths so every persisted instance carries a non-empty JSONB snapshot
- Strip null values from the Jackson-serialized map before wrapping in the immutable
  snapshot so Map.copyOf in the compact ctor does not throw NPE on nullable rule fields
- Add ruleSnapshotIsPersistedOnInstanceCreation IT: asserts name/severity/conditionKind
  appear in the rule_snapshot column after a tick fires an instance
- Add historySurvivesRuleDelete IT: fires an instance, deletes the rule, asserts
  rule_id IS NULL and rule_snapshot still contains the rule name (spec §5 guarantee)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 20:09:28 +02:00
hsiegeln
15c0a8273c feat(alerting): AlertEvaluatorJob with claim-polling + circuit breaker
- AlertEvaluatorJob implements SchedulingConfigurer; fixed-delay tick from
  AlertingProperties.effectiveEvaluatorTickIntervalMs (5 s floor)
- Claim-polling via AlertRuleRepository.claimDueRules (FOR UPDATE SKIP LOCKED)
- Per-kind circuit breaker guards each evaluator; failures recorded, open kinds
  skipped and rescheduled without evaluation
- Single-Firing path delegates to AlertStateTransitions; new FIRING instances
  enqueue AlertNotification rows per rule.webhooks()
- Batch (PER_EXCHANGE) path creates one FIRING AlertInstance per Firing entry
- PENDING→FIRING promotion handled in applyResult via state machine
- Title/message rendered via MustacheRenderer + NotificationContextBuilder;
  environment resolved from EnvironmentRepository.findById per tick
- AlertEvaluatorJobIT (4 tests): uses named @MockBean replacements for
  ClickHouseSearchIndex + ClickHouseLogStore; @MockBean AgentRegistryService
  drives Clear/Firing/resolve cycle without timing sensitivity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:58:27 +02:00
hsiegeln
657dc2d407 feat(alerting): AlertingProperties + AlertStateTransitions state machine
- AlertingProperties @ConfigurationProperties with effective*() accessors and
  5000 ms floor clamp on evaluatorTickIntervalMs; warn logged at startup
- AlertStateTransitions pure static state machine: Clear/Firing/Batch/Error
  branches, PENDING→FIRING promotion on forDuration elapsed; Batch delegated
  to job
- AlertInstance wither helpers: withState, withFiredAt, withResolvedAt, withAck,
  withSilenced, withTitleMessage, withLastNotifiedAt, withContext
- AlertingBeanConfig gains @EnableConfigurationProperties(AlertingProperties),
  alertingInstanceId bean (hostname:pid), alertingClock bean,
  PerKindCircuitBreaker bean wired from props
- 12 unit tests in AlertStateTransitionsTest covering all transitions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:58:12 +02:00
hsiegeln
f8cd3f3ee4 feat(alerting): EXCHANGE_MATCH evaluator with per-exchange + count modes
PER_EXCHANGE returns EvalResult.Batch(List<Firing>); last Firing carries
_nextCursor (Instant) in its context map for the job to persist as
evalState.lastExchangeTs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:40:54 +02:00
hsiegeln
89db8bd1c5 feat(alerting): JVM_METRIC evaluator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:38:48 +02:00
hsiegeln
17d2be5638 feat(alerting): LOG_PATTERN evaluator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:37:33 +02:00
hsiegeln
07d0386bf2 feat(alerting): ROUTE_METRIC evaluator
P95_LATENCY_MS maps to avgDurationMs (ExecutionStats has no p95 bucket).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:36:22 +02:00
hsiegeln
983b698266 feat(alerting): DEPLOYMENT_STATE evaluator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:34:47 +02:00
hsiegeln
e84338fc9a feat(alerting): AGENT_STATE evaluator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:33:13 +02:00
hsiegeln
55f4cab948 feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:32:06 +02:00
hsiegeln
891c7f87e3 feat(alerting): silence matcher for notification-time dispatch
SilenceMatcherService.matches() evaluates AND semantics across ruleId,
severity, appSlug, routeId, agentId constraints. Null fields are wildcards.
Scope-based constraints (appSlug/routeId/agentId) return false when rule is
null (deleted rule — scope cannot be verified). 17 unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:27:18 +02:00
hsiegeln
1c74ab8541 feat(alerting): NotificationContextBuilder for template context maps
Builds the Mustache context map from AlertRule + AlertInstance + Environment.
Always emits env/rule/alert subtrees; conditionally emits kind-specific
subtrees (agent, app, route, exchange, log, metric, deployment) based on
rule.conditionKind(). Missing instance.context() keys resolve to empty
string. alert.link prefixed with uiOrigin when non-null. 11 unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:27:12 +02:00
hsiegeln
92a74e7b8d feat(alerting): MustacheRenderer with literal fallback on missing vars
Sentinel-substitution approach: unresolved {{x.y.z}} tokens are replaced
with a unique NUL-delimited sentinel before Mustache compilation, rendered
as opaque text, then post-replaced with the original {{x.y.z}} literal.
Malformed templates (unclosed {{) are caught and return the raw template.
Never throws. 9 unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:27:05 +02:00
hsiegeln
7c0e94a425 feat(alerting): ClickHouse projections for alerting read paths
Adds alerting_projections.sql with four projections (alerting_app_status,
alerting_route_status on executions; alerting_app_level on logs;
alerting_instance_metric on agent_metrics). ClickHouseSchemaInitializer now
runs both init.sql and alerting_projections.sql, with ADD PROJECTION and
MATERIALIZE treated as non-fatal — executions (ReplacingMergeTree) requires
deduplicate_merge_projection_mode=rebuild which is unavailable via JDBC pool.
MergeTree projections (logs, agent_metrics) always succeed and are asserted in IT.

Column names confirmed from init.sql: logs uses 'application' (not application_id),
agent_metrics uses 'collected_at' (not timestamp). All column names match the plan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:58 +02:00
hsiegeln
7b79d3aa64 feat(alerting): countExecutionsForAlerting for exchange-match evaluator
Adds AlertMatchSpec record (core) and ClickHouseSearchIndex.countExecutionsForAlerting —
no FINAL, no text subqueries. Filters by tenant, env, app, route, status, time window,
and optional after-cursor. Attributes (JSON string column) use inlined JSONExtractString
key literals since ClickHouse JDBC does not bind ? placeholders inside JSON functions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:49 +02:00
hsiegeln
44e91ccdb5 feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator
Adds countLogs(LogSearchRequest) — no FINAL, no cursor/sort/limit —
reusing the same WHERE-clause logic as search() for tenant, env, app,
level, q, logger, source, exchangeId, and time-range filters.
Also extends ClickHouseTestHelper with executeInitSqlWithProjections()
and makes the script runner non-fatal for ADD/MATERIALIZE PROJECTION.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:41 +02:00
hsiegeln
f829929b07 feat(alerting): Postgres repositories for silences, notifications, reads
PostgresAlertSilenceRepository: save/findById roundtrip, listActive (BETWEEN
starts_at AND ends_at), listByEnvironment, delete. JSONB SilenceMatcher via ObjectMapper.

PostgresAlertNotificationRepository: save/findById, listForInstance,
claimDueNotifications (UPDATE...RETURNING with FOR UPDATE SKIP LOCKED),
markDelivered, scheduleRetry (bumps attempts + next_attempt_at), markFailed,
deleteSettledBefore (DELIVERED+FAILED rows older than cutoff). JSONB payload.

PostgresAlertReadRepository: markRead (ON CONFLICT DO NOTHING idempotent),
bulkMarkRead (iterates, handles empty list without error).

16 IT scenarios across 3 classes, all passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:05:01 +02:00
hsiegeln
45028de1db feat(alerting): Postgres repository for alert_instances with inbox queries
Implements AlertInstanceRepository: save (upsert), findById, findOpenForRule,
listForInbox (3-way OR: user/group/role via && array-overlap + ANY), countUnreadForUser
(LEFT JOIN alert_reads), ack, resolve, markSilenced, deleteResolvedBefore.
Integration test covers all 9 scenarios including inbox fan-out across all
three target types. Also adds @JsonIgnoreProperties(ignoreUnknown=true) to
SilenceMatcher to suppress Jackson serializing isWildcard() as a round-trip field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:04:51 +02:00
hsiegeln
930ac20d11 fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)
Replaces the Plan 01 stub that returned [] with a real call through
AlertRuleRepository.findRuleIdsByOutboundConnectionId. Adds AlertingBeanConfig
exposing the AlertRuleRepository bean; widens OutboundBeanConfig constructor
to inject it. Delete and narrow-envs guards now correctly block when rules
reference a connection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:51:36 +02:00
hsiegeln
f80bc006c1 feat(alerting): Postgres repository for alert_rules
Implements AlertRuleRepository with JSONB condition/webhooks/eval_state
serialization via ObjectMapper, UPSERT on conflict, JSONB containment
query for findRuleIdsByOutboundConnectionId, and FOR UPDATE SKIP LOCKED
claim-polling for horizontal scale.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:48:15 +02:00
hsiegeln
a80c376950 fix(alerting): harden V12 migration IT against shared container state
- Replace hard-coded 'u1' user_id with per-test UUID to prevent PK collision on re-runs
- Add @AfterEach null-safe cleanup for environments and users rows
- Use containsExactlyInAnyOrder for enum assertions to catch misspelled names
- Slug suffix on environment insert avoids slug uniqueness conflicts on re-runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:32:35 +02:00
hsiegeln
59e76bdfb6 feat(alerting): V12 flyway migration for alerting tables 2026-04-19 18:28:09 +02:00
hsiegeln
cacedd3f16 fix(outbound): null-guard TRUST_PATHS check; add RBAC test for probe endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 3m5s
CI / build (pull_request) Successful in 2m13s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / docker (push) Successful in 4m48s
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 32s
- OutboundConnectionRequest compact ctor: avoid NPE if tlsTrustMode is null
  (defense-in-depth alongside @NotNull Bean Validation).
- Add operatorCannotTest IT case to lock the ADMIN-only contract on
  POST /{id}/test — was previously untested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:19:37 +02:00
hsiegeln
7358555d56 test(outbound): add @AfterEach cleanup to avoid leaking user/connection rows
Shared Spring test context meant seeded test-admin/test-operator/test-viewer/test-alice
users persisted across IT classes, breaking FlywayMigrationIT's "users is empty" assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:10:25 +02:00
hsiegeln
87b8a71205 feat(outbound): admin test action for reachability + TLS summary
POST /{id}/test issues a synthetic probe against the connection URL.
TLS protocol/cipher/peer-cert details stubbed for now (Plan 02 follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:47:36 +02:00
hsiegeln
ea4c56e7f6 feat(outbound): admin CRUD REST + RBAC + audit
New audit categories: OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE.
Controller-level @PreAuthorize defaults to ADMIN; GETs relaxed to ADMIN|OPERATOR.
SecurityConfig permits OPERATOR GETs on /api/v1/admin/outbound-connections/**.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:43:48 +02:00
hsiegeln
a3c35c7df9 feat(outbound): request + response + test-result DTOs with Bean Validation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:37:00 +02:00
hsiegeln
642c040116 feat(outbound): Postgres repository for outbound_connections
- PostgresOutboundConnectionRepository: JdbcTemplate impl of
  OutboundConnectionRepository; UUID arrays via ConnectionCallback,
  JSONB for headers/auth/ca-paths, enum casts for method/trust/auth-kind
- OutboundBeanConfig: wires the repo + SecretCipher beans
- PostgresOutboundConnectionRepositoryIT: 5 Testcontainers tests
  (save+read, unique-name, allowed-env-ids round-trip, tenant isolation,
  delete); validates V11 Flyway migration end-to-end
- application-test.yml: add jwtsecret default so SecretCipher bean
  starts up in the Spring test context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:23:51 +02:00
hsiegeln
b8565af039 feat(outbound): SecretCipher - AES-GCM with JWT-derived key for at-rest secret encryption
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:13:57 +02:00
hsiegeln
0c9d12d8e0 test(http): tighten SSL-failure assertion + null-guard WireMock teardown
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:08:43 +02:00
hsiegeln
000e9d2847 feat(http): ApacheOutboundHttpClientFactory with memoization and startup validation
Adds ApacheOutboundHttpClientFactory (Apache HttpClient 5) that memoizes
CloseableHttpClient instances keyed on effective TLS + timeout config, and
OutboundHttpConfig (@ConfigurationProperties) that validates trusted CA paths
at startup and exposes OutboundHttpClientFactory as a Spring bean.

TRUST_ALL mode disables both cert validation (TrustAllManager in SslContextBuilder)
and hostname verification (NoopHostnameVerifier on SSLConnectionSocketFactoryBuilder).
WireMock HTTPS integration test covers trust-all bypass, system-default PKIX rejection,
and client memoization.

OIDC audit: OidcProviderHelper and OidcTokenExchanger use Nimbus SDK's own HTTP layer
(DefaultResourceRetriever for JWKS, HTTPRequest.send() for token exchange) plus the
bespoke InsecureTlsHelper for TLS skip-verify; neither uses OutboundHttpClientFactory.
Retrofit deferred to a separate follow-up per plan §20.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:03:56 +02:00
hsiegeln
4922748599 refactor(http): tighten SslContextBuilder throws clause, classpath test fixture, system trust-all test
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:59:06 +02:00
hsiegeln
262ee91684 feat(http): SslContextBuilder supports system/trust-all/trust-paths modes
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:54:15 +02:00
hsiegeln
89c9b53edd fix(pagination): add insert_id UUID tiebreak to cursor keyset
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Same-millisecond rows were silently skipped between pages because the
log cursor had no tiebreak and the events cursor tied by instance_id
(which also collides when one instance emits multiple events within a
millisecond). Add an insert_id UUID (DEFAULT generateUUIDv4()) column
to both logs and agent_events, order by (timestamp, insert_id)
consistently, and encode the cursor as 'timestamp|insert_id'. Existing
data is materialized via ALTER TABLE MATERIALIZE COLUMN (one-time
background mutation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:25:36 +02:00
hsiegeln
6d3956935d refactor(events): remove dead non-paginated query path
AgentEventService.queryEvents, AgentEventRepository.query, and the
ClickHouse implementation have had no callers since /agents/events
became cursor-paginated. Remove them along with their dedicated IT
tests. queryPage and its tests remain as the single query path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 13:16:28 +02:00
hsiegeln
0194549f25 fix(events): reject malformed pagination cursors as 400 errors
Wraps DateTimeParseException from Instant.parse in IllegalArgumentException
so the controller maps it to 400. Also rejects cursors with empty
instance_id (trailing '|') which would otherwise produce a vacuous
keyset predicate.
2026-04-17 12:02:40 +02:00
hsiegeln
d293dafb99 feat(events): cursor-paginate agent events (ClickHouse impl)
Orders by (timestamp DESC, instance_id ASC). Cursor is
base64url('timestampIso|instanceId') with a tuple keyset predicate
for stable paging across ties.
2026-04-17 11:57:35 +02:00
hsiegeln
769752a327 feat(logs): widen source filter to multi-value OR list
Replaces LogSearchRequest.source (String) with sources (List<String>)
and emits 'source IN (...)' when non-empty. LogQueryController parses
?source=a,b,c the same way it parses ?level=a,b,c.
2026-04-17 11:48:10 +02:00
hsiegeln
62dd71b860 fix: stamp environment on agent_events rows
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
The agent_events table has an `environment` column and AgentEventsController
filters on it, but the INSERT never populated it — every row got the
column default ('default'). Result: Timeline on the Application Runtime
page was empty whenever the user's selected env was anything other than
'default'.

Thread env through the write path:
- AgentEventRepository.insert + AgentEventService.recordEvent gain an
  `environment` param; delete the no-env query overload (unused).
- ClickHouseAgentEventRepository.insert writes the column (falls back to
  'default' on null to match column DEFAULT).
- All 5 callers source env from the agent registry (AgentInfo.environmentId)
  or the registration request body; AgentLifecycleMonitor, deregister,
  command ack, event ingestion, register/re-register.
- Integration test updated for the new signatures.

Pre-existing rows in deployed CH will still report environment='default'.
New events from this build forward will carry the correct env. Backfill
(UPDATE ... FROM apps) is left as a manual DB step if historical timeline
is needed for non-default envs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:30:56 +02:00
hsiegeln
b7a107d33f test: update integration tests for env-scoped URL shape
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m49s
CI / docker (push) Successful in 2m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m37s
Picks up the URL moves from P2/P3A/P3B/P3C. Also fixes a latent bug in
AppControllerIT.uploadJar_asOperator_returns201 / DeploymentControllerIT
setUp: the tests were passing the app's UUID as the {appSlug} path
variable (via `path("id").asText()`); the old AppController looked up
apps via getBySlug(), so the legacy URL call would 404 when the slug
literal was a UUID. Now the test tracks the known slug string and uses
it for every /apps/{appSlug}/... path.

Test URL updates:
- SearchControllerIT: /api/v1/search/executions →
  /api/v1/environments/default/executions (GET) and
  /api/v1/environments/default/executions/search (POST).
- AppControllerIT: /api/v1/apps → /api/v1/environments/default/apps.
  Request bodies drop environmentId (it's in the path).
- DeploymentControllerIT: /api/v1/apps/{appId}/deployments →
  /api/v1/environments/default/apps/{appSlug}/deployments. DeployRequest
  body drops environmentId.
- JwtRefreshIT + RegistrationSecurityIT: smoke-test protected endpoint
  call updated to the new /environments/default/executions shape.

All tests compile clean. Runtime behavior requires a full stack
(Postgres + ClickHouse + Docker); validating integration tests is a
pre-merge step before merging the feature branch.

Remaining pre-merge items (not blocked by code):
1. Regenerate ui/src/api/schema.d.ts + openapi.json by running
   `cd ui && npm run generate-api:live` against a running backend.
   SearchController, DeploymentController, etc. DTO signatures have
   changed; schema.d.ts is frozen at the pre-migration shape.
   Raw-fetch call sites introduced in P3A/P3C work at runtime without
   the schema; the regen only sharpens TypeScript coverage.
2. Smoke test locally: boot server, verify EnvironmentsPage,
   AppsTab, Exchanges, Dashboard, Runtime pages all function.
3. Run `mvn verify` end-to-end (Testcontainers + Docker required).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:53:55 +02:00
hsiegeln
6b5ee10944 feat!: environment admin URLs use slug; validate and immutabilize slug
UUID-based admin paths were the only remaining UUID-in-URL pattern in
the API. Migrates /api/v1/admin/environments/{id} to /{envSlug} so
slugs are the single environment identifier in every URL. UUIDs stay
internal to the database.

- Controller: @PathVariable UUID id → @PathVariable String envSlug on
  get/update/delete and the two nested endpoints (default-container-
  config, jar-retention). Handlers resolve slug → Environment via
  EnvironmentService.getBySlug, then delegate to existing UUID-based
  service methods.
- Service: create() now validates slug against ^[a-z0-9][a-z0-9-]{0,63}$
  and returns 400 on invalid slugs. Rationale documented in the class:
  slugs are immutable after creation because they appear in URLs,
  Docker network names, container names, and ClickHouse partition keys.
- UpdateEnvironmentRequest has no slug field and Jackson's default
  ignore-unknown behavior drops any slug supplied in a PUT body;
  regression test (updateEnvironment_withSlugInBody_ignoresSlug)
  documents this invariant.
- SPA: mutation args change from { id } to { slug }. EnvironmentsPage
  still uses env.id for local selection state (UUID from DB) but
  passes env.slug to every mutation.

BREAKING CHANGE: /api/v1/admin/environments/{id:UUID}/... paths removed.
Clients must use /{envSlug}/... (slug from the environments list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:23:31 +02:00
hsiegeln
9b1ef51d77 feat!: scope per-app config and settings by environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m40s
SonarQube / sonarqube (push) Successful in 4m29s
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).

Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.

- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
  stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
  JWT env claim (agents cannot spoof env); non-agent callers provide env
  via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
  env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
  route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
  removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
  useAllAppSettings, useUpdateAppSettings) take env, wired to
  useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.

Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 22:25:21 +02:00
hsiegeln
e2d9428dff fix: drop stale instance_id filter from search and scope route stats by app
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 1m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
The exchange search silently filtered by the in-memory agent registry's
current instance IDs on top of application_id. Historical exchanges written
by previous agent instances (or any instance not currently registered, e.g.
after a server restart before agents heartbeat back) were hidden from
results even though they matched the application filter.

Fix: drop the applicationId -> instanceIds resolution in SearchController.
Rely on application_id = ? in ClickHouseSearchIndex; keep explicit
instanceIds filtering only when a client passes them.

Related cleanup: the agentIds parameter on StatsStore.statsForRoute /
timeseriesForRoute was silently discarded inside ClickHouseStatsStore, so
per-route stats aggregated across any apps sharing a routeId. Replace with
String applicationId and add application_id to the stats_1m_route filters
so per-route stats are correctly scoped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:49:55 +02:00
hsiegeln
cb3ebfea7c chore: rename cameleer3 to cameleer
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:42 +02:00