Commit Graph

1274 Commits

Author SHA1 Message Date
hsiegeln
07d0386bf2 feat(alerting): ROUTE_METRIC evaluator
P95_LATENCY_MS maps to avgDurationMs (ExecutionStats has no p95 bucket).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:36:22 +02:00
hsiegeln
983b698266 feat(alerting): DEPLOYMENT_STATE evaluator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:34:47 +02:00
hsiegeln
e84338fc9a feat(alerting): AGENT_STATE evaluator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:33:13 +02:00
hsiegeln
55f4cab948 feat(alerting): evaluator scaffolding (context, result, tick cache, circuit breaker)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:32:06 +02:00
hsiegeln
891c7f87e3 feat(alerting): silence matcher for notification-time dispatch
SilenceMatcherService.matches() evaluates AND semantics across ruleId,
severity, appSlug, routeId, agentId constraints. Null fields are wildcards.
Scope-based constraints (appSlug/routeId/agentId) return false when rule is
null (deleted rule — scope cannot be verified). 17 unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:27:18 +02:00
hsiegeln
1c74ab8541 feat(alerting): NotificationContextBuilder for template context maps
Builds the Mustache context map from AlertRule + AlertInstance + Environment.
Always emits env/rule/alert subtrees; conditionally emits kind-specific
subtrees (agent, app, route, exchange, log, metric, deployment) based on
rule.conditionKind(). Missing instance.context() keys resolve to empty
string. alert.link prefixed with uiOrigin when non-null. 11 unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:27:12 +02:00
hsiegeln
92a74e7b8d feat(alerting): MustacheRenderer with literal fallback on missing vars
Sentinel-substitution approach: unresolved {{x.y.z}} tokens are replaced
with a unique NUL-delimited sentinel before Mustache compilation, rendered
as opaque text, then post-replaced with the original {{x.y.z}} literal.
Malformed templates (unclosed {{) are caught and return the raw template.
Never throws. 9 unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:27:05 +02:00
hsiegeln
c53f642838 chore(alerting): add jmustache 1.16
Declared in cameleer-server-core pom (canonical location for unit-testable
rendering without Spring) and mirrored in cameleer-server-app pom so the
app module compiles standalone without a full reactor install.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:26:57 +02:00
hsiegeln
7c0e94a425 feat(alerting): ClickHouse projections for alerting read paths
Adds alerting_projections.sql with four projections (alerting_app_status,
alerting_route_status on executions; alerting_app_level on logs;
alerting_instance_metric on agent_metrics). ClickHouseSchemaInitializer now
runs both init.sql and alerting_projections.sql, with ADD PROJECTION and
MATERIALIZE treated as non-fatal — executions (ReplacingMergeTree) requires
deduplicate_merge_projection_mode=rebuild which is unavailable via JDBC pool.
MergeTree projections (logs, agent_metrics) always succeed and are asserted in IT.

Column names confirmed from init.sql: logs uses 'application' (not application_id),
agent_metrics uses 'collected_at' (not timestamp). All column names match the plan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:58 +02:00
hsiegeln
7b79d3aa64 feat(alerting): countExecutionsForAlerting for exchange-match evaluator
Adds AlertMatchSpec record (core) and ClickHouseSearchIndex.countExecutionsForAlerting —
no FINAL, no text subqueries. Filters by tenant, env, app, route, status, time window,
and optional after-cursor. Attributes (JSON string column) use inlined JSONExtractString
key literals since ClickHouse JDBC does not bind ? placeholders inside JSON functions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:49 +02:00
hsiegeln
44e91ccdb5 feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator
Adds countLogs(LogSearchRequest) — no FINAL, no cursor/sort/limit —
reusing the same WHERE-clause logic as search() for tenant, env, app,
level, q, logger, source, exchangeId, and time-range filters.
Also extends ClickHouseTestHelper with executeInitSqlWithProjections()
and makes the script runner non-fatal for ADD/MATERIALIZE PROJECTION.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:41 +02:00
hsiegeln
59354fae18 feat(alerting): wire all alerting repository beans
AlertingBeanConfig now exposes 4 additional @Bean methods:
alertInstanceRepository, alertSilenceRepository,
alertNotificationRepository, alertReadRepository.
AlertReadRepository takes only JdbcTemplate (no JSONB/ObjectMapper needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:05:06 +02:00
hsiegeln
f829929b07 feat(alerting): Postgres repositories for silences, notifications, reads
PostgresAlertSilenceRepository: save/findById roundtrip, listActive (BETWEEN
starts_at AND ends_at), listByEnvironment, delete. JSONB SilenceMatcher via ObjectMapper.

PostgresAlertNotificationRepository: save/findById, listForInstance,
claimDueNotifications (UPDATE...RETURNING with FOR UPDATE SKIP LOCKED),
markDelivered, scheduleRetry (bumps attempts + next_attempt_at), markFailed,
deleteSettledBefore (DELIVERED+FAILED rows older than cutoff). JSONB payload.

PostgresAlertReadRepository: markRead (ON CONFLICT DO NOTHING idempotent),
bulkMarkRead (iterates, handles empty list without error).

16 IT scenarios across 3 classes, all passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:05:01 +02:00
hsiegeln
45028de1db feat(alerting): Postgres repository for alert_instances with inbox queries
Implements AlertInstanceRepository: save (upsert), findById, findOpenForRule,
listForInbox (3-way OR: user/group/role via && array-overlap + ANY), countUnreadForUser
(LEFT JOIN alert_reads), ack, resolve, markSilenced, deleteResolvedBefore.
Integration test covers all 9 scenarios including inbox fan-out across all
three target types. Also adds @JsonIgnoreProperties(ignoreUnknown=true) to
SilenceMatcher to suppress Jackson serializing isWildcard() as a round-trip field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:04:51 +02:00
hsiegeln
930ac20d11 fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)
Replaces the Plan 01 stub that returned [] with a real call through
AlertRuleRepository.findRuleIdsByOutboundConnectionId. Adds AlertingBeanConfig
exposing the AlertRuleRepository bean; widens OutboundBeanConfig constructor
to inject it. Delete and narrow-envs guards now correctly block when rules
reference a connection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:51:36 +02:00
hsiegeln
f80bc006c1 feat(alerting): Postgres repository for alert_rules
Implements AlertRuleRepository with JSONB condition/webhooks/eval_state
serialization via ObjectMapper, UPSERT on conflict, JSONB containment
query for findRuleIdsByOutboundConnectionId, and FOR UPDATE SKIP LOCKED
claim-polling for horizontal scale.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:48:15 +02:00
hsiegeln
1ff256dce0 feat(alerting): core repository interfaces 2026-04-19 18:43:36 +02:00
hsiegeln
e7a9042677 feat(alerting): core domain records (rule, instance, silence, notification) 2026-04-19 18:43:03 +02:00
hsiegeln
56a7b6de7d feat(alerting): sealed AlertCondition hierarchy with Jackson deduction 2026-04-19 18:42:04 +02:00
hsiegeln
530bc32040 feat(alerting): core enums + AlertScope 2026-04-19 18:36:29 +02:00
hsiegeln
5103dc91be feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories 2026-04-19 18:34:08 +02:00
hsiegeln
a80c376950 fix(alerting): harden V12 migration IT against shared container state
- Replace hard-coded 'u1' user_id with per-test UUID to prevent PK collision on re-runs
- Add @AfterEach null-safe cleanup for environments and users rows
- Use containsExactlyInAnyOrder for enum assertions to catch misspelled names
- Slug suffix on environment insert avoids slug uniqueness conflicts on re-runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:32:35 +02:00
hsiegeln
59e76bdfb6 feat(alerting): V12 flyway migration for alerting tables 2026-04-19 18:28:09 +02:00
hsiegeln
087dcee5df docs(alerting): Plan 02 — backend (domain, storage, evaluators, dispatch) 2026-04-19 18:24:16 +02:00
hsiegeln
cacedd3f16 fix(outbound): null-guard TRUST_PATHS check; add RBAC test for probe endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 3m5s
CI / build (pull_request) Successful in 2m13s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / docker (push) Successful in 4m48s
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 32s
- OutboundConnectionRequest compact ctor: avoid NPE if tlsTrustMode is null
  (defense-in-depth alongside @NotNull Bean Validation).
- Add operatorCannotTest IT case to lock the ADMIN-only contract on
  POST /{id}/test — was previously untested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:19:37 +02:00
hsiegeln
7358555d56 test(outbound): add @AfterEach cleanup to avoid leaking user/connection rows
Shared Spring test context meant seeded test-admin/test-operator/test-viewer/test-alice
users persisted across IT classes, breaking FlywayMigrationIT's "users is empty" assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:10:25 +02:00
hsiegeln
609a86dd03 docs: admin guide for outbound connections
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:03:18 +02:00
hsiegeln
1dd1f10c0e docs(rules): document http/ and outbound/ packages + admin controller
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:02:09 +02:00
hsiegeln
0c5f1b5740 feat(ui): outbound connection editor — TLS config, test action, env restriction
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:59:19 +02:00
hsiegeln
e7fbf5a7b2 feat(ui): admin page for outbound connections list + navigation
Adds OutboundConnectionsPage (list view with delete), lazy route at
/admin/outbound-connections, and Outbound Connections nav node in the
admin sidebar tree. No test file created — UI codebase has no existing
test infrastructure to build on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:55:35 +02:00
hsiegeln
3c903fc8dc feat(ui): tanstack query hooks for outbound connections
Types are hand-authored (matching codebase admin-query convention);
schema.d.ts regeneration deferred until backend dev server is available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:52:40 +02:00
hsiegeln
87b8a71205 feat(outbound): admin test action for reachability + TLS summary
POST /{id}/test issues a synthetic probe against the connection URL.
TLS protocol/cipher/peer-cert details stubbed for now (Plan 02 follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:47:36 +02:00
hsiegeln
ea4c56e7f6 feat(outbound): admin CRUD REST + RBAC + audit
New audit categories: OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE.
Controller-level @PreAuthorize defaults to ADMIN; GETs relaxed to ADMIN|OPERATOR.
SecurityConfig permits OPERATOR GETs on /api/v1/admin/outbound-connections/**.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:43:48 +02:00
hsiegeln
a3c35c7df9 feat(outbound): request + response + test-result DTOs with Bean Validation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:37:00 +02:00
hsiegeln
94b5db0f5b feat(outbound): service with uniqueness + narrow-envs + delete-if-referenced guards
rulesReferencing() is stubbed; wired to AlertRuleRepository in Plan 02.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:34:09 +02:00
hsiegeln
642c040116 feat(outbound): Postgres repository for outbound_connections
- PostgresOutboundConnectionRepository: JdbcTemplate impl of
  OutboundConnectionRepository; UUID arrays via ConnectionCallback,
  JSONB for headers/auth/ca-paths, enum casts for method/trust/auth-kind
- OutboundBeanConfig: wires the repo + SecretCipher beans
- PostgresOutboundConnectionRepositoryIT: 5 Testcontainers tests
  (save+read, unique-name, allowed-env-ids round-trip, tenant isolation,
  delete); validates V11 Flyway migration end-to-end
- application-test.yml: add jwtsecret default so SecretCipher bean
  starts up in the Spring test context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:23:51 +02:00
hsiegeln
380ccb102b fix(outbound): align user FK with users(user_id) TEXT schema
V11 migration referenced users(id) as uuid, but V1 users table has
user_id as TEXT primary key. Amending V11 and the OutboundConnection
record before Task 7's integration tests catch this at Flyway startup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:18:12 +02:00
hsiegeln
b8565af039 feat(outbound): SecretCipher - AES-GCM with JWT-derived key for at-rest secret encryption
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:13:57 +02:00
hsiegeln
46b8f63fd1 feat(outbound): core domain records for outbound connections
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:10:17 +02:00
hsiegeln
0c9d12d8e0 test(http): tighten SSL-failure assertion + null-guard WireMock teardown
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:08:43 +02:00
hsiegeln
000e9d2847 feat(http): ApacheOutboundHttpClientFactory with memoization and startup validation
Adds ApacheOutboundHttpClientFactory (Apache HttpClient 5) that memoizes
CloseableHttpClient instances keyed on effective TLS + timeout config, and
OutboundHttpConfig (@ConfigurationProperties) that validates trusted CA paths
at startup and exposes OutboundHttpClientFactory as a Spring bean.

TRUST_ALL mode disables both cert validation (TrustAllManager in SslContextBuilder)
and hostname verification (NoopHostnameVerifier on SSLConnectionSocketFactoryBuilder).
WireMock HTTPS integration test covers trust-all bypass, system-default PKIX rejection,
and client memoization.

OIDC audit: OidcProviderHelper and OidcTokenExchanger use Nimbus SDK's own HTTP layer
(DefaultResourceRetriever for JWKS, HTTPRequest.send() for token exchange) plus the
bespoke InsecureTlsHelper for TLS skip-verify; neither uses OutboundHttpClientFactory.
Retrofit deferred to a separate follow-up per plan §20.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:03:56 +02:00
hsiegeln
4922748599 refactor(http): tighten SslContextBuilder throws clause, classpath test fixture, system trust-all test
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:59:06 +02:00
hsiegeln
262ee91684 feat(http): SslContextBuilder supports system/trust-all/trust-paths modes
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:54:15 +02:00
hsiegeln
2224f7d902 feat(http): core outbound HTTP interfaces and property records 2026-04-19 15:39:57 +02:00
hsiegeln
ffdfd6cd9a feat(outbound): add HTTPS CHECK constraint on outbound_connections.url
Defense-in-depth per code review. DTO layer already validates HTTPS at save
time; this DB-level check guards against future code paths that might bypass
the DTO validator. Mustache template variables in the URL (e.g., {{env.slug}})
remain valid since only the scheme prefix is constrained.
2026-04-19 15:37:35 +02:00
hsiegeln
116038262a feat(outbound): V11 flyway migration for outbound_connections table 2026-04-19 15:33:39 +02:00
hsiegeln
77a23c270b docs(alerting): Plan 01 — outbound HTTP infra + admin-managed outbound connections
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m57s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
First of three sequenced plans for the alerting feature. Covers:
- Cross-cutting http/ module (OutboundHttpClientFactory, SslContextBuilder,
  TLS trust composition, startup validation)
- Admin-managed OutboundConnection with PG persistence, AES-GCM-encrypted
  HMAC secret (resolves spec §20 item 2)
- Admin CRUD REST + test endpoint + RBAC + audit
- Admin UI page with TLS config, allowed-envs multi-select, test action
- OIDC retrofit deliberately deferred (documented in Task 4 audit)

Plan 02 (alerting backend) and Plan 03 (alerting UI) written after Plan 01
executes — lets reality inform their details, especially the secret-cipher
interface and the rules-referencing integration point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:26:00 +02:00
hsiegeln
e71edcdd5e docs(alerting): add BL-002 for native provider integrations + Mustache auto-complete
BL-002 / gitea#138 tracks deferred native provider types (Slack Block Kit,
PagerDuty Events v2, Teams connector) with shipped templates as a post-v1
fast-follow once usage data informs which providers matter.

Spec §13 folds in context-aware variable auto-complete for the shared
<MustacheEditor /> component used in rule editor, webhook overrides, and
outbound-connection admin. Available variables filter by condition kind.
Completion engine choice added to §20 as a planning-phase decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:10:00 +02:00
hsiegeln
a9ad0eb841 docs(alerting): spec for alerting feature + backlog entry BL-001
Comprehensive design spec for a confined, env-scoped alerting feature:
6 signal sources, shared env-scoped rules with RBAC-targeted notifications,
in-app inbox + webhook delivery via admin-managed outbound connections,
claim-based polling for horizontal scalability, 4 CH projections for hot-path
reads. Backlog entry BL-001 / gitea#137 tracks deferred managed-CA investigation
(reuse SaaS-layer CA handling first before building in-server storage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:38 +02:00
hsiegeln
c4cee9718c fix(ui): align log search input styling with EventFeed, render ellipsis
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m44s
CI / docker (push) Successful in 1m16s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m49s
SonarQube / sonarqube (push) Successful in 4m5s
JSX attribute strings don't process JS escape sequences — "Search logs\u2026"
rendered the literal "\u2026" in the placeholder. Replaced with the actual
ellipsis character.

Also aligned .logSearchInput (Application Log search) with EventFeed's
internal search input: --bg-surface background, --border border,
mono font family, 28px height. Previously used --bg-body + --border-subtle
+ body font, which looked visibly different next to the Timeline panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:53:43 +02:00