Commit Graph

1265 Commits

Author SHA1 Message Date
hsiegeln
7b79d3aa64 feat(alerting): countExecutionsForAlerting for exchange-match evaluator
Adds AlertMatchSpec record (core) and ClickHouseSearchIndex.countExecutionsForAlerting —
no FINAL, no text subqueries. Filters by tenant, env, app, route, status, time window,
and optional after-cursor. Attributes (JSON string column) use inlined JSONExtractString
key literals since ClickHouse JDBC does not bind ? placeholders inside JSON functions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:49 +02:00
hsiegeln
44e91ccdb5 feat(alerting): ClickHouseLogStore.countLogs for log-pattern evaluator
Adds countLogs(LogSearchRequest) — no FINAL, no cursor/sort/limit —
reusing the same WHERE-clause logic as search() for tenant, env, app,
level, q, logger, source, exchangeId, and time-range filters.
Also extends ClickHouseTestHelper with executeInitSqlWithProjections()
and makes the script runner non-fatal for ADD/MATERIALIZE PROJECTION.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:18:41 +02:00
hsiegeln
59354fae18 feat(alerting): wire all alerting repository beans
AlertingBeanConfig now exposes 4 additional @Bean methods:
alertInstanceRepository, alertSilenceRepository,
alertNotificationRepository, alertReadRepository.
AlertReadRepository takes only JdbcTemplate (no JSONB/ObjectMapper needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:05:06 +02:00
hsiegeln
f829929b07 feat(alerting): Postgres repositories for silences, notifications, reads
PostgresAlertSilenceRepository: save/findById roundtrip, listActive (BETWEEN
starts_at AND ends_at), listByEnvironment, delete. JSONB SilenceMatcher via ObjectMapper.

PostgresAlertNotificationRepository: save/findById, listForInstance,
claimDueNotifications (UPDATE...RETURNING with FOR UPDATE SKIP LOCKED),
markDelivered, scheduleRetry (bumps attempts + next_attempt_at), markFailed,
deleteSettledBefore (DELIVERED+FAILED rows older than cutoff). JSONB payload.

PostgresAlertReadRepository: markRead (ON CONFLICT DO NOTHING idempotent),
bulkMarkRead (iterates, handles empty list without error).

16 IT scenarios across 3 classes, all passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:05:01 +02:00
hsiegeln
45028de1db feat(alerting): Postgres repository for alert_instances with inbox queries
Implements AlertInstanceRepository: save (upsert), findById, findOpenForRule,
listForInbox (3-way OR: user/group/role via && array-overlap + ANY), countUnreadForUser
(LEFT JOIN alert_reads), ack, resolve, markSilenced, deleteResolvedBefore.
Integration test covers all 9 scenarios including inbox fan-out across all
three target types. Also adds @JsonIgnoreProperties(ignoreUnknown=true) to
SilenceMatcher to suppress Jackson serializing isWildcard() as a round-trip field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:04:51 +02:00
hsiegeln
930ac20d11 fix(outbound): wire rulesReferencing to AlertRuleRepository (Plan 01 gate)
Replaces the Plan 01 stub that returned [] with a real call through
AlertRuleRepository.findRuleIdsByOutboundConnectionId. Adds AlertingBeanConfig
exposing the AlertRuleRepository bean; widens OutboundBeanConfig constructor
to inject it. Delete and narrow-envs guards now correctly block when rules
reference a connection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:51:36 +02:00
hsiegeln
f80bc006c1 feat(alerting): Postgres repository for alert_rules
Implements AlertRuleRepository with JSONB condition/webhooks/eval_state
serialization via ObjectMapper, UPSERT on conflict, JSONB containment
query for findRuleIdsByOutboundConnectionId, and FOR UPDATE SKIP LOCKED
claim-polling for horizontal scale.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:48:15 +02:00
hsiegeln
1ff256dce0 feat(alerting): core repository interfaces 2026-04-19 18:43:36 +02:00
hsiegeln
e7a9042677 feat(alerting): core domain records (rule, instance, silence, notification) 2026-04-19 18:43:03 +02:00
hsiegeln
56a7b6de7d feat(alerting): sealed AlertCondition hierarchy with Jackson deduction 2026-04-19 18:42:04 +02:00
hsiegeln
530bc32040 feat(alerting): core enums + AlertScope 2026-04-19 18:36:29 +02:00
hsiegeln
5103dc91be feat(alerting): add ALERT_RULE_CHANGE + ALERT_SILENCE_CHANGE audit categories 2026-04-19 18:34:08 +02:00
hsiegeln
a80c376950 fix(alerting): harden V12 migration IT against shared container state
- Replace hard-coded 'u1' user_id with per-test UUID to prevent PK collision on re-runs
- Add @AfterEach null-safe cleanup for environments and users rows
- Use containsExactlyInAnyOrder for enum assertions to catch misspelled names
- Slug suffix on environment insert avoids slug uniqueness conflicts on re-runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 18:32:35 +02:00
hsiegeln
59e76bdfb6 feat(alerting): V12 flyway migration for alerting tables 2026-04-19 18:28:09 +02:00
hsiegeln
087dcee5df docs(alerting): Plan 02 — backend (domain, storage, evaluators, dispatch) 2026-04-19 18:24:16 +02:00
hsiegeln
cacedd3f16 fix(outbound): null-guard TRUST_PATHS check; add RBAC test for probe endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 3m5s
CI / build (pull_request) Successful in 2m13s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / docker (push) Successful in 4m48s
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 32s
- OutboundConnectionRequest compact ctor: avoid NPE if tlsTrustMode is null
  (defense-in-depth alongside @NotNull Bean Validation).
- Add operatorCannotTest IT case to lock the ADMIN-only contract on
  POST /{id}/test — was previously untested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:19:37 +02:00
hsiegeln
7358555d56 test(outbound): add @AfterEach cleanup to avoid leaking user/connection rows
Shared Spring test context meant seeded test-admin/test-operator/test-viewer/test-alice
users persisted across IT classes, breaking FlywayMigrationIT's "users is empty" assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:10:25 +02:00
hsiegeln
609a86dd03 docs: admin guide for outbound connections
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:03:18 +02:00
hsiegeln
1dd1f10c0e docs(rules): document http/ and outbound/ packages + admin controller
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:02:09 +02:00
hsiegeln
0c5f1b5740 feat(ui): outbound connection editor — TLS config, test action, env restriction
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:59:19 +02:00
hsiegeln
e7fbf5a7b2 feat(ui): admin page for outbound connections list + navigation
Adds OutboundConnectionsPage (list view with delete), lazy route at
/admin/outbound-connections, and Outbound Connections nav node in the
admin sidebar tree. No test file created — UI codebase has no existing
test infrastructure to build on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:55:35 +02:00
hsiegeln
3c903fc8dc feat(ui): tanstack query hooks for outbound connections
Types are hand-authored (matching codebase admin-query convention);
schema.d.ts regeneration deferred until backend dev server is available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:52:40 +02:00
hsiegeln
87b8a71205 feat(outbound): admin test action for reachability + TLS summary
POST /{id}/test issues a synthetic probe against the connection URL.
TLS protocol/cipher/peer-cert details stubbed for now (Plan 02 follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:47:36 +02:00
hsiegeln
ea4c56e7f6 feat(outbound): admin CRUD REST + RBAC + audit
New audit categories: OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE.
Controller-level @PreAuthorize defaults to ADMIN; GETs relaxed to ADMIN|OPERATOR.
SecurityConfig permits OPERATOR GETs on /api/v1/admin/outbound-connections/**.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:43:48 +02:00
hsiegeln
a3c35c7df9 feat(outbound): request + response + test-result DTOs with Bean Validation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:37:00 +02:00
hsiegeln
94b5db0f5b feat(outbound): service with uniqueness + narrow-envs + delete-if-referenced guards
rulesReferencing() is stubbed; wired to AlertRuleRepository in Plan 02.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:34:09 +02:00
hsiegeln
642c040116 feat(outbound): Postgres repository for outbound_connections
- PostgresOutboundConnectionRepository: JdbcTemplate impl of
  OutboundConnectionRepository; UUID arrays via ConnectionCallback,
  JSONB for headers/auth/ca-paths, enum casts for method/trust/auth-kind
- OutboundBeanConfig: wires the repo + SecretCipher beans
- PostgresOutboundConnectionRepositoryIT: 5 Testcontainers tests
  (save+read, unique-name, allowed-env-ids round-trip, tenant isolation,
  delete); validates V11 Flyway migration end-to-end
- application-test.yml: add jwtsecret default so SecretCipher bean
  starts up in the Spring test context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:23:51 +02:00
hsiegeln
380ccb102b fix(outbound): align user FK with users(user_id) TEXT schema
V11 migration referenced users(id) as uuid, but V1 users table has
user_id as TEXT primary key. Amending V11 and the OutboundConnection
record before Task 7's integration tests catch this at Flyway startup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:18:12 +02:00
hsiegeln
b8565af039 feat(outbound): SecretCipher - AES-GCM with JWT-derived key for at-rest secret encryption
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:13:57 +02:00
hsiegeln
46b8f63fd1 feat(outbound): core domain records for outbound connections
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:10:17 +02:00
hsiegeln
0c9d12d8e0 test(http): tighten SSL-failure assertion + null-guard WireMock teardown
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:08:43 +02:00
hsiegeln
000e9d2847 feat(http): ApacheOutboundHttpClientFactory with memoization and startup validation
Adds ApacheOutboundHttpClientFactory (Apache HttpClient 5) that memoizes
CloseableHttpClient instances keyed on effective TLS + timeout config, and
OutboundHttpConfig (@ConfigurationProperties) that validates trusted CA paths
at startup and exposes OutboundHttpClientFactory as a Spring bean.

TRUST_ALL mode disables both cert validation (TrustAllManager in SslContextBuilder)
and hostname verification (NoopHostnameVerifier on SSLConnectionSocketFactoryBuilder).
WireMock HTTPS integration test covers trust-all bypass, system-default PKIX rejection,
and client memoization.

OIDC audit: OidcProviderHelper and OidcTokenExchanger use Nimbus SDK's own HTTP layer
(DefaultResourceRetriever for JWKS, HTTPRequest.send() for token exchange) plus the
bespoke InsecureTlsHelper for TLS skip-verify; neither uses OutboundHttpClientFactory.
Retrofit deferred to a separate follow-up per plan §20.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:03:56 +02:00
hsiegeln
4922748599 refactor(http): tighten SslContextBuilder throws clause, classpath test fixture, system trust-all test
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:59:06 +02:00
hsiegeln
262ee91684 feat(http): SslContextBuilder supports system/trust-all/trust-paths modes
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:54:15 +02:00
hsiegeln
2224f7d902 feat(http): core outbound HTTP interfaces and property records 2026-04-19 15:39:57 +02:00
hsiegeln
ffdfd6cd9a feat(outbound): add HTTPS CHECK constraint on outbound_connections.url
Defense-in-depth per code review. DTO layer already validates HTTPS at save
time; this DB-level check guards against future code paths that might bypass
the DTO validator. Mustache template variables in the URL (e.g., {{env.slug}})
remain valid since only the scheme prefix is constrained.
2026-04-19 15:37:35 +02:00
hsiegeln
116038262a feat(outbound): V11 flyway migration for outbound_connections table 2026-04-19 15:33:39 +02:00
hsiegeln
77a23c270b docs(alerting): Plan 01 — outbound HTTP infra + admin-managed outbound connections
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m57s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
First of three sequenced plans for the alerting feature. Covers:
- Cross-cutting http/ module (OutboundHttpClientFactory, SslContextBuilder,
  TLS trust composition, startup validation)
- Admin-managed OutboundConnection with PG persistence, AES-GCM-encrypted
  HMAC secret (resolves spec §20 item 2)
- Admin CRUD REST + test endpoint + RBAC + audit
- Admin UI page with TLS config, allowed-envs multi-select, test action
- OIDC retrofit deliberately deferred (documented in Task 4 audit)

Plan 02 (alerting backend) and Plan 03 (alerting UI) written after Plan 01
executes — lets reality inform their details, especially the secret-cipher
interface and the rules-referencing integration point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:26:00 +02:00
hsiegeln
e71edcdd5e docs(alerting): add BL-002 for native provider integrations + Mustache auto-complete
BL-002 / gitea#138 tracks deferred native provider types (Slack Block Kit,
PagerDuty Events v2, Teams connector) with shipped templates as a post-v1
fast-follow once usage data informs which providers matter.

Spec §13 folds in context-aware variable auto-complete for the shared
<MustacheEditor /> component used in rule editor, webhook overrides, and
outbound-connection admin. Available variables filter by condition kind.
Completion engine choice added to §20 as a planning-phase decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:10:00 +02:00
hsiegeln
a9ad0eb841 docs(alerting): spec for alerting feature + backlog entry BL-001
Comprehensive design spec for a confined, env-scoped alerting feature:
6 signal sources, shared env-scoped rules with RBAC-targeted notifications,
in-app inbox + webhook delivery via admin-managed outbound connections,
claim-based polling for horizontal scalability, 4 CH projections for hot-path
reads. Backlog entry BL-001 / gitea#137 tracks deferred managed-CA investigation
(reuse SaaS-layer CA handling first before building in-server storage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:38 +02:00
hsiegeln
c4cee9718c fix(ui): align log search input styling with EventFeed, render ellipsis
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m44s
CI / docker (push) Successful in 1m16s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m49s
SonarQube / sonarqube (push) Successful in 4m5s
JSX attribute strings don't process JS escape sequences — "Search logs\u2026"
rendered the literal "\u2026" in the placeholder. Replaced with the actual
ellipsis character.

Also aligned .logSearchInput (Application Log search) with EventFeed's
internal search input: --bg-surface background, --border border,
mono font family, 28px height. Previously used --bg-body + --border-subtle
+ body font, which looked visibly different next to the Timeline panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:53:43 +02:00
hsiegeln
d40833b96a docs(rules): refresh for insert_id UUID cursor + AgentEventPage
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- LogQueryController: note response shape, sort param, and that the
  cursor tiebreak is the insert_id UUID column (not exchange/instance)
- AgentEventsController: cursor now carries insert_id UUID (was instanceId);
  order is (timestamp DESC, insert_id DESC)
- core-classes: add AgentEventPage record; note that the non-paginated
  AgentEventRepository.query(...) path has been removed
- core-classes: note LogSearchRequest.sources/levels are now List<String>
  with multi-value OR semantics

Keeps the rule files in sync with the cursor-pagination + multi-select
filter work on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:43:25 +02:00
hsiegeln
57e1d09bc6 fix(ui): align Timeline panel header with Application Log
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Both panels now use the same card wrapper (logStyles.logCard), header
container (logStyles.logHeader, 12px 16px padding), and DS SectionHeader
for the title. Previously Timeline rendered a custom 13px span while
Application Log used SectionHeader's uppercase style, so the two panels
side-by-side looked inconsistent.

Removes the now-orphaned .eventCard/.eventCardHeader/.sectionTitle and
.timelineCard/.timelineHeader CSS rules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:41:29 +02:00
hsiegeln
9292bd5f5f fix(ui): Timeline uses EventFeed's internal scroll + load-older button
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 39s
EventFeed has its own search + filter toolbar inside the component.
Wrapping it in InfiniteScrollArea made the toolbar scroll out of
sight. Drop InfiniteScrollArea for the Timeline, give EventFeed a
bounded-height flex container (it scrolls its own .list internally),
and add an explicit 'Load older events' button for cursor
pagination. Polling always on for events (low volume).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:25:48 +02:00
hsiegeln
a3429a609e fix(ui): live-tail logs when time range is a relative preset
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Page 1 refetches were using the captured timeRange.end, so rows
arriving after the initial render were outside the query window and
never surfaced. When timeRange.preset is set (e.g. 'last 1h'), each
fetch now advances 'to' to Date.now() so the poll picks up new rows.
Absolute ranges are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:48:29 +02:00
hsiegeln
51feacec1e fix(ui): cascade flatScroll override to descendants
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m25s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
EventFeed's overflow-y:auto lives on its inner .list, not the root
where className lands. Extending .flatScroll to .flatScroll * covers
nested scroll containers, and relaxing the root's height:100% (which
EventFeed sets) lets content size naturally so the outer
InfiniteScrollArea owns the single scrollbar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:44:26 +02:00
hsiegeln
806a817c07 fix(ui): suppress double scrollbar in log + timeline panels
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m17s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
LogViewer and EventFeed each apply overflow-y:auto to their root
container, which produced a nested scrollbar inside the
InfiniteScrollArea that also scrolls. A flatScroll override class
flattens the DS component so the outer InfiniteScrollArea owns the
single scrollbar — matching the IntersectionObserver sentinels that
drive infinite-load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:38:15 +02:00
hsiegeln
89c9b53edd fix(pagination): add insert_id UUID tiebreak to cursor keyset
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Same-millisecond rows were silently skipped between pages because the
log cursor had no tiebreak and the events cursor tied by instance_id
(which also collides when one instance emits multiple events within a
millisecond). Add an insert_id UUID (DEFAULT generateUUIDv4()) column
to both logs and agent_events, order by (timestamp, insert_id)
consistently, and encode the cursor as 'timestamp|insert_id'. Existing
data is materialized via ALTER TABLE MATERIALIZE COLUMN (one-time
background mutation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:25:36 +02:00
hsiegeln
07dbfb1391 fix(ui): log header counter reflects visible (filtered) count
When a text search is active, show 'X of Y entries' rather than the
loaded total, so the number matches what's on screen.
2026-04-17 13:19:51 +02:00
hsiegeln
a2d55f7075 fix(ui): push log sort toggle server-side
Reversing logStream.items client-side breaks across infinite-scroll
pages. Passing sort='asc'/'desc' into the query key and URL triggers
a fresh first-page fetch in the selected order.
2026-04-17 13:19:29 +02:00