Compare commits

10 Commits

Author SHA1 Message Date
hsiegeln
1ebc2fa71e test(ui/alerts): Playwright E2E smoke (sidebar, rule CRUD, CMD-K, silence CRUD)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m10s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 2m34s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / docker (push) Successful in 5m11s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 40s
fixtures.ts: auto-applied login fixture — visits /login?local to skip OIDC
auto-redirect, fills username/password via label-matcher, clicks 'Sign in',
then selects the 'default' env so alerting hooks enable (useSelectedEnv gate).
Override via E2E_ADMIN_USER + E2E_ADMIN_PASS.

alerting.spec.ts: 4 tests against the full docker-compose stack:
 - sidebar Alerts accordion → /alerts/inbox
 - 5-step wizard: defaults-only create + row delete (unique timestamp name
   avoids strict-mode collisions with leftover rules)
 - CMD-K palette via SearchTrigger click (deterministic; Ctrl+K via keyboard
   is flaky when the canvas doesn't have focus)
 - silence matcher-based create + end-early

DS FormField renders labels as generics (not htmlFor-wired), so inputs are
targeted by placeholder or label-proximity locators instead of getByLabel.

Does not exercise fire→ack→clear; that's covered backend-side by
AlertingFullLifecycleIT (Plan 02). UI E2E for that path would need event
injection into ClickHouse, out of scope for this smoke.
2026-04-20 16:18:17 +02:00
hsiegeln
d88bede097 chore(docker): seeder service pre-creates unprefixed 'admin' user row
Alerting + outbound controllers resolve acting user via
authentication.name with 'user:' prefix stripped → 'admin'. But
UserRepository.upsert stores env-admin as 'user:admin' (JWT sub format).
The resulting FK mismatch manifests as 500 'alert_rules_created_by_fkey'
on any create operation in a fresh docker stack.

Workaround: run-once 'cameleer-seed' compose service runs psql against
deploy/docker/postgres-init.sql after the server is healthy (i.e. after
Flyway migrations have created tenant_default.users), inserting
user_id='admin' idempotently. The root-cause fix belongs in the backend
(either stop stripping the prefix in alerting/outbound controllers, or
normalise storage to the unprefixed form) and is out of scope for
Plan 03.
2026-04-20 16:18:07 +02:00
hsiegeln
bcde6678b8 fix(ui/alerts): align RouteMetric metric enum with backend; pre-populate ROUTE_METRIC defaults
- RouteMetricForm dropped P95_LATENCY_MS — not in cameleer-server-core
  RouteMetric enum (valid: ERROR_RATE, P99_LATENCY_MS, AVG_DURATION_MS,
  THROUGHPUT, ERROR_COUNT).
- initialForm now returns a ready-to-save ROUTE_METRIC condition
  (metric=ERROR_RATE, comparator=GT, threshold=0.05, windowSeconds=300),
  so clicking through the wizard with all defaults produces a valid rule.
  Prevents a 400 'missing type id property kind' + 400 on condition enum
  validation if the user leaves the condition step untouched.
2026-04-20 16:17:59 +02:00
hsiegeln
5edf7eb23a fix(alerting): @Autowired on AlertingMetrics production constructor
Task 29's refactor added a package-private test-friendly constructor
alongside the public production one. Without @Autowired Spring cannot pick
which constructor to use for the @Component, and falls back to searching
for a no-arg default — crashing startup with 'No default constructor found'.

Detected when launching the server via the new docker-compose stack; unit
tests still pass because they invoke the package-private test constructor
directly.
2026-04-20 16:02:48 +02:00
hsiegeln
1ed2d3a611 chore(docker): full-stack docker-compose mirroring deploy/ k8s manifests
Mirrors the k8s manifests in deploy/ as a local dev stack:
  - cameleer-postgres   (matches deploy/cameleer-postgres.yaml)
  - cameleer-clickhouse (matches deploy/cameleer-clickhouse.yaml, default CLICKHOUSE_DB=cameleer)
  - cameleer-server     (built from Dockerfile, env mirrors deploy/base/server.yaml)
  - cameleer-ui         (built from ui/Dockerfile, served on host :8080 to leave :5173 free for Vite dev)

Dockerfile + ui/Dockerfile: REGISTRY_TOKEN is now optional (empty → skip Maven/npm auth).
cameleer-common package is public, so anonymous pulls succeed; private packages still require the token.

Backend defaults tuned for local E2E:
  - RUNTIME_ENABLED=false (no Docker-in-Docker deployments in dev stack)
  - OUTBOUND_HTTP_ALLOW_PRIVATE_TARGETS=true (so webhook tests can target host.docker.internal etc.)
  - UIUSER/UIPASSWORD=admin/admin (matches Playwright E2E_ADMIN_USER/PASS defaults)
  - CORS includes both :5173 (Vite) and :8080 (nginx)
2026-04-20 15:52:24 +02:00
hsiegeln
f75ee9f352 docs(alerting): UI map + admin-guide walkthrough for Plan 03
.claude/rules/ui.md now maps every Plan 03 UI surface. Admin guide gains
an inbox/rules/silences walkthrough so ops teams can start in the UI
without reading the spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:55:36 +02:00
hsiegeln
9f109b20fd perf(alerting): 30s TTL cache on AlertingMetrics gauge suppliers
Prometheus scrapes can fire every few seconds. The open-alerts / open-rules
gauges query Postgres on each read — caching the values for 30s amortises
that to one query per half-minute. Addresses final-review NIT from Plan 02.

- Introduces a package-private TtlCache that wraps a Supplier<Long> and
  memoises the last read for a configurable Duration against a Supplier<Instant>
  clock.
- Wraps each gauge supplier (alerting_rules_total{enabled|disabled},
  alerting_instances_total{state}) in its own TtlCache.
- Adds a test-friendly constructor (package-private) taking explicit
  Duration + Supplier<Instant> so AlertingMetricsCachingTest can advance
  a fake clock without waiting wall-clock time.
- Adds AlertingMetricsCachingTest covering:
  * supplier invoked once per TTL across repeated scrapes
  * 29 s elapsed → still cached; 31 s elapsed → re-queried
  * gauge value reflects the cached result even after delegate mutates

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:22:54 +02:00
hsiegeln
5ebc729b82 feat(alerting): SSRF guard on outbound connection URL
Rejects webhook URLs that resolve to loopback, link-local, or RFC-1918
private ranges (IPv4 + IPv6 ULA fc00::/7). Enforced on both create and
update in OutboundConnectionServiceImpl before persistence; returns 400
Bad Request with "private or loopback" in the body.

Bypass via `cameleer.server.outbound-http.allow-private-targets=true`
for dev environments where webhooks legitimately point at local
services. Production default is `false`.

Test profile sets the flag to `true` in application-test.yml so the
existing ITs that post webhooks to WireMock on https://localhost:PORT
keep working. A dedicated OutboundConnectionSsrfIT overrides the flag
back to false (via @TestPropertySource + @DirtiesContext) to exercise
the reject path end-to-end through the admin controller.

Plan 01 scope; required before SaaS exposure (spec §17).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:17:44 +02:00
hsiegeln
f4c2cb120b feat(ui/alerts): CMD-K sources for alerts + alert rules
Extends operationalSearchData with open alerts (FIRING|ACKNOWLEDGED) and
all rules. Badges convey severity + state. Selecting an alert navigates to
/alerts/inbox/{id}; a rule navigates to /alerts/rules/{id}. Uses the
existing CommandPalette extension point — no new registry.
2026-04-20 14:09:39 +02:00
hsiegeln
8689643e11 feat(ui/alerts): SilencesPage with matcher-based create + end-early action
Matcher accepts ruleId and/or appSlug. Server enforces endsAt > startsAt
(V12 CHECK constraint) and matcher_matches() at dispatch time (spec §7).
2026-04-20 14:08:27 +02:00
20 changed files with 1109 additions and 42 deletions

View File

@@ -34,6 +34,28 @@ The UI has 4 main tabs: **Exchanges**, **Dashboard**, **Runtime**, **Deployments
- `ui/src/hooks/useInfiniteStream.ts` — tanstack `useInfiniteQuery` wrapper with top-gated auto-refetch, flattened `items[]`, and `refresh()` invalidator
- `ui/src/components/InfiniteScrollArea.tsx` — scrollable container with IntersectionObserver top/bottom sentinels. Streaming log/event views use this + `useInfiniteStream`. Bounded views (LogTab, StartupLogPanel) keep `useLogs`/`useStartupLogs`
## Alerts
- **Sidebar section** (`buildAlertsTreeNodes` in `ui/src/components/sidebar-utils.ts`) — Inbox, All, Rules, Silences, History.
- **Routes** in `ui/src/router.tsx`: `/alerts`, `/alerts/inbox`, `/alerts/all`, `/alerts/history`, `/alerts/rules`, `/alerts/rules/new`, `/alerts/rules/:id`, `/alerts/silences`.
- **Pages** under `ui/src/pages/Alerts/`:
- `InboxPage.tsx` — user-targeted FIRING/ACK'd alerts with bulk-read.
- `AllAlertsPage.tsx` — env-wide list with state-chip filter.
- `HistoryPage.tsx` — RESOLVED alerts.
- `RulesListPage.tsx` — CRUD + enable/disable toggle + env-promotion dropdown (pure UI prefill, no new endpoint).
- `RuleEditor/RuleEditorWizard.tsx` — 5-step wizard (Scope / Condition / Trigger / Notify / Review). `form-state.ts` is the single source of truth (`initialForm` / `toRequest` / `validateStep`). Six condition-form subcomponents under `RuleEditor/condition-forms/`.
- `SilencesPage.tsx` — matcher-based create + end-early.
- `AlertRow.tsx` shared list row; `alerts-page.module.css` shared styling.
- **Components**:
- `NotificationBell.tsx` — polls `/alerts/unread-count` every 30 s (paused when tab hidden via TanStack Query `refetchIntervalInBackground: false`).
- `AlertStateChip.tsx`, `SeverityBadge.tsx` — shared state/severity indicators.
- `MustacheEditor/` — CodeMirror 6 editor with variable autocomplete + inline linter. Shared between rule title/message, webhook body/header overrides, and (future) Admin Outbound Connection editor (reduced-context mode for URL).
- `MustacheEditor/alert-variables.ts` — variable registry aligned with `NotificationContextBuilder.java`. Add new leaves here whenever the backend context grows.
- **API queries** under `ui/src/api/queries/`: `alerts.ts`, `alertRules.ts`, `alertSilences.ts`, `alertNotifications.ts`, `alertMeta.ts`. All env-scoped via `useSelectedEnv` from `alertMeta`.
- **CMD-K**: `buildAlertSearchData` in `LayoutShell.tsx` registers `alert` and `alertRule` result categories. Badges convey severity + state. Palette navigates directly to the deep-link path — no sidebar-reveal state for alerts.
- **Sidebar accordion**: entering `/alerts/*` collapses Applications + Admin + Starred (mirrors Admin accordion).
- **Top-nav**: `<NotificationBell />` is the first child of `<TopBar>`, sitting alongside `SearchTrigger` + status `ButtonGroup` + `TimeRangeDropdown` + `AutoRefreshToggle`.
## UI Styling
- Always use `@cameleer/design-system` CSS variables for colors (`var(--amber)`, `var(--error)`, `var(--success)`, etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG `fill`/`stroke` attributes. SVG presentation attributes resolve `var()` correctly. All colors use CSS variables (no hardcoded hex).

View File

@@ -1,10 +1,14 @@
FROM --platform=$BUILDPLATFORM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /build
# Configure Gitea Maven Registry for cameleer-common dependency
ARG REGISTRY_TOKEN
RUN mkdir -p ~/.m2 && \
echo '<settings><servers><server><id>gitea</id><username>cameleer</username><password>'${REGISTRY_TOKEN}'</password></server></servers></settings>' > ~/.m2/settings.xml
# Optional auth for Gitea Maven Registry. The `cameleer/cameleer-common` package
# is published publicly, so empty token → anonymous pull (no settings.xml).
# Private packages require a non-empty token.
ARG REGISTRY_TOKEN=""
RUN if [ -n "$REGISTRY_TOKEN" ]; then \
mkdir -p ~/.m2 && \
printf '<settings><servers><server><id>gitea</id><username>cameleer</username><password>%s</password></server></servers></settings>\n' "$REGISTRY_TOKEN" > ~/.m2/settings.xml; \
fi
COPY pom.xml .
COPY cameleer-server-core/pom.xml cameleer-server-core/

View File

@@ -9,12 +9,20 @@ import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Component;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.EnumMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.function.Supplier;
/**
* Micrometer-based metrics for the alerting subsystem.
@@ -30,10 +38,11 @@ import java.util.concurrent.ConcurrentMap;
* <li>{@code alerting_eval_duration_seconds{kind}} — per-kind evaluation latency</li>
* <li>{@code alerting_webhook_delivery_duration_seconds} — webhook POST latency</li>
* </ul>
* Gauges (read from PostgreSQL on each scrape; low scrape frequency = low DB load):
* Gauges (read from PostgreSQL, cached for {@link #DEFAULT_GAUGE_TTL} to amortise
* Prometheus scrapes that may fire every few seconds):
* <ul>
* <li>{@code alerting_rules_total{state=enabled|disabled}} — rule counts from {@code alert_rules}</li>
* <li>{@code alerting_instances_total{state,severity}} — instance counts grouped from {@code alert_instances}</li>
* <li>{@code alerting_instances_total{state}} — instance counts grouped from {@code alert_instances}</li>
* </ul>
*/
@Component
@@ -41,11 +50,13 @@ public class AlertingMetrics {
private static final Logger log = LoggerFactory.getLogger(AlertingMetrics.class);
/** Default time-to-live for the gauge-supplier caches. */
static final Duration DEFAULT_GAUGE_TTL = Duration.ofSeconds(30);
private final MeterRegistry registry;
private final JdbcTemplate jdbc;
// Cached counters per kind (lazy-initialized)
private final ConcurrentMap<String, Counter> evalErrorCounters = new ConcurrentHashMap<>();
private final ConcurrentMap<String, Counter> evalErrorCounters = new ConcurrentHashMap<>();
private final ConcurrentMap<String, Counter> circuitOpenCounters = new ConcurrentHashMap<>();
private final ConcurrentMap<String, Timer> evalDurationTimers = new ConcurrentHashMap<>();
@@ -55,33 +66,81 @@ public class AlertingMetrics {
// Shared delivery timer
private final Timer webhookDeliveryTimer;
// TTL-cached gauge suppliers registered so tests can force a read cycle.
private final TtlCache enabledRulesCache;
private final TtlCache disabledRulesCache;
private final Map<AlertState, TtlCache> instancesByStateCaches;
/**
* Production constructor: wraps the Postgres-backed gauge suppliers in a
* 30-second TTL cache so Prometheus scrapes don't cause per-scrape DB queries.
*/
@Autowired
public AlertingMetrics(MeterRegistry registry, JdbcTemplate jdbc) {
this(registry,
() -> countRules(jdbc, true),
() -> countRules(jdbc, false),
state -> countInstances(jdbc, state),
DEFAULT_GAUGE_TTL,
Instant::now);
}
/**
* Test-friendly constructor accepting the three gauge suppliers that are
* exercised in the {@link AlertingMetricsCachingTest} plan sketch. The
* {@code instancesSupplier} is used for every {@link AlertState}.
*/
AlertingMetrics(MeterRegistry registry,
Supplier<Long> enabledRulesSupplier,
Supplier<Long> disabledRulesSupplier,
Supplier<Long> instancesSupplier,
Duration gaugeTtl,
Supplier<Instant> clock) {
this(registry,
enabledRulesSupplier,
disabledRulesSupplier,
state -> instancesSupplier.get(),
gaugeTtl,
clock);
}
/**
* Core constructor: accepts per-state instance supplier so production can
* query PostgreSQL with a different value per {@link AlertState}.
*/
private AlertingMetrics(MeterRegistry registry,
Supplier<Long> enabledRulesSupplier,
Supplier<Long> disabledRulesSupplier,
java.util.function.Function<AlertState, Long> instancesSupplier,
Duration gaugeTtl,
Supplier<Instant> clock) {
this.registry = registry;
this.jdbc = jdbc;
// ── Static timers ───────────────────────────────────────────────
this.webhookDeliveryTimer = Timer.builder("alerting_webhook_delivery_duration_seconds")
.description("Latency of outbound webhook POST requests")
.register(registry);
// ── Gauge: rules by enabled/disabled ────────────────────────────
Gauge.builder("alerting_rules_total", this, m -> m.countRules(true))
// ── Gauge: rules by enabled/disabled (cached) ───────────────────
this.enabledRulesCache = new TtlCache(enabledRulesSupplier, gaugeTtl, clock);
this.disabledRulesCache = new TtlCache(disabledRulesSupplier, gaugeTtl, clock);
Gauge.builder("alerting_rules_total", enabledRulesCache, TtlCache::getAsDouble)
.tag("state", "enabled")
.description("Number of enabled alert rules")
.register(registry);
Gauge.builder("alerting_rules_total", this, m -> m.countRules(false))
Gauge.builder("alerting_rules_total", disabledRulesCache, TtlCache::getAsDouble)
.tag("state", "disabled")
.description("Number of disabled alert rules")
.register(registry);
// ── Gauges: alert instances by state × severity ─────────────────
// ── Gauges: alert instances by state (cached) ───────────────────
this.instancesByStateCaches = new EnumMap<>(AlertState.class);
for (AlertState state : AlertState.values()) {
// Capture state as effectively-final for lambda
AlertState capturedState = state;
// We register one gauge per state (summed across severities) for simplicity;
// per-severity breakdown would require a dynamic MultiGauge.
Gauge.builder("alerting_instances_total", this,
m -> m.countInstances(capturedState))
AlertState captured = state;
TtlCache cache = new TtlCache(() -> instancesSupplier.apply(captured), gaugeTtl, clock);
this.instancesByStateCaches.put(state, cache);
Gauge.builder("alerting_instances_total", cache, TtlCache::getAsDouble)
.tag("state", state.name().toLowerCase())
.description("Number of alert instances by state")
.register(registry);
@@ -148,28 +207,73 @@ public class AlertingMetrics {
.increment();
}
// ── Gauge suppliers (called on each Prometheus scrape) ──────────────
private double countRules(boolean enabled) {
try {
Long count = jdbc.queryForObject(
"SELECT COUNT(*) FROM alert_rules WHERE enabled = ?", Long.class, enabled);
return count == null ? 0.0 : count.doubleValue();
} catch (Exception e) {
log.debug("alerting_rules gauge query failed: {}", e.getMessage());
return 0.0;
/**
* Force a read of every TTL-cached gauge supplier. Used by tests to simulate
* a Prometheus scrape without needing a real registry scrape pipeline.
*/
void snapshotAllGauges() {
List<TtlCache> all = new ArrayList<>();
all.add(enabledRulesCache);
all.add(disabledRulesCache);
all.addAll(instancesByStateCaches.values());
for (TtlCache c : all) {
c.getAsDouble();
}
}
private double countInstances(AlertState state) {
// ── Gauge suppliers (queried at most once per TTL) ──────────────────
private static long countRules(JdbcTemplate jdbc, boolean enabled) {
try {
Long count = jdbc.queryForObject(
"SELECT COUNT(*) FROM alert_rules WHERE enabled = ?", Long.class, enabled);
return count == null ? 0L : count;
} catch (Exception e) {
log.debug("alerting_rules gauge query failed: {}", e.getMessage());
return 0L;
}
}
private static long countInstances(JdbcTemplate jdbc, AlertState state) {
try {
Long count = jdbc.queryForObject(
"SELECT COUNT(*) FROM alert_instances WHERE state = ?::alert_state_enum",
Long.class, state.name());
return count == null ? 0.0 : count.doubleValue();
return count == null ? 0L : count;
} catch (Exception e) {
log.debug("alerting_instances gauge query failed: {}", e.getMessage());
return 0.0;
return 0L;
}
}
/**
* Lightweight TTL cache around a {@code Supplier<Long>}. Every call to
* {@link #getAsDouble()} either returns the cached value (if {@code clock.get()
* - lastRead < ttl}) or invokes the delegate and refreshes the cache.
*
* <p>Used to amortise Postgres queries behind Prometheus gauges over a
* 30-second TTL (see {@link AlertingMetrics#DEFAULT_GAUGE_TTL}).
*/
static final class TtlCache {
private final Supplier<Long> delegate;
private final Duration ttl;
private final Supplier<Instant> clock;
private volatile Instant lastRead = Instant.MIN;
private volatile long cached = 0L;
TtlCache(Supplier<Long> delegate, Duration ttl, Supplier<Instant> clock) {
this.delegate = delegate;
this.ttl = ttl;
this.clock = clock;
}
synchronized double getAsDouble() {
Instant now = clock.get();
if (lastRead == Instant.MIN || Duration.between(lastRead, now).compareTo(ttl) >= 0) {
cached = delegate.get();
lastRead = now;
}
return cached;
}
}
}

View File

@@ -7,6 +7,8 @@ import com.cameleer.server.core.outbound.OutboundConnectionService;
import org.springframework.http.HttpStatus;
import org.springframework.web.server.ResponseStatusException;
import java.net.URI;
import java.net.URISyntaxException;
import java.time.Instant;
import java.util.List;
import java.util.UUID;
@@ -15,20 +17,24 @@ public class OutboundConnectionServiceImpl implements OutboundConnectionService
private final OutboundConnectionRepository repo;
private final AlertRuleRepository ruleRepo;
private final SsrfGuard ssrfGuard;
private final String tenantId;
public OutboundConnectionServiceImpl(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
SsrfGuard ssrfGuard,
String tenantId) {
this.repo = repo;
this.ruleRepo = ruleRepo;
this.ssrfGuard = ssrfGuard;
this.tenantId = tenantId;
}
@Override
public OutboundConnection create(OutboundConnection draft, String actingUserId) {
assertNameUnique(draft.name(), null);
validateUrl(draft.url());
OutboundConnection c = new OutboundConnection(
UUID.randomUUID(), tenantId, draft.name(), draft.description(),
draft.url(), draft.method(), draft.defaultHeaders(), draft.defaultBodyTmpl(),
@@ -46,6 +52,7 @@ public class OutboundConnectionServiceImpl implements OutboundConnectionService
if (!existing.name().equals(draft.name())) {
assertNameUnique(draft.name(), id);
}
validateUrl(draft.url());
// Narrowing allowed-envs guard: if the new draft restricts to a non-empty set of envs,
// find any envs that existed before but are absent in the draft.
@@ -107,4 +114,23 @@ public class OutboundConnectionServiceImpl implements OutboundConnectionService
}
});
}
/**
* Validate the webhook URL against SSRF pitfalls. Translates the guard's
* {@link IllegalArgumentException} into a 400 Bad Request with the guard's
* message preserved, so the client sees e.g. "private or loopback".
*/
private void validateUrl(String url) {
URI uri;
try {
uri = new URI(url);
} catch (URISyntaxException e) {
throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "Invalid URL: " + url);
}
try {
ssrfGuard.validate(uri);
} catch (IllegalArgumentException e) {
throw new ResponseStatusException(HttpStatus.BAD_REQUEST, e.getMessage(), e);
}
}
}

View File

@@ -0,0 +1,69 @@
package com.cameleer.server.app.outbound;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import java.net.Inet4Address;
import java.net.Inet6Address;
import java.net.InetAddress;
import java.net.URI;
import java.net.UnknownHostException;
/**
* Validates outbound webhook URLs against SSRF pitfalls: rejects hosts that resolve to
* loopback, link-local, or RFC-1918 private ranges (and IPv6 equivalents).
*
* Per spec §17. The `cameleer.server.outbound-http.allow-private-targets` flag bypasses
* the check for dev environments where webhooks legitimately point at local services.
*/
@Component
public class SsrfGuard {
private final boolean allowPrivate;
public SsrfGuard(
@Value("${cameleer.server.outbound-http.allow-private-targets:false}") boolean allowPrivate
) {
this.allowPrivate = allowPrivate;
}
public void validate(URI uri) {
if (allowPrivate) return;
String host = uri.getHost();
if (host == null || host.isBlank()) {
throw new IllegalArgumentException("URL must include a host: " + uri);
}
if ("localhost".equalsIgnoreCase(host)) {
throw new IllegalArgumentException("URL host resolves to private or loopback range: " + host);
}
InetAddress[] addrs;
try {
addrs = InetAddress.getAllByName(host);
} catch (UnknownHostException e) {
throw new IllegalArgumentException("URL host does not resolve: " + host, e);
}
for (InetAddress addr : addrs) {
if (isPrivate(addr)) {
throw new IllegalArgumentException("URL host resolves to private or loopback range: " + host + " -> " + addr.getHostAddress());
}
}
}
private static boolean isPrivate(InetAddress addr) {
if (addr.isLoopbackAddress()) return true;
if (addr.isLinkLocalAddress()) return true;
if (addr.isSiteLocalAddress()) return true; // 10/8, 172.16/12, 192.168/16
if (addr.isAnyLocalAddress()) return true; // 0.0.0.0, ::
if (addr instanceof Inet6Address ip6) {
byte[] raw = ip6.getAddress();
// fc00::/7 unique-local
if ((raw[0] & 0xfe) == 0xfc) return true;
}
if (addr instanceof Inet4Address ip4) {
byte[] raw = ip4.getAddress();
// 169.254.0.0/16 link-local (also matches isLinkLocalAddress but doubled-up for safety)
if ((raw[0] & 0xff) == 169 && (raw[1] & 0xff) == 254) return true;
}
return false;
}
}

View File

@@ -1,6 +1,7 @@
package com.cameleer.server.app.outbound.config;
import com.cameleer.server.app.outbound.OutboundConnectionServiceImpl;
import com.cameleer.server.app.outbound.SsrfGuard;
import com.cameleer.server.app.outbound.crypto.SecretCipher;
import com.cameleer.server.app.outbound.storage.PostgresOutboundConnectionRepository;
import com.cameleer.server.core.alerting.AlertRuleRepository;
@@ -31,7 +32,8 @@ public class OutboundBeanConfig {
public OutboundConnectionService outboundConnectionService(
OutboundConnectionRepository repo,
AlertRuleRepository ruleRepo,
SsrfGuard ssrfGuard,
@Value("${cameleer.server.tenant.id:default}") String tenantId) {
return new OutboundConnectionServiceImpl(repo, ruleRepo, tenantId);
return new OutboundConnectionServiceImpl(repo, ruleRepo, ssrfGuard, tenantId);
}
}

View File

@@ -0,0 +1,111 @@
package com.cameleer.server.app.alerting.metrics;
import com.cameleer.server.core.alerting.AlertState;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.simple.SimpleMeterRegistry;
import org.junit.jupiter.api.Test;
import java.time.Duration;
import java.time.Instant;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;
import java.util.function.Supplier;
import static org.assertj.core.api.Assertions.assertThat;
/**
* Verifies that {@link AlertingMetrics} caches gauge values for a configurable TTL,
* so that Prometheus scrapes do not cause one Postgres query per scrape.
*/
class AlertingMetricsCachingTest {
@Test
void gaugeSupplierIsCalledAtMostOncePerTtl() {
// The instances supplier is shared across every AlertState gauge, so each
// full gauge snapshot invokes it once per AlertState (one cache per state).
final int stateCount = AlertState.values().length;
AtomicInteger enabledRulesCalls = new AtomicInteger();
AtomicInteger disabledRulesCalls = new AtomicInteger();
AtomicInteger instancesCalls = new AtomicInteger();
AtomicReference<Instant> now = new AtomicReference<>(Instant.parse("2026-04-20T00:00:00Z"));
Supplier<Instant> clock = now::get;
MeterRegistry registry = new SimpleMeterRegistry();
Supplier<Long> enabledRulesSupplier = () -> { enabledRulesCalls.incrementAndGet(); return 7L; };
Supplier<Long> disabledRulesSupplier = () -> { disabledRulesCalls.incrementAndGet(); return 3L; };
Supplier<Long> instancesSupplier = () -> { instancesCalls.incrementAndGet(); return 5L; };
AlertingMetrics metrics = new AlertingMetrics(
registry,
enabledRulesSupplier,
disabledRulesSupplier,
instancesSupplier,
Duration.ofSeconds(30),
clock
);
// First scrape — each supplier invoked exactly once per gauge.
metrics.snapshotAllGauges();
assertThat(enabledRulesCalls.get()).isEqualTo(1);
assertThat(disabledRulesCalls.get()).isEqualTo(1);
assertThat(instancesCalls.get()).isEqualTo(stateCount);
// Second scrape within TTL — served from cache.
metrics.snapshotAllGauges();
assertThat(enabledRulesCalls.get()).isEqualTo(1);
assertThat(disabledRulesCalls.get()).isEqualTo(1);
assertThat(instancesCalls.get()).isEqualTo(stateCount);
// Third scrape still within TTL (29 s later) — still cached.
now.set(now.get().plusSeconds(29));
metrics.snapshotAllGauges();
assertThat(enabledRulesCalls.get()).isEqualTo(1);
assertThat(disabledRulesCalls.get()).isEqualTo(1);
assertThat(instancesCalls.get()).isEqualTo(stateCount);
// Advance past TTL — next scrape re-queries the delegate.
now.set(Instant.parse("2026-04-20T00:00:31Z"));
metrics.snapshotAllGauges();
assertThat(enabledRulesCalls.get()).isEqualTo(2);
assertThat(disabledRulesCalls.get()).isEqualTo(2);
assertThat(instancesCalls.get()).isEqualTo(stateCount * 2);
// Immediate follow-up — back in cache.
metrics.snapshotAllGauges();
assertThat(enabledRulesCalls.get()).isEqualTo(2);
assertThat(disabledRulesCalls.get()).isEqualTo(2);
assertThat(instancesCalls.get()).isEqualTo(stateCount * 2);
}
@Test
void gaugeValueReflectsCachedResult() {
AtomicReference<Long> enabledValue = new AtomicReference<>(10L);
AtomicReference<Instant> now = new AtomicReference<>(Instant.parse("2026-04-20T00:00:00Z"));
MeterRegistry registry = new SimpleMeterRegistry();
AlertingMetrics metrics = new AlertingMetrics(
registry,
enabledValue::get,
() -> 0L,
() -> 0L,
Duration.ofSeconds(30),
now::get
);
// Read once — value cached at 10.
metrics.snapshotAllGauges();
// Mutate the underlying supplier output; cache should shield it.
enabledValue.set(99L);
double cached = registry.find("alerting_rules_total").tag("state", "enabled").gauge().value();
assertThat(cached).isEqualTo(10.0);
// After TTL, new value surfaces.
now.set(now.get().plusSeconds(31));
metrics.snapshotAllGauges();
double refreshed = registry.find("alerting_rules_total").tag("state", "enabled").gauge().value();
assertThat(refreshed).isEqualTo(99.0);
}
}

View File

@@ -0,0 +1,73 @@
package com.cameleer.server.app.outbound;
import org.junit.jupiter.api.Test;
import java.net.URI;
import java.util.Set;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
class SsrfGuardTest {
private final SsrfGuard guard = new SsrfGuard(false); // allow-private disabled by default
@Test
void rejectsLoopbackIpv4() {
assertThatThrownBy(() -> guard.validate(URI.create("https://127.0.0.1/webhook")))
.isInstanceOf(IllegalArgumentException.class)
.hasMessageContaining("private or loopback");
}
@Test
void rejectsLocalhostHostname() {
assertThatThrownBy(() -> guard.validate(URI.create("https://localhost:8080/x")))
.isInstanceOf(IllegalArgumentException.class);
}
@Test
void rejectsRfc1918Ranges() {
for (String url : Set.of(
"https://10.0.0.1/x",
"https://172.16.5.6/x",
"https://192.168.1.1/x"
)) {
assertThatThrownBy(() -> guard.validate(URI.create(url)))
.as(url)
.isInstanceOf(IllegalArgumentException.class);
}
}
@Test
void rejectsLinkLocal() {
assertThatThrownBy(() -> guard.validate(URI.create("https://169.254.169.254/latest/meta-data/")))
.isInstanceOf(IllegalArgumentException.class);
}
@Test
void rejectsIpv6Loopback() {
assertThatThrownBy(() -> guard.validate(URI.create("https://[::1]/x")))
.isInstanceOf(IllegalArgumentException.class);
}
@Test
void rejectsIpv6UniqueLocal() {
assertThatThrownBy(() -> guard.validate(URI.create("https://[fc00::1]/x")))
.isInstanceOf(IllegalArgumentException.class);
}
@Test
void acceptsPublicHttps() {
// DNS resolution happens inside validate(); this test relies on a public hostname.
// Use a literal public IP to avoid network flakiness.
// 8.8.8.8 is a public Google DNS IP — not in any private range.
assertThat(new SsrfGuard(false)).isNotNull();
guard.validate(URI.create("https://8.8.8.8/")); // does not throw
}
@Test
void allowPrivateFlagBypassesCheck() {
SsrfGuard permissive = new SsrfGuard(true);
permissive.validate(URI.create("https://127.0.0.1/")); // must not throw
}
}

View File

@@ -0,0 +1,67 @@
package com.cameleer.server.app.outbound.controller;
import com.cameleer.server.app.AbstractPostgresIT;
import com.cameleer.server.app.TestSecurityHelper;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.web.client.TestRestTemplate;
import org.springframework.http.HttpEntity;
import org.springframework.http.HttpMethod;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.test.annotation.DirtiesContext;
import org.springframework.test.context.TestPropertySource;
import static org.assertj.core.api.Assertions.assertThat;
/**
* Dedicated IT that overrides the test-profile default `allow-private-targets=true`
* back to `false` so the SSRF guard's production behavior (reject loopback) is
* exercised end-to-end through the admin controller.
*
* Uses {@link DirtiesContext} to avoid polluting the shared context used by the
* other ITs which rely on the flag being `true` to hit WireMock on localhost.
*/
@TestPropertySource(properties = "cameleer.server.outbound-http.allow-private-targets=false")
@DirtiesContext
class OutboundConnectionSsrfIT extends AbstractPostgresIT {
@Autowired private TestRestTemplate restTemplate;
@Autowired private TestSecurityHelper securityHelper;
private String adminJwt;
@BeforeEach
void setUp() {
adminJwt = securityHelper.adminToken();
// Seed admin user row since users(user_id) is an FK target.
jdbcTemplate.update(
"INSERT INTO users (user_id, provider, email, display_name) VALUES (?, 'test', ?, ?) ON CONFLICT (user_id) DO NOTHING",
"test-admin", "test-admin@example.com", "test-admin");
jdbcTemplate.update("DELETE FROM outbound_connections WHERE tenant_id = 'default'");
}
@AfterEach
void cleanup() {
jdbcTemplate.update("DELETE FROM outbound_connections WHERE tenant_id = 'default'");
jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-admin'");
}
@Test
void rejectsLoopbackUrlOnCreate() {
String body = """
{"name":"evil","url":"https://127.0.0.1/abuse","method":"POST",
"tlsTrustMode":"SYSTEM_DEFAULT","auth":{}}""";
ResponseEntity<String> resp = restTemplate.exchange(
"/api/v1/admin/outbound-connections", HttpMethod.POST,
new HttpEntity<>(body, securityHelper.authHeaders(adminJwt)),
String.class);
assertThat(resp.getStatusCode()).isEqualTo(HttpStatus.BAD_REQUEST);
assertThat(resp.getBody()).isNotNull();
assertThat(resp.getBody()).contains("private or loopback");
}
}

View File

@@ -17,3 +17,5 @@ cameleer:
bootstraptokenprevious: old-bootstrap-token
infrastructureendpoints: true
jwtsecret: test-jwt-secret-for-integration-tests-only
outbound-http:
allow-private-targets: true

View File

@@ -0,0 +1,41 @@
-- Dev-stack seed: pre-create the `admin` user row without the `user:` prefix.
--
-- Why: the UI login controller stores the local admin as `user_id='user:admin'`
-- (JWT `sub` format), but the alerting + outbound controllers resolve the FK
-- via `authentication.name` with the `user:` prefix stripped, i.e. `admin`.
-- In k8s these controllers happily insert `admin` because production admins are
-- provisioned through the admin API with unprefixed user_ids. In the local
-- docker stack there's no such provisioning step, so the FK check fails with
-- "alert_rules_created_by_fkey violation" on the first rule create.
--
-- Seeding a row with `user_id='admin'` here bridges the gap so E2E smokes,
-- API probes, and manual dev sessions can create alerting rows straight away.
-- Flyway owns the schema in tenant_default; this script only INSERTs idempotently
-- and is gated on the schema existing.
DO $$
DECLARE
schema_exists bool;
table_exists bool;
BEGIN
SELECT EXISTS(
SELECT 1 FROM information_schema.schemata WHERE schema_name = 'tenant_default'
) INTO schema_exists;
IF NOT schema_exists THEN
RAISE NOTICE 'tenant_default schema not yet migrated — skipping admin seed (Flyway will run on server start)';
RETURN;
END IF;
SELECT EXISTS(
SELECT 1 FROM information_schema.tables
WHERE table_schema = 'tenant_default' AND table_name = 'users'
) INTO table_exists;
IF NOT table_exists THEN
RAISE NOTICE 'tenant_default.users not yet migrated — skipping admin seed';
RETURN;
END IF;
INSERT INTO tenant_default.users (user_id, provider, email, display_name)
VALUES ('admin', 'local', '', 'admin')
ON CONFLICT (user_id) DO NOTHING;
END $$;

View File

@@ -1,6 +1,24 @@
##
## Local development + E2E stack. Mirrors the k8s manifests in deploy/ :
## - cameleer-postgres (PG for RBAC/config/audit/alerting — Flyway migrates on server start)
## - cameleer-clickhouse (OLAP for executions/logs/metrics/stats/diagrams)
## - cameleer-server (Spring Boot backend; built from this repo's Dockerfile)
## - cameleer-ui (nginx-served SPA; built from ui/Dockerfile)
##
## Usage:
## docker compose up -d --build # full stack, detached
## docker compose up -d cameleer-postgres cameleer-clickhouse # infra only (dev via mvn/vite)
## docker compose down -v # stop + remove volumes
##
## Defaults match `application.yml` and the k8s base manifests. Production
## k8s still owns the source of truth; this compose is for local iteration
## and Playwright E2E. Secrets are non-sensitive dev placeholders.
##
services:
cameleer-postgres:
image: postgres:16
container_name: cameleer-postgres
ports:
- "5432:5432"
environment:
@@ -8,7 +26,129 @@ services:
POSTGRES_USER: cameleer
POSTGRES_PASSWORD: cameleer_dev
volumes:
- cameleer-pgdata:/home/postgres/pgdata/data
- cameleer-pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U cameleer -d cameleer"]
interval: 5s
timeout: 3s
retries: 20
restart: unless-stopped
cameleer-clickhouse:
image: clickhouse/clickhouse-server:24.12
container_name: cameleer-clickhouse
ports:
- "8123:8123"
- "9000:9000"
environment:
CLICKHOUSE_DB: cameleer
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: ""
# Allow the default user to manage access (matches k8s StatefulSet env)
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: "1"
ulimits:
nofile:
soft: 262144
hard: 262144
volumes:
- cameleer-chdata:/var/lib/clickhouse
healthcheck:
# wget-less image: use clickhouse-client's ping equivalent
test: ["CMD-SHELL", "clickhouse-client --query 'SELECT 1' || exit 1"]
interval: 5s
timeout: 3s
retries: 20
restart: unless-stopped
cameleer-server:
build:
context: .
dockerfile: Dockerfile
args:
# Public cameleer-common package — token optional. Override with
# REGISTRY_TOKEN=... in the shell env if you need a private package.
REGISTRY_TOKEN: ${REGISTRY_TOKEN:-}
container_name: cameleer-server
ports:
- "8081:8081"
environment:
SPRING_DATASOURCE_URL: jdbc:postgresql://cameleer-postgres:5432/cameleer?currentSchema=tenant_default&ApplicationName=tenant_default
SPRING_DATASOURCE_USERNAME: cameleer
SPRING_DATASOURCE_PASSWORD: cameleer_dev
SPRING_FLYWAY_USER: cameleer
SPRING_FLYWAY_PASSWORD: cameleer_dev
CAMELEER_SERVER_CLICKHOUSE_URL: jdbc:clickhouse://cameleer-clickhouse:8123/cameleer
CAMELEER_SERVER_CLICKHOUSE_USERNAME: default
CAMELEER_SERVER_CLICKHOUSE_PASSWORD: ""
# Auth / UI credentials — dev defaults; change before exposing the port.
CAMELEER_SERVER_SECURITY_UIUSER: admin
CAMELEER_SERVER_SECURITY_UIPASSWORD: admin
CAMELEER_SERVER_SECURITY_UIORIGIN: http://localhost:5173
CAMELEER_SERVER_SECURITY_CORSALLOWEDORIGINS: http://localhost:5173,http://localhost:8080
CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKEN: dev-bootstrap-token-for-local-agent-registration
CAMELEER_SERVER_SECURITY_JWTSECRET: dev-jwt-secret-32-bytes-min-0123456789abcdef0123456789abcdef
# Runtime (Docker-in-Docker deployment) disabled for local stack
CAMELEER_SERVER_RUNTIME_ENABLED: "false"
CAMELEER_SERVER_TENANT_ID: default
# SSRF guard: allow private targets for dev (Playwright + local webhooks)
CAMELEER_SERVER_OUTBOUND_HTTP_ALLOW_PRIVATE_TARGETS: "true"
depends_on:
cameleer-postgres:
condition: service_healthy
cameleer-clickhouse:
condition: service_healthy
healthcheck:
# JRE image has wget; /api/v1/health is Actuator + Spring managed endpoint
test: ["CMD-SHELL", "wget -qO- http://localhost:8081/api/v1/health > /dev/null || exit 1"]
interval: 10s
timeout: 5s
retries: 12
start_period: 90s
restart: unless-stopped
cameleer-ui:
build:
context: ./ui
dockerfile: Dockerfile
args:
REGISTRY_TOKEN: ${REGISTRY_TOKEN:-}
container_name: cameleer-ui
# Host :8080 — Vite dev server (npm run dev:local) keeps :5173 for local iteration.
ports:
- "8080:80"
environment:
# nginx proxies /api → CAMELEER_API_URL
CAMELEER_API_URL: http://cameleer-server:8081
BASE_PATH: /
depends_on:
cameleer-server:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost/healthz > /dev/null || exit 1"]
interval: 5s
timeout: 3s
retries: 10
restart: unless-stopped
# Run-once seeder: waits for the server to be healthy (i.e. Flyway migrations
# finished) and inserts a `user_id='admin'` row (without the `user:` prefix)
# so alerting-controller FKs succeed. See deploy/docker/postgres-init.sql for
# the full rationale. Idempotent — exits 0 if the row already exists.
cameleer-seed:
image: postgres:16
container_name: cameleer-seed
depends_on:
cameleer-server:
condition: service_healthy
environment:
PGPASSWORD: cameleer_dev
volumes:
- ./deploy/docker/postgres-init.sql:/seed.sql:ro
entrypoint: ["sh", "-c"]
command:
- "psql -h cameleer-postgres -U cameleer -d cameleer -v ON_ERROR_STOP=1 -f /seed.sql"
restart: "no"
volumes:
cameleer-pgdata:
cameleer-chdata:

View File

@@ -307,3 +307,54 @@ Check `GET /api/v1/environments/{envSlug}/alerts/{id}/notifications` for respons
### ClickHouse projections
The `LOG_PATTERN` and `EXCHANGE_MATCH` evaluators use ClickHouse projections (`logs_by_level`, `executions_by_status`). On fresh ClickHouse containers (e.g. Testcontainers), projections may not be active immediately — the evaluator falls back to a full table scan with the same WHERE clause, so correctness is preserved but latency may increase on first evaluation. In production ClickHouse, projections are applied to new data immediately and to existing data after `OPTIMIZE TABLE … FINAL`.
---
## UI walkthrough
The alerting UI is accessible to any authenticated VIEWER+; writing actions (create rule, silence, ack) require OPERATOR+ per backend RBAC.
### Sidebar
A dedicated **Alerts** section between Applications and Admin:
- **Inbox** — open alerts targeted at you (state FIRING or ACKNOWLEDGED). Mark individual rows as read by clicking the title, or "Mark all read" via the toolbar. Firing rows have an amber left border.
- **All** — every open alert in the environment with state-chip filter (Open / Firing / Acked / All).
- **Rules** — the rule catalogue. Toggle the Enabled switch to disable a rule without deleting it. Delete prompts for confirmation; fired instances survive via `rule_snapshot`.
- **Silences** — active + scheduled silences. Create one by filling any combination of `ruleId` and `appSlug`, duration (hours), optional reason.
- **History** — RESOLVED alerts within the retention window (default 90 days).
### Notification bell
A bell icon in the top bar polls `/alerts/unread-count` every 30 seconds (paused when the tab is hidden). Clicking it navigates to the inbox.
### Rule editor (5-step wizard)
1. **Scope** — name, severity, and radio between environment-wide, single-app, single-route, or single-agent.
2. **Condition** — one of six condition kinds (ROUTE_METRIC, EXCHANGE_MATCH, AGENT_STATE, DEPLOYMENT_STATE, LOG_PATTERN, JVM_METRIC) with a form tailored to each.
3. **Trigger** — evaluation interval (≥5s), for-duration before firing (0 = fire immediately), re-notify cadence (minutes). Test-evaluate button when editing an existing rule.
4. **Notify** — notification title + message templates (Mustache with autocomplete), target users/groups/roles, webhook bindings (filtered to outbound connections allowed in the current env).
5. **Review** — summary card, enable toggle, save.
### Mustache autocomplete
Every template-editable field uses a shared CodeMirror 6 editor with variable autocomplete:
- Type `{{` to open the variable picker.
- Variables filter by condition kind (e.g. `route.*` is only shown when a route-scoped condition is selected).
- Unknown references get an amber underline at save time ("not available for this rule kind — will render as literal").
- The canonical variable list lives in `ui/src/components/MustacheEditor/alert-variables.ts` and mirrors the backend `NotificationContextBuilder`.
### Env promotion
Rules are environment-scoped. To replicate a rule in another env, open the source env's rule list and pick a target env from the **Promote to ▾** dropdown. The editor opens pre-filled with the source rule's values, with client-side warnings:
- Agent IDs are env-specific and get cleared.
- Apps that don't exist in the target env flag an "update before saving" hint.
- Outbound connections not allowed in the target env flag an "remove or pick another" hint.
No new REST endpoint — promotion is pure UI-driven create.
### CMD-K
The command palette (`Ctrl/Cmd + K`) surfaces open alerts and alert rules alongside existing apps/routes/exchanges. Select an alert to jump to its inbox detail; select a rule to open its editor.

View File

@@ -1,15 +1,19 @@
FROM --platform=$BUILDPLATFORM node:22-alpine AS build
WORKDIR /app
ARG REGISTRY_TOKEN
ARG REGISTRY_TOKEN=""
COPY package.json package-lock.json .npmrc ./
RUN echo "//gitea.siegeln.net/api/packages/cameleer/npm/:_authToken=${REGISTRY_TOKEN}" >> .npmrc && \
RUN if [ -n "$REGISTRY_TOKEN" ]; then \
echo "//gitea.siegeln.net/api/packages/cameleer/npm/:_authToken=${REGISTRY_TOKEN}" >> .npmrc; \
fi && \
npm ci
COPY . .
# Upgrade design system to latest dev snapshot (after COPY to bust Docker cache)
RUN echo "//gitea.siegeln.net/api/packages/cameleer/npm/:_authToken=${REGISTRY_TOKEN}" >> .npmrc && \
RUN if [ -n "$REGISTRY_TOKEN" ]; then \
echo "//gitea.siegeln.net/api/packages/cameleer/npm/:_authToken=${REGISTRY_TOKEN}" >> .npmrc; \
fi && \
npm install @cameleer/design-system@dev && \
rm -f .npmrc

View File

@@ -31,6 +31,8 @@ import { useAgents } from '../api/queries/agents';
import { useSearchExecutions, useAttributeKeys } from '../api/queries/executions';
import { useUsers, useGroups, useRoles } from '../api/queries/admin/rbac';
import { useEnvironments } from '../api/queries/admin/environments';
import { useAlerts } from '../api/queries/alerts';
import { useAlertRules } from '../api/queries/alertRules';
import type { UserDetail, GroupDetail, RoleDetail } from '../api/queries/admin/rbac';
import { useAuthStore, useIsAdmin, useCanControl } from '../auth/auth-store';
import { useEnvironmentStore } from '../api/environment-store';
@@ -161,6 +163,58 @@ function buildAdminSearchData(
return results;
}
function buildAlertSearchData(
alerts: any[] | undefined,
rules: any[] | undefined,
): SearchResult[] {
const results: SearchResult[] = [];
if (alerts) {
for (const a of alerts) {
results.push({
id: `alert:${a.id}`,
category: 'alert',
title: a.title ?? '(untitled)',
badges: [
{ label: a.severity, color: severityToSearchColor(a.severity) },
{ label: a.state, color: stateToSearchColor(a.state) },
],
meta: `${a.firedAt ?? ''}${a.silenced ? ' · silenced' : ''}`,
path: `/alerts/inbox/${a.id}`,
});
}
}
if (rules) {
for (const r of rules) {
results.push({
id: `rule:${r.id}`,
category: 'alertRule',
title: r.name,
badges: [
{ label: r.severity, color: severityToSearchColor(r.severity) },
{ label: r.conditionKind, color: 'auto' },
...(r.enabled ? [] : [{ label: 'DISABLED', color: 'warning' as const }]),
],
meta: `${r.evaluationIntervalSeconds}s · ${r.targets?.length ?? 0} targets`,
path: `/alerts/rules/${r.id}`,
});
}
}
return results;
}
function severityToSearchColor(s: string): string {
if (s === 'CRITICAL') return 'error';
if (s === 'WARNING') return 'warning';
return 'auto';
}
function stateToSearchColor(s: string): string {
if (s === 'FIRING') return 'error';
if (s === 'ACKNOWLEDGED') return 'warning';
if (s === 'RESOLVED') return 'success';
return 'auto';
}
function healthToSearchColor(health: string): string {
switch (health) {
case 'live': return 'success';
@@ -313,6 +367,10 @@ function LayoutContent() {
const { data: attributeKeys } = useAttributeKeys();
const { data: envRecords = [] } = useEnvironments();
// Open alerts + rules for CMD-K (env-scoped).
const { data: cmdkAlerts } = useAlerts({ state: ['FIRING', 'ACKNOWLEDGED'], limit: 100 });
const { data: cmdkRules } = useAlertRules();
// Merge environments from both the environments table and agent heartbeats
const environments: string[] = useMemo(() => {
const envSet = new Set<string>();
@@ -569,6 +627,11 @@ function LayoutContent() {
[adminUsers, adminGroups, adminRoles],
);
const alertingSearchData: SearchResult[] = useMemo(
() => buildAlertSearchData(cmdkAlerts, cmdkRules),
[cmdkAlerts, cmdkRules],
);
const operationalSearchData: SearchResult[] = useMemo(() => {
if (isAdminPage) return [];
@@ -604,8 +667,8 @@ function LayoutContent() {
}
}
return [...catalogRef.current, ...exchangeItems, ...attributeItems];
}, [isAdminPage, catalogRef.current, exchangeResults, debouncedQuery]);
return [...catalogRef.current, ...exchangeItems, ...attributeItems, ...alertingSearchData];
}, [isAdminPage, catalogRef.current, exchangeResults, debouncedQuery, alertingSearchData]);
const searchData = isAdminPage ? adminSearchData : operationalSearchData;
@@ -653,6 +716,11 @@ function LayoutContent() {
const ADMIN_TAB_MAP: Record<string, string> = { user: 'users', group: 'groups', role: 'roles' };
const handlePaletteSelect = useCallback((result: any) => {
if (result.category === 'alert' || result.category === 'alertRule') {
if (result.path) navigate(result.path);
setPaletteOpen(false);
return;
}
if (result.path) {
if (ADMIN_CATEGORIES.has(result.category)) {
const itemId = result.id.split(':').slice(1).join(':');

View File

@@ -1,9 +1,9 @@
import { FormField, Input, Select } from '@cameleer/design-system';
import type { FormState } from '../form-state';
// Mirrors cameleer-server-core RouteMetric enum — keep in sync.
const METRICS = [
{ value: 'ERROR_RATE', label: 'Error rate' },
{ value: 'P95_LATENCY_MS', label: 'P95 latency (ms)' },
{ value: 'P99_LATENCY_MS', label: 'P99 latency (ms)' },
{ value: 'AVG_DURATION_MS', label: 'Avg duration (ms)' },
{ value: 'THROUGHPUT', label: 'Throughput (msg/s)' },

View File

@@ -51,7 +51,17 @@ export function initialForm(existing?: AlertRuleResponse): FormState {
routeId: '',
agentId: '',
conditionKind: 'ROUTE_METRIC',
condition: { kind: 'ROUTE_METRIC' } as Partial<AlertCondition>,
// Pre-populate a valid ROUTE_METRIC default so a rule can be saved without
// the user needing to fill in every condition field. Values chosen to be
// sane for "error rate" alerts on almost any route.
condition: {
kind: 'ROUTE_METRIC',
scope: {},
metric: 'ERROR_RATE',
comparator: 'GT',
threshold: 0.05,
windowSeconds: 300,
} as unknown as Partial<AlertCondition>,
evaluationIntervalSeconds: 60,
forDurationSeconds: 0,
reNotifyMinutes: 60,

View File

@@ -1,3 +1,130 @@
import { useState } from 'react';
import { Button, FormField, Input, SectionHeader, useToast } from '@cameleer/design-system';
import { PageLoader } from '../../components/PageLoader';
import {
useAlertSilences,
useCreateSilence,
useDeleteSilence,
type AlertSilenceResponse,
} from '../../api/queries/alertSilences';
import sectionStyles from '../../styles/section-card.module.css';
export default function SilencesPage() {
return <div>SilencesPage coming soon</div>;
const { data, isLoading, error } = useAlertSilences();
const create = useCreateSilence();
const remove = useDeleteSilence();
const { toast } = useToast();
const [reason, setReason] = useState('');
const [matcherRuleId, setMatcherRuleId] = useState('');
const [matcherAppSlug, setMatcherAppSlug] = useState('');
const [hours, setHours] = useState(1);
if (isLoading) return <PageLoader />;
if (error) return <div>Failed to load silences: {String(error)}</div>;
const onCreate = async () => {
const now = new Date();
const endsAt = new Date(now.getTime() + hours * 3600_000);
const matcher: Record<string, string> = {};
if (matcherRuleId) matcher.ruleId = matcherRuleId;
if (matcherAppSlug) matcher.appSlug = matcherAppSlug;
if (Object.keys(matcher).length === 0) {
toast({ title: 'Silence needs at least one matcher field', variant: 'error' });
return;
}
try {
await create.mutateAsync({
matcher,
reason: reason || undefined,
startsAt: now.toISOString(),
endsAt: endsAt.toISOString(),
});
setReason('');
setMatcherRuleId('');
setMatcherAppSlug('');
setHours(1);
toast({ title: 'Silence created', variant: 'success' });
} catch (e) {
toast({ title: 'Create failed', description: String(e), variant: 'error' });
}
};
const onRemove = async (s: AlertSilenceResponse) => {
if (!confirm(`End silence early?`)) return;
try {
await remove.mutateAsync(s.id!);
toast({ title: 'Silence removed', variant: 'success' });
} catch (e) {
toast({ title: 'Remove failed', description: String(e), variant: 'error' });
}
};
const rows = data ?? [];
return (
<div style={{ padding: 16 }}>
<SectionHeader>Alert silences</SectionHeader>
<div className={sectionStyles.section} style={{ marginTop: 12 }}>
<div style={{ display: 'grid', gridTemplateColumns: 'repeat(4, 1fr) auto', gap: 8, alignItems: 'end' }}>
<FormField label="Rule ID (optional)">
<Input value={matcherRuleId} onChange={(e) => setMatcherRuleId(e.target.value)} />
</FormField>
<FormField label="App slug (optional)">
<Input value={matcherAppSlug} onChange={(e) => setMatcherAppSlug(e.target.value)} />
</FormField>
<FormField label="Duration (hours)">
<Input
type="number"
min={1}
value={hours}
onChange={(e) => setHours(Number(e.target.value))}
/>
</FormField>
<FormField label="Reason">
<Input
value={reason}
onChange={(e) => setReason(e.target.value)}
placeholder="Maintenance window"
/>
</FormField>
<Button variant="primary" size="sm" onClick={onCreate} disabled={create.isPending}>
Create silence
</Button>
</div>
</div>
<div className={sectionStyles.section} style={{ marginTop: 16 }}>
{rows.length === 0 ? (
<p>No active or scheduled silences.</p>
) : (
<table style={{ width: '100%', borderCollapse: 'collapse' }}>
<thead>
<tr>
<th style={{ textAlign: 'left' }}>Matcher</th>
<th style={{ textAlign: 'left' }}>Reason</th>
<th style={{ textAlign: 'left' }}>Starts</th>
<th style={{ textAlign: 'left' }}>Ends</th>
<th></th>
</tr>
</thead>
<tbody>
{rows.map((s) => (
<tr key={s.id}>
<td><code>{JSON.stringify(s.matcher)}</code></td>
<td>{s.reason ?? '—'}</td>
<td>{s.startsAt}</td>
<td>{s.endsAt}</td>
<td>
<Button variant="secondary" size="sm" onClick={() => onRemove(s)}>
End
</Button>
</td>
</tr>
))}
</tbody>
</table>
)}
</div>
</div>
);
}

View File

@@ -0,0 +1,107 @@
import { test, expect } from './fixtures';
/**
* Plan 03 alerting smoke suite.
*
* Covers the CRUD + navigation paths that don't require event injection:
* - sidebar → inbox
* - create + delete a rule via the 5-step wizard
* - CMD-K opens, closes cleanly
* - silence create + end-early
*
* End-to-end fire→ack→clear is covered server-side by `AlertingFullLifecycleIT`
* (Plan 02). Exercising it from the UI would require injecting executions
* into ClickHouse, which is out of scope for this smoke.
*
* Note: the design-system `SectionHeader` renders a generic element (not role=heading),
* so page headings are asserted via `getByText`.
*/
test.describe('alerting UI smoke', () => {
test('sidebar Alerts section navigates to inbox', async ({ page }) => {
// Click the Alerts sidebar section header. On navigation the accordion
// will already be expanded; the "Alerts" label is on the toggle button.
await page.getByRole('button', { name: /^(collapse|expand) alerts$/i }).first().click();
await expect(page).toHaveURL(/\/alerts\/inbox/, { timeout: 10_000 });
// Inbox page renders "Inbox" text + "Mark all read" button.
await expect(page.getByText(/^Inbox$/)).toBeVisible();
await expect(page.getByRole('button', { name: /mark all read/i })).toBeVisible();
});
test('create + delete a rule via the wizard', async ({ page }) => {
// Unique name per run so leftover rules from crashed prior runs don't
// trip the strict-mode "multiple matches" check.
const ruleName = `e2e smoke rule ${Date.now()}`;
await page.goto('/alerts/rules');
await expect(page.getByText(/^Alert rules$/)).toBeVisible();
await page.getByRole('link', { name: /new rule/i }).click();
await expect(page).toHaveURL(/\/alerts\/rules\/new/);
// Step 1 — Scope. DS FormField renders the label as a generic element
// (not `htmlFor` wired), so the textbox's accessible name is its placeholder.
await page.getByPlaceholder('Order API error rate').fill(ruleName);
await page.getByRole('button', { name: /^next$/i }).click();
// Step 2 — Condition (leave at ROUTE_METRIC default)
await page.getByRole('button', { name: /^next$/i }).click();
// Step 3 — Trigger (defaults)
await page.getByRole('button', { name: /^next$/i }).click();
// Step 4 — Notify: default title/message templates are pre-populated;
// targets/webhooks empty is OK for smoke.
await page.getByRole('button', { name: /^next$/i }).click();
// Step 5 — Review + save
await page.getByRole('button', { name: /^create rule$/i }).click();
// Land on rules list, rule appears in the table.
await expect(page).toHaveURL(/\/alerts\/rules$/, { timeout: 10_000 });
const main = page.locator('main');
await expect(main.getByRole('link', { name: ruleName })).toBeVisible({ timeout: 10_000 });
// Cleanup: delete.
page.once('dialog', (d) => d.accept());
await page
.getByRole('row', { name: new RegExp(ruleName) })
.getByRole('button', { name: /^delete$/i })
.click();
await expect(main.getByRole('link', { name: ruleName })).toHaveCount(0);
});
test('CMD-K palette opens + closes', async ({ page }) => {
await page.goto('/alerts/inbox');
// The DS CommandPalette is toggled by the SearchTrigger button in the top bar
// (accessible name "Open search"). Ctrl/Cmd+K is wired inside the DS but
// clicking the button is the deterministic path.
await page.getByRole('button', { name: /open search/i }).click();
const dialog = page.getByRole('dialog').first();
await expect(dialog).toBeVisible({ timeout: 5_000 });
await page.keyboard.press('Escape');
await expect(dialog).toBeHidden();
});
test('silence create + end-early', async ({ page }) => {
await page.goto('/alerts/silences');
await expect(page.getByText(/^Alert silences$/)).toBeVisible();
const unique = `smoke-app-${Date.now()}`;
// DS FormField labels aren't `htmlFor`-wired, so target via parent-of-label → textbox.
const form = page.locator('main');
await form.getByText(/^App slug/).locator('..').getByRole('textbox').fill(unique);
await form.getByRole('spinbutton').fill('1');
await form.getByPlaceholder('Maintenance window').fill('e2e smoke');
await page.getByRole('button', { name: /create silence/i }).click();
await expect(page.getByText(unique).first()).toBeVisible({ timeout: 10_000 });
page.once('dialog', (d) => d.accept());
await page
.getByRole('row', { name: new RegExp(unique) })
.getByRole('button', { name: /^end$/i })
.click();
await expect(page.getByText(unique)).toHaveCount(0);
});
});

View File

@@ -0,0 +1,39 @@
import { test as base, expect } from '@playwright/test';
/**
* E2E fixtures for the alerting UI smoke suite.
*
* Auth happens once per test via an auto-applied fixture. Override creds via:
* E2E_ADMIN_USER=... E2E_ADMIN_PASS=... npm run test:e2e
*
* The fixture logs in to the local form (not OIDC). The backend in the
* Docker-compose stack defaults to `admin` / `admin` for the local login.
*/
export const ADMIN_USER = process.env.E2E_ADMIN_USER ?? 'admin';
export const ADMIN_PASS = process.env.E2E_ADMIN_PASS ?? 'admin';
type Fixtures = {
loggedIn: void;
};
export const test = base.extend<Fixtures>({
loggedIn: [
async ({ page }, use) => {
// `?local` keeps the login page's auto-OIDC-redirect from firing so the
// form-based login works even when an OIDC config happens to be present.
await page.goto('/login?local');
await page.getByLabel(/username/i).fill(ADMIN_USER);
await page.getByLabel(/password/i).fill(ADMIN_PASS);
await page.getByRole('button', { name: /sign in/i }).click();
// Default landing after login is /exchanges (via Navigate redirect).
await expect(page).toHaveURL(/\/(exchanges|alerts|dashboard)/, { timeout: 15_000 });
// Env selection is required for every alerts query (useSelectedEnv gate).
// Pick the default env so hooks enable.
await page.getByRole('combobox').selectOption({ label: 'default' });
await use();
},
{ auto: true },
],
});
export { expect };