test(alerting): rewrite AlertingFullLifecycleIT — REST-driven rule creation, re-notify cadence
Rule creation now goes through POST /alerts/rules (exercises saveTargets on the
write path). Clock is replaced with @MockBean(name="alertingClock") and re-stubbed
in @BeforeEach to survive Mockito's inter-test reset. Six ordered steps:
1. seed log → tick evaluator → assert FIRING instance with non-empty targets (B-1)
2. tick dispatcher → assert DELIVERED notification + lastNotifiedAt stamped (B-2)
3. ack via REST → assert ACKNOWLEDGED state
4. create silence → inject PENDING notification → tick dispatcher → assert silenced (FAILED)
5. delete rule → assert rule_id nullified, rule_snapshot preserved (ON DELETE SET NULL)
6. new rule with reNotifyMinutes=1 → first dispatch → advance clock 61s →
evaluator sweep → second dispatch → verify 2 WireMock POSTs (B-2 cadence)
Background scheduler races addressed by resetting claimed_by/claimed_until before
each manual tick. Simulated clock set AFTER log insert to guarantee log timestamp
falls within the evaluator window. Re-notify notifications backdated in Postgres
to work around the simulated vs real clock gap in claimDueNotifications.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
12
AGENTS.md
12
AGENTS.md
@@ -1,7 +1,7 @@
|
|||||||
<!-- gitnexus:start -->
|
<!-- gitnexus:start -->
|
||||||
# GitNexus — Code Intelligence
|
# GitNexus — Code Intelligence
|
||||||
|
|
||||||
This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
|
This project is indexed by GitNexus as **alerting-02** (7810 symbols, 20082 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
|
||||||
|
|
||||||
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
|
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
|
||||||
|
|
||||||
@@ -17,7 +17,7 @@ This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892
|
|||||||
|
|
||||||
1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
|
1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
|
||||||
2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
|
2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
|
||||||
3. `READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
|
3. `READ gitnexus://repo/alerting-02/process/{processName}` — trace the full execution flow step by step
|
||||||
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
|
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
|
||||||
|
|
||||||
## When Refactoring
|
## When Refactoring
|
||||||
@@ -56,10 +56,10 @@ This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892
|
|||||||
|
|
||||||
| Resource | Use for |
|
| Resource | Use for |
|
||||||
|----------|---------|
|
|----------|---------|
|
||||||
| `gitnexus://repo/cameleer-server/context` | Codebase overview, check index freshness |
|
| `gitnexus://repo/alerting-02/context` | Codebase overview, check index freshness |
|
||||||
| `gitnexus://repo/cameleer-server/clusters` | All functional areas |
|
| `gitnexus://repo/alerting-02/clusters` | All functional areas |
|
||||||
| `gitnexus://repo/cameleer-server/processes` | All execution flows |
|
| `gitnexus://repo/alerting-02/processes` | All execution flows |
|
||||||
| `gitnexus://repo/cameleer-server/process/{name}` | Step-by-step execution trace |
|
| `gitnexus://repo/alerting-02/process/{name}` | Step-by-step execution trace |
|
||||||
|
|
||||||
## Self-Check Before Finishing
|
## Self-Check Before Finishing
|
||||||
|
|
||||||
|
|||||||
12
CLAUDE.md
12
CLAUDE.md
@@ -94,7 +94,7 @@ When adding, removing, or renaming classes, controllers, endpoints, UI component
|
|||||||
<!-- gitnexus:start -->
|
<!-- gitnexus:start -->
|
||||||
# GitNexus — Code Intelligence
|
# GitNexus — Code Intelligence
|
||||||
|
|
||||||
This project is indexed by GitNexus as **cameleer-server** (6436 symbols, 16257 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
|
This project is indexed by GitNexus as **alerting-02** (7810 symbols, 20082 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
|
||||||
|
|
||||||
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
|
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
|
||||||
|
|
||||||
@@ -110,7 +110,7 @@ This project is indexed by GitNexus as **cameleer-server** (6436 symbols, 16257
|
|||||||
|
|
||||||
1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
|
1. `gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
|
||||||
2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
|
2. `gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
|
||||||
3. `READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
|
3. `READ gitnexus://repo/alerting-02/process/{processName}` — trace the full execution flow step by step
|
||||||
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
|
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
|
||||||
|
|
||||||
## When Refactoring
|
## When Refactoring
|
||||||
@@ -149,10 +149,10 @@ This project is indexed by GitNexus as **cameleer-server** (6436 symbols, 16257
|
|||||||
|
|
||||||
| Resource | Use for |
|
| Resource | Use for |
|
||||||
|----------|---------|
|
|----------|---------|
|
||||||
| `gitnexus://repo/cameleer-server/context` | Codebase overview, check index freshness |
|
| `gitnexus://repo/alerting-02/context` | Codebase overview, check index freshness |
|
||||||
| `gitnexus://repo/cameleer-server/clusters` | All functional areas |
|
| `gitnexus://repo/alerting-02/clusters` | All functional areas |
|
||||||
| `gitnexus://repo/cameleer-server/processes` | All execution flows |
|
| `gitnexus://repo/alerting-02/processes` | All execution flows |
|
||||||
| `gitnexus://repo/cameleer-server/process/{name}` | Step-by-step execution trace |
|
| `gitnexus://repo/alerting-02/process/{name}` | Step-by-step execution trace |
|
||||||
|
|
||||||
## Self-Check Before Finishing
|
## Self-Check Before Finishing
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,10 @@
|
|||||||
package com.cameleer.server.app;
|
package com.cameleer.server.app;
|
||||||
|
|
||||||
|
import com.cameleer.server.app.search.ClickHouseSearchIndex;
|
||||||
|
import com.cameleer.server.core.agent.AgentRegistryService;
|
||||||
import org.springframework.beans.factory.annotation.Autowired;
|
import org.springframework.beans.factory.annotation.Autowired;
|
||||||
import org.springframework.boot.test.context.SpringBootTest;
|
import org.springframework.boot.test.context.SpringBootTest;
|
||||||
|
import org.springframework.boot.test.mock.mockito.MockBean;
|
||||||
import org.springframework.jdbc.core.JdbcTemplate;
|
import org.springframework.jdbc.core.JdbcTemplate;
|
||||||
import org.springframework.test.context.ActiveProfiles;
|
import org.springframework.test.context.ActiveProfiles;
|
||||||
import org.springframework.test.context.DynamicPropertyRegistry;
|
import org.springframework.test.context.DynamicPropertyRegistry;
|
||||||
@@ -14,6 +17,12 @@ import org.testcontainers.containers.PostgreSQLContainer;
|
|||||||
@ActiveProfiles("test")
|
@ActiveProfiles("test")
|
||||||
public abstract class AbstractPostgresIT {
|
public abstract class AbstractPostgresIT {
|
||||||
|
|
||||||
|
// Mocked infrastructure beans required by the full application context.
|
||||||
|
// ClickHouseSearchIndex is not available in test without explicit ClickHouse wiring,
|
||||||
|
// and AgentRegistryService requires in-memory state that tests manage directly.
|
||||||
|
@MockBean(name = "clickHouseSearchIndex") protected ClickHouseSearchIndex clickHouseSearchIndex;
|
||||||
|
@MockBean protected AgentRegistryService agentRegistryService;
|
||||||
|
|
||||||
static final PostgreSQLContainer<?> postgres;
|
static final PostgreSQLContainer<?> postgres;
|
||||||
static final ClickHouseContainer clickhouse;
|
static final ClickHouseContainer clickhouse;
|
||||||
|
|
||||||
|
|||||||
@@ -16,12 +16,16 @@ import com.github.tomakehurst.wiremock.WireMockServer;
|
|||||||
import com.github.tomakehurst.wiremock.core.WireMockConfiguration;
|
import com.github.tomakehurst.wiremock.core.WireMockConfiguration;
|
||||||
import org.junit.jupiter.api.*;
|
import org.junit.jupiter.api.*;
|
||||||
import org.junit.jupiter.api.TestInstance.Lifecycle;
|
import org.junit.jupiter.api.TestInstance.Lifecycle;
|
||||||
|
import org.mockito.Mockito;
|
||||||
import org.springframework.beans.factory.annotation.Autowired;
|
import org.springframework.beans.factory.annotation.Autowired;
|
||||||
import org.springframework.beans.factory.annotation.Value;
|
import org.springframework.beans.factory.annotation.Value;
|
||||||
|
import org.springframework.boot.test.mock.mockito.MockBean;
|
||||||
import org.springframework.boot.test.web.client.TestRestTemplate;
|
import org.springframework.boot.test.web.client.TestRestTemplate;
|
||||||
import org.springframework.http.*;
|
import org.springframework.http.*;
|
||||||
|
|
||||||
|
import java.time.Clock;
|
||||||
import java.time.Instant;
|
import java.time.Instant;
|
||||||
|
import java.time.ZoneOffset;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
import java.util.UUID;
|
import java.util.UUID;
|
||||||
@@ -32,9 +36,14 @@ import static org.assertj.core.api.Assertions.assertThat;
|
|||||||
/**
|
/**
|
||||||
* Canary integration test — exercises the full alerting lifecycle end-to-end:
|
* Canary integration test — exercises the full alerting lifecycle end-to-end:
|
||||||
* fire → notify → ack → silence → re-fire (suppressed) → resolve → rule delete.
|
* fire → notify → ack → silence → re-fire (suppressed) → resolve → rule delete.
|
||||||
|
* Also verifies the re-notification cadence (reNotifyMinutes).
|
||||||
|
*
|
||||||
|
* Rule creation is driven through the REST API (POST /alerts/rules), not raw SQL,
|
||||||
|
* so target persistence via saveTargets() is exercised on the critical path.
|
||||||
*
|
*
|
||||||
* Uses real Postgres (Testcontainers) and real ClickHouse for log seeding.
|
* Uses real Postgres (Testcontainers) and real ClickHouse for log seeding.
|
||||||
* WireMock provides the webhook target.
|
* WireMock provides the webhook target.
|
||||||
|
* Clock is replaced with a @MockBean so the re-notify test can advance time.
|
||||||
*/
|
*/
|
||||||
@TestMethodOrder(MethodOrderer.OrderAnnotation.class)
|
@TestMethodOrder(MethodOrderer.OrderAnnotation.class)
|
||||||
@TestInstance(Lifecycle.PER_CLASS)
|
@TestInstance(Lifecycle.PER_CLASS)
|
||||||
@@ -42,6 +51,9 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
|
|
||||||
// AbstractPostgresIT already declares clickHouseSearchIndex + agentRegistryService mocks.
|
// AbstractPostgresIT already declares clickHouseSearchIndex + agentRegistryService mocks.
|
||||||
|
|
||||||
|
// Replace the alertingClock bean so we can control time in re-notify test
|
||||||
|
@MockBean(name = "alertingClock") Clock alertingClock;
|
||||||
|
|
||||||
// ── Spring beans ──────────────────────────────────────────────────────────
|
// ── Spring beans ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
@Autowired private AlertEvaluatorJob evaluatorJob;
|
@Autowired private AlertEvaluatorJob evaluatorJob;
|
||||||
@@ -71,15 +83,30 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
private UUID connId;
|
private UUID connId;
|
||||||
private UUID instanceId; // filled after first FIRING
|
private UUID instanceId; // filled after first FIRING
|
||||||
|
|
||||||
|
// Current simulated clock time — starts at "now" and can be advanced
|
||||||
|
private Instant simulatedNow = Instant.now();
|
||||||
|
|
||||||
// ── Setup / teardown ──────────────────────────────────────────────────────
|
// ── Setup / teardown ──────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mockito resets @MockBean stubs between @Test methods even with PER_CLASS lifecycle.
|
||||||
|
* Re-stub the clock before every test so clock.instant() never returns null.
|
||||||
|
*/
|
||||||
|
@BeforeEach
|
||||||
|
void refreshClock() {
|
||||||
|
stubClock();
|
||||||
|
}
|
||||||
|
|
||||||
@BeforeAll
|
@BeforeAll
|
||||||
void seedFixtures() throws Exception {
|
void seedFixtures() throws Exception {
|
||||||
wm = new WireMockServer(WireMockConfiguration.options()
|
wm = new WireMockServer(WireMockConfiguration.options()
|
||||||
.httpDisabled(true)
|
.httpDisabled(true)
|
||||||
.dynamicHttpsPort());
|
.dynamicHttpsPort());
|
||||||
wm.start();
|
wm.start();
|
||||||
// ClickHouse schema is auto-initialized by ClickHouseSchemaInitializer on Spring context startup.
|
|
||||||
|
// Default clock behaviour: delegate to simulatedNow
|
||||||
|
stubClock();
|
||||||
|
|
||||||
operatorJwt = securityHelper.operatorToken();
|
operatorJwt = securityHelper.operatorToken();
|
||||||
|
|
||||||
// Seed operator user in Postgres
|
// Seed operator user in Postgres
|
||||||
@@ -111,41 +138,8 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
" 'test-operator', 'test-operator')",
|
" 'test-operator', 'test-operator')",
|
||||||
connId, tenantId, webhookUrl, hmacCiphertext);
|
connId, tenantId, webhookUrl, hmacCiphertext);
|
||||||
|
|
||||||
// Seed alert rule (LOG_PATTERN, forDurationSeconds=0, threshold=0 so >=1 log fires immediately)
|
// Create alert rule via REST API (exercises saveTargets on the write path)
|
||||||
ruleId = UUID.randomUUID();
|
ruleId = createRuleViaRestApi();
|
||||||
UUID webhookBindingId = UUID.randomUUID();
|
|
||||||
String webhooksJson = objectMapper.writeValueAsString(List.of(
|
|
||||||
Map.of("id", webhookBindingId.toString(),
|
|
||||||
"outboundConnectionId", connId.toString())));
|
|
||||||
String conditionJson = objectMapper.writeValueAsString(Map.of(
|
|
||||||
"kind", "LOG_PATTERN",
|
|
||||||
"scope", Map.of("appSlug", "lc-app"),
|
|
||||||
"level", "ERROR",
|
|
||||||
"pattern", "TimeoutException",
|
|
||||||
"threshold", 0,
|
|
||||||
"windowSeconds", 300));
|
|
||||||
|
|
||||||
jdbcTemplate.update("""
|
|
||||||
INSERT INTO alert_rules
|
|
||||||
(id, environment_id, name, severity, enabled,
|
|
||||||
condition_kind, condition,
|
|
||||||
evaluation_interval_seconds, for_duration_seconds,
|
|
||||||
notification_title_tmpl, notification_message_tmpl,
|
|
||||||
webhooks, next_evaluation_at,
|
|
||||||
created_by, updated_by)
|
|
||||||
VALUES (?, ?, 'lc-timeout-rule', 'WARNING'::severity_enum, true,
|
|
||||||
'LOG_PATTERN'::condition_kind_enum, ?::jsonb,
|
|
||||||
60, 0,
|
|
||||||
'Alert: {{rule.name}}', 'Instance {{alert.id}} fired',
|
|
||||||
?::jsonb, now() - interval '1 second',
|
|
||||||
'test-operator', 'test-operator')
|
|
||||||
""",
|
|
||||||
ruleId, envId, conditionJson, webhooksJson);
|
|
||||||
|
|
||||||
// Seed alert_rule_targets so the instance shows up in inbox
|
|
||||||
jdbcTemplate.update(
|
|
||||||
"INSERT INTO alert_rule_targets (id, rule_id, target_kind, target_id) VALUES (gen_random_uuid(), ?, 'USER'::target_kind_enum, 'test-operator') ON CONFLICT (rule_id, target_kind, target_id) DO NOTHING",
|
|
||||||
ruleId);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@AfterAll
|
@AfterAll
|
||||||
@@ -154,8 +148,8 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
jdbcTemplate.update("DELETE FROM alert_silences WHERE environment_id = ?", envId);
|
jdbcTemplate.update("DELETE FROM alert_silences WHERE environment_id = ?", envId);
|
||||||
jdbcTemplate.update("DELETE FROM alert_notifications WHERE alert_instance_id IN (SELECT id FROM alert_instances WHERE environment_id = ?)", envId);
|
jdbcTemplate.update("DELETE FROM alert_notifications WHERE alert_instance_id IN (SELECT id FROM alert_instances WHERE environment_id = ?)", envId);
|
||||||
jdbcTemplate.update("DELETE FROM alert_instances WHERE environment_id = ?", envId);
|
jdbcTemplate.update("DELETE FROM alert_instances WHERE environment_id = ?", envId);
|
||||||
jdbcTemplate.update("DELETE FROM alert_rule_targets WHERE rule_id = ?", ruleId);
|
jdbcTemplate.update("DELETE FROM alert_rule_targets WHERE rule_id IN (SELECT id FROM alert_rules WHERE environment_id = ?)", envId);
|
||||||
jdbcTemplate.update("DELETE FROM alert_rules WHERE id = ?", ruleId);
|
jdbcTemplate.update("DELETE FROM alert_rules WHERE environment_id = ?", envId);
|
||||||
jdbcTemplate.update("DELETE FROM outbound_connections WHERE id = ?", connId);
|
jdbcTemplate.update("DELETE FROM outbound_connections WHERE id = ?", connId);
|
||||||
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
|
jdbcTemplate.update("DELETE FROM environments WHERE id = ?", envId);
|
||||||
jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-operator'");
|
jdbcTemplate.update("DELETE FROM users WHERE user_id = 'test-operator'");
|
||||||
@@ -169,9 +163,27 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
// Stub WireMock to return 200
|
// Stub WireMock to return 200
|
||||||
wm.stubFor(post("/webhook").willReturn(aResponse().withStatus(200).withBody("accepted")));
|
wm.stubFor(post("/webhook").willReturn(aResponse().withStatus(200).withBody("accepted")));
|
||||||
|
|
||||||
// Seed a matching log into ClickHouse
|
// Seed a matching log into ClickHouse BEFORE capturing simulatedNow,
|
||||||
|
// so the log timestamp is guaranteed to fall inside [simulatedNow-300s, simulatedNow].
|
||||||
seedMatchingLog();
|
seedMatchingLog();
|
||||||
|
|
||||||
|
// Set simulatedNow to current wall time — the log was inserted a few ms earlier,
|
||||||
|
// so its timestamp is guaranteed <= simulatedNow within the 300s window.
|
||||||
|
setSimulatedNow(Instant.now());
|
||||||
|
|
||||||
|
// Release any claim the background scheduler may have already placed on the rule,
|
||||||
|
// and backdate next_evaluation_at so it's due again for our manual tick.
|
||||||
|
jdbcTemplate.update(
|
||||||
|
"UPDATE alert_rules SET claimed_by = NULL, claimed_until = NULL, " +
|
||||||
|
"next_evaluation_at = now() - interval '1 second' WHERE id = ?", ruleId);
|
||||||
|
|
||||||
|
// Verify rule is in DB and due (no claim outstanding)
|
||||||
|
Integer ruleCount = jdbcTemplate.queryForObject(
|
||||||
|
"SELECT count(*) FROM alert_rules WHERE id = ? AND enabled = true " +
|
||||||
|
"AND next_evaluation_at <= now() AND (claimed_until IS NULL OR claimed_until < now())",
|
||||||
|
Integer.class, ruleId);
|
||||||
|
assertThat(ruleCount).as("rule must be unclaimed and due before tick").isEqualTo(1);
|
||||||
|
|
||||||
// Tick evaluator
|
// Tick evaluator
|
||||||
evaluatorJob.tick();
|
evaluatorJob.tick();
|
||||||
|
|
||||||
@@ -181,6 +193,13 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
assertThat(instances).hasSize(1);
|
assertThat(instances).hasSize(1);
|
||||||
assertThat(instances.get(0).state()).isEqualTo(AlertState.FIRING);
|
assertThat(instances.get(0).state()).isEqualTo(AlertState.FIRING);
|
||||||
assertThat(instances.get(0).ruleId()).isEqualTo(ruleId);
|
assertThat(instances.get(0).ruleId()).isEqualTo(ruleId);
|
||||||
|
|
||||||
|
// B-1 fix verification: targets were persisted via the REST API path,
|
||||||
|
// so target_user_ids must be non-empty (not {} as before the fix)
|
||||||
|
assertThat(instances.get(0).targetUserIds())
|
||||||
|
.as("target_user_ids must be non-empty — verifies B-1 fix (saveTargets)")
|
||||||
|
.isNotEmpty();
|
||||||
|
|
||||||
instanceId = instances.get(0).id();
|
instanceId = instances.get(0).id();
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -205,6 +224,12 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
// Body should contain rule name
|
// Body should contain rule name
|
||||||
wm.verify(postRequestedFor(urlEqualTo("/webhook"))
|
wm.verify(postRequestedFor(urlEqualTo("/webhook"))
|
||||||
.withRequestBody(containing("lc-timeout-rule")));
|
.withRequestBody(containing("lc-timeout-rule")));
|
||||||
|
|
||||||
|
// B-2: lastNotifiedAt must be set after dispatch (step sets it on DELIVERED)
|
||||||
|
AlertInstance inst = instanceRepo.findById(instanceId).orElseThrow();
|
||||||
|
assertThat(inst.lastNotifiedAt())
|
||||||
|
.as("lastNotifiedAt must be set after DELIVERED — verifies B-2 tracking fix")
|
||||||
|
.isNotNull();
|
||||||
}
|
}
|
||||||
|
|
||||||
@Test
|
@Test
|
||||||
@@ -234,8 +259,8 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
String silenceBody = objectMapper.writeValueAsString(Map.of(
|
String silenceBody = objectMapper.writeValueAsString(Map.of(
|
||||||
"matcher", Map.of("ruleId", ruleId.toString()),
|
"matcher", Map.of("ruleId", ruleId.toString()),
|
||||||
"reason", "lifecycle-test-silence",
|
"reason", "lifecycle-test-silence",
|
||||||
"startsAt", Instant.now().minusSeconds(10).toString(),
|
"startsAt", simulatedNow.minusSeconds(10).toString(),
|
||||||
"endsAt", Instant.now().plusSeconds(3600).toString()
|
"endsAt", simulatedNow.plusSeconds(3600).toString()
|
||||||
));
|
));
|
||||||
ResponseEntity<String> silenceResp = restTemplate.exchange(
|
ResponseEntity<String> silenceResp = restTemplate.exchange(
|
||||||
"/api/v1/environments/" + envSlug + "/alerts/silences",
|
"/api/v1/environments/" + envSlug + "/alerts/silences",
|
||||||
@@ -305,8 +330,178 @@ class AlertingFullLifecycleIT extends AbstractPostgresIT {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Test
|
||||||
|
@Order(6)
|
||||||
|
void step6_reNotifyCadenceFiresSecondNotification() throws Exception {
|
||||||
|
// Standalone sub-test: create a fresh rule with reNotifyMinutes=1 and verify
|
||||||
|
// that the evaluator's re-notify sweep enqueues a second notification after 61 seconds.
|
||||||
|
|
||||||
|
wm.resetRequests();
|
||||||
|
wm.stubFor(post("/webhook").willReturn(aResponse().withStatus(200).withBody("accepted")));
|
||||||
|
|
||||||
|
// Create a new rule via REST with reNotifyMinutes=1, forDurationSeconds=0
|
||||||
|
UUID reNotifyRuleId = createReNotifyRuleViaRestApi();
|
||||||
|
|
||||||
|
// Seed the log BEFORE capturing T+0 so the log timestamp falls inside
|
||||||
|
// the evaluator window [t0-300s, t0].
|
||||||
|
seedMatchingLog();
|
||||||
|
|
||||||
|
// Set T+0 to current wall time — the log was inserted a few ms earlier,
|
||||||
|
// so its timestamp is guaranteed <= t0 within the 300s window.
|
||||||
|
Instant t0 = Instant.now();
|
||||||
|
setSimulatedNow(t0);
|
||||||
|
|
||||||
|
// Tick evaluator at T+0 → instance FIRING, notification PENDING
|
||||||
|
evaluatorJob.tick();
|
||||||
|
|
||||||
|
List<AlertInstance> instances = instanceRepo.listForInbox(
|
||||||
|
envId, List.of(), "test-operator", List.of("OPERATOR"), 10);
|
||||||
|
// Find the instance for the reNotify rule
|
||||||
|
AlertInstance inst = instances.stream()
|
||||||
|
.filter(i -> reNotifyRuleId.equals(i.ruleId()))
|
||||||
|
.findFirst()
|
||||||
|
.orElse(null);
|
||||||
|
assertThat(inst).as("FIRING instance for reNotify rule").isNotNull();
|
||||||
|
UUID reNotifyInstanceId = inst.id();
|
||||||
|
|
||||||
|
// Tick dispatcher at T+0 → notification DELIVERED, WireMock: 1 POST
|
||||||
|
dispatchJob.tick();
|
||||||
|
wm.verify(1, postRequestedFor(urlEqualTo("/webhook")));
|
||||||
|
|
||||||
|
// Verify lastNotifiedAt was stamped (B-2 tracking)
|
||||||
|
AlertInstance afterFirstDispatch = instanceRepo.findById(reNotifyInstanceId).orElseThrow();
|
||||||
|
assertThat(afterFirstDispatch.lastNotifiedAt()).isNotNull();
|
||||||
|
|
||||||
|
// --- Advance clock 61 seconds ---
|
||||||
|
setSimulatedNow(t0.plusSeconds(61));
|
||||||
|
|
||||||
|
// Backdate next_evaluation_at so the rule is claimed again
|
||||||
|
jdbcTemplate.update(
|
||||||
|
"UPDATE alert_rules SET next_evaluation_at = now() - interval '1 second', " +
|
||||||
|
"claimed_by = NULL, claimed_until = NULL WHERE id = ?", reNotifyRuleId);
|
||||||
|
|
||||||
|
// Tick evaluator at T+61 — re-notify sweep fires because lastNotifiedAt + 1 min <= now
|
||||||
|
evaluatorJob.tick();
|
||||||
|
|
||||||
|
// The sweep saves notifications with nextAttemptAt = simulatedNow (T+61s) which is in the
|
||||||
|
// future relative to Postgres real clock. Backdate so the dispatcher can claim them.
|
||||||
|
jdbcTemplate.update(
|
||||||
|
"UPDATE alert_notifications SET next_attempt_at = now() - interval '1 second' " +
|
||||||
|
"WHERE alert_instance_id = ? AND status = 'PENDING'::notification_status_enum",
|
||||||
|
reNotifyInstanceId);
|
||||||
|
|
||||||
|
// Tick dispatcher → second POST
|
||||||
|
dispatchJob.tick();
|
||||||
|
wm.verify(2, postRequestedFor(urlEqualTo("/webhook")));
|
||||||
|
|
||||||
|
// Cleanup
|
||||||
|
jdbcTemplate.update("DELETE FROM alert_notifications WHERE alert_instance_id = ?", reNotifyInstanceId);
|
||||||
|
jdbcTemplate.update("DELETE FROM alert_instances WHERE id = ?", reNotifyInstanceId);
|
||||||
|
jdbcTemplate.update("DELETE FROM alert_rule_targets WHERE rule_id = ?", reNotifyRuleId);
|
||||||
|
jdbcTemplate.update("DELETE FROM alert_rules WHERE id = ?", reNotifyRuleId);
|
||||||
|
}
|
||||||
|
|
||||||
// ── Helpers ───────────────────────────────────────────────────────────────
|
// ── Helpers ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/** POST the main lifecycle rule via REST API. Returns the created rule ID. */
|
||||||
|
private UUID createRuleViaRestApi() throws Exception {
|
||||||
|
// Build JSON directly — Map.of() supports at most 10 entries
|
||||||
|
String ruleBody = """
|
||||||
|
{
|
||||||
|
"name": "lc-timeout-rule",
|
||||||
|
"severity": "WARNING",
|
||||||
|
"conditionKind": "LOG_PATTERN",
|
||||||
|
"condition": {
|
||||||
|
"kind": "LOG_PATTERN",
|
||||||
|
"scope": {"appSlug": "lc-app"},
|
||||||
|
"level": "ERROR",
|
||||||
|
"pattern": "TimeoutException",
|
||||||
|
"threshold": 0,
|
||||||
|
"windowSeconds": 300
|
||||||
|
},
|
||||||
|
"evaluationIntervalSeconds": 60,
|
||||||
|
"forDurationSeconds": 0,
|
||||||
|
"reNotifyMinutes": 0,
|
||||||
|
"notificationTitleTmpl": "Alert: {{rule.name}}",
|
||||||
|
"notificationMessageTmpl": "Instance {{alert.id}} fired",
|
||||||
|
"webhooks": [{"outboundConnectionId": "%s"}],
|
||||||
|
"targets": [{"kind": "USER", "targetId": "test-operator"}]
|
||||||
|
}
|
||||||
|
""".formatted(connId);
|
||||||
|
|
||||||
|
ResponseEntity<String> resp = restTemplate.exchange(
|
||||||
|
"/api/v1/environments/" + envSlug + "/alerts/rules",
|
||||||
|
HttpMethod.POST,
|
||||||
|
new HttpEntity<>(ruleBody, securityHelper.authHeaders(operatorJwt)),
|
||||||
|
String.class);
|
||||||
|
|
||||||
|
assertThat(resp.getStatusCode()).isEqualTo(HttpStatus.CREATED);
|
||||||
|
JsonNode body = objectMapper.readTree(resp.getBody());
|
||||||
|
String id = body.path("id").asText();
|
||||||
|
assertThat(id).isNotBlank();
|
||||||
|
|
||||||
|
// Backdate next_evaluation_at so it's due immediately
|
||||||
|
UUID ruleUuid = UUID.fromString(id);
|
||||||
|
jdbcTemplate.update(
|
||||||
|
"UPDATE alert_rules SET next_evaluation_at = now() - interval '1 second' WHERE id = ?",
|
||||||
|
ruleUuid);
|
||||||
|
|
||||||
|
return ruleUuid;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** POST a short-cadence re-notify rule via REST API. Returns the created rule ID. */
|
||||||
|
private UUID createReNotifyRuleViaRestApi() throws Exception {
|
||||||
|
String ruleBody = """
|
||||||
|
{
|
||||||
|
"name": "lc-renotify-rule",
|
||||||
|
"severity": "WARNING",
|
||||||
|
"conditionKind": "LOG_PATTERN",
|
||||||
|
"condition": {
|
||||||
|
"kind": "LOG_PATTERN",
|
||||||
|
"scope": {"appSlug": "lc-app"},
|
||||||
|
"level": "ERROR",
|
||||||
|
"pattern": "TimeoutException",
|
||||||
|
"threshold": 0,
|
||||||
|
"windowSeconds": 300
|
||||||
|
},
|
||||||
|
"evaluationIntervalSeconds": 60,
|
||||||
|
"forDurationSeconds": 0,
|
||||||
|
"reNotifyMinutes": 1,
|
||||||
|
"notificationTitleTmpl": "ReNotify: {{rule.name}}",
|
||||||
|
"notificationMessageTmpl": "Re-fired {{alert.id}}",
|
||||||
|
"webhooks": [{"outboundConnectionId": "%s"}],
|
||||||
|
"targets": [{"kind": "USER", "targetId": "test-operator"}]
|
||||||
|
}
|
||||||
|
""".formatted(connId);
|
||||||
|
|
||||||
|
ResponseEntity<String> resp = restTemplate.exchange(
|
||||||
|
"/api/v1/environments/" + envSlug + "/alerts/rules",
|
||||||
|
HttpMethod.POST,
|
||||||
|
new HttpEntity<>(ruleBody, securityHelper.authHeaders(operatorJwt)),
|
||||||
|
String.class);
|
||||||
|
|
||||||
|
assertThat(resp.getStatusCode()).isEqualTo(HttpStatus.CREATED);
|
||||||
|
JsonNode body = objectMapper.readTree(resp.getBody());
|
||||||
|
String id = body.path("id").asText();
|
||||||
|
assertThat(id).isNotBlank();
|
||||||
|
|
||||||
|
UUID ruleUuid = UUID.fromString(id);
|
||||||
|
jdbcTemplate.update(
|
||||||
|
"UPDATE alert_rules SET next_evaluation_at = now() - interval '1 second' WHERE id = ?",
|
||||||
|
ruleUuid);
|
||||||
|
return ruleUuid;
|
||||||
|
}
|
||||||
|
|
||||||
|
private void setSimulatedNow(Instant instant) {
|
||||||
|
simulatedNow = instant;
|
||||||
|
stubClock();
|
||||||
|
}
|
||||||
|
|
||||||
|
private void stubClock() {
|
||||||
|
Mockito.when(alertingClock.instant()).thenReturn(simulatedNow);
|
||||||
|
Mockito.when(alertingClock.getZone()).thenReturn(ZoneOffset.UTC);
|
||||||
|
}
|
||||||
|
|
||||||
private void seedMatchingLog() {
|
private void seedMatchingLog() {
|
||||||
LogEntry entry = new LogEntry(
|
LogEntry entry = new LogEntry(
|
||||||
Instant.now(),
|
Instant.now(),
|
||||||
|
|||||||
Reference in New Issue
Block a user