Spec §13 calls for the notification bell to colour-code by highest
unread severity (CRITICAL → error, WARNING → amber, INFO → muted).
The old { count } DTO forced the UI to pick one static colour, so
NotificationBell shipped with a TODO. Grow the contract instead:
UnreadCountResponse = { total, bySeverity: { CRITICAL, WARNING, INFO } }
Guarantees:
- every severity is always present with a >=0 value (no undefined
keys on the wire), so the UI can branch without defaults.
- total = sum of bySeverity values — kept explicit on the wire for
cheap top-line display, not recomputed client-side.
Backend
- AlertInstanceRepository: replaces countUnreadForUser(long) with
countUnreadBySeverityForUser returning Map<AlertSeverity, Long>.
One SQL round-trip per (env, user) — GROUP BY ai.severity over the
same NOT EXISTS(alert_reads) filter.
- UnreadCountResponse.from(Map) normalises and defensively copies;
missing severities default to 0.
- InAppInboxQuery.countUnread now returns the DTO, caches the full
response (still 5s TTL) so severity breakdown gets the same
hit-rate as the total did before.
- AlertController just hands the DTO back.
Breaking change — no backwards-compat shim: the `count` field is
gone. UI and tests updated in the same commit; there are no other
API consumers in the tree.
Frontend
- Regenerated openapi.json + schema.d.ts against a fresh build of
the new backend.
- NotificationBell branches badge colour on the highest unread
severity (CRITICAL > WARNING > INFO) via new CSS variants.
- Tests cover all four paths: zero, critical-present, warning-only,
info-only.
Tests: 7 unit tests + 12 ITs (incl. new grouping + empty-map)
+ 49 vitest (was 46; +3 severity-branch assertions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AlertNotificationRepository gains resetForRetry(UUID, Instant) which sets
attempts=0, status=PENDING, next_attempt_at=now, and clears claim/response
fields. AlertNotificationController calls resetForRetry instead of
scheduleRetry so a manual retry always starts from a clean slate.
AlertNotificationControllerIT adds retryResetsAttemptsToZero to verify
attempts==0 and status==PENDING after three prior markFailed calls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlertInstanceRepository gains listFiringDueForReNotify(Instant) — only returns
instances where last_notified_at IS NOT NULL and cadence has elapsed (IS NULL
branch excluded: sweep only re-notifies, initial notify is the dispatcher's job).
AlertEvaluatorJob.sweepReNotify() runs at the end of each tick, enqueues fresh
notifications for eligible instances and stamps last_notified_at.
NotificationDispatchJob stamps last_notified_at on the alert_instance when a
notification is DELIVERED, providing the anchor timestamp for cadence checks.
PostgresAlertInstanceRepositoryIT adds listFiringDueForReNotify test covering
the three-rule eligibility matrix (never-notified, long-ago, recent).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The evaluator mapped P95_LATENCY_MS to ExecutionStats.avgDurationMs because
stats_1m_route has no p95 column. Exposing the old name implied p95 semantics
operators did not get. Rename to AVG_DURATION_MS makes the contract honest.
Updated RouteMetric enum (with javadoc), evaluator switch, and admin guide.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add withRuleSnapshot(Map) wither to AlertInstance (same pattern as other withers)
- Call snapshotRule(rule) + withRuleSnapshot in both applyResult (single-firing) and
applyBatchFiring paths so every persisted instance carries a non-empty JSONB snapshot
- Strip null values from the Jackson-serialized map before wrapping in the immutable
snapshot so Map.copyOf in the compact ctor does not throw NPE on nullable rule fields
- Add ruleSnapshotIsPersistedOnInstanceCreation IT: asserts name/severity/conditionKind
appear in the rule_snapshot column after a tick fires an instance
- Add historySurvivesRuleDelete IT: fires an instance, deletes the rule, asserts
rule_id IS NULL and rule_snapshot still contains the rule name (spec §5 guarantee)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Declared in cameleer-server-core pom (canonical location for unit-testable
rendering without Spring) and mirrored in cameleer-server-app pom so the
app module compiles standalone without a full reactor install.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds AlertMatchSpec record (core) and ClickHouseSearchIndex.countExecutionsForAlerting —
no FINAL, no text subqueries. Filters by tenant, env, app, route, status, time window,
and optional after-cursor. Attributes (JSON string column) use inlined JSONExtractString
key literals since ClickHouse JDBC does not bind ? placeholders inside JSON functions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements AlertInstanceRepository: save (upsert), findById, findOpenForRule,
listForInbox (3-way OR: user/group/role via && array-overlap + ANY), countUnreadForUser
(LEFT JOIN alert_reads), ack, resolve, markSilenced, deleteResolvedBefore.
Integration test covers all 9 scenarios including inbox fan-out across all
three target types. Also adds @JsonIgnoreProperties(ignoreUnknown=true) to
SilenceMatcher to suppress Jackson serializing isWildcard() as a round-trip field.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New audit categories: OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE.
Controller-level @PreAuthorize defaults to ADMIN; GETs relaxed to ADMIN|OPERATOR.
SecurityConfig permits OPERATOR GETs on /api/v1/admin/outbound-connections/**.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V11 migration referenced users(id) as uuid, but V1 users table has
user_id as TEXT primary key. Amending V11 and the OutboundConnection
record before Task 7's integration tests catch this at Flyway startup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AgentEventService.queryEvents, AgentEventRepository.query, and the
ClickHouse implementation have had no callers since /agents/events
became cursor-paginated. Remove them along with their dedicated IT
tests. queryPage and its tests remain as the single query path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces cursor-paginated query on AgentEventRepository. The cursor
format is owned by the implementation. The existing non-paginated
query(...) is kept for internal consumers.
Replaces LogSearchRequest.source (String) with sources (List<String>)
and emits 'source IN (...)' when non-empty. LogQueryController parses
?source=a,b,c the same way it parses ?level=a,b,c.
The agent_events table has an `environment` column and AgentEventsController
filters on it, but the INSERT never populated it — every row got the
column default ('default'). Result: Timeline on the Application Runtime
page was empty whenever the user's selected env was anything other than
'default'.
Thread env through the write path:
- AgentEventRepository.insert + AgentEventService.recordEvent gain an
`environment` param; delete the no-env query overload (unused).
- ClickHouseAgentEventRepository.insert writes the column (falls back to
'default' on null to match column DEFAULT).
- All 5 callers source env from the agent registry (AgentInfo.environmentId)
or the registration request body; AgentLifecycleMonitor, deregister,
command ack, event ingestion, register/re-register.
- Integration test updated for the new signatures.
Pre-existing rows in deployed CH will still report environment='default'.
New events from this build forward will carry the correct env. Backfill
(UPDATE ... FROM apps) is left as a manual DB step if historical timeline
is needed for non-default envs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Correlated exchanges always share the env of the one being viewed —
using the globally-selected env from the picker was wrong if the user
switched envs after opening a detail view (or arrived via permalink).
Thread `environment` through:
- `ExecutionStore.ExecutionRecord` gains `environment` field; the
ClickHouse `executions` table already stores this, just not read back.
- `ClickHouseExecutionStore.findById` SELECT adds the column; mapper
populates it.
- `ExecutionDetail` gains `environment`; `DetailService` passes through.
- `IngestionService.toExecutionRecord` passes null — this legacy PG
ingestion path isn't active when ClickHouse is enabled, and the
read-side is what drives the correlation UI.
- UI `ExchangeHeader` reads `detail.environment ?? storeEnv` and
extends the TS type locally (schema.d.ts catches up on next regen).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P3B of the taxonomy migration. App and deployment routes are now
env-scoped in the URL itself, making the (env, app_slug) uniqueness
key explicit. Previously /api/v1/apps/{appSlug} was ambiguous: with
the same app deployed to multiple environments (dev/staging/prod),
the handler called AppService.getBySlug(slug) which returns the
first row matching slug regardless of env.
Server:
- AppController: @RequestMapping("/api/v1/environments/{envSlug}/
apps"). Every handler now calls
appService.getByEnvironmentAndSlug(env.id(), appSlug) — 404 if the
app doesn't exist in *this* env. CreateAppRequest body drops
environmentId (it's in the path).
- DeploymentController: @RequestMapping("/api/v1/environments/
{envSlug}/apps/{appSlug}/deployments"). DeployRequest body drops
environmentId. PromoteRequest body switches from
targetEnvironmentId (UUID) to targetEnvironment (slug);
promote handler resolves the target env by slug and looks up the
app with the same slug in the target env (fails with 404 if the
target app doesn't exist yet — apps must exist in both source
and target before promote).
- AppService: added getByEnvironmentAndSlug helper; createApp now
validates slug against ^[a-z0-9][a-z0-9-]{0,63}$ (400 on
invalid).
SPA:
- queries/admin/apps.ts: rewritten. Hooks take envSlug where
env-scoped. Removed useAllApps (no flat endpoint). Renamed path
param naming: appId → appSlug throughout. Added
usePromoteDeployment. Query keys include envSlug so cache is
env-scoped.
- AppsTab.tsx: call sites updated. When no environment is selected,
the managed-app list is empty — cross-env discovery lives in the
Runtime tab (catalog). handleDeploy/handleStop/etc. pass envSlug
to the new hook signatures.
BREAKING CHANGE: /api/v1/apps/** paths removed. Clients must use
/api/v1/environments/{envSlug}/apps/{appSlug}/**. Request bodies
for POST /apps and POST /apps/{slug}/deployments no longer accept
environmentId (use the URL path instead). Promote body uses slug
not UUID.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UUID-based admin paths were the only remaining UUID-in-URL pattern in
the API. Migrates /api/v1/admin/environments/{id} to /{envSlug} so
slugs are the single environment identifier in every URL. UUIDs stay
internal to the database.
- Controller: @PathVariable UUID id → @PathVariable String envSlug on
get/update/delete and the two nested endpoints (default-container-
config, jar-retention). Handlers resolve slug → Environment via
EnvironmentService.getBySlug, then delegate to existing UUID-based
service methods.
- Service: create() now validates slug against ^[a-z0-9][a-z0-9-]{0,63}$
and returns 400 on invalid slugs. Rationale documented in the class:
slugs are immutable after creation because they appear in URLs,
Docker network names, container names, and ClickHouse partition keys.
- UpdateEnvironmentRequest has no slug field and Jackson's default
ignore-unknown behavior drops any slug supplied in a PUT body;
regression test (updateEnvironment_withSlugInBody_ignoresSlug)
documents this invariant.
- SPA: mutation args change from { id } to { slug }. EnvironmentsPage
still uses env.id for local selection state (UUID from DB) but
passes env.slug to every mutation.
BREAKING CHANGE: /api/v1/admin/environments/{id:UUID}/... paths removed.
Clients must use /{envSlug}/... (slug from the environments list).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes two cross-env data leakage paths. Both endpoints previously
returned data aggregated across all environments, so a diagram or
attribute key from dev could appear in a prod UI query (and vice versa).
B1: GET /api/v1/diagrams?application=&routeId= now requires
?environment= and resolves agents via
registryService.findByApplicationAndEnvironment instead of
findByApplication. Prevents serving a dev diagram for a prod route.
B2: GET /api/v1/search/attributes/keys now requires ?environment=.
SearchIndex.distinctAttributeKeys gains an environment parameter and
the ClickHouse query adds the env filter alongside the existing
tenant_id filter. Prevents prod attribute names leaking into dev
autocompletion (and vice versa).
SPA hooks updated to thread environment through from
useEnvironmentStore; query keys include environment so React Query
re-fetches on env switch. No call-site changes needed — hook
signatures unchanged.
B3 (AgentMetricsController env scope) deferred to P3C: agent-env is
effectively 1:1 today via the instance_id naming
({envSlug}-{appSlug}-{replicaIndex}), and the URL migration in P3C
to /api/v1/environments/{env}/agents/{agentId}/metrics naturally
introduces env from path. A minimal P1 fix would regress the "view
metrics of a killed agent" case.
BREAKING CHANGE: Both endpoints now require ?environment= (slug).
Clients omitting the parameter receive 400.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).
Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.
- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
JWT env claim (agents cannot spoof env); non-agent callers provide env
via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
useAllAppSettings, useUpdateAppSettings) take env, wired to
useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.
Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The exchange search silently filtered by the in-memory agent registry's
current instance IDs on top of application_id. Historical exchanges written
by previous agent instances (or any instance not currently registered, e.g.
after a server restart before agents heartbeat back) were hidden from
results even though they matched the application filter.
Fix: drop the applicationId -> instanceIds resolution in SearchController.
Rely on application_id = ? in ClickHouseSearchIndex; keep explicit
instanceIds filtering only when a client passes them.
Related cleanup: the agentIds parameter on StatsStore.statsForRoute /
timeseriesForRoute was silently discarded inside ClickHouseStatsStore, so
per-route stats aggregated across any apps sharing a routeId. Replace with
String applicationId and add application_id to the stats_1m_route filters
so per-route stats are correctly scoped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IngestionService.hasAnyTraceData() and ChunkAccumulator only checked
for inputBody/outputBody when setting has_trace_data on executions.
Headers and properties captured via deep tracing were not considered,
causing the trace data indicator to be missing in the exchange list.
DetailService already checked all six fields — this aligns the
ingestion path to match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two sidebar bugs fixed:
1. Route entries never highlighted on navigation because sidebar-utils
generated /apps/ paths for route children while effectiveSelectedPath
normalizes to /exchanges/. The design system does exact string matching.
2. Routes disappeared from sidebar when agents had no recent exchange
data. Heartbeat carried routeStates (with route IDs as keys) but
AgentRegistryService.heartbeat() never updated AgentInfo.routeIds.
After server restart, auto-heal registered agents with empty routes,
leaving ClickHouse (24h window) as the only discovery source.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RuntimeDetector now derives the correct PropertiesLauncher FQN from
the JAR manifest Main-Class package. Spring Boot 3.2+ uses
org.springframework.boot.loader.launch.PropertiesLauncher, pre-3.2
uses org.springframework.boot.loader.PropertiesLauncher.
DockerRuntimeOrchestrator uses the detected class instead of a
hardcoded 3.2+ reference, falling back to 3.2+ when not auto-detected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>