Backend
- V18 migration adds AGENT_LIFECYCLE to condition_kind_enum. Java
ConditionKind enum shipped with this value but no Postgres migration
extended the type, so any AGENT_LIFECYCLE rule insert failed with
"invalid input value for enum condition_kind_enum".
- ALTER TYPE ... ADD VALUE lives alone in its migration per Postgres
constraint that the new value cannot be referenced in the same tx.
- V18MigrationIT asserts the enum now contains all 7 kinds.
Frontend
- Add describeApiError(e) helper to unwrap openapi-fetch error bodies
(Spring error JSON) into readable strings. String(e) on a plain
object rendered "[object Object]" in toasts — the actual failure
reason was hidden from the user.
- Replace String(e) in all 13 toast descriptions across the alerting
and outbound-connection mutation paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitNexus analyze --embeddings after the alerts-inbox-redesign branch
brought the graph to 8780 symbols / 22753 relationships (was 8527/22174
in AGENTS.md and 8603/22281 in CLAUDE.md). The stat-header drift between
AGENTS.md and CLAUDE.md is an artifact of separate reindexes — both now
in sync.
ui/tsconfig.app.tsbuildinfo was a stale tsc incremental-build cache
that shouldn't be tracked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allows alert rules to fire on agent-lifecycle events — REGISTERED,
RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather
than only on current state. Each matching `(agent, eventType, timestamp)`
becomes its own ackable AlertInstance, so outages on distinct agents are
independently routable.
Core:
- New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record
(scope, eventTypes, withinSeconds). Compact ctor rejects empty
eventTypes and withinSeconds<1.
- Strict allowlist enum `AgentLifecycleEventType` (six entries matching
the server-emitted types in `AgentRegistrationController` and
`AgentLifecycleMonitor`). Custom agent-emitted event types tracked in
backlog issue #145.
- `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes,
from, to, limit)` — new read path ordered `(timestamp ASC, insert_id
ASC)` used by the evaluator. Implemented on
`ClickHouseAgentEventRepository` with tenant + env filter mandatory.
App:
- `AgentLifecycleEvaluator` queries events in the last `withinSeconds`
window and returns `EvalResult.Batch` with one `Firing` per row.
Every Firing carries a canonical `_subjectFingerprint` of
`"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event`
subtrees for Mustache templating.
- `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that
exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}`.
- Validation is delegated to the record compact ctor + enum at Jackson
deserialization time — matches the existing policy of keeping
controller validators focused on env-scoped / SQL-injection concerns.
Schema:
- V16 migration generalises the V15 per-exchange discriminator on
`alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a
fallback to the legacy `exchange.id` expression. Scalar kinds still
resolve to `''` and keep one-open-per-rule. Duplicate-key path in
`PostgresAlertInstanceRepository.save` is unchanged — the index is
the deduper.
UI:
- New `AgentLifecycleForm.tsx` wizard form with multi-select chips for
the six allowed event types + `withinSeconds` input. Wired into
`ConditionStep`, `form-state` (validation + defaults: WENT_DEAD,
300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the
new option array.
- `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind,
and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`.
Tests (all passing):
- 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive +
empty/zero/unknown-type rejection).
- 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty
window, multi-agent fingerprint shape, scope forwarding, missing env).
- `NotificationContextBuilderTest` switch now covers the new kind.
- 119 alerting unit tests + 71 UI tests green.
Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
The rule editor wizard reset the condition payload on kind-change without
seeding a fireMode default; the ExchangeMatchCondition ctor allowed null to
pass through; AlertEvaluatorJob then NPE-looped every tick on a saved rule.
- core: compact ctor now rejects null fireMode (Jackson-deser path only — all
production callers already pass a value).
- V14: repair existing EXCHANGE_MATCH rows with fireMode=null to
PER_EXCHANGE + perExchangeLingerSeconds=300 (default matches the wizard).
- ui: ConditionStep.onKindChange seeds EXCHANGE_MATCH defaults so the
Select's displayed fallback ("Per exchange") is actually in form state.
- ui: validateStep('condition', ...) now enforces fireMode presence + the
mode-specific fields before the user reaches Review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes item 5 on the Plan 03 cleanup triage. The option arrays
("METRICS", "COMPARATORS", KIND_OPTIONS, SEVERITY_OPTIONS, FIRE_MODES)
scattered across RouteMetricForm / JvmMetricForm / ExchangeMatchForm /
ConditionStep / ScopeStep were hand-typed string literals. They drifted
silently — P95_LATENCY_MS appeared in a dropdown without a backend
counterpart (caught at runtime in bcde6678); JvmMetric.LATEST and
Comparator.EQ existed on the backend but were missing from the UI all
along.
Fix: new `ui/src/api/alerting-enums.ts` derives every enum from
schema.d.ts and pairs each with a `Record<T, string>` label map.
TypeScript enforces exhaustiveness — adding or removing a backend
value fails the build of this file until the label map is updated.
Every consumer imports the generated `*_OPTIONS` array.
Covered (schema-derived):
- ConditionKind → CONDITION_KIND_OPTIONS
- Severity → SEVERITY_OPTIONS
- RouteMetric → ROUTE_METRIC_OPTIONS
- Comparator → COMPARATOR_OPTIONS (adds EQ that was missing)
- JvmAggregation → JVM_AGGREGATION_OPTIONS (adds LATEST that was missing)
- ExchangeMatch.fireMode → EXCHANGE_FIRE_MODE_OPTIONS
- AlertRuleTarget.kind → TARGET_KIND_OPTIONS
form-state.ts: `severity: 'CRITICAL' | 'WARNING' | 'INFO'` and
`kind: 'USER' | 'GROUP' | 'ROLE'` literal unions swapped for the
derived `Severity` / `TargetKind` aliases.
Not covered, backend types them as `String` (no `@Schema(allowableValues)`
annotation yet):
- AgentStateCondition.state
- DeploymentStateCondition.states
- LogPatternCondition.level
- ExchangeFilter.status
- JvmMetricCondition.metric
These stay hand-typed with a pointer-comment. Follow-up: add
`@Schema(allowableValues = …)` to the Java record components so the
enums land in schema.d.ts; then fold them into alerting-enums.ts.
Plus: gitnexus index-stats refresh in AGENTS.md/CLAUDE.md from the
post-deploy reindex.
Verified: ui build green, 49/49 vitest pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the UiAuth / Oidc / UserAdmin controllers were aligned to store
bare user_ids, the rules that future sessions read were still describing
the old behaviour (OutboundConnectionAdminController "strips user:
prefix" — true mechanically but the subtlety is that the strip is
the bridge between a prefixed JWT subject and an unprefixed DB key,
not a hack).
- CLAUDE.md: expand the User persistence one-liner to state the
convention authoritatively (local `<username>`, OIDC `oidc:<sub>`,
JWT `user:` namespace, env-scoped controllers strip for FK).
- .claude/rules/app-classes.md:
- Add "User ID conventions" section near the top that spells out
write-path vs read-path behaviour in one place.
- Add UiAuthController + OidcAuthController entries under
security/ with their upsert shape documented.
- Soften the OutboundConnectionAdminController line to reference
the convention instead of restating the mechanism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Post-commit stats: 8524 nodes / 22174 edges / 415 clusters / 300 flows
(up from 8513/22146/409/300 after the five cleanup-batch commits).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the three Flyway migrations added by the alerting feature branch
so future sessions have an accurate migration map.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rule creation now goes through POST /alerts/rules (exercises saveTargets on the
write path). Clock is replaced with @MockBean(name="alertingClock") and re-stubbed
in @BeforeEach to survive Mockito's inter-test reset. Six ordered steps:
1. seed log → tick evaluator → assert FIRING instance with non-empty targets (B-1)
2. tick dispatcher → assert DELIVERED notification + lastNotifiedAt stamped (B-2)
3. ack via REST → assert ACKNOWLEDGED state
4. create silence → inject PENDING notification → tick dispatcher → assert silenced (FAILED)
5. delete rule → assert rule_id nullified, rule_snapshot preserved (ON DELETE SET NULL)
6. new rule with reNotifyMinutes=1 → first dispatch → advance clock 61s →
evaluator sweep → second dispatch → verify 2 WireMock POSTs (B-2 cadence)
Background scheduler races addressed by resetting claimed_by/claimed_until before
each manual tick. Simulated clock set AFTER log insert to guarantee log timestamp
falls within the evaluator window. Re-notify notifications backdated in Postgres
to work around the simulated vs real clock gap in claimDueNotifications.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks the new conventions into rule files so future agents and humans
don't drift back into old patterns.
- .claude/rules/app-classes.md: replaces the flat endpoint catalog
with a taxonomy-aware reorganization (env-scoped / env-admin /
agent-only / ingestion / cross-env discovery / admin / other).
Adds the flat-endpoint allow-list with rationale per prefix and
documents the tenant-filter invariant for ClickHouse queries.
- CLAUDE.md: adds four convention bullets in Key Conventions —
URL taxonomy with allow-list pointer, slug immutability rule,
app uniqueness as (env, app_slug), env-required on env-scoped
endpoints.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Groundwork for the REST API taxonomy migration. Introduces the
infrastructure that future waves use to move data/query endpoints under
/api/v1/environments/{envSlug}/... without per-handler boilerplate.
- Add @EnvPath annotation + EnvironmentPathResolver: injects the
Environment identified by the {envSlug} path variable, 404 on unknown
slug, registered via WebConfig.addArgumentResolvers.
- Add env-scoped URL matchers to SecurityConfig (config, settings,
executions/stats/logs/routes/agents/apps/deployments under
/environments/*/**). Legacy flat matchers kept in place and will be
removed per-wave as controllers migrate. New agent-authoritative
/api/v1/agents/config matcher prepared for the agent/user split.
- Document OpenAPI schema regen workflow in CLAUDE.md so future API
changes cover schema.d.ts regeneration as part of the change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).
Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.
- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
JWT env claim (agents cannot spoof env); non-agent callers provide env
via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
useAllAppSettings, useUpdateAppSettings) take env, wired to
useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.
Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Un-ignore .claude/rules/ so path-scoped rule files are shared via git.
Add instruction in CLAUDE.md to update rule files when modifying classes,
controllers, endpoints, or metrics — keeps rules current as part of
normal workflow rather than requiring separate maintenance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent shaded JAR now includes the log appender classes. Remove
PropertiesLauncher, -Dloader.path, and separate appender JAR references.
All JVM types now use: java -javaagent:/app/agent.jar -jar app.jar
Plain Java uses -cp with explicit main class. Native runs binary directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LogTab now checks mdc['cameleer.processorId'] first when filtering logs
for a selected processor node, falling back to fuzzy message/loggerName
matching for older agents without the new MDC key.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent now sets cameleer.exchangeId in MDC (persists across processor
executions, unlike Camel's camel.exchangeId which is scoped to MDCUnitOfWork).
For ON_COMPLETION exchange copies, the agent uses the parent's exchange ID.
Server changes:
- ClickHouseLogStore ingestion: extract exchange_id preferring
cameleer.exchangeId, falling back to camel.exchangeId
- ClickHouseLogStore search: match exchangeId filter against exchange_id
column OR cameleer.exchangeId OR camel.exchangeId in MDC
- Update CLAUDE.md with log exchange correlation documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four logging pipeline fixes:
1. Multi-replica startup logs: remove stopLogCaptureByApp from
SseConnectionManager — container log capture now expires naturally
after 60s instead of being killed when the first agent connects SSE.
This ensures all replicas' bootstrap output is captured.
2. Unified instance_id: container logs and agent logs now share the same
instance identity ({envSlug}-{appSlug}-{replicaIndex}). DeploymentExecutor
sets CAMELEER_AGENT_INSTANCEID per replica so the agent uses the same
ID as ContainerLogForwarder. Instance-level log views now show both
container and agent logs.
3. Labels-first container identity: TraefikLabelBuilder emits cameleer.replica
and cameleer.instance-id labels. Container names are tenant-prefixed
({tenantId}-{envSlug}-{appSlug}-{idx}) for global Docker daemon uniqueness.
4. Environment filter on log queries: useApplicationLogs now passes the
selected environment to the API, preventing log leakage across environments.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DatabaseAdminController's active-queries and kill-query endpoints could
expose SQL text from other tenants sharing the same PostgreSQL instance.
Added ApplicationName=tenant_{id} to the JDBC URL and filter
pg_stat_activity by application_name so each tenant only sees its own
connections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ContainerLogForwarder, StartupLogPanel, useStartupLogs to key classes
and UI files. Document log capture lifecycle and source badge rendering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show agent built-in defaults as reference Badge pills, separate editable keys
section with count badge, amber-highlighted push toggle, right-aligned save
button. Fix info text: keys add to defaults, not replace. Add ClaimMapping
controller to CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent team migrated from JMX to Micrometer metrics. Update the 5
hardcoded metric names in AgentInstance.tsx JVM charts:
- jvm.cpu.process → process.cpu.usage.value
- jvm.memory.heap.used → jvm.memory.used.value
- jvm.memory.heap.max → jvm.memory.max.value
- jvm.threads.count → jvm.threads.live.value
- jvm.gc.time → jvm.gc.pause.total_time
Server backend is unaffected (generic MetricsSnapshot storage).
CLAUDE.md updated with full agent metric name reference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lists all business metrics (gauges, counters, timers) with their
tags and source classes, plus agent container label mapping table.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Labels prometheus.scrape, prometheus.path, and prometheus.port are now
set on every deployed container based on the resolved runtime type,
enabling automatic Prometheus service discovery via docker_sd_configs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOWTO.md: log ingestion example updated from LogBatch wrapper to raw
JSON array with source field. CLAUDE.md: added LogIngestionController,
updated LogQueryController with new filters. SERVER-CAPABILITIES.md:
updated log ingestion and query descriptions, ClickHouse table note.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOWTO.md configuration table rewritten with correct cameleer.server.*
property names, grouped by functional area. Removed stale CAMELEER_OIDC_*
env var references. SERVER-CAPABILITIES.md updated with correct env var
names for ingestion and agent registry tuning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.
Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
(e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redesign DeploymentProgress component: track-based layout with amber
brand color, checkmarks for completed steps, user-friendly labels
(Prepare, Image, Network, Launch, Verify, Activate, Live)
- Delete terminal (STOPPED/FAILED) deployments before creating new ones
for the same app+environment, preventing duplicate rows in the UI
- Update CLAUDE.md with comprehensive key class locations, correct deploy
stages, database migration reference, and REST endpoint summary
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expose getDirectRolesForUser on RbacService interface so syncOidcRoles
compares against directly-assigned roles only, not group-inherited ones
- Remove early-return that preserved existing roles when OIDC returned
none — now always applies defaultRoles as fallback
- Update CLAUDE.md and SERVER-CAPABILITIES.md to reflect changes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend holds client_secret and does the token exchange server-side,
making PKCE redundant. Removes code_verifier/code_challenge from all
frontend auth paths and backend exchange method. Eliminates the source
of "grant request is invalid" errors from verifier mismatches.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hand-crafted favicon.svg with official brand assets from
@cameleer/design-system v0.1.32: PNG favicons (16/32px) and
camel-logo.svg for login dialog and sidebar. Update SecurityConfig
public endpoints accordingly. Update documentation for architecture
cleanup (PKCE, OidcProviderHelper, role normalization, K8s hardening,
Dockerfile credential removal, CI deduplication, sidebar path fix).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>