- app-classes: DiagramRenderController by-route endpoint no longer
depends on the agent registry; points at findLatestContentHashForAppRoute
and cross-refs the exchange viewer's content-hash path.
- core-classes: document the new DiagramStore method and note why the
agent-scoped findContentHashForRoute stays for the ingest path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on
every router. That assumes a resolver literally named `default` exists
in the Traefik static config — true for ACME-backed installs, false for
dev/local installs that use a file-based TLS store. Traefik logs
"Router uses a nonexistent certificate resolver" for the bogus resolver
on every managed app, and any future attempt to define a differently-
named real resolver would silently skip these routers.
Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by
default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver`
into `ResolvedContainerConfig.certResolver`. When blank the
`tls.certresolver` label is omitted entirely; `tls=true` is still
emitted so Traefik serves the default TLS-store cert. When set, the
label is emitted with the configured resolver name.
Not per-app/per-env configurable: there is one Traefik per server
instance and one resolver config; app-level override would only let
users break their own routers.
TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank).
Full unit suite 211/0/0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a boolean `externalRouting` flag (default `true`) on
ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only
the identity labels (`managed-by`, `cameleer.*`) and skips every
`traefik.*` label, so the container is not published by Traefik.
Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}`
can still reach it via Docker DNS on whatever port the app listens on.
TDD: new TraefikLabelBuilderTest covers enabled (default labels present),
disabled (zero traefik.* labels), and disabled (identity labels retained)
cases. Full module unit suite: 208/0/0.
Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form
state, Resources tab toggle, POST payload, and snapshot-to-form mapping.
Rule files updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment.
STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty.
Now only FAILED rows are pruned; STOPPED deployments are retained as restorable
checkpoints (they still carry deployed_config_snapshot from their RUNNING window).
- UI filter: any deployment with a snapshot is a checkpoint (was RUNNING|DEGRADED only,
which excluded the main case — the previous blue/green deployment now in STOPPED).
- UI placement: Checkpoints disclosure now renders inside IdentitySection, matching
the design spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refresh the three rules files to match the new executor behavior:
- docker-orchestration.md: rewrite DeploymentExecutor Details with
container naming scheme ({...}-{replica}-{generation}), strategy
dispatch (blue-green vs rolling), and the new DEGRADED semantics
(post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder
bullets for the generation suffix + new cameleer.generation label.
- app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets
mirror the same.
- core-classes.md: add DeploymentStrategy enum; note DEGRADED is now
post-deploy-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No callers after the legacy PG ingestion path was retired in 0f635576.
core-classes.md updated to drop the leftover note.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ExecutionController was @ConditionalOnMissingBean(ChunkAccumulator.class),
and ChunkAccumulator is registered unconditionally — the legacy controller
never bound in any profile. Even if it had, IngestionService.ingestExecution
called executionStore.upsert(), and the only ExecutionStore impl
(ClickHouseExecutionStore) threw UnsupportedOperationException from upsert
and upsertProcessors. The entire RouteExecution → upsert path was dead code
carrying four transitive dependencies (RouteExecution import, eventPublisher
wiring, body-size-limit config, searchIndexer::onExecutionUpdated hook).
Removed:
- cameleer-server-app/.../controller/ExecutionController.java (whole file)
- ExecutionStore.upsert + upsertProcessors (interface methods)
- ClickHouseExecutionStore.upsert + upsertProcessors (thrower overrides)
- IngestionService.ingestExecution + toExecutionRecord + flattenProcessors
+ hasAnyTraceData + truncateBody + toJson/toJsonObject helpers
- IngestionService constructor now takes (DiagramStore, WriteBuffer<Metrics>);
dropped ExecutionStore + Consumer<ExecutionUpdatedEvent> + bodySizeLimit
- StorageBeanConfig.ingestionService(...) simplified accordingly
Untouched because still in use:
- ExecutionRecord / ProcessorRecord records (findById / findProcessors /
SearchIndexer / DetailController)
- SearchIndexer (its onExecutionUpdated never fires now since no-one
publishes ExecutionUpdatedEvent, but SearchIndexerStats is still
referenced by ClickHouseAdminController — separate cleanup)
- TaggedExecution record has no remaining callers after this change —
flagged in core-classes.md as a leftover; separate cleanup.
Rule docs updated:
- .claude/rules/app-classes.md: retired ExecutionController bullet, fixed
stale URL for ChunkIngestionController (it owns /api/v1/data/executions,
not /api/v1/ingestion/chunk/executions).
- .claude/rules/core-classes.md: IngestionService surface + note the dead
TaggedExecution.
Full IT suite post-removal: 560 tests run, 11 F + 1 E — same 12 failures
in the same 3 previously-parked classes (AgentSseControllerIT / SseSigningIT
SSE-timing + ClickHouseStatsStoreIT timezone bug). No regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allows alert rules to fire on agent-lifecycle events — REGISTERED,
RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather
than only on current state. Each matching `(agent, eventType, timestamp)`
becomes its own ackable AlertInstance, so outages on distinct agents are
independently routable.
Core:
- New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record
(scope, eventTypes, withinSeconds). Compact ctor rejects empty
eventTypes and withinSeconds<1.
- Strict allowlist enum `AgentLifecycleEventType` (six entries matching
the server-emitted types in `AgentRegistrationController` and
`AgentLifecycleMonitor`). Custom agent-emitted event types tracked in
backlog issue #145.
- `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes,
from, to, limit)` — new read path ordered `(timestamp ASC, insert_id
ASC)` used by the evaluator. Implemented on
`ClickHouseAgentEventRepository` with tenant + env filter mandatory.
App:
- `AgentLifecycleEvaluator` queries events in the last `withinSeconds`
window and returns `EvalResult.Batch` with one `Firing` per row.
Every Firing carries a canonical `_subjectFingerprint` of
`"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event`
subtrees for Mustache templating.
- `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that
exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}`.
- Validation is delegated to the record compact ctor + enum at Jackson
deserialization time — matches the existing policy of keeping
controller validators focused on env-scoped / SQL-injection concerns.
Schema:
- V16 migration generalises the V15 per-exchange discriminator on
`alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a
fallback to the legacy `exchange.id` expression. Scalar kinds still
resolve to `''` and keep one-open-per-rule. Duplicate-key path in
`PostgresAlertInstanceRepository.save` is unchanged — the index is
the deduper.
UI:
- New `AgentLifecycleForm.tsx` wizard form with multi-select chips for
the six allowed event types + `withinSeconds` input. Wired into
`ConditionStep`, `form-state` (validation + defaults: WENT_DEAD,
300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the
new option array.
- `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind,
and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`.
Tests (all passing):
- 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive +
empty/zero/unknown-type rejection).
- 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty
window, multi-agent fingerprint shape, scope forwarding, missing env).
- `NotificationContextBuilderTest` switch now covers the new kind.
- 119 alerting unit tests + 71 UI tests green.
Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
- LogQueryController: note response shape, sort param, and that the
cursor tiebreak is the insert_id UUID column (not exchange/instance)
- AgentEventsController: cursor now carries insert_id UUID (was instanceId);
order is (timestamp DESC, insert_id DESC)
- core-classes: add AgentEventPage record; note that the non-paginated
AgentEventRepository.query(...) path has been removed
- core-classes: note LogSearchRequest.sources/levels are now List<String>
with multi-value OR semantics
Keeps the rule files in sync with the cursor-pagination + multi-select
filter work on main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).
Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.
- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
JWT env claim (agents cannot spoof env); non-agent callers provide env
via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
useAllAppSettings, useUpdateAppSettings) take env, wired to
useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.
Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The exchange search silently filtered by the in-memory agent registry's
current instance IDs on top of application_id. Historical exchanges written
by previous agent instances (or any instance not currently registered, e.g.
after a server restart before agents heartbeat back) were hidden from
results even though they matched the application filter.
Fix: drop the applicationId -> instanceIds resolution in SearchController.
Rely on application_id = ? in ClickHouseSearchIndex; keep explicit
instanceIds filtering only when a client passes them.
Related cleanup: the agentIds parameter on StatsStore.statsForRoute /
timeseriesForRoute was silently discarded inside ClickHouseStatsStore, so
per-route stats aggregated across any apps sharing a routeId. Replace with
String applicationId and add application_id to the stats_1m_route filters
so per-route stats are correctly scoped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Un-ignore .claude/rules/ so path-scoped rule files are shared via git.
Add instruction in CLAUDE.md to update rule files when modifying classes,
controllers, endpoints, or metrics — keeps rules current as part of
normal workflow rather than requiring separate maintenance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>