Future-proofs against well-meaning "fixes" that would re-align inconsistent
hostnames. CI keeps pushing to gitea.siegeln.net; runtime defaults speak
registry.cameleer.io (the public alias of the same registry). Both forms
of the same image coexist intentionally during the institutionalization
period.
- CLAUDE.md: new "Registry naming (buildtime vs public)" section between
Related Project and Modules; loader-image default mention now says
registry.cameleer.io with an inline cross-reference; license-API note
flags that com.cameleer:cameleer-common stays on the agent repo's
groupId until that project follows the same flip
- .claude/rules/cicd.md: registry line now names both hostnames and points
at the new CLAUDE.md section
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Customers running this server with no overrides reach the public registry
alias, not the internal hostname. registry.cameleer.io and gitea.siegeln.net
resolve to the same registry — buildtime CI keeps pushing to gitea.siegeln.net,
runtime defaults pull via the public alias.
- application.yml: baseimage, loaderimage defaults
- DeploymentExecutor.java: matching @Value defaults
- docker-orchestration.md: updates the documented default and notes the
buildtime/public split so future changes don't "fix" the asymmetry
Out of scope (intentionally still on gitea.siegeln.net):
- LoaderHardeningIT and the two DockerRuntimeOrchestrator unit tests.
Tests are buildtime artifacts; LoaderHardeningIT pulls the real image
via CI's pre-authenticated docker login to gitea.siegeln.net.
- deploy/base/*.yaml and deploy/overlays/main/*.yaml (internal k3s,
customers don't use these manifests).
- pom.xml, .npmrc, ui/Dockerfile (build dependency sources).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brand-aligned reverse-DNS: io.cameleer matches the owned cameleer.io
domain. Part of institutionalization prep — clean break, no compat shims.
Scope:
- 613 .java files: package + import declarations and directory layout
- 5 POMs: groupId for cameleer-server-parent and 4 modules; mainClass FQN
in cameleer-license-minter; internal inter-module dep coordinates
- .claude/rules/{core,app}-classes.md + CLAUDE.md: keep class/API maps in
sync per the maintenance rule in CLAUDE.md
Out of scope (intentionally preserved on com.cameleer):
- com.cameleer:cameleer-common — external dep from the agent repo
- Spring config namespaces (cameleer.server.*) — they're property keys,
not Java packages
Consumer heads-up:
- cameleer-saas pulls io.cameleer:cameleer-license-{api,minter} on next
sync; their POMs need the matching groupId bump.
Verification: mvn install -DskipITs (273 server-app unit tests pass under
io.cameleer.* package names; license-api / server-core / license-minter
modules all green). The repackage step's JAR-rename failure during the run
was a file lock from a co-running dev server, unrelated to the rename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update CLAUDE.md and .claude/rules/cicd.md to point at the new
source-of-truth location (cameleer-saas/docker/runtime-loader/) and
flag LoaderHardeningIT as the cross-repo contract test instead of an
internal regression guard. The image's runtime contract (env vars,
mount path, exit codes) is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The loader is infra glue (per-replica init container that fetches the
tenant JAR from a signed URL) — same shape as runtime-base, postgres,
clickhouse, traefik, logto images already living in cameleer-saas. Move
the source + CI build there so all sidecar/infra image builds are in
one place; cameleer-server's CI is back to building only what it owns
(server, server-ui).
Coordination: cameleer-saas@ac8d628 added the build step and copied the
source verbatim. Published tag path is unchanged
(gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest), so running
tenant servers continue pulling the same image without disruption.
This commit:
- Deletes cameleer-runtime-loader/ (Dockerfile, entrypoint.sh, README).
- Removes the conditional "Build and push runtime-loader" step and its
upstream "Detect runtime-loader changes" detection from .gitea/workflows/ci.yml.
Drops the fetch-depth: 0 + outputs.loader_changed plumbing that only
existed for the change-detection path.
- Drops cameleer-runtime-loader from the in-job and cleanup-branch image
cleanup loops — saas owns the registry lifecycle now.
- Rewrites LoaderHardeningIT to pull the published :latest from the
registry (via Testcontainers GenericContainer) instead of building
from a local Dockerfile. The IT now functions as a cross-repo contract
test: cameleer-server's hardening expectations vs. the saas-published
artifact. Local devs need `docker login gitea.siegeln.net`; CI runners
are pre-authenticated.
- Updates .claude/rules/docker-orchestration.md to point at the new
source-of-truth location and reframe LoaderHardeningIT as the
cross-repo contract test.
The image's runtime contract (ARTIFACT_URL, ARTIFACT_EXPECTED_SIZE,
/app/jars/app.jar mount, exit code semantics) is unchanged. Future
contract changes need coordinated commits across both repos.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two diagnostics-and-confidence follow-ups to the loader-init-container pattern.
1) DockerRuntimeOrchestrator now captures the loader's last 50 lines of
stdout/stderr (capped at 4096 chars, 5s timeout) before the finally-remove
and appends them to the thrown RuntimeException as
`. loader output: <text>`. Best-effort: log-capture failures are swallowed
and never mask the original exit. Closes the visibility gap that turned a
simple "wget: Permission denied" into the opaque "Loader exited 1".
2) New LoaderHardeningIT spins up a Testcontainers nginx serving a 1KB
fixture, builds the loader image fresh from cameleer-runtime-loader/,
and runs it under the exact baseHardenedHostConfig() shape (cap_drop ALL,
readonly rootfs, /tmp tmpfs, no-new-privileges, apparmor=docker-default,
pids=512) bound to a fresh named volume RW at /app/jars. Asserts exit 0.
This would have caught the volume-permission regression in CI.
GenericContainer + OneShotStartupCheckStrategy is used instead of raw
docker-java waitContainerCmd because docker-java's unshaded api version
in this project's pom and testcontainers' shaded copy disagree on
WaitContainerCmd.getCondition() — going through GenericContainer keeps
the call inside testcontainers' shaded executor.
Rules doc updated to point at the captured-output behaviour and the IT.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The init-container image referenced by DockerRuntimeOrchestrator
(`gitea.siegeln.net/cameleer/cameleer-runtime-loader:latest`) had no CI
producer; it had to be built and pushed by hand. Replicates the
cameleer-saas pattern (single docker job with multiple buildx push
steps), but gates the loader build on a path-diff so unrelated commits
don't rebuild and re-tag a sidecar that didn't change.
- build job: fetch-depth=0 + Detect runtime-loader changes step that
diffs `${{ github.event.before }}..${{ github.sha }}` for paths under
cameleer-runtime-loader/. Falls back to `changed=true` when no prior
commit is reachable (first push to a branch).
- docker job: new `Build and push runtime-loader` step gated on
`needs.build.outputs.loader_changed == 'true'`. Tags with sha and
latest/branch-<slug>, --provenance=false for Gitea, no buildcache
(image is alpine + script).
- Cleanup loops in docker and cleanup-branch jobs include the new
package.
- Rules and loader README updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final-review must-fixes:
- HOWTO.md: drop CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME; add the three new
artifact env vars (loaderimage / artifacttokenttlseconds / artifactbaseurl).
- DeploymentExecutor @PostConstruct WARN, handoff doc, and docker-orchestration
rule no longer claim the loader uses cameleer-traefik. The loader runs on
the PRIMARY Docker network only — additional networks are attached after
startContainer returns, by which time the loader has exited. SaaS still
works because the tenant's primary network hosts the tenant server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-pull the loader image at PULL_IMAGE so the implicit pull on first
createContainerCmd doesn't bypass the 120s loader-wait timeout.
Wrap createAndStartLoader in try/catch so a create/start failure cleans
up the just-created volume; same guard around createAndStartMain on
phase-2 failures. Folds the wait-error message into the rethrown
RuntimeException so the cause chain is visible.
Add a @PostConstruct WARN when neither artifactbaseurl nor serverurl is
set so the implicit cameleer-server DNS dependency is loud at boot, and
document the loader-to-server reachability contract in
.claude/rules/docker-orchestration.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tasks 9+10+11 of the init-container-jar-fetch plan, landed atomically because
9 alone leaves the orchestrator+executor referencing removed ContainerRequest
fields.
ContainerRequest (core) drops jarPath/jarVolumeName/jarVolumeMountPath; adds
appVersionId, artifactDownloadUrl, artifactExpectedSize, loaderImage.
DockerRuntimeOrchestrator (app):
- per-replica named volume "cameleer-jars-{containerName}"
- phase 1: loader container with the volume mounted RW at /app/jars,
ARTIFACT_URL + ARTIFACT_EXPECTED_SIZE env, full hardening contract
- block on waitContainerCmd().awaitStatusCode(120s); on non-zero exit
remove the loader, remove the volume, propagate RuntimeException so
DeploymentExecutor marks the deployment FAILED. main is never created.
- phase 2: main container with the same volume mounted RO at /app/jars
- withUsernsMode("host:1000:65536") on BOTH containers — closes the last
open hardening gap from issue #152
- main entrypoint paths point at /app/jars/app.jar
- extracted baseHardenedHostConfig() so loader and main share the
cap_drop / security_opt / readonly / pids / tmpfs contract
- removeContainer() also removes the per-replica volume so blue/green
doesn't leak volumes
DeploymentExecutor (app):
- injects ArtifactDownloadTokenSigner; new @Value props loaderimage,
artifacttokenttlseconds, artifactbaseurl
- replaces the temporary getVersion(...).jarPath() bridge with a signed
URL ${artifactBaseUrl}/api/v1/artifacts/{id}?exp&sig
- drops the Files.exists pre-flight check; AppVersion.jarSizeBytes is
the size-of-record check now
- drops jarDockerVolume / jarStoragePath @Value fields and the volume
plumbing in startReplica
- DeployCtx carries appVersionId / artifactUrl / artifactExpectedSize
in place of jarPath
Tests:
- DockerRuntimeOrchestratorHardeningTest updated for the new shape;
captures HostConfig on the MAIN container and asserts cap_drop ALL
+ no-new-privileges + apparmor + readonly + pids + tmpfs + the new
withUsernsMode("host:1000:65536")
- DockerRuntimeOrchestratorLoaderTest (new): verifies volume create →
loader create with RW bind → loader started → awaited → loader
removed → main create with RO bind → main started; verifies abort
+ cleanup on loader exit != 0 (loader removed, volume removed, main
NEVER created); verifies userns_mode applied to both containers.
Config:
- application.yml replaces jardockervolume with loaderimage,
artifacttokenttlseconds, artifactbaseurl
Rules updated: .claude/rules/docker-orchestration.md (loader pattern,
userns, no more bind-mount); .claude/rules/core-classes.md
(ContainerRequest field map).
Test counts after change:
- cameleer-server-core: 116/116 unit tests pass
- cameleer-server-app: 273/273 unit tests pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates both UiAuthController listings (Auth flat + security/) so future
sessions know /logout exists, that it bumps token_revoked_before with a
+1ms race-safety bump, and that it audits under AuditCategory.AUTH.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the pure license contract types (LicenseInfo, LicenseValidator,
LicenseState, LicenseStateMachine, LicenseLimits, DefaultTierLimits) into a
new cameleer-license-api module under package com.cameleer.license.
Why: cameleer-license-minter previously depended on cameleer-server-core for
these types, dragging cameleer-server-core + cameleer-common onto the
classpath of every minter consumer (notably cameleer-saas). The SaaS
management plane has no business carrying server-runtime types — it only
needs the license contract to mint and verify tokens.
After:
cameleer-license-minter -> cameleer-license-api (no server internals)
cameleer-server-core -> cameleer-license-api
cameleer-saas -> cameleer-license-minter -> cameleer-license-api
Verified: mvn -pl cameleer-license-minter dependency:tree shows the minter
no longer pulls cameleer-server-core or cameleer-common. Full reactor
verify (-DskipITs) green: 371 tests pass.
LicenseGate stays in server-core (server-runtime state holder, not contract).
Closescameleer/cameleer-server#156
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final consolidation pass after the 36-task license-enforcement work.
core-classes.md:
- New license/ section: LicenseInfo, LicenseLimits, DefaultTierLimits,
LicenseValidator, LicenseGate, LicenseStateMachine, LicenseState.
- runtime/: added CreateGuard (functional interface for license-cap
hooks consulted by EnvironmentService/AppService/AgentRegistryService).
- admin/: AuditCategory.LICENSE added to the documented enum value list.
app-classes.md:
- New license/ section: LicenseService, LicenseRepository, LicenseRecord,
PostgresLicenseRepository, LicenseChangedEvent, LicenseEnforcer,
LicenseUsageReader, LicenseCapExceededException, LicenseExceptionAdvice,
LicenseMessageRenderer, RetentionPolicyApplier, LicenseRevalidationJob,
LicenseMetrics.
- LicenseAdminController entry expanded to document the GET response
shape and the LicenseService.install delegation pattern.
- config/: RuntimeBeanConfig note about CreateGuard wiring; new
LicenseBeanConfig entry covering the four-bean topology and the
always-failing-validator fallback.
Note: LicenseChangedEvent, LicenseRepository, LicenseRecord, and
PostgresLicenseRepository live in cameleer-server-app, not -core; the
plan's section assignments were corrected against the actual code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Returns state, expiresAt/daysRemaining, lastValidatedAt, message
(LicenseMessageRenderer.forState), and a limits[] array where each
entry carries key/current/cap/source ("license" vs "default"). Adds
public AgentRegistryService.liveCount() so max_agents can be reported
from the in-memory registry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three int fields to the Environment record + repository row mapper,
matching the columns added in V5. Default value is 1 per the V5 NOT NULL
DEFAULT 1. Read DTO surfaces the fields via Jackson record serialization;
setter endpoint deferred to a follow-up that wires the corresponding
license cap checks.
The canonical constructor enforces >= 1 for each retention field — V5
guarantees this at the DB level, but the runtime guard catches in-memory
construction errors (e.g., test sites that pass 0).
Test sites updated to the 12-arg signature with retention defaults of 1.
EnvironmentAdminControllerIT gains a regression test asserting the wire
shape exposes all three fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ComputeUsage record + computeUsage() helper to LicenseUsageReader
that aggregates from PG. DeploymentExecutor.executeAsync runs three
assertWithinCap checks (max_total_cpu_millis, max_total_memory_mb,
max_total_replicas) right after config resolution. The existing
executor try/catch turns a LicenseCapExceededException into a FAILED
deployment with the cap message in the failure reason.
Adds ComputeCapEnforcementIT (HTTP-driven; @MockBean RuntimeOrchestrator,
since cap rejection short-circuits before any orchestrator call) plus
defensive license lifts in BlueGreenStrategyIT, RollingStrategyIT,
DeploymentSnapshotIT, and DeploymentControllerAuditIT so sequential
deploys under testcontainer reuse don't trip the new caps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tenant JARs are arbitrary user code: Camel ships components (camel-exec,
camel-bean, MVEL/Groovy templating) that turn a header into shell, and
Java 17 has no SecurityManager — the JVM is not a security boundary.
This applies an unconditional hardening contract to every tenant
container so a single runc CVE no longer equals host takeover.
DockerRuntimeOrchestrator.startContainer now sets:
- cap_drop ALL (Capability.values() — docker-java has no ALL constant)
- security_opt: no-new-privileges, apparmor=docker-default
(default seccomp profile applies implicitly)
- read_only rootfs, pids_limit=512
- /tmp tmpfs rw,nosuid,size=256m — no noexec, since Netty/Snappy/LZ4/Zstd
dlopen native libs from /tmp via mmap(PROT_EXEC) which noexec blocks
The orchestrator also probes `docker info` at construction and uses runsc
(gVisor) automatically when the daemon has it registered. Override via
cameleer.server.runtime.dockerruntime (e.g. "kata"); empty = auto.
Outbound TCP, DNS, and TLS are unaffected — caps/seccomp don't gate
those — so vanilla Camel-Kafka producers/consumers and REST integrations
keep working unchanged. Stateful tenants (Kafka Streams with on-disk
state stores, apps writing to /var/log/...) need explicit writeable
volumes; that's tracked in #153 as the natural follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the page-local DS Select window picker. Drive from() / to() off
useGlobalFilters().timeRange so the dashboard tracks the same TopBar range
as Exchanges / Dashboard / Runtime. Bucket size auto-scales via
stepSecondsFor(windowSeconds) (10 s for ≤30 min → 1 h for >48 h). Query
hooks now take ServerMetricsRange = { from: Date; to: Date } instead of a
windowSeconds number, so they support arbitrary absolute or rolling ranges
the TopBar may supply (not just "now − N"). Toolbar collapses to just the
server-instance badges.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds /admin/server-metrics page mirroring the Database/ClickHouse visibility
rules: sidebar entry gated on capabilities.infrastructureEndpoints, backend
controller now has @ConditionalOnProperty(infrastructureendpoints) and
class-level @PreAuthorize('hasRole(ADMIN)'). Dashboard panels are driven
from docs/server-self-metrics.md via the generic
/api/v1/admin/server-metrics/{catalog,instances,query} API — Server Health,
JVM, HTTP & DB pools, and conditionally Alerting + Deployments when their
metrics appear in the catalog. ThemedChart / Line / Area from the design
system; hooks in ui/src/api/queries/admin/serverMetrics.ts. Not yet
browser-verified against a running dev server — backend IT covers the API
end-to-end (8 tests), UI typecheck + production bundle both clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control
planes can build the server-health dashboard without direct ClickHouse
access. One generic /query endpoint covers every panel in the
server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag,
filter-by-tag, counter-delta mode with per-server_instance_id rotation
handling, and a derived 'mean' statistic for timers. Regex-validated
identifiers, parameterised literals, 31-day range cap, 500-series response
cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated:
all 17 suggested panels now expressed as single-endpoint queries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Snapshot the full Micrometer registry (cameleer business metrics, alerting
metrics, and Spring Boot Actuator defaults) every 60s into a new
server_metrics table so server health survives restarts without an external
Prometheus. Includes a dashboard-builder reference for the SaaS team.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- app-classes: DiagramRenderController by-route endpoint no longer
depends on the agent registry; points at findLatestContentHashForAppRoute
and cross-refs the exchange viewer's content-hash path.
- core-classes: document the new DiagramStore method and note why the
agent-scoped findContentHashForRoute stays for the ingest path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously `TraefikLabelBuilder` hardcoded `tls.certresolver=default` on
every router. That assumes a resolver literally named `default` exists
in the Traefik static config — true for ACME-backed installs, false for
dev/local installs that use a file-based TLS store. Traefik logs
"Router uses a nonexistent certificate resolver" for the bogus resolver
on every managed app, and any future attempt to define a differently-
named real resolver would silently skip these routers.
Server-wide setting via `CAMELEER_SERVER_RUNTIME_CERTRESOLVER` (empty by
default) flows through `ConfigMerger.GlobalRuntimeDefaults.certResolver`
into `ResolvedContainerConfig.certResolver`. When blank the
`tls.certresolver` label is omitted entirely; `tls=true` is still
emitted so Traefik serves the default TLS-store cert. When set, the
label is emitted with the configured resolver name.
Not per-app/per-env configurable: there is one Traefik per server
instance and one resolver config; app-level override would only let
users break their own routers.
TDD: TraefikLabelBuilderTest gains 3 cases (resolver set, null, blank).
Full unit suite 211/0/0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a boolean `externalRouting` flag (default `true`) on
ResolvedContainerConfig. When `false`, TraefikLabelBuilder emits only
the identity labels (`managed-by`, `cameleer.*`) and skips every
`traefik.*` label, so the container is not published by Traefik.
Sibling containers on `cameleer-traefik` / `cameleer-env-{tenant}-{env}`
can still reach it via Docker DNS on whatever port the app listens on.
TDD: new TraefikLabelBuilderTest covers enabled (default labels present),
disabled (zero traefik.* labels), and disabled (identity labels retained)
cases. Full module unit suite: 208/0/0.
Plumbed through ConfigMerger read, DeploymentExecutor snapshot, UI form
state, Resources tab toggle, POST payload, and snapshot-to-form mapping.
Rule files updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Maven: enable useIncrementalCompilation; Surefire forkCount=1C +
reuseForks=true so unit-test JVMs are reused per CPU core instead of
spawning per class (205 tests pass under the new strategy).
- Testcontainers: opt-in reuse via .withReuse(true) on Postgres +
ClickHouse base; per-developer enable via ~/.testcontainers.properties.
- UI: drop redundant `tsc --noEmit` from `npm run build` (Vite already
type-checks); split into a dedicated `npm run typecheck` script.
- CI: cache ~/.npm and ui/node_modules/.vite alongside Maven; npm ci with
--prefer-offline --no-audit --fund=false; paths-ignore for docs-only,
.planning/ and .claude/ changes so doc-only pushes skip the pipeline.
- Docs: CLAUDE.md + .claude/rules/cicd.md updated with the new build
knobs and the Testcontainers reuse opt-in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Backend: rename deleteTerminalByAppAndEnvironment → deleteFailedByAppAndEnvironment.
STOPPED rows were being wiped on every redeploy, so Checkpoints was always empty.
Now only FAILED rows are pruned; STOPPED deployments are retained as restorable
checkpoints (they still carry deployed_config_snapshot from their RUNNING window).
- UI filter: any deployment with a snapshot is a checkpoint (was RUNNING|DEGRADED only,
which excluded the main case — the previous blue/green deployment now in STOPPED).
- UI placement: Checkpoints disclosure now renders inside IdentitySection, matching
the design spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refresh the three rules files to match the new executor behavior:
- docker-orchestration.md: rewrite DeploymentExecutor Details with
container naming scheme ({...}-{replica}-{generation}), strategy
dispatch (blue-green vs rolling), and the new DEGRADED semantics
(post-deploy only). Update TraefikLabelBuilder + ContainerLogForwarder
bullets for the generation suffix + new cameleer.generation label.
- app-classes.md: DeploymentExecutor + TraefikLabelBuilder bullets
mirror the same.
- core-classes.md: add DeploymentStrategy enum; note DEGRADED is now
post-deploy-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires DirtyStateCalculator behind an HTTP endpoint on AppController.
Adds findLatestSuccessfulByAppAndEnv to PostgresDeploymentRepository,
registers DirtyStateCalculator as a Spring bean (with ObjectMapper for
JavaTimeModule support), and covers all three scenarios with IT.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace EnvironmentSelector "All Envs" dropdown with Button+Modal (DS Modal, forced on first-use).
- Add 8-swatch preset color picker in the Environment settings "Appearance" section; commits via useUpdateEnvironment.
- Render a 3px fixed top bar in the current env's color across every page (z-index 900, below DS modals).
- New env-colors tokens (--env-color-*, light + dark) and envColorVar() helper with slate fallback.
- Vitest coverage for button, modal, and color helpers (13 new specs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 6.2 housekeeping — add BatchResultApplier to the class map per
CLAUDE.md convention. Introduced in Task 2.2 as the @Transactional
wrapper for atomic per-rule batch commits (instance writes + notification
enqueues + cursor advance).
Also refreshes GitNexus index stats auto-emitted into AGENTS.md /
CLAUDE.md (8778 -> 8893 nodes, 22647 -> 23049 edges).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No callers after the legacy PG ingestion path was retired in 0f635576.
core-classes.md updated to drop the leftover note.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ExecutionController was @ConditionalOnMissingBean(ChunkAccumulator.class),
and ChunkAccumulator is registered unconditionally — the legacy controller
never bound in any profile. Even if it had, IngestionService.ingestExecution
called executionStore.upsert(), and the only ExecutionStore impl
(ClickHouseExecutionStore) threw UnsupportedOperationException from upsert
and upsertProcessors. The entire RouteExecution → upsert path was dead code
carrying four transitive dependencies (RouteExecution import, eventPublisher
wiring, body-size-limit config, searchIndexer::onExecutionUpdated hook).
Removed:
- cameleer-server-app/.../controller/ExecutionController.java (whole file)
- ExecutionStore.upsert + upsertProcessors (interface methods)
- ClickHouseExecutionStore.upsert + upsertProcessors (thrower overrides)
- IngestionService.ingestExecution + toExecutionRecord + flattenProcessors
+ hasAnyTraceData + truncateBody + toJson/toJsonObject helpers
- IngestionService constructor now takes (DiagramStore, WriteBuffer<Metrics>);
dropped ExecutionStore + Consumer<ExecutionUpdatedEvent> + bodySizeLimit
- StorageBeanConfig.ingestionService(...) simplified accordingly
Untouched because still in use:
- ExecutionRecord / ProcessorRecord records (findById / findProcessors /
SearchIndexer / DetailController)
- SearchIndexer (its onExecutionUpdated never fires now since no-one
publishes ExecutionUpdatedEvent, but SearchIndexerStats is still
referenced by ClickHouseAdminController — separate cleanup)
- TaggedExecution record has no remaining callers after this change —
flagged in core-classes.md as a leftover; separate cleanup.
Rule docs updated:
- .claude/rules/app-classes.md: retired ExecutionController bullet, fixed
stale URL for ChunkIngestionController (it owns /api/v1/data/executions,
not /api/v1/ingestion/chunk/executions).
- .claude/rules/core-classes.md: IngestionService surface + note the dead
TaggedExecution.
Full IT suite post-removal: 560 tests run, 11 F + 1 E — same 12 failures
in the same 3 previously-parked classes (AgentSseControllerIT / SseSigningIT
SSE-timing + ClickHouseStatsStoreIT timezone bug). No regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GET /alerts gains tri-state acked + read query params
- new endpoints: DELETE /{id} (soft-delete), POST /bulk-delete, POST /bulk-ack, POST /{id}/restore
- requireLiveInstance 404s on soft-deleted rows; restore() reads the row regardless
- BulkReadRequest → BulkIdsRequest (shared body for bulk read/ack/delete)
- AlertDto gains readAt; deletedAt stays off the wire
- InAppInboxQuery.listInbox threads acked/read through to the repo (7-arg, no more null placeholders)
- SecurityConfig: new matchers for bulk-ack (VIEWER+), DELETE/bulk-delete/restore (OPERATOR+)
- AlertControllerIT: persistence assertions on /read + /bulk-read; full coverage for new endpoints
- InAppInboxQueryTest: updated to 7-arg listInbox signature
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allows alert rules to fire on agent-lifecycle events — REGISTERED,
RE_REGISTERED, DEREGISTERED, WENT_STALE, WENT_DEAD, RECOVERED — rather
than only on current state. Each matching `(agent, eventType, timestamp)`
becomes its own ackable AlertInstance, so outages on distinct agents are
independently routable.
Core:
- New `ConditionKind.AGENT_LIFECYCLE` + `AgentLifecycleCondition` record
(scope, eventTypes, withinSeconds). Compact ctor rejects empty
eventTypes and withinSeconds<1.
- Strict allowlist enum `AgentLifecycleEventType` (six entries matching
the server-emitted types in `AgentRegistrationController` and
`AgentLifecycleMonitor`). Custom agent-emitted event types tracked in
backlog issue #145.
- `AgentEventRepository.findInWindow(env, appSlug, agentId, eventTypes,
from, to, limit)` — new read path ordered `(timestamp ASC, insert_id
ASC)` used by the evaluator. Implemented on
`ClickHouseAgentEventRepository` with tenant + env filter mandatory.
App:
- `AgentLifecycleEvaluator` queries events in the last `withinSeconds`
window and returns `EvalResult.Batch` with one `Firing` per row.
Every Firing carries a canonical `_subjectFingerprint` of
`"<agentId>:<eventType>:<tsMillis>"` in context plus `agent` / `event`
subtrees for Mustache templating.
- `NotificationContextBuilder` gains an `AGENT_LIFECYCLE` branch that
exposes `{{agent.id}}`, `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}`.
- Validation is delegated to the record compact ctor + enum at Jackson
deserialization time — matches the existing policy of keeping
controller validators focused on env-scoped / SQL-injection concerns.
Schema:
- V16 migration generalises the V15 per-exchange discriminator on
`alert_instances_open_rule_uq` to prefer `_subjectFingerprint` with a
fallback to the legacy `exchange.id` expression. Scalar kinds still
resolve to `''` and keep one-open-per-rule. Duplicate-key path in
`PostgresAlertInstanceRepository.save` is unchanged — the index is
the deduper.
UI:
- New `AgentLifecycleForm.tsx` wizard form with multi-select chips for
the six allowed event types + `withinSeconds` input. Wired into
`ConditionStep`, `form-state` (validation + defaults: WENT_DEAD,
300 s), and `enums.ts` options. Tests in `enums.test.ts` pin the
new option array.
- `alert-variables.ts` registers `{{agent.app}}`, `{{event.type}}`,
`{{event.timestamp}}`, `{{event.detail}}` leaves for the new kind,
and extends `agent.id`'s availability list to include `AGENT_LIFECYCLE`.
Tests (all passing):
- 5 new JSON-roundtrip cases on `AlertConditionJsonTest` (positive +
empty/zero/unknown-type rejection).
- 5 new evaluator unit tests on `AgentLifecycleEvaluatorTest` (empty
window, multi-agent fingerprint shape, scope forwarding, missing env).
- `NotificationContextBuilderTest` switch now covers the new kind.
- 119 alerting unit tests + 71 UI tests green.
Docs: `.claude/rules/{core,app,ui}` and CLAUDE.md migration list updated.
After the UiAuth / Oidc / UserAdmin controllers were aligned to store
bare user_ids, the rules that future sessions read were still describing
the old behaviour (OutboundConnectionAdminController "strips user:
prefix" — true mechanically but the subtlety is that the strip is
the bridge between a prefixed JWT subject and an unprefixed DB key,
not a hack).
- CLAUDE.md: expand the User persistence one-liner to state the
convention authoritatively (local `<username>`, OIDC `oidc:<sub>`,
JWT `user:` namespace, env-scoped controllers strip for FK).
- .claude/rules/app-classes.md:
- Add "User ID conventions" section near the top that spells out
write-path vs read-path behaviour in one place.
- Add UiAuthController + OidcAuthController entries under
security/ with their upsert shape documented.
- Soften the OutboundConnectionAdminController line to reference
the convention instead of restating the mechanism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.claude/rules/ui.md now maps every Plan 03 UI surface. Admin guide gains
an inbox/rules/silences walkthrough so ops teams can start in the UI
without reading the spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- LogQueryController: note response shape, sort param, and that the
cursor tiebreak is the insert_id UUID column (not exchange/instance)
- AgentEventsController: cursor now carries insert_id UUID (was instanceId);
order is (timestamp DESC, insert_id DESC)
- core-classes: add AgentEventPage record; note that the non-paginated
AgentEventRepository.query(...) path has been removed
- core-classes: note LogSearchRequest.sources/levels are now List<String>
with multi-value OR semantics
Keeps the rule files in sync with the cursor-pagination + multi-select
filter work on main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reflects LogQueryController's multi-value source/level filters,
AgentEventsController's cursor pagination shape, and the new
useInfiniteStream/InfiniteScrollArea UI primitives used by streaming
views.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>