ClickHouse does not have lag() as a window function. Use lagInFrame()
with explicit ROWS BETWEEN frame instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Not needed yet -- all deployments are under our control and can be
reset manually if the old schema is encountered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite init.sql as a pure CREATE IF NOT EXISTS file with no DROP or
INSERT statements. Safe for repeated runs on every startup without
corrupting aggregated stats data.
Old deployments with count()-based stats tables are migrated
automatically: ClickHouseSchemaInitializer checks system.columns for
the old AggregateFunction(count) type and drops those tables before
init.sql recreates them with the correct uniq() schema. This runs
once per table and is a no-op on fresh installs or already-migrated
deployments.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Counter metrics like chunks.exported.count are monotonically increasing.
Add mode=delta query parameter to the agent metrics API that computes
per-bucket deltas server-side using ClickHouse lag() window function:
max(value) per bucket, then greatest(0, current - previous) to get the
increase per period with counter-reset handling.
The chunks exported/dropped charts now show throughput per bucket
instead of the ever-increasing cumulative total.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend uniq-based dedup from processor tables to all stats tables
(stats_1m_all, stats_1m_app, stats_1m_route). Execution-level tables
use uniq(execution_id). Processor-level tables now use
uniq(concat(execution_id, toString(seq))) so loop iterations (same
exchange, different seq) are counted while chunk retry duplicates
(same exchange+seq) are collapsed.
All stats tables are dropped, recreated, and backfilled from raw
data on startup. All Java queries updated: countMerge -> uniqMerge,
countIfMerge -> uniqIfMerge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ClickHouseSchemaInitializer splits on semicolons before filtering
comments, so semicolons inside comment text created invalid statements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Processor execution counts were inflated by duplicate inserts into the
plain MergeTree processor_executions table (chunk retries, reconnects).
Replace count()/countIf() with uniq(execution_id)/uniqIf() in both
stats_1m_processor and stats_1m_processor_detail MVs so each exchange
is counted once per processor regardless of duplicates.
Tables are dropped and rebuilt from raw data on startup. MV created
after backfill to avoid double-counting.
Also adds stats_1m_processor_detail to the catalog purge list (was
missing).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After server restart, agents send logs before re-registering. Instead
of dropping these logs, fall back to application and environment from
the JWT token claims. Only drops logs when neither registry nor JWT
provide an applicationId.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Labels prometheus.scrape, prometheus.path, and prometheus.port are now
set on every deployed container based on the resolved runtime type,
enabling automatic Prometheus service discovery via docker_sd_configs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server defaultConfig() and UI fallbacks returned "NONE" for payload
capture, but the agent defaults to "BOTH". This caused unwanted
reconfiguration when users saved other settings — payload capture
would silently change from the agent's default BOTH to NONE.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apps can now join additional Docker networks (e.g., monitoring,
prometheus) configured via containerConfig.extraNetworks. Flows through
the 3-layer config merge. Networks are created if absent and containers
are connected during deployment. UI adds a pill-list field on the
Resources tab (both create and edit views).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spring's default handler silently returns 400 for malformed payloads
with no server-side log. Added @ExceptionHandler to catch and WARN with
the agent instance ID and root cause message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the endpoint silently returned 202 for all failures: missing
agent identity, unregistered agents, empty payloads, and buffer-full
drops. Now logs WARN for each failure case with context (instanceId,
entry count, reason). Normal ingestion logged at INFO with accepted
count. Buffer-full drops tracked individually with accepted/dropped
counts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flyway V10 migration adds the two nullable columns. AppVersion record,
AppVersionRepository interface, and PostgresAppVersionRepository are
updated to carry and persist detected runtime information.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LogEntry.getSource() exists but source is not a constructor parameter
in cameleer3-common — it uses a default value.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All constructor calls updated to include the new source field added
in the log forwarding v2 changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace LogBatch wrapper with raw List<LogEntry> on the ingestion endpoint.
Add source column to ClickHouse logs table and propagate it through the
storage, search, and HTTP layers (LogSearchRequest, LogEntryResult,
LogEntryResponse, LogQueryController).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add cameleer.server.security.infrastructureendpoints property (default true) and
@ConditionalOnProperty to DatabaseAdminController and ClickHouseAdminController so
the SaaS provisioner can set CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false
to suppress these endpoints (404) on tenant server containers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move container resource defaults into their own sub-namespace for
future extensibility:
cameleer.server.runtime.container.memorylimit → CAMELEER_SERVER_RUNTIME_CONTAINER_MEMORYLIMIT
cameleer.server.runtime.container.cpushares → CAMELEER_SERVER_RUNTIME_CONTAINER_CPUSHARES
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move OIDC properties into a nested Oidc class within SecurityProperties
for clearer grouping. Env vars gain an extra separator:
cameleer.server.security.oidc.issueruri → CAMELEER_SERVER_SECURITY_OIDC_ISSUERURI
cameleer.server.security.oidc.jwkseturi → CAMELEER_SERVER_SECURITY_OIDC_JWKSETURI
cameleer.server.security.oidc.audience → CAMELEER_SERVER_SECURITY_OIDC_AUDIENCE
cameleer.server.security.oidc.tlsskipverify → CAMELEER_SERVER_SECURITY_OIDC_TLSSKIPVERIFY
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse is the only storage backend — there is no alternative.
The enabled flag created a false sense of optionality: setting it to
false would crash on startup because most beans unconditionally depend
on the ClickHouse JdbcTemplate.
Remove all @ConditionalOnProperty annotations gating ClickHouse beans,
the enabled property from application.yml, and the K8s manifest entry.
Also fix old property names in AbstractPostgresIT test config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.
Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
(e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added routeControlEnabled and replayEnabled to ResolvedContainerConfig,
flowing through the three-layer config merge (global -> env -> app).
Both default to true. Admins can disable them per environment (e.g.
prod) via the defaultContainerConfig JSONB, or per app via the app's
containerConfig JSONB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
buildEnvVars was missing CAMELEER_ROUTE_CONTROL_ENABLED and
CAMELEER_REPLAY_ENABLED, so deployed app containers defaulted to false
and agents didn't announce these capabilities.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The password reset endpoint was fully blocked under OIDC mode. Now
M2M callers (identified by oidc: principal prefix) can reset local
user passwords, enabling the SaaS platform to manage the server's
built-in admin credentials.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows the SaaS platform to identify and clean up all containers
belonging to a tenant on delete (cameleer/cameleer-saas#55).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OIDC callback extracted roles from the token's Custom JWT claim
(e.g. roles: [server:admin]) but never used them. The
applyClaimMappings fallback only assigned defaultRoles (VIEWER).
Now the fallback priority is: claim mapping rules > OIDC token
roles > defaultRoles. This ensures users get their org-mapped
roles (owner → server:admin) without requiring manual claim
mapping rule configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Environment networks now include the tenant ID to prevent cross-tenant
collisions: cameleer-env-{tenantId}-{envSlug} instead of cameleer-env-
{envSlug}. Without this, two tenants with a "dev" environment would
share the same Docker network.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of hardcoding cameleer-traefik as the primary network for
deployed app containers, use CAMELEER_DOCKER_NETWORK (env var). In
SaaS mode this is the tenant-isolated network (cameleer-tenant-{slug}).
Apps still connect to cameleer-traefik (for routing) and cameleer-env-
{slug} (for intra-environment discovery) as additional networks.
This enables per-tenant network isolation: apps deployed by tenant A
cannot reach apps deployed by tenant B since they share no network.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return a JSON error body from UserAdminController instead of an empty 400,
and extract API error messages in adminFetch so toasts display the reason.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: AgentRegistryService gains findByApplicationAndEnvironment()
and environment-aware addGroupCommandWithReplies() overload.
AgentCommandController and ApplicationConfigController accept optional
environment query parameter. When set, commands only target agents in
that environment. Backward compatible — null means all environments.
Frontend: All command mutations (config update, route control, traced
processors, tap config, route recording) now pass selectedEnv to the
backend via query parameter.
Prevents cross-environment command leakage — e.g., updating config for
prod no longer pushes to dev agents.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add environment parameter to AgentEventsController, AgentEventService,
and ClickHouseAgentEventRepository (filters agent_events by environment)
- Wire selectedEnv to useAgents and useAgentEvents in both AgentHealth
and AgentInstance pages
- Wire selectedEnv to useStatsTimeseries in AgentInstance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Part A: Environment creation slug is now auto-derived from display name
and shown read-only (matching app creation pattern). Removes manual slug
input.
Part B: All data queries now pass the selected environment to backend:
- Exchanges search, Dashboard L1/L2/L3 stats, Routes metrics, Route
detail, correlation chains, and processor metrics all filter by
selected environment.
- Backend RouteMetricsController now accepts environment parameter for
both route and processor metrics endpoints.
Closes #XYZ
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The DockerEventMonitor only reacted to Docker events. If an event was
missed (e.g., during reconnect or startup race), a DEGRADED deployment
with all replicas healthy would never promote back to RUNNING.
Add a @Scheduled reconciliation (every 30s) that inspects actual
container state and corrects deployment status mismatches.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Prevent removal of last ADMIN role via role unassign, user delete,
or group role removal (returns 409 Conflict)
- Add password policy: min 12 chars, 3/4 character classes, no username
- Add brute-force protection: 5 attempts then 15min lockout, IP rate limit
- Add token revocation on password change via token_revoked_before column
- V9 migration adds failed_login_attempts, locked_until, token_revoked_before
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Part 1 — Config snapshot:
- V8 migration adds resolved_config JSONB to deployments table
- DeploymentExecutor saves the full resolved config at deploy time
- Deployment record includes resolvedConfig for auditability
Part 2 — Composite health StatusDot:
- CatalogController computes composite health from deployment status +
agent health (green only when RUNNING AND agent live)
- CatalogApp includes healthTooltip (e.g. "Deployment: RUNNING,
Agents: live (1 connected)")
- StatusDot added to app detail header with deployment status Badge
- StatusDot added to deployment table rows
- Sidebar passes composite health + tooltip through to tree nodes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename cpuShares to cpuRequest (millicores), cpuLimit from cores to
millicores. ResolvedContainerConfig translates to Docker-native units
via dockerCpuShares() and dockerCpuQuota() helpers. Future K8s
orchestrator can pass millicores through directly.
- Fix waitForAnyHealthy to wait for ALL replicas instead of returning
on first healthy one. Prevents false DEGRADED status with 2+ replicas.
- Default app detail to Configuration tab (was Overview)
- Reorder config sub-tabs: Monitoring, Resources, Variables, Traces &
Taps, Route Recording
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate route catalog (agent-driven) and apps table (deployment-
driven) into a single GET /api/v1/catalog?environment={slug} endpoint.
Apps table is authoritative; agent data enriches with live health,
routes, and metrics. Unmanaged apps (agents without App record) appear
with managed=false.
- Add CatalogController merging App records + agent registry + ClickHouse
- Add CatalogApp DTO with deployment summary, managed flag, health
- Change AppController and DeploymentController to accept slugs (not UUIDs)
- Add AppRepository.findBySlug() and AppService.getBySlug()
- Replace useRouteCatalog() with useCatalog() across all UI components
- Navigate to /apps/{slug} instead of /apps/{UUID}
- Update sidebar, search, and all catalog lookups to use slug
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redesign DeploymentProgress component: track-based layout with amber
brand color, checkmarks for completed steps, user-friendly labels
(Prepare, Image, Network, Launch, Verify, Activate, Live)
- Delete terminal (STOPPED/FAILED) deployments before creating new ones
for the same app+environment, preventing duplicate rows in the UI
- Update CLAUDE.md with comprehensive key class locations, correct deploy
stages, database migration reference, and REST endpoint summary
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The runtime-base image and all agent Dockerfiles now read
CAMELEER_SERVER_URL instead of CAMELEER_EXPORT_ENDPOINT.
Updated the volume-mode entrypoint override to match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When CAMELEER_JAR_DOCKER_VOLUME is set, the orchestrator mounts the
named volume at the jar storage path instead of using a host bind mount.
This solves the path translation issue in Docker-in-Docker setups where
the server runs inside a container and manages sibling containers.
The entrypoint is overridden to use the volume-mounted JAR path via
the CAMELEER_APP_JAR env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker's connectToNetworkCmd needs the network ID (not name) and the
container's network sandbox must be ready. Moving network connection
to DeploymentExecutor where DockerNetworkManager handles ID resolution
and the container is already started.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>