Explicit spring.datasource.url in YAML takes precedence over the env var,
causing deployed containers to connect to localhost instead of the postgres
service. Now the YAML uses ${SPRING_DATASOURCE_URL:...} so the env var
wins when set. Flyway inherits from the datasource (no separate URL).
Removed CAMELEER_DB_SCHEMA — schema is part of the datasource URL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Existing deployment has tables in public schema. The new tenant_default
default breaks startup because Flyway sees an empty schema. Override to
public for backward compat; new deployments use the tenant-derived default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Schema now defaults to tenant_${cameleer.tenant.id} (e.g. tenant_default,
tenant_acme) instead of public. Flyway create-schemas: true ensures the
schema is auto-created on first startup. CAMELEER_DB_SCHEMA env var still
available as override for feature branch isolation. Removed hardcoded
public schema from K8s base and main overlay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive standalone document covering API surface, agent protocol,
security, storage, multi-tenancy, deployment, and configuration — designed
for external systems (like the SaaS orchestration layer) that need to
understand and manage Cameleer3 Server instances.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Environment selector was losing its value on navigation because URL search
params were silently dropped by navigate() calls. Moved to a Zustand store
with localStorage persistence so the selection survives navigation, page
refresh, and new tabs. Switching environment now resets all filters, clears
URL params, invalidates queries, and remounts pages via Outlet key. Also
syncs openapi.json schema with running backend.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HeartbeatRequest now carries environmentId (cameleer3-common update).
Auto-heal prefers the heartbeat value (most current) over the JWT
claim, ensuring agents recover their correct environment immediately
on the first heartbeat after server restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 'env' claim to agent JWTs (set at registration, carried through
refresh). Auto-heal on heartbeat/SSE now reads environment from the
JWT instead of hardcoding 'default', so agents retain their correct
environment after server restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DS v0.1.31 changes .env wrapper to neutral button style matching
other TopBar controls. Simplified selector CSS to inherit all
font/color properties from the wrapper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make the select transparent (no border, no background) so it
inherits the DS .env pill styling (success-colored badge with
mono font). Negative margins compensate for the pill padding.
Dropdown chevron uses currentColor to match the pill text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use unfiltered agent query to discover environments (avoids circular
filter). Always show selector even with single environment so it's
visible as a label. Default to ['default'] when no agents connected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update @cameleer/design-system to v0.1.30 which accepts ReactNode
for the environment prop. Move EnvironmentSelector from standalone
div into TopBar, rendering between theme toggle and user menu.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: Added optional `environment` query parameter to catalog,
search, stats, timeseries, punchcard, top-errors, logs, and agents
endpoints. ClickHouse queries filter by environment when specified
(literal SQL for AggregatingMergeTree, ? binds for raw tables).
StatsStore interface methods all accept environment parameter.
UI: Added EnvironmentSelector component (compact native select).
LayoutShell extracts distinct environments from agent data and
passes selected environment to catalog and agent queries via URL
search param (?env=). TopBar shows current environment label.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.
Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
"default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
updated ORDER BY (tenant→time→env→app), added tenant_id to
usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures
Closes#123
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When no exchange is selected, the topology-only diagram now shows
the RouteControlBar above it (if the agent supports routeControl
or replay and the user has OPERATOR/ADMIN role). This fixes a gap
where suspended routes with no recent exchanges had no way to be
resumed from the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The heartbeat now carries capabilities (per protocol v2 update).
On each heartbeat, capabilities are updated in the agent registry.
On auto-heal (server restart), capabilities from the heartbeat
are used instead of empty Map.of(), so the agent's feature flags
(replay, routeControl, logForwarding, etc.) are restored
immediately on the first heartbeat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert ClickHouseUsageTracker and ClickHouseMetricsQueryStore to
use JDBC parameterized queries (? binds) — these query raw tables
without AggregateFunction columns.
Fix lit(String) in RouteMetricsController and ClickHouseStatsStore
to escape backslashes before single quotes. Without this, an input
like \' breaks out of the string literal in ClickHouse (where \
is an escaped backslash). These must remain as literal SQL because
the ClickHouse JDBC 0.9.x driver wraps PreparedStatement in
sub-queries that strip AggregateFunction types, breaking -Merge
combinators.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Running mvn sonar:sonar as a separate invocation skips child
modules. Combining verify and sonar:sonar in a single mvn
command ensures the reactor processes all modules.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Maven sonar plugin auto-detects sources and tests from the POM
module structure. Passing sonar.sources as CLI args caused path
doubling (module-dir/module-dir/src) in multi-module projects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The standalone sonar-scanner CLI has Java discovery issues in the
build container. Switch to the Maven sonar plugin (same approach
as cameleer3 agent repo), which uses Maven's own JDK. This also
removes the sonar-scanner download/install step entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonar-scanner 6.x checks SONAR_SCANNER_JAVA_HOME, not JAVA_HOME.
Despite JAVA_HOME being correct and java being on PATH, the scanner
uses its own env var for Java discovery.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
java binary may not be on PATH directly in the build container.
Derive JAVA_HOME from the jar binary location (which we know works)
and prepend JAVA_HOME/bin to PATH so sonar-scanner can find java.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonar-scanner 6.x requires JAVA_HOME or java on PATH. The build
container has Java installed but doesn't export JAVA_HOME, so
derive it from the java binary location.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jar xf doesn't preserve Unix file permissions from zip entries,
so the sonar-scanner binary lacks the execute bit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diagnostics showed ~3,200 tiny inserts per 5 minutes:
- processor_executions: 2,376 inserts (14 rows avg) — one per chunk
- logs: 803 inserts (5 rows avg) — synchronous in HTTP handler
Fix 1: Consolidate processor inserts — new insertProcessorBatches() method
flattens all ProcessorBatch records into a single INSERT per flush cycle.
Fix 2: Buffer log inserts — route through WriteBuffer<BufferedLogEntry>,
flushed on the same 5s interval as executions. LogIngestionController now
pushes to buffer instead of inserting directly.
Also reverts async_insert config (doesn't work with JDBC inline VALUES).
Expected: ~3,200 inserts/5min → ~160 (20x reduction in part creation,
MV triggers, and background merge work).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diagnostics showed 3,200 tiny inserts per 5 minutes (processor_executions:
2,376 at 14 rows avg, logs: 803 at 5 rows avg), each creating a new part
and triggering MV aggregations + background merges. This was the root cause
of ~400m CPU usage at 3 tx/s.
async_insert=1 with 5s busy timeout lets ClickHouse buffer incoming inserts
and consolidate them into fewer, larger parts before writing to disk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkAccumulator now extracts inputBody/outputBody/inputHeaders/outputHeaders
from ExecutionChunk.inputSnapshot/outputSnapshot instead of storing empty strings
- Set ClickHouse server log level to warning (was trace by default)
- Update CLAUDE.md to document Ed25519 key derivation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace DB-persisted keypair with deterministic derivation from
CAMELEER_JWT_SECRET via HMAC-SHA256 seed + seeded SHA1PRNG KeyPairGenerator.
Same secret = same key pair across restarts, no private key in the database.
Closes#121
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The keypair was generated ephemerally on each startup, causing agents
to reject all commands after a server restart (signature mismatch).
Now persisted to PostgreSQL server_config table and restored on startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Merge all V1-V11 migration scripts into one idempotent init.sql
- Simplify ClickHouseSchemaInitializer to load single file
- Replace route_diagrams projection with in-memory caches:
hashCache (routeId+instanceId → contentHash) warm-loaded on startup,
graphCache (contentHash → RouteGraph) lazy-populated on access
- Eliminates 9M+ row scans on diagram lookups
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse 24.12 requires this setting before adding projections to
ReplacingMergeTree tables. Using 'drop' mode which discards the projection
during deduplication merges and rebuilds it afterward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Alphabetical sort put V10/V11 before V2-V9 ("V11" < "V1_" in ASCII),
causing the route_diagrams projection to run before the table existed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set com.clickhouse log level to INFO and org.apache.hc.client5 to WARN
- Admin hooks (useUsers/useGroups/useRoles) now only fetch on admin pages,
eliminating AUDIT view_users entries on every UI click
- Add ClickHouse projection on route_diagrams for (tenant_id, route_id,
instance_id, created_at) to avoid full table scans on diagram lookups
- Bump @cameleer/design-system to v0.1.28 (PAUSED mode time range fix,
refreshTimeRange API)
- Call refreshTimeRange before invalidateQueries in PAUSED mode manual
refresh so sidebar clicks use current time window
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On first click, Dashboard was in non-split mode. The click set
selectedId locally then triggered split view, which remounted
Dashboard — losing the selectedId state.
Added activeExchangeId prop passed from ExchangesPage so the
selection survives the remount. Also syncs via useEffect when
parent changes selection (e.g. correlated exchange navigation).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
useLiveQuery returned enabled:false when paused, which prevented
queries from running at all. Changed to enabled:true always —
PAUSED now means "fetch once, no polling" instead of "don't fetch".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After server restart, auto-healed agents register with empty
routeIds. The catalog only looked at agent registry for routes,
so routes and counts disappeared.
Now merges route IDs from ClickHouse stats_1m_route into the
catalog. Also includes apps that only exist in ClickHouse data
(no agent currently registered). Routes and exchange counts
survive server restarts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When autoRefresh is disabled, sidebar clicks now invalidate all
queries (queryClient.invalidateQueries()), triggering a re-fetch.
This gives users "click to refresh" behavior instead of stale data.
When LIVE mode is on, queries already poll at intervals, so no
invalidation is needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the Applications section is already expanded, clicking the
header now navigates to /{tab} (all applications) instead of
collapsing. When collapsed, clicking expands as before.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three related issues caused by in-memory agent registry being empty
after server restart:
1. JwtAuthenticationFilter rejected valid agent JWTs if agent wasn't
in registry — now authenticates any valid JWT regardless
2. Heartbeat returned 404 for unknown agents — now auto-registers
the agent from JWT claims (subject, application)
3. SSE endpoint returned 404 — same auto-registration fix
JWT validation result is stored as a request attribute so downstream
controllers can extract the application claim for auto-registration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The refresh endpoint required the agent to exist in the in-memory
registry. After server restart the registry is empty, so all refresh
attempts got 404. The refresh token itself is self-contained with
subject, application, and roles — the registry lookup is optional.
Now uses application from the JWT, falling back to registry only
if the agent happens to be registered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>