Commit Graph

805 Commits

Author SHA1 Message Date
hsiegeln
3bd07c9b07 feat: add OIDC resource server support with JWKS discovery and scope-based roles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:10:08 +02:00
hsiegeln
a5c4e0cead feat: add spring-boot-starter-oauth2-resource-server and OIDC properties
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 13:06:53 +02:00
hsiegeln
de85cdf5a2 fix: let SPRING_DATASOURCE_URL fully control datasource connection
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m26s
Explicit spring.datasource.url in YAML takes precedence over the env var,
causing deployed containers to connect to localhost instead of the postgres
service. Now the YAML uses ${SPRING_DATASOURCE_URL:...} so the env var
wins when set. Flyway inherits from the datasource (no separate URL).
Removed CAMELEER_DB_SCHEMA — schema is part of the datasource URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:24:22 +02:00
hsiegeln
2277a0498f fix: set CAMELEER_DB_SCHEMA=public for existing main deployment
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m1s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Existing deployment has tables in public schema. The new tenant_default
default breaks startup because Flyway sees an empty schema. Override to
public for backward compat; new deployments use the tenant-derived default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:21:17 +02:00
hsiegeln
ac87aa6eb2 fix: derive PG schema from tenant ID instead of defaulting to public
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m17s
Schema now defaults to tenant_${cameleer.tenant.id} (e.g. tenant_default,
tenant_acme) instead of public. Flyway create-schemas: true ensures the
schema is auto-created on first startup. CAMELEER_DB_SCHEMA env var still
available as override for feature branch isolation. Removed hardcoded
public schema from K8s base and main overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:46:57 +02:00
hsiegeln
f16d331621 docs: add SERVER-CAPABILITIES.md for SaaS integration reference
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Comprehensive standalone document covering API surface, agent protocol,
security, storage, multi-tenancy, deployment, and configuration — designed
for external systems (like the SaaS orchestration layer) that need to
understand and manage Cameleer3 Server instances.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 20:30:42 +02:00
hsiegeln
69055f7d74 fix: persist environment selection in Zustand store instead of URL params
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Environment selector was losing its value on navigation because URL search
params were silently dropped by navigate() calls. Moved to a Zustand store
with localStorage persistence so the selection survives navigation, page
refresh, and new tabs. Switching environment now resets all filters, clears
URL params, invalidates queries, and remounts pages via Outlet key. Also
syncs openapi.json schema with running backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 17:12:16 +02:00
hsiegeln
37eb56332a fix: use environmentId from heartbeat body for auto-heal
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
HeartbeatRequest now carries environmentId (cameleer3-common update).
Auto-heal prefers the heartbeat value (most current) over the JWT
claim, ensuring agents recover their correct environment immediately
on the first heartbeat after server restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 16:21:55 +02:00
hsiegeln
72ec87a3ba fix: persist environment in JWT claims for auto-heal recovery
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m7s
CI / deploy (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
Add 'env' claim to agent JWTs (set at registration, carried through
refresh). Auto-heal on heartbeat/SSE now reads environment from the
JWT instead of hardcoding 'default', so agents retain their correct
environment after server restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 16:12:25 +02:00
hsiegeln
346e38ee1d fix: update DS to v0.1.31, simplify env selector styles
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m23s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
DS v0.1.31 changes .env wrapper to neutral button style matching
other TopBar controls. Simplified selector CSS to inherit all
font/color properties from the wrapper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 16:01:58 +02:00
hsiegeln
39d9ec9cd6 fix: restyle environment selector to match DS TopBar pill
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Make the select transparent (no border, no background) so it
inherits the DS .env pill styling (success-colored badge with
mono font). Negative margins compensate for the pill padding.
Dropdown chevron uses currentColor to match the pill text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:53:09 +02:00
hsiegeln
08f2a01057 fix: always show environment selector in TopBar
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 1m12s
CI / deploy (push) Successful in 44s
CI / deploy-feature (push) Has been skipped
Use unfiltered agent query to discover environments (avoids circular
filter). Always show selector even with single environment so it's
visible as a label. Default to ['default'] when no agents connected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:47:48 +02:00
hsiegeln
574f82b731 docs: add historical implementation plans
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 37s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:45:49 +02:00
hsiegeln
c2d4d38bfb feat: move environment selector into TopBar (DS v0.1.30)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Update @cameleer/design-system to v0.1.30 which accepts ReactNode
for the environment prop. Move EnvironmentSelector from standalone
div into TopBar, rendering between theme toggle and user menu.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:43:43 +02:00
hsiegeln
694d0eef59 feat: add environment filtering across all APIs and UI
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Backend: Added optional `environment` query parameter to catalog,
search, stats, timeseries, punchcard, top-errors, logs, and agents
endpoints. ClickHouse queries filter by environment when specified
(literal SQL for AggregatingMergeTree, ? binds for raw tables).
StatsStore interface methods all accept environment parameter.

UI: Added EnvironmentSelector component (compact native select).
LayoutShell extracts distinct environments from agent data and
passes selected environment to catalog and agent queries via URL
search param (?env=). TopBar shows current environment label.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:42:26 +02:00
hsiegeln
babdc1d7a4 docs: update CLAUDE.md with multitenancy architecture
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:14:38 +02:00
hsiegeln
a188308ec5 feat: implement multitenancy with tenant isolation + environment support
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m25s
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.

Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
  "default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
  updated ORDER BY (tenant→time→env→app), added tenant_id to
  usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures

Closes #123

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:00:18 +02:00
hsiegeln
ee7226cf1c docs: multitenancy architecture design spec
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Covers tenant isolation (1 tenant = 1 server instance), environment
support (first-class agent property), ClickHouse partitioning
(tenant → time → environment → application), PostgreSQL schema-per-
tenant via JDBC currentSchema, and agent protocol changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 14:37:00 +02:00
hsiegeln
7429b85964 feat: show route control bar on topology diagram
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
When no exchange is selected, the topology-only diagram now shows
the RouteControlBar above it (if the agent supports routeControl
or replay and the user has OPERATOR/ADMIN role). This fixes a gap
where suspended routes with no recent exchanges had no way to be
resumed from the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:49:28 +02:00
hsiegeln
a5c07b8585 docs: update CLAUDE.md with heartbeat capabilities restoration
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:31:33 +02:00
hsiegeln
45a74075a1 feat: restore agent capabilities from heartbeat after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
The heartbeat now carries capabilities (per protocol v2 update).
On each heartbeat, capabilities are updated in the agent registry.
On auto-heal (server restart), capabilities from the heartbeat
are used instead of empty Map.of(), so the agent's feature flags
(replay, routeControl, logForwarding, etc.) are restored
immediately on the first heartbeat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:19:15 +02:00
hsiegeln
abed4dc96f security: fix SQL injection in ClickHouse query escaping
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m6s
CI / deploy (push) Successful in 47s
CI / deploy-feature (push) Has been skipped
Convert ClickHouseUsageTracker and ClickHouseMetricsQueryStore to
use JDBC parameterized queries (? binds) — these query raw tables
without AggregateFunction columns.

Fix lit(String) in RouteMetricsController and ClickHouseStatsStore
to escape backslashes before single quotes. Without this, an input
like \' breaks out of the string literal in ClickHouse (where \
is an escaped backslash). These must remain as literal SQL because
the ClickHouse JDBC 0.9.x driver wraps PreparedStatement in
sub-queries that strip AggregateFunction types, breaking -Merge
combinators.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 12:17:12 +02:00
hsiegeln
170b2c4a02 fix: run sonar:sonar in same reactor as verify
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
Running mvn sonar:sonar as a separate invocation skips child
modules. Combining verify and sonar:sonar in a single mvn
command ensures the reactor processes all modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:57:05 +02:00
hsiegeln
66e91ba18c fix: remove explicit sonar.sources/tests from mvn sonar:sonar
All checks were successful
CI / build (push) Successful in 2m0s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 14s
CI / deploy (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
Maven sonar plugin auto-detects sources and tests from the POM
module structure. Passing sonar.sources as CLI args caused path
doubling (module-dir/module-dir/src) in multi-module projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:13:47 +02:00
hsiegeln
e30b561dfe fix: use mvn sonar:sonar instead of standalone sonar-scanner
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 14s
CI / deploy (push) Successful in 44s
CI / deploy-feature (push) Has been skipped
The standalone sonar-scanner CLI has Java discovery issues in the
build container. Switch to the Maven sonar plugin (same approach
as cameleer3 agent repo), which uses Maven's own JDK. This also
removes the sonar-scanner download/install step entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:07:49 +02:00
hsiegeln
5ae94e1e2c fix: set SONAR_SCANNER_JAVA_HOME for sonar-scanner 6.x
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m42s
CI / docker (push) Successful in 15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 48s
sonar-scanner 6.x checks SONAR_SCANNER_JAVA_HOME, not JAVA_HOME.
Despite JAVA_HOME being correct and java being on PATH, the scanner
uses its own env var for Java discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:04:03 +02:00
hsiegeln
7dca8f2609 fix: derive JAVA_HOME from jar binary and add to PATH
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 13s
CI / deploy (push) Successful in 50s
CI / deploy-feature (push) Has been skipped
java binary may not be on PATH directly in the build container.
Derive JAVA_HOME from the jar binary location (which we know works)
and prepend JAVA_HOME/bin to PATH so sonar-scanner can find java.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 10:59:45 +02:00
hsiegeln
2589c681c5 fix: derive JAVA_HOME for sonar-scanner in CI workflow
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m53s
CI / docker (push) Successful in 14s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
sonar-scanner 6.x requires JAVA_HOME or java on PATH. The build
container has Java installed but doesn't export JAVA_HOME, so
derive it from the java binary location.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 10:05:18 +02:00
hsiegeln
352fa43ef8 fix: add chmod +x for sonar-scanner binary after jar extraction
All checks were successful
CI / build (push) Successful in 2m5s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 10s
CI / deploy (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
jar xf doesn't preserve Unix file permissions from zip entries,
so the sonar-scanner binary lacks the execute bit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:57:48 +02:00
hsiegeln
b04b12220b fix: resolve 25 SonarQube code smells across 21 files
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m2s
CI / docker (push) Successful in 45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Remove unused fields (log, rbacService, roleRepository, jwt),
unused variables (agentTps, routeKeys, updated), unused imports
(HttpHeaders, JdbcTemplate). Rename restricted identifier 'record'
to 'auditRecord'/'event'. Return empty collections instead of null.
Replace .collect(Collectors.toList()) with .toList(). Simplify
conditional return in BootstrapTokenValidator.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:36:13 +02:00
hsiegeln
633a61d89d perf: batch processor and log inserts to reduce ClickHouse part creation
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m2s
SonarQube / sonarqube (push) Failing after 1m58s
Diagnostics showed ~3,200 tiny inserts per 5 minutes:
- processor_executions: 2,376 inserts (14 rows avg) — one per chunk
- logs: 803 inserts (5 rows avg) — synchronous in HTTP handler

Fix 1: Consolidate processor inserts — new insertProcessorBatches() method
flattens all ProcessorBatch records into a single INSERT per flush cycle.

Fix 2: Buffer log inserts — route through WriteBuffer<BufferedLogEntry>,
flushed on the same 5s interval as executions. LogIngestionController now
pushes to buffer instead of inserting directly.

Also reverts async_insert config (doesn't work with JDBC inline VALUES).

Expected: ~3,200 inserts/5min → ~160 (20x reduction in part creation,
MV triggers, and background merge work).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:48:04 +02:00
hsiegeln
e0aac4bf0a perf: enable ClickHouse async_insert to batch small inserts server-side
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Diagnostics showed 3,200 tiny inserts per 5 minutes (processor_executions:
2,376 at 14 rows avg, logs: 803 at 5 rows avg), each creating a new part
and triggering MV aggregations + background merges. This was the root cause
of ~400m CPU usage at 3 tx/s.

async_insert=1 with 5s busy timeout lets ClickHouse buffer incoming inserts
and consolidate them into fewer, larger parts before writing to disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:33:48 +02:00
hsiegeln
ac94a67a49 fix: reduce ClickHouse CPU by increasing flush interval, rename LIVE→AUTO labels
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m24s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:05:29 +02:00
hsiegeln
e1cb9d7872 fix: extract snapshot data from chunks, reduce ClickHouse log noise
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- ChunkAccumulator now extracts inputBody/outputBody/inputHeaders/outputHeaders
  from ExecutionChunk.inputSnapshot/outputSnapshot instead of storing empty strings
- Set ClickHouse server log level to warning (was trace by default)
- Update CLAUDE.md to document Ed25519 key derivation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:58:54 +02:00
hsiegeln
a9ec424d52 fix: derive Ed25519 signing key from JWT secret, no DB storage
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Replace DB-persisted keypair with deterministic derivation from
CAMELEER_JWT_SECRET via HMAC-SHA256 seed + seeded SHA1PRNG KeyPairGenerator.
Same secret = same key pair across restarts, no private key in the database.

Closes #121

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:18:43 +02:00
hsiegeln
81f13396a0 fix: persist Ed25519 signing key to survive server restarts
All checks were successful
CI / build (push) Successful in 2m8s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 50s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 54s
The keypair was generated ephemerally on each startup, causing agents
to reject all commands after a server restart (signature mismatch).
Now persisted to PostgreSQL server_config table and restored on startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:13:40 +02:00
hsiegeln
670e458376 fix: update ITs to use consolidated init.sql, remove dead code
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m29s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 50s
- All 7 ClickHouse integration tests now load init.sql via shared
  ClickHouseTestHelper instead of deleted V1-V11 migration files
- Remove unused useScope exports (setApp, setRoute, setExchange, clearScope)
- Remove unused CSS classes (monoCell, punchcardStack)
- Update ui/README.md DS version to v0.1.28

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:03:54 +02:00
hsiegeln
d4327af6a4 refactor: consolidate ClickHouse schema into single init.sql, cache diagrams
All checks were successful
CI / build (push) Successful in 2m2s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- Merge all V1-V11 migration scripts into one idempotent init.sql
- Simplify ClickHouseSchemaInitializer to load single file
- Replace route_diagrams projection with in-memory caches:
  hashCache (routeId+instanceId → contentHash) warm-loaded on startup,
  graphCache (contentHash → RouteGraph) lazy-populated on access
- Eliminates 9M+ row scans on diagram lookups

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:24:53 +02:00
hsiegeln
bb3e1e2bc3 fix: set deduplicate_merge_projection_mode for ReplacingMergeTree projection
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
ClickHouse 24.12 requires this setting before adding projections to
ReplacingMergeTree tables. Using 'drop' mode which discards the projection
during deduplication merges and rebuilds it afterward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:14:56 +02:00
hsiegeln
984bb2d40f fix: sort ClickHouse migration scripts by numeric version prefix
All checks were successful
CI / build (push) Successful in 2m32s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 55s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 52s
Alphabetical sort put V10/V11 before V2-V9 ("V11" < "V1_" in ASCII),
causing the route_diagrams projection to run before the table existed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:06:56 +02:00
hsiegeln
6f00ff2e28 fix: reduce ClickHouse log noise, admin query spam, and diagram scan perf
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m25s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
- Set com.clickhouse log level to INFO and org.apache.hc.client5 to WARN
- Admin hooks (useUsers/useGroups/useRoles) now only fetch on admin pages,
  eliminating AUDIT view_users entries on every UI click
- Add ClickHouse projection on route_diagrams for (tenant_id, route_id,
  instance_id, created_at) to avoid full table scans on diagram lookups
- Bump @cameleer/design-system to v0.1.28 (PAUSED mode time range fix,
  refreshTimeRange API)
- Call refreshTimeRange before invalidateQueries in PAUSED mode manual
  refresh so sidebar clicks use current time window

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:48:30 +02:00
hsiegeln
2708bcec17 fix: first exchange click doesn't highlight selected row
All checks were successful
CI / build (push) Successful in 1m47s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m45s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 53s
On first click, Dashboard was in non-split mode. The click set
selectedId locally then triggered split view, which remounted
Dashboard — losing the selectedId state.

Added activeExchangeId prop passed from ExchangesPage so the
selection survives the remount. Also syncs via useEffect when
parent changes selection (e.g. correlated exchange navigation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:28:26 +02:00
hsiegeln
901dfd1eb8 fix: PAUSED mode disabled queries entirely instead of just polling
Some checks failed
CI / build (push) Successful in 1m49s
CI / cleanup-branch (push) Has been skipped
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
useLiveQuery returned enabled:false when paused, which prevented
queries from running at all. Changed to enabled:true always —
PAUSED now means "fetch once, no polling" instead of "don't fetch".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:25:04 +02:00
hsiegeln
726e77bb91 docs: update all documentation for session changes
Some checks failed
CI / build (push) Successful in 2m2s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CLAUDE.md:
- Agent registry auto-heal note (in-memory, JWT fallback)
- Usage analytics (ClickHouse usage_events table)

HOWTO.md:
- Architecture diagram: added deploy-demo (NodePort 30092) and cameleer-demo namespace
- Access URLs: added Deploy Demo
- Agent registry: server restart resilience documentation
- Route control: CommandGroupResponse note

ui/README.md:
- Fixed outdated generate-api command
- Added DS version (v0.1.26)
- Fixed VITE_API_TARGET (30081 not 30090)
- Added key features section (cmd-k, LIVE mode, route control, event icons)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:22:44 +02:00
hsiegeln
d30c267292 fix: route catalog missing routes after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m20s
CI / docker (push) Successful in 52s
CI / deploy (push) Successful in 54s
CI / deploy-feature (push) Has been skipped
After server restart, auto-healed agents register with empty
routeIds. The catalog only looked at agent registry for routes,
so routes and counts disappeared.

Now merges route IDs from ClickHouse stats_1m_route into the
catalog. Also includes apps that only exist in ClickHouse data
(no agent currently registered). Routes and exchange counts
survive server restarts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:14:27 +02:00
hsiegeln
37c10ae0a6 feat: manual refresh on sidebar navigation when LIVE mode is off
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 57s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
When autoRefresh is disabled, sidebar clicks now invalidate all
queries (queryClient.invalidateQueries()), triggering a re-fetch.
This gives users "click to refresh" behavior instead of stale data.

When LIVE mode is on, queries already poll at intervals, so no
invalidation is needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 11:01:29 +02:00
hsiegeln
c16f0e62ed fix: clicking Applications header navigates back to all apps
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 1m22s
CI / deploy (push) Failing after 2m26s
CI / deploy-feature (push) Has been skipped
When the Applications section is already expanded, clicking the
header now navigates to /{tab} (all applications) instead of
collapsing. When collapsed, clicking expands as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:49:54 +02:00
hsiegeln
2bc3efad7f fix: agent auth, heartbeat, and SSE all break after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Three related issues caused by in-memory agent registry being empty
after server restart:

1. JwtAuthenticationFilter rejected valid agent JWTs if agent wasn't
   in registry — now authenticates any valid JWT regardless

2. Heartbeat returned 404 for unknown agents — now auto-registers
   the agent from JWT claims (subject, application)

3. SSE endpoint returned 404 — same auto-registration fix

JWT validation result is stored as a request attribute so downstream
controllers can extract the application claim for auto-registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:41:23 +02:00
hsiegeln
0632f1c6a8 fix: agent token refresh returns 404 after server restart
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m23s
The refresh endpoint required the agent to exist in the in-memory
registry. After server restart the registry is empty, so all refresh
attempts got 404. The refresh token itself is self-contained with
subject, application, and roles — the registry lookup is optional.

Now uses application from the JWT, falling back to registry only
if the agent happens to be registered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:37:57 +02:00
hsiegeln
bdac363e40 fix: active queries list always showed itself
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
The system.processes query was returning its own row. Added
filter: query NOT LIKE '%system.processes%'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 10:33:47 +02:00