Environments now have:
- production (bool): prod vs non-prod resource allocation
- enabled (bool): disabled blocks new deployments
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SecurityBeanConfig uses Ed25519SigningServiceImpl.ephemeral() when no jwt-secret
- Fixes pre-existing application context failure in integration tests
- Reverts test jwt-secret from application-test.yml (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ClaimMappingAdminControllerIT with create+list and delete tests
- Add adminHeaders() convenience method to TestSecurityHelper
- Add jwt-secret to test profile (fixes pre-existing Ed25519 init failure)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Thread-safe AtomicReference-based license holder
- Defaults to open mode (all features enabled) when no license loaded
- Runtime license loading with feature/limit queries
- Unit tests for open mode and licensed mode
- Evaluates JWT claims against mapping rules
- Supports equals, contains (list + space-separated), regex match types
- Results sorted by priority
- 7 unit tests covering all match types and edge cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Validates payload.signature license tokens using Ed25519 public key
- Parses tier, features, limits, timestamps from JSON payload
- Rejects expired and tampered tokens
- Unit tests for valid, expired, and tampered license scenarios
Backend: Added optional `environment` query parameter to catalog,
search, stats, timeseries, punchcard, top-errors, logs, and agents
endpoints. ClickHouse queries filter by environment when specified
(literal SQL for AggregatingMergeTree, ? binds for raw tables).
StatsStore interface methods all accept environment parameter.
UI: Added EnvironmentSelector component (compact native select).
LayoutShell extracts distinct environments from agent data and
passes selected environment to catalog and agent queries via URL
search param (?env=). TopBar shows current environment label.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.
Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
"default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
updated ORDER BY (tenant→time→env→app), added tenant_id to
usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures
Closes#123
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The keypair was generated ephemerally on each startup, causing agents
to reject all commands after a server restart (signature mismatch).
Now persisted to PostgreSQL server_config table and restored on startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.
- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
(docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align all internal naming with the agent team's protocol v2 identity rename:
- agentId → instanceId (unique per-JVM identifier)
- applicationName → applicationId (shared app identifier)
- AgentInfo: id → instanceId, name → displayName, application → applicationId
Add SHUTDOWN lifecycle state for graceful agent shutdowns:
- New POST /data/events endpoint receives agent lifecycle events
- AGENT_STOPPED event transitions agent to SHUTDOWN (skips STALE/DEAD)
- New POST /{id}/deregister endpoint removes agent from registry
- Server now distinguishes graceful shutdown from crash (heartbeat timeout)
Includes ClickHouse V9 and PostgreSQL V14 migrations for column renames.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract LogIndex interface from OpenSearchLogIndex. Both ClickHouseLogStore
and OpenSearchLogIndex implement it. Controllers now inject LogIndex.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkIngestionController: /data/chunks → /data/executions (matches
PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements StatsStore interface for ClickHouse using AggregatingMergeTree
tables with -Merge combinators (countMerge, countIfMerge, sumMerge,
quantileMerge). Uses literal SQL for aggregate table queries to avoid
ClickHouse JDBC driver PreparedStatement issues with AggregateFunction
columns. Raw table queries (SLA, topErrors, activeErrorTypes) use normal
prepared statements.
Includes 13 integration tests covering stats, timeseries, grouped
timeseries, SLA compliance, SLA counts by app/route, top errors, active
error types, punchcard, and processor stats. Also fixes AggregateFunction
type signatures in V4 DDL (count() takes no args, countIf takes UInt8).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Testcontainers tests need Docker which isn't available in CI.
Rename to *IT so Surefire skips them (Failsafe runs them with -DskipITs=false).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements MetricsQueryStore using ClickHouse toStartOfInterval() for
time-bucketed aggregation queries; verified with 4 Testcontainers tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TDD implementation of MetricsStore backed by ClickHouse. Uses native
Map(String,String) column type (no JSON cast), relies on ClickHouse
DEFAULT for server_received_at, and handles null tags by substituting
an empty HashMap. All 4 Testcontainers tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ExecutionDocument and ExecutionRecord records gained an isReplay
field but the integration tests were not updated, breaking CI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteGraph no longer stores a separate nodes list; getNodes() computes
from root tree. Tests now build proper tree via setRoot() + setChildren()
instead of calling setNodes().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trace data visibility:
- ProcessorNode now includes hasTraceData flag computed from captured
body/headers during tree conversion
- ConfigBadge shows teal for tracing configured, green when data captured
- Search results show green footprints icon for exchanges with trace data
- New has_trace_data column on executions table (V11 migration with backfill)
- OpenSearch documents and ExecutionSummary include the flag
Inline tap configuration:
- Extracted reusable TapConfigModal component from RouteDetail
- Diagram context menu opens tap modal inline instead of navigating away
- Toggle-trace action works immediately with toast feedback
- Modal closes only on ESC, Cancel, Save, or Delete (not backdrop click)
Detail panel tab gating:
- Headers, Input, Output tabs disabled when no data is available
- Works at both exchange and processor level
- Falls back to Info tab when active tab becomes empty
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes iteration overlay corruption caused by flat storage collapsing
duplicate processorIds across loop iterations.
Server:
- Store raw processor tree as processors_json JSONB on executions table
- Detail endpoint serves from processors_json (faithful tree), falls back
to flat record reconstruction for older executions
- V10 migration: processors_json, error categorization (errorType,
errorCategory, rootCauseType, rootCauseMessage), OTel (traceId, spanId),
circuit breaker (circuitBreakerState, fallbackTriggered), drops
erroneous splitDepth/loopDepth columns
- Add all new fields through full ingestion/storage/API chain
UI:
- Fix overlay wrapper filtering: check wrapper type before status filter
- Add new fields to schema.d.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add resolvedEndpointUri, splitDepth, loopDepth arguments to
ProcessorRecord constructors in TreeReconstructionTest and
PostgresExecutionStoreIT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests were using the old 18-param constructor, missing the 5 new
iteration fields (loopIndex, loopSize, splitIndex, splitSize,
multicastIndex) added in V8 migration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TreeReconstructionTest and PostgresExecutionStoreIT still passed the
removed diagramNodeId parameter. Missed by mvn compile (main only);
caught by mvn verify (test compilation).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent now uses Camel processorId as RouteNode.id, eliminating the
nodeId mapping layer. Drop diagram_node_id column (V6 migration),
remove from ProcessorRecord/ProcessorNode/IngestionService/DetailService,
add /processor-routes endpoint for processorId→routeId lookup,
simplify frontend diagram-mapping and ExchangeDetail overlays,
replace N diagram fetches in AppConfigPage with single hook.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add logging to MetricsController: warn on parse failures, debug on
received metrics, buffer depth on 503
- Add GET /api/v1/admin/database/metrics-pipeline diagnostic endpoint
(buffer depth, row count, distinct agents/metrics, latest timestamp)
- Fix BackpressureIT test JSON to match actual MetricsSnapshot schema
(collectedAt/metricName/metricValue instead of timestamp/metrics)
- Upgrade cameleer3-common from 1.0-SNAPSHOT to 0.0.3 (adds engineLevel)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete the group→application terminology rename in the agent
registry subsystem:
- AgentInfo: field group → application, all wither methods updated
- AgentRegistryService: findByGroup → findByApplication
- AgentInstanceResponse: field group → application (API response)
- AgentRegistrationRequest: field group → application (API request)
- JwtServiceImpl: parameter names group → application (JWT claim
string "group" preserved for token backward compatibility)
- All controllers, lifecycle monitor, command controller updated
- Integration tests: JSON request bodies "group" → "application"
- Frontend: schema.d.ts, openapi.json, agent queries, AgentHealth
RBAC group references (groups table, GroupAdminController, etc.)
are NOT affected — they are a separate domain concept.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The execution-related "group" concept actually represents the
application name. Rename all Java fields, API parameters, and frontend
types from groupName→applicationName and group→application for clarity.
- Java records: ExecutionSummary, ExecutionDetail, ExecutionDocument,
ExecutionRecord, ProcessorRecord
- API params: SearchRequest.group→application, SearchController
@RequestParam group→application
- Services: IngestionService, DetailService, SearchIndexer, StatsStore
- Frontend: schema.d.ts, Dashboard, ExchangeDetail, RouteDetail,
executions query hooks
Database column names (group_name) and OpenSearch field names are
unchanged — only the API-facing Java/TS field names are renamed.
RBAC group references (groups table, GroupRepository, GroupsTab) are
a separate domain concept and are NOT affected by this change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the refresh endpoint only returned a new accessToken, causing
agents to lose their refreshToken after the first refresh cycle and
forcing a full re-registration every ~2 hours.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single JSONB key-value table replaces two singleton config tables, making
future config types trivial to add. Also fixes pre-existing IT failures:
Flyway URL not overridden by Testcontainers, threshold test ordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>