Merge the route state update and route catalog upsert blocks to share
one registryService.findById() call instead of two, reducing overhead
on the high-frequency heartbeat path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add RouteCatalogStore as a third data source in RouteCatalogController so that
/api/v1/routes/catalog surfaces routes with zero executions and routes from
previous app versions that fall within the requested time window.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires RouteCatalogStore into CatalogController as a third data source:
routes with zero executions and routes from previous app versions
(within the queried time window) now appear in the unified catalog.
Also clears route_catalog on app dismiss via deleteByApplication().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire RouteCatalogStore into AgentRegistrationController and call upsert
after registration and heartbeat so routes survive server restarts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Routes with zero executions (sub-routes) vanish from the sidebar after
server restart because the catalog is purely in-memory with a ClickHouse
stats fallback that only covers executed routes. This spec describes a
persistent route_catalog table in ClickHouse with lifecycle tracking
(first_seen/last_seen) to reconstruct the sidebar without agent
reconnection and support historical time-window queries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use Activity, Cpu, and HeartPulse icons instead of "tps", "cpu", and
"ago" text in compact and expanded app cards. Bump design-system to
v0.1.55 for sidebar footer alignment fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add search icon, translucent background, and same padding/sizing
as the sidebar's built-in filter input. Placeholder changed to
"Filter..." to match sidebar convention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Text input next to view toggle filters apps by name (case-insensitive
substring match). KPI stat strip uses unfiltered counts so totals
stay accurate. Clear button on non-empty input.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move toolbar above the grid conditional so it renders in both
view modes. Hidden only on app detail pages (isFullWidth).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show per-instance CPU usage percentage instead of error rate in the
DataTable. Highlights >80% CPU in error color.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The toggle only renders inside the compact branch, so viewMode is
always 'compact' there. Use static class assignment instead of a
comparison TypeScript correctly flags as unreachable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add max CPU percentage to the meta row of both the full expanded
view and the overlay expanded card, consistent with compact cards.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend:
- Add cpuUsage field to AgentInstanceResponse (-1 if unavailable)
- Add queryAgentCpuUsage() to AgentRegistrationController — queries
avg CPU per instance from agent_metrics over last 2 minutes
- Wire CPU into agent list response via withCpuUsage()
Frontend:
- Add cpuUsage to schema.d.ts
- Compute maxCpu per AppGroup (max across all instances)
- Show "X% cpu" on compact cards when available (hidden when -1)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add invisible backdrop (z-index 99) behind expanded overlay to
dismiss on outside click
- Remove background/padding from overlay wrapper so GroupCard
renders without visible extra border
- Use drop-shadow filter instead of box-shadow for natural card
shadow
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move view toggle into compact grid conditional so it only renders
on the overview page (not app detail /runtime/{slug})
- Left-align the toolbar buttons
- Change TPS format from "x.y/s" to "x.y tps"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bump overlay z-index to 100 so it renders above the sidebar
- App name in compact card navigates to /runtime/{slug} on click
- Add TPS (msg/s) as third metric on compact cards between live
count and heartbeat
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move expand/collapse toggle from stat strip to dedicated toolbar
below KPIs
- Sort app groups alphabetically by name
- Expanded card overlays from clicked card position instead of
pushing other cards down
- Viewport constraint: overlay flips right-alignment and limits
height when near edges
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two sidebar bugs fixed:
1. Route entries never highlighted on navigation because sidebar-utils
generated /apps/ paths for route children while effectiveSelectedPath
normalizes to /exchanges/. The design system does exact string matching.
2. Routes disappeared from sidebar when agents had no recent exchange
data. Heartbeat carried routeStates (with route IDs as keys) but
AgentRegistryService.heartbeat() never updated AgentInfo.routeIds.
After server restart, auto-heal registered agents with empty routes,
leaving ClickHouse (24h window) as the only discovery source.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Un-ignore .claude/rules/ so path-scoped rule files are shared via git.
Add instruction in CLAUDE.md to update rule files when modifying classes,
controllers, endpoints, or metrics — keeps rules current as part of
normal workflow rather than requiring separate maintenance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent shaded JAR now includes the log appender classes. Remove
PropertiesLauncher, -Dloader.path, and separate appender JAR references.
All JVM types now use: java -javaagent:/app/agent.jar -jar app.jar
Plain Java uses -cp with explicit main class. Native runs binary directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LogTab now checks mdc['cameleer.processorId'] first when filtering logs
for a selected processor node, falling back to fuzzy message/loggerName
matching for older agents without the new MDC key.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RuntimeDetector now derives the correct PropertiesLauncher FQN from
the JAR manifest Main-Class package. Spring Boot 3.2+ uses
org.springframework.boot.loader.launch.PropertiesLauncher, pre-3.2
uses org.springframework.boot.loader.PropertiesLauncher.
DockerRuntimeOrchestrator uses the detected class instead of a
hardcoded 3.2+ reference, falling back to 3.2+ when not auto-detected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cameleer-clickhouse-external Service (NodePort 30123) matching the
pattern used by cameleer-postgres-external (NodePort 30432).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent now sets cameleer.exchangeId in MDC (persists across processor
executions, unlike Camel's camel.exchangeId which is scoped to MDCUnitOfWork).
For ON_COMPLETION exchange copies, the agent uses the parent's exchange ID.
Server changes:
- ClickHouseLogStore ingestion: extract exchange_id preferring
cameleer.exchangeId, falling back to camel.exchangeId
- ClickHouseLogStore search: match exchangeId filter against exchange_id
column OR cameleer.exchangeId OR camel.exchangeId in MDC
- Update CLAUDE.md with log exchange correlation documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Applications section: maxHeight 50vh with scroll overflow
- Starred section: maxHeight 30vh with scroll overflow
- Admin section: pinned to bottom of sidebar via position="bottom"
- Update design-system to 0.1.54 (sidebar section maxHeight, position props)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
canSubmit no longer requires a JAR file when "Create only" is selected.
JAR upload and deploy steps are skipped when no file is provided.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sidebar: make +App button more subtle (lower opacity, brightens on hover)
- Sidebar: add filter chips to hide empty routes and offline/stale apps
- Sidebar: hide filter chips and +App button when sidebar is collapsed
- Exchange table: reorder columns to Status, Attributes, App, Route, Started, Duration; remove ExchangeId and Agent columns
- Exchange detail log tab: query by exchangeId only (no applicationId required), filter by processorId when processor selected
- KPI tooltips: styled tooltips with current/previous values, time period labels, percentage change, themed with DS variables
- KPI tooltips: fix overflow by left-aligning first two and right-aligning last two
- Exchange detail: show full datetime (YYYY-MM-DD HH:mm:ss.SSS) for start/end times
- Status labels: unify to title-case (Completed, Failed, Running) across all views
- Status filter buttons: match title-case labels (Completed, Warning, Failed, Running)
- Create app: show full external URL using routingDomain from env config or window.location.origin fallback
- Create app: add Runtime Type selector and Custom Arguments to Resources tab
- Create app: add Sensitive Keys tab with agent defaults, global keys, and app-specific keys (matching admin page design)
- Create app: add placeholder text to all Input fields for consistency
- Update design-system to 0.1.52 (sidebar collapse toggle fix)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PULL_IMAGE deploy stage was a no-op — Docker only pulls on create
if the image is missing entirely, not when a newer version exists.
DeploymentExecutor now calls orchestrator.pullImage() to fetch the
latest base image from the registry before creating containers.
Also fixes the default base image from 'cameleer-runtime-base:latest'
(local-only name) to the fully qualified registry path
'gitea.siegeln.net/cameleer/cameleer-runtime-base:latest'.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four logging pipeline fixes:
1. Multi-replica startup logs: remove stopLogCaptureByApp from
SseConnectionManager — container log capture now expires naturally
after 60s instead of being killed when the first agent connects SSE.
This ensures all replicas' bootstrap output is captured.
2. Unified instance_id: container logs and agent logs now share the same
instance identity ({envSlug}-{appSlug}-{replicaIndex}). DeploymentExecutor
sets CAMELEER_AGENT_INSTANCEID per replica so the agent uses the same
ID as ContainerLogForwarder. Instance-level log views now show both
container and agent logs.
3. Labels-first container identity: TraefikLabelBuilder emits cameleer.replica
and cameleer.instance-id labels. Container names are tenant-prefixed
({tenantId}-{envSlug}-{appSlug}-{idx}) for global Docker daemon uniqueness.
4. Environment filter on log queries: useApplicationLogs now passes the
selected environment to the API, preventing log leakage across environments.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract DockerClient creation into a standalone bean so both
runtimeOrchestrator and containerLogForwarder depend on it directly
instead of on each other. DockerRuntimeOrchestrator now receives
DockerClient via constructor instead of creating it in @PostConstruct.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detect when an agent instance already exists in the registry and record
a RE_REGISTERED event with route count and capabilities instead of a
generic REGISTERED event. UI shows a refresh icon for re-registrations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The DS LogViewer expects level as a string union, but the API response
type uses plain string. Cast at the call site to fix the TS build error.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DatabaseAdminController's active-queries and kill-query endpoints could
expose SQL text from other tenants sharing the same PostgreSQL instance.
Added ApplicationName=tenant_{id} to the JDBC URL and filter
pg_stat_activity by application_name so each tenant only sees its own
connections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ContainerLogForwarder, StartupLogPanel, useStartupLogs to key classes
and UI files. Document log capture lifecycle and source badge rendering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers streaming Docker logs to ClickHouse until agent SSE connect,
deployment log panel UI, and source badge in general log views.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show agent built-in defaults as reference Badge pills, separate editable keys
section with count badge, amber-highlighted push toggle, right-aligned save
button. Fix info text: keys add to defaults, not replace. Add ClaimMapping
controller to CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test endpoint now accepts inline rules from the client instead of reading
from the database, so unsaved rules can be tested. Matched rows show the
checkmark alongside action buttons instead of replacing them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds sensitiveKeys/globalSensitiveKeys/mergedSensitiveKeys fields to
ApplicationConfig, unwraps the new AppConfigResponse envelope in
useApplicationConfig, and renders an editable Sensitive Keys section
with read-only global pills and add/remove app-specific key tags.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- GET /config/{app} now returns AppConfigResponse with globalSensitiveKeys and mergedSensitiveKeys alongside the config
- PUT /config/{app} merges global + per-app sensitive keys before pushing CONFIG_UPDATE to agents via SSE
- extractSensitiveKeys() uses JsonNode reflection to avoid compile-time dependency on cameleer3-common getSensitiveKeys()
- SensitiveKeysRepository injected as new constructor parameter
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GET/PUT /api/v1/admin/sensitive-keys (ADMIN only). PUT accepts optional
pushToAgents param — when true, fans out merged global+per-app sensitive
keys to all live agents via CONFIG_UPDATE SSE commands with 10-second
shared deadline. Per-app keys extracted via JsonNode to avoid depending
on ApplicationConfig.getSensitiveKeys() not yet in the published
cameleer3-common jar. Includes audit logging on every PUT.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The roles-claim and default-roles fallback paths in applyClaimMappings
were using assignRoleToUser (origin='direct'), causing OIDC-derived
roles to accumulate across logins and never be cleared. Changed both
to assignManagedRole (origin='managed') so all OIDC-assigned roles
are cleared and re-evaluated on every login, same as claim mapping
rules. Only roles assigned directly via the admin UI are preserved.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All edits (add, edit, delete, reorder) now modify local state only.
Cancel discards changes, Apply diffs local vs server and issues the
necessary create/update/delete API calls. Target selects now include
a placeholder option. Footer shows Cancel and Apply buttons.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Populate target field from existing roles (assign role) or groups
(add to group) instead of free-text input, preventing typos.
Switching action resets the target selection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace non-existent --surface-1/--surface-2 with --bg-raised (modal)
and --bg-hover (subtle backgrounds) from the design system.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bump all font sizes from 11px/10px to 12px (project minimum)
- Fix handleMove race condition: use mutateAsync + Promise.all
- Clear stale test results after rule create/edit/delete/reorder
- Replace inline styles with CSS module classes in OidcConfigPage
- Remove dead .editRow CSS class
- Replace inline chevron with Lucide icon
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Now that cameleer3-common has getInputProperties/getOutputProperties on
ProcessorExecution, add the check to the processors_json deserialization
path as well.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hasTrace flag on ProcessorNode now also checks inputProperties and
outputProperties on the flat-record code paths (buildTreeBySeq and
buildTreeByProcessorId). The ProcessorExecution path (processors_json)
will be updated once cameleer3-common publishes the new snapshot.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add support for exchange properties sent by the agent alongside headers.
Properties flow through the same pipeline as headers: ClickHouse columns
(input_properties, output_properties) on both executions and
processor_executions tables, MergedExecution record, ChunkAccumulator
extraction, DetailService snapshot, and REST API response.
UI adds a Properties tab next to Headers in the process diagram detail
panel, with the same input/output split table layout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a subtle "+ App" button in the sidebar section header for quick
app creation without navigating to the Deployments tab first. Only
visible to OPERATOR and ADMIN roles.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch Vite base back to './' (relative paths) and always inject
<base href="${BASE_PATH}"> in the entrypoint, even when BASE_PATH=/.
This fixes asset loading for both deployment modes:
- Single-instance: <base href="/"> resolves ./assets/x.js to /assets/x.js
- SaaS tenant: <base href="/t/slug/"> resolves to /t/slug/assets/x.js
Previously base:'/' produced absolute /assets/ paths that the <base>
tag couldn't redirect, breaking SaaS tenants. And base:'./' without
<base> broke deep URLs in single-instance mode. Always injecting the
tag makes relative paths work universally.
The patched server-ui-entrypoint.sh in cameleer-saas (which rewrote
absolute href/src attributes via sed) is no longer needed and can be
removed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert base: './' back to '/' — relative asset paths break on deep
URLs like /dashboard/app/route where the browser resolves assets to
/dashboard/app/assets/ instead of /assets/.
Also fix processor metrics table clipping: remove flex:1/min-height:0
from .processorSection so the table takes its natural content height
and the page scrolls to show all rows (was clipping at ~12 of 18).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the server-ui is deployed under a subpath (/t/{slug}/), absolute
asset paths (/assets/...) resolve to the domain root instead of the
subpath, causing 404s. Using './' makes asset URLs relative to the
HTML page, so they resolve correctly regardless of mount path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse does not have lag() as a window function. Use lagInFrame()
with explicit ROWS BETWEEN frame instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Not needed yet -- all deployments are under our control and can be
reset manually if the old schema is encountered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite init.sql as a pure CREATE IF NOT EXISTS file with no DROP or
INSERT statements. Safe for repeated runs on every startup without
corrupting aggregated stats data.
Old deployments with count()-based stats tables are migrated
automatically: ClickHouseSchemaInitializer checks system.columns for
the old AggregateFunction(count) type and drops those tables before
init.sql recreates them with the correct uniq() schema. This runs
once per table and is a no-op on fresh installs or already-migrated
deployments.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change vite base from './' to '/' so asset paths are absolute. With
relative paths, direct navigation to multi-segment URLs like
/runtime/app/instance resolved assets to /runtime/assets/ which 404'd.
Also fix sidebar navigation: clicking a route while on the runtime tab
no longer navigates to /runtime/{appId}/{routeId} (which the runtime
page interprets as an instanceId). It stays at /runtime/{appId}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Counter metrics like chunks.exported.count are monotonically increasing.
Add mode=delta query parameter to the agent metrics API that computes
per-bucket deltas server-side using ClickHouse lag() window function:
max(value) per bucket, then greatest(0, current - previous) to get the
increase per period with counter-reset handling.
The chunks exported/dropped charts now show throughput per bucket
instead of the ever-increasing cumulative total.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend uniq-based dedup from processor tables to all stats tables
(stats_1m_all, stats_1m_app, stats_1m_route). Execution-level tables
use uniq(execution_id). Processor-level tables now use
uniq(concat(execution_id, toString(seq))) so loop iterations (same
exchange, different seq) are counted while chunk retry duplicates
(same exchange+seq) are collapsed.
All stats tables are dropped, recreated, and backfilled from raw
data on startup. All Java queries updated: countMerge -> uniqMerge,
countIfMerge -> uniqIfMerge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ClickHouseSchemaInitializer splits on semicolons before filtering
comments, so semicolons inside comment text created invalid statements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Processor execution counts were inflated by duplicate inserts into the
plain MergeTree processor_executions table (chunk retries, reconnects).
Replace count()/countIf() with uniq(execution_id)/uniqIf() in both
stats_1m_processor and stats_1m_processor_detail MVs so each exchange
is counted once per processor regardless of duplicates.
Tables are dropped and rebuilt from raw data on startup. MV created
after backfill to avoid double-counting.
Also adds stats_1m_processor_detail to the catalog purge list (was
missing).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove redundant "X/X LIVE" badge from runtime page, breadcrumb trail
and routes section from agent detail page (pills moved into Process
Information card). Fix session expiry: guard against concurrent 401
refresh races and skip re-entrant triggers on auth endpoints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The wheel event listener was attached in a useEffect with empty deps,
but the SVG element doesn't exist during the loading state. Switch
svgRef from a plain ref to a callback ref that triggers re-attachment
when the SVG element becomes available after data loads.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renders an HTML tooltip below hovered diagram nodes with processor
metrics (avg, p99, % time, invocations, error rate). Styled inline
with the existing NodeToolbar pattern — positioned via screen-space
coordinates, uses DS tokens for background/border/shadow/typography.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dashboard L3 now shows a single Processor Metrics card with
Diagram/Table toggle buttons. The diagram shows native tooltips on
hover with full processor metrics (avg, p99, invocations, error rate,
% time).
Also fixes:
- Chart x-axis uses actual timestamps instead of bucket indices
- formatDurationShort uses locale formatting with max 3 decimals
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After server restart, agents send logs before re-registering. Instead
of dropping these logs, fall back to application and environment from
the JWT token claims. Only drops logs when neither registry nor JWT
provide an applicationId.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoids Date round-trip that crashes with toISOString() on invalid
timestamps from the timeseries API.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom LineChart/AreaChart/BarChart usage with ThemedChart
wrapper. Data format changed from ChartSeries[] to Recharts-native
flat objects. Uses DS v0.1.47.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All chart series now use Date objects from the API response instead
of integer indices. This gives proper date/time on x-axes and in
tooltips (leveraging DS v0.1.46 responsive charts + timestamp
tooltips). GC chart switched from BarChart to AreaChart for
consistency with Date x-values.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent team migrated from JMX to Micrometer metrics. Update the 5
hardcoded metric names in AgentInstance.tsx JVM charts:
- jvm.cpu.process → process.cpu.usage.value
- jvm.memory.heap.used → jvm.memory.used.value
- jvm.memory.heap.max → jvm.memory.max.value
- jvm.threads.count → jvm.threads.live.value
- jvm.gc.time → jvm.gc.pause.total_time
Server backend is unaffected (generic MetricsSnapshot storage).
CLAUDE.md updated with full agent metric name reference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lists all business metrics (gauges, counters, timers) with their
tags and source classes, plus agent container label mapping table.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Labels prometheus.scrape, prometheus.path, and prometheus.port are now
set on every deployed container based on the resolved runtime type,
enabling automatic Prometheus service discovery via docker_sd_configs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously showed an infinite spinner because unmanaged apps have no
PostgreSQL record. Now shows an "Unmanaged Application" message with
a link to create a managed app.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server defaultConfig() and UI fallbacks returned "NONE" for payload
capture, but the agent defaults to "BOTH". This caused unwanted
reconfiguration when users saved other settings — payload capture
would silently change from the agent's default BOTH to NONE.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sets query cache immediately on dismiss success so the sidebar updates
without waiting for the catalog refetch to complete.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apps can now join additional Docker networks (e.g., monitoring,
prometheus) configured via containerConfig.extraNetworks. Flows through
the 3-layer config merge. Networks are created if absent and containers
are connected during deployment. UI adds a pill-list field on the
Resources tab (both create and edit views).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace window.confirm with design system ConfirmDialog for the dismiss
action. Move the "No agents connected" section to the top of the Runtime
page using Alert component with warning variant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spring's default handler silently returns 400 for malformed payloads
with no server-side log. Added @ExceptionHandler to catch and WARN with
the agent instance ID and root cause message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the endpoint silently returned 202 for all failures: missing
agent identity, unregistered agents, empty payloads, and buffer-full
drops. Now logs WARN for each failure case with context (instanceId,
entry count, reason). Normal ingestion logged at INFO with accepted
count. Buffer-full drops tracked individually with accepted/dropped
counts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After versionRepo.create(), detect the runtime type from the saved JAR
via RuntimeDetector and persist the result via updateDetectedRuntime().
Log messages now include the detected runtime type (or 'unknown').
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Flyway V10 migration adds the two nullable columns. AppVersion record,
AppVersionRepository interface, and PostgresAppVersionRepository are
updated to carry and persist detected runtime information.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LogEntry.getSource() exists but source is not a constructor parameter
in cameleer3-common — it uses a default value.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All constructor calls updated to include the new source field added
in the log forwarding v2 changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOWTO.md: log ingestion example updated from LogBatch wrapper to raw
JSON array with source field. CLAUDE.md: added LogIngestionController,
updated LogQueryController with new filters. SERVER-CAPABILITIES.md:
updated log ingestion and query descriptions, ClickHouse table note.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace LogBatch wrapper with raw List<LogEntry> on the ingestion endpoint.
Add source column to ClickHouse logs table and propagate it through the
storage, search, and HTTP layers (LogSearchRequest, LogEntryResult,
LogEntryResponse, LogQueryController).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a useServerCapabilities hook that fetches /api/v1/health once per
session (staleTime: Infinity) and extracts the infrastructureEndpoints
flag. buildAdminTreeNodes now accepts an opts parameter so ClickHouse
and Database tabs are hidden when the server reports infra endpoints as
disabled. LayoutShell wires the hook result into the admin tree memo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add cameleer.server.security.infrastructureendpoints property (default true) and
@ConditionalOnProperty to DatabaseAdminController and ClickHouseAdminController so
the SaaS provisioner can set CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false
to suppress these endpoints (404) on tenant server containers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14-task plan covering server-side @ConditionalOnProperty flag,
health endpoint capability exposure, UI sidebar filtering,
SaaS provisioner env var, and vendor infrastructure dashboard
with per-tenant PostgreSQL and ClickHouse visibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers restricting DB/ClickHouse admin endpoints in SaaS-managed
server instances via @ConditionalOnProperty flag, and building a
vendor-facing infrastructure dashboard in the SaaS platform with
per-tenant PostgreSQL and ClickHouse visibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOWTO.md configuration table rewritten with correct cameleer.server.*
property names, grouped by functional area. Removed stale CAMELEER_OIDC_*
env var references. SERVER-CAPABILITIES.md updated with correct env var
names for ingestion and agent registry tuning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move container resource defaults into their own sub-namespace for
future extensibility:
cameleer.server.runtime.container.memorylimit → CAMELEER_SERVER_RUNTIME_CONTAINER_MEMORYLIMIT
cameleer.server.runtime.container.cpushares → CAMELEER_SERVER_RUNTIME_CONTAINER_CPUSHARES
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move OIDC properties into a nested Oidc class within SecurityProperties
for clearer grouping. Env vars gain an extra separator:
cameleer.server.security.oidc.issueruri → CAMELEER_SERVER_SECURITY_OIDC_ISSUERURI
cameleer.server.security.oidc.jwkseturi → CAMELEER_SERVER_SECURITY_OIDC_JWKSETURI
cameleer.server.security.oidc.audience → CAMELEER_SERVER_SECURITY_OIDC_AUDIENCE
cameleer.server.security.oidc.tlsskipverify → CAMELEER_SERVER_SECURITY_OIDC_TLSSKIPVERIFY
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse is the only storage backend — there is no alternative.
The enabled flag created a false sense of optionality: setting it to
false would crash on startup because most beans unconditionally depend
on the ClickHouse JdbcTemplate.
Remove all @ConditionalOnProperty annotations gating ClickHouse beans,
the enabled property from application.yml, and the K8s manifest entry.
Also fix old property names in AbstractPostgresIT test config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.
Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
(e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Admins can now disable route control and replay per environment via the
Default Resource Limits section. Both default to enabled. Apps in the
environment inherit these defaults unless overridden per-app.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added routeControlEnabled and replayEnabled to ResolvedContainerConfig,
flowing through the three-layer config merge (global -> env -> app).
Both default to true. Admins can disable them per environment (e.g.
prod) via the defaultContainerConfig JSONB, or per app via the app's
containerConfig JSONB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
buildEnvVars was missing CAMELEER_ROUTE_CONTROL_ENABLED and
CAMELEER_REPLAY_ENABLED, so deployed app containers defaulted to false
and agents didn't announce these capabilities.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The .section base class sets flex-direction: column, which caused the
config bar items (App Log Level, Agent Log Level, etc.) to stack
vertically instead of displaying in a horizontal row.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The server container mounts the platform's certs volume at /certs but
the CA bundle was never imported into the JVM truststore. OIDC discovery
failed with PKIX path building errors when a self-signed or custom CA
was in use.
The new entrypoint script splits the PEM bundle and imports each cert
via keytool before starting the app. This makes the conditional
CAMELEER_OIDC_TLS_SKIP_VERIFY logic in the SaaS provisioner work
correctly: when ca.pem exists, the JVM now actually trusts it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The password reset endpoint was fully blocked under OIDC mode. Now
M2M callers (identified by oidc: principal prefix) can reset local
user passwords, enabling the SaaS platform to manage the server's
built-in admin credentials.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows the SaaS platform to identify and clean up all containers
belonging to a tenant on delete (cameleer/cameleer-saas#55).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace simple key-value rows with EnvEditor component that supports
editing variables as Table, Properties, YAML, or .env format.
Switching views converts data seamlessly. Includes file import
(drag-and-drop .properties/.yaml/.env) with auto-detect and merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cast the Set<string> from ButtonGroup.onChange to Set<ExchangeStatus>
before iterating, fixing TS2345 from DS TopBar decomposition.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update to @cameleer/design-system@0.1.40 which decomposes TopBar into a
composable shell. Move status filters, time range, search trigger, and
auto-refresh toggle from the DS TopBar into LayoutShell as composed
children. Fixescameleer/cameleer-saas#53.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Always include urn:logto:scope:organizations and
urn:logto:scope:organization_roles in OIDC auth requests. These are
required for role mapping in multi-tenant setups and harmless for
non-Logto providers (unknown scopes ignored per OIDC spec).
Filter them from the OIDC admin config page so they don't confuse
standalone server admins or SaaS tenants.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OIDC callback extracted roles from the token's Custom JWT claim
(e.g. roles: [server:admin]) but never used them. The
applyClaimMappings fallback only assigned defaultRoles (VIEWER).
Now the fallback priority is: claim mapping rules > OIDC token
roles > defaultRoles. This ensures users get their org-mapped
roles (owner → server:admin) without requiring manual claim
mapping rule configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Environment networks now include the tenant ID to prevent cross-tenant
collisions: cameleer-env-{tenantId}-{envSlug} instead of cameleer-env-
{envSlug}. Without this, two tenants with a "dev" environment would
share the same Docker network.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of hardcoding cameleer-traefik as the primary network for
deployed app containers, use CAMELEER_DOCKER_NETWORK (env var). In
SaaS mode this is the tenant-isolated network (cameleer-tenant-{slug}).
Apps still connect to cameleer-traefik (for routing) and cameleer-env-
{slug} (for intra-environment discovery) as additional networks.
This enables per-tenant network isolation: apps deployed by tenant A
cannot reach apps deployed by tenant B since they share no network.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The tableSection card wrapper broke the flex height chain — DataTable's
fillHeight couldn't constrain to viewport. Added .tableWrap with
flex: 1, min-height: 0, display: flex to re-establish the chain.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The relative `to="apps"` didn't resolve correctly. All other legacy
redirects use absolute paths (`to="/apps"`, `to="/runtime"`).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- 7.1: Add deployment status badge (StatusDot + Badge) to AppsTab app
list, sourced from catalog.deployment.status via slug lookup
- 7.3: Add X close button to top-right of exchange detail right panel
in ExchangesPage (position:absolute, triggers handleClearSelection)
- 7.5: PunchcardHeatmap shows "Requires at least 2 days of data"
when timeRangeMs < 2 days; DashboardL1 passes the range down
- 7.6: Command palette exchange results truncate IDs to ...{last8}
matching the exchanges table display
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds amber edit-mode banners to AppConfigDetailPage and both
DefaultResourcesSection/JarRetentionSection in EnvironmentsPage,
matching the existing ConfigSubTab pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- RolesTab: wrap \u00b7 in JS expression {'\u00b7'} so JSX renders the middle dot correctly instead of literal backslash-u sequence
- UsersTab: add confirm password field with mismatch validation, hint text for password policy, and reset on cancel/success
- UserManagement.module.css: add .hintText style for password policy hint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Throughput chart: divide totalCount by bucket duration (seconds) so Y-axis shows true msg/s instead of raw bucket counts; fixes flat-line appearance when TPS is low but totalCount is large
- Error Rate chart: convert failedCount/totalCount to percentage; change yLabel from "err/h" to "%" to match KPI stat card unit
- Memory chart: add threshold line at jvm.memory.heap.max so chart Y-axis extends to max heap and shows the reference line (spec 5.3)
- Agent state: suppress containerStatus badge when value is "UNKNOWN"; only render it with "Container: <state>" label when a non-UNKNOWN secondary state is present (spec 5.4)
- DashboardTab chartGrid: add pointer-events:none with pointer-events:auto on children so the chart grid overlay does not intercept clicks on the Application Health table rows below (spec 5.5)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Attributes column is now hidden when no exchanges in the current view
have attributes; shown conditionally via hasAttributes check on rows
- Status labels already standardized via statusLabel() in ExchangeHeader
- Agent names truncated to last two hyphen-separated segments via
shortAgentName(); full name preserved as tooltip title
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- formatDuration and formatDurationShort now show Xm Ys for durations >= 60s (e.g. "5m 21s" instead of "321s") and 1 decimal for 1-60s range ("6.7s" instead of "6.70s")
- Exchange ID column shows last 8 chars with ellipsis prefix; full ID on hover, copies to clipboard on click
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Override design system tokens in app root CSS: --text-muted raised to 4.5:1
contrast in both light (#766A5E) and dark (#9A9088) modes; --text-faint dark
mode raised from catastrophic 1.4:1 to 3:1 (#6A6058). Migrate --text-faint
usages on readable text (empty states, italic notes, buttons) to --text-muted.
Raise all 10px and 11px font-size declarations to 12px floor.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
adminFetch called logout() directly on 401/403 responses, which cleared
roles and caused RequireAdmin to redirect to /exchanges while users were
editing forms. Now adminFetch attempts a token refresh before failing,
and RequireAdmin tolerates a transient empty-roles state during refresh.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return a JSON error body from UserAdminController instead of an empty 400,
and extract API error messages in adminFetch so toasts display the reason.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detailed step-by-step plan covering critical bug fixes, layout/interaction
consistency, WCAG contrast compliance, data formatting, chart fixes, and
admin polish. Each task includes exact file paths, code snippets, and
verification steps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Playwright-driven audit of the live UI (build 69dcce2, 60+ screenshots)
covering all pages, CRUD lifecycles, design consistency, and interaction
patterns. Spec defines 8 batches of work: critical bugs, layout
consistency, interaction consistency, contrast/readability, data
formatting, chart fixes, admin polish, and nice-to-have items.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: AgentRegistryService gains findByApplicationAndEnvironment()
and environment-aware addGroupCommandWithReplies() overload.
AgentCommandController and ApplicationConfigController accept optional
environment query parameter. When set, commands only target agents in
that environment. Backward compatible — null means all environments.
Frontend: All command mutations (config update, route control, traced
processors, tap config, route recording) now pass selectedEnv to the
backend via query parameter.
Prevents cross-environment command leakage — e.g., updating config for
prod no longer pushes to dev agents.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add environment parameter to AgentEventsController, AgentEventService,
and ClickHouseAgentEventRepository (filters agent_events by environment)
- Wire selectedEnv to useAgents and useAgentEvents in both AgentHealth
and AgentInstance pages
- Wire selectedEnv to useStatsTimeseries in AgentInstance
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Part A: Environment creation slug is now auto-derived from display name
and shown read-only (matching app creation pattern). Removes manual slug
input.
Part B: All data queries now pass the selected environment to backend:
- Exchanges search, Dashboard L1/L2/L3 stats, Routes metrics, Route
detail, correlation chains, and processor metrics all filter by
selected environment.
- Backend RouteMetricsController now accepts environment parameter for
both route and processor metrics endpoints.
Closes #XYZ
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The DockerEventMonitor only reacted to Docker events. If an event was
missed (e.g., during reconnect or startup race), a DEGRADED deployment
with all replicas healthy would never promote back to RUNNING.
Add a @Scheduled reconciliation (every 30s) that inspects actual
container state and corrects deployment status mismatches.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full audit of design system adoption, color consistency, inline styles,
layout patterns, and CSS module duplication across the server UI.
Includes 6-phase fix plan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Prevent removal of last ADMIN role via role unassign, user delete,
or group role removal (returns 409 Conflict)
- Add password policy: min 12 chars, 3/4 character classes, no username
- Add brute-force protection: 5 attempts then 15min lockout, IP rate limit
- Add token revocation on password change via token_revoked_before column
- V9 migration adds failed_login_attempts, locked_until, token_revoked_before
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes pre-existing TS2322 where Record<string, string> was not
assignable to the StatusDotVariant union type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate 20+ duplicate function definitions across UI components into
three shared util files (format-utils, agent-utils, config-draft-utils).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Part 1 — Config snapshot:
- V8 migration adds resolved_config JSONB to deployments table
- DeploymentExecutor saves the full resolved config at deploy time
- Deployment record includes resolvedConfig for auditability
Part 2 — Composite health StatusDot:
- CatalogController computes composite health from deployment status +
agent health (green only when RUNNING AND agent live)
- CatalogApp includes healthTooltip (e.g. "Deployment: RUNNING,
Agents: live (1 connected)")
- StatusDot added to app detail header with deployment status Badge
- StatusDot added to deployment table rows
- Sidebar passes composite health + tooltip through to tree nodes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compare app.updatedAt with deployment.deployedAt — if config was
modified after the deployment started, show a primary "Redeploy" button
in the Actions column. Also show a toast hint after saving config:
"Redeploy to apply changes to running deployments."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename cpuShares to cpuRequest (millicores), cpuLimit from cores to
millicores. ResolvedContainerConfig translates to Docker-native units
via dockerCpuShares() and dockerCpuQuota() helpers. Future K8s
orchestrator can pass millicores through directly.
- Fix waitForAnyHealthy to wait for ALL replicas instead of returning
on first healthy one. Prevents false DEGRADED status with 2+ replicas.
- Default app detail to Configuration tab (was Overview)
- Reorder config sub-tabs: Monitoring, Resources, Variables, Traces &
Taps, Route Recording
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Normalize the sidebar selectedPath so the app highlight persists across
tab switches (Dashboard, Runtime, Deployments). Also make sidebar clicks
tab-aware: clicking an app navigates to the current tab's path instead
of always going to /exchanges/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate route catalog (agent-driven) and apps table (deployment-
driven) into a single GET /api/v1/catalog?environment={slug} endpoint.
Apps table is authoritative; agent data enriches with live health,
routes, and metrics. Unmanaged apps (agents without App record) appear
with managed=false.
- Add CatalogController merging App records + agent registry + ClickHouse
- Add CatalogApp DTO with deployment summary, managed flag, health
- Change AppController and DeploymentController to accept slugs (not UUIDs)
- Add AppRepository.findBySlug() and AppService.getBySlug()
- Replace useRouteCatalog() with useCatalog() across all UI components
- Navigate to /apps/{slug} instead of /apps/{UUID}
- Update sidebar, search, and all catalog lookups to use slug
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unify route catalog (agent-driven) and apps table (deployment-driven)
into a single catalog endpoint. Apps table becomes authoritative,
agent data enriches with live health/routes. Slug-based URLs replace
UUIDs for navigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redesign DeploymentProgress component: track-based layout with amber
brand color, checkmarks for completed steps, user-friendly labels
(Prepare, Image, Network, Launch, Verify, Activate, Live)
- Delete terminal (STOPPED/FAILED) deployments before creating new ones
for the same app+environment, preventing duplicate rows in the UI
- Update CLAUDE.md with comprehensive key class locations, correct deploy
stages, database migration reference, and REST endpoint summary
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The runtime-base image and all agent Dockerfiles now read
CAMELEER_SERVER_URL instead of CAMELEER_EXPORT_ENDPOINT.
Updated the volume-mode entrypoint override to match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When CAMELEER_JAR_DOCKER_VOLUME is set, the orchestrator mounts the
named volume at the jar storage path instead of using a host bind mount.
This solves the path translation issue in Docker-in-Docker setups where
the server runs inside a container and manages sibling containers.
The entrypoint is overridden to use the volume-mounted JAR path via
the CAMELEER_APP_JAR env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker's connectToNetworkCmd needs the network ID (not name) and the
container's network sandbox must be ready. Moving network connection
to DeploymentExecutor where DockerNetworkManager handles ID resolution
and the container is already started.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Nginx defaults to 1MB body size, causing 413 on JAR uploads through
the UI proxy. Matches the Spring Boot multipart limit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The opacity:0 approach caused the native "Choose File" button to
appear in the accessibility tree and compete for clicks. The clip
pattern properly hides the input while keeping it functional for
programmatic .click().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Some browsers block programmatic .click() on display:none inputs.
Using position:absolute + opacity:0 keeps the input in the render tree.
Also added type="button" to prevent any form-submission interference.
Applied to both create page and detail view file inputs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Listens to Docker daemon events (die, oom, start, stop) for containers
labeled managed-by=cameleer3-server, updates replica states in Postgres,
and recomputes aggregate deployment status (RUNNING/DEGRADED/FAILED).
Bean is wired in RuntimeOrchestratorAutoConfig via instanceof guard so it
only activates when Docker is available.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extends Deployment with targetState, deploymentStrategy, replicaStates
(List<Map<String,Object>>), and deployStage. Updates withStatus() to
carry the new fields through.
Adds DEGRADED and STOPPING to DeploymentStatus (reordered for lifecycle
clarity). Introduces DeployStage enum for tracking orchestration progress
through PRE_FLIGHT → COMPLETE.
The cameleer-traefik network disables inter-container communication
so app containers cannot reach each other directly — only through
Traefik. Environment networks keep ICC enabled for intra-env comms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per-environment "keep last N versions" setting (default 5, null for
unlimited). Nightly scheduled job at 03:00 deletes old versions from
both database and disk, skipping any version that is currently deployed.
Full stack:
- V6 migration: adds jar_retention_count column to environments
- Environment record, repository, service, admin controller endpoint
- JarRetentionJob: @Scheduled nightly, iterates environments and apps
- UI: retention policy editor on admin Environments page with
toggle between limited/unlimited and version count input
- AppVersionRepository.delete() for version cleanup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New "Default Resource Limits" section in environment detail view with
memory limit/reserve, CPU shares/limit. These defaults apply to new
apps unless overridden per-app.
Added useUpdateDefaultContainerConfig hook for the PUT endpoint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Environment Variables moved from Resources into a dedicated "Variables"
tab, placed first in the tab order since it's the most commonly needed
config when creating new apps.
Tab order:
- Create page: Variables | Monitoring | Resources
- Detail page: Variables | Monitoring | Traces & Taps | Route Recording | Resources
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
As the user types the app name, the URL builds in real-time:
/{envSlug}/{appSlug}/
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Config sub-tabs are now: Monitoring | Traces & Taps | Route Recording | Resources
(renamed from Agent/Infrastructure, with traces and recording as their own tabs).
Also increase Spring multipart max-file-size and max-request-size to 200MB
to fix HTTP 413 on JAR uploads.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ConfigSubTab now uses inner tabs (Agent / Infrastructure):
- Agent: observability settings, compress success, traces & taps table,
route recording toggles
- Infrastructure: container resources, exposed ports, environment variables
This completes the Config tab consolidation — all features from the
standalone Config page now live in the Deployments tab.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logs functionality already exists in Runtime tab (AgentHealth/AgentInstance).
Config functionality moved to Deployments tab ConfigSubTab.
Old routes redirect to /runtime and /apps respectively.
Navigation links updated throughout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents accidental app deletion by requiring the user to type the app
slug before confirming.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline create form with a modal that handles the full flow:
- Name → auto-computed slug (editable if needed)
- Environment picker
- JAR file upload
- "Deploy immediately" toggle (on by default)
- Single "Create & Deploy" button runs all three API calls sequentially
with step indicator
After creation, navigates directly to the new app's detail view.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Overview sub-tab:
- Deployments table with env badge, version, status, URL, deployed time
- Actions (Start/Stop) scoped to selected environment; other envs show
"switch env to manage" hint with muted rows
- Versions list with per-env deploy target picker
Configuration sub-tab:
- Read-only by default with Edit mode gate (Cancel/Save banner)
- Agent observability: engine level, payload capture with size unit
selector, log levels, metrics toggle, sampling, replay and route
control (default enabled)
- Container resources: memory/CPU limits, exposed ports as deletable
pills with inline add input
- Environment variables: key-value editor with add/remove
- Reuses existing ApplicationConfig API for agent config push via SSE
Tab renamed from "Apps" to "Deployments" in the tab bar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The list endpoint on EnvironmentAdminController now overrides the
class-level ADMIN guard with isAuthenticated(), so VIEWERs can see
the environment selector. The LayoutShell merges environments from
both the table and agent heartbeats, so the selector always shows
configured environments even when no agents are connected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
apps.updated_at already exists from V3. The duplicate ALTER caused
Flyway to fail on startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Apps tab visible to OPERATOR+ (hidden for VIEWER), scoped by
sidebar app selection and environment filter
- List view: DataTable with name, environment, updated, created columns
- Detail view: deployments across all envs, version upload with
per-env deploy target, container config form (resources, ports,
custom env vars) with explicit Save
- Memory reserve field disabled for non-production environments
with info hint
- Admin sidebar sorted alphabetically, Applications entry removed
- Old admin AppsPage deleted
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- V5 migration: container_config JSONB + updated_at on apps,
default_container_config JSONB on environments
- App/Environment records updated with new fields
- PUT /apps/{id}/container-config endpoint for per-app config
- PUT /admin/environments/{id}/default-container-config for env defaults
- GET /apps now supports optional environmentId (lists all when omitted)
- AppRepository.findAll() for cross-environment app listing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SplitPane layout with environment selector, app list, and detail pane
- Create/delete apps with slug uniqueness validation
- Upload JAR versions with file size display
- Deploy versions and stop running deployments with status badges
- Deployment list auto-refreshes every 5s for live status updates
- Registered at /admin/apps with sidebar entry
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add GET /api/v1/auth/me endpoint returning current user's UserDetail
- Add AboutMeDialog component with role badges and group memberships
- Add userMenuItems prop to TopBar via design-system update
- Wire "About Me" menu item into user dropdown above Logout
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getDirectRolesForUser filtered on origin='direct', which excluded
roles assigned via claim mapping (origin='managed'). This caused
OIDC users to appear roleless even when claim mappings matched.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When no claim mapping rules are configured or none match the JWT
claims, fall back to assigning the OidcConfig.defaultRoles (e.g.
VIEWER). This restores the behavior that was lost when syncOidcRoles
was replaced with claim mapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SplitPane with create/edit/delete, production flag toggle,
enabled/disabled toggle. Follows existing admin page patterns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Environments now have:
- production (bool): prod vs non-prod resource allocation
- enabled (bool): disabled blocks new deployments
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local login was blocked when OIDC env vars were present, causing
bootstrap to fail (chicken-and-egg: bootstrap needs local auth to
configure OIDC). The backend now accepts both auth paths; the
frontend/UI decides which login flow to present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- JAR storage path, base image, Docker network
- Container memory/CPU limits, health check timeout
- Routing mode and domain for Traefik integration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EnvironmentAdminController: CRUD under /api/v1/admin/environments (ADMIN)
- AppController: CRUD + JAR upload under /api/v1/apps (OPERATOR+)
- DeploymentController: deploy, stop, promote, logs under /api/v1/apps/{appId}/deployments
- Security rule for /api/v1/apps/** requiring OPERATOR or ADMIN role
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Async container deployment with health check polling
- Stops previous deployment before starting new one
- Configurable memory, CPU, health timeout via application properties
- @EnableAsync on application class for Spring async proxy
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EnvironmentService: CRUD with slug uniqueness, default env protection
- AppService: CRUD, JAR upload with SHA-256 checksumming
- DeploymentService: create, promote, status transitions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- EnvironmentRepository, AppRepository, AppVersionRepository, DeploymentRepository
- RuntimeOrchestrator interface with ContainerRequest and ContainerStatus
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SecurityBeanConfig uses Ed25519SigningServiceImpl.ephemeral() when no jwt-secret
- Fixes pre-existing application context failure in integration tests
- Reverts test jwt-secret from application-test.yml (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ClaimMappingAdminControllerIT with create+list and delete tests
- Add adminHeaders() convenience method to TestSecurityHelper
- Add jwt-secret to test profile (fixes pre-existing Ed25519 init failure)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ADMIN-only REST endpoints at /api/v1/admin/claim-mappings
- Full CRUD: list, get by ID, create, update, delete
- OpenAPI annotations for Swagger documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- UiAuthController.login returns 404 when OIDC issuer is configured
- JwtAuthenticationFilter skips internal user tokens in OIDC mode (agents still work)
- UserAdminController.createUser and resetPassword return 400 in OIDC mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- OidcUserInfo now includes allClaims map from id_token + access_token
- OidcAuthController.callback() calls applyClaimMappings instead of syncOidcRoles
- applyClaimMappings evaluates rules, clears managed assignments, applies new ones
- Supports both assignRole and addToGroup actions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add clearManagedAssignments, assignManagedRole, addUserToManagedGroup to interface
- Update assignRoleToUser and addUserToGroup to explicitly set origin='direct'
- Update getDirectRolesForUser to filter by origin='direct'
- Implement managed assignment methods with ON CONFLICT upsert
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GET /api/v1/admin/license returns current license info
- POST /api/v1/admin/license validates and loads new license token
- Requires ADMIN role, validates Ed25519 signature before applying
- OpenAPI annotations for Swagger documentation
- LicenseBeanConfig wires LicenseGate bean with startup validation
- Supports token from CAMELEER_LICENSE_TOKEN env var or CAMELEER_LICENSE_FILE path
- Falls back to open mode when no license or no public key configured
- Add license config properties to application.yml
- JdbcTemplate-based CRUD for claim_mapping_rules table
- RbacBeanConfig wires ClaimMappingRepository and ClaimMappingService beans
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Thread-safe AtomicReference-based license holder
- Defaults to open mode (all features enabled) when no license loaded
- Runtime license loading with feature/limit queries
- Unit tests for open mode and licensed mode
- Evaluates JWT claims against mapping rules
- Supports equals, contains (list + space-separated), regex match types
- Results sorted by priority
- 7 unit tests covering all match types and edge cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Validates payload.signature license tokens using Ed25519 public key
- Parses tier, features, limits, timestamps from JSON payload
- Rejects expired and tampered tokens
- Unit tests for valid, expired, and tampered license scenarios
- AssignmentOrigin enum (direct/managed)
- ClaimMappingRule record with match type and action enums
- ClaimMappingRepository interface for CRUD operations
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add origin and mapping_id columns to user_roles and user_groups
- Create claim_mapping_rules table with match_type and action constraints
- Update primary keys to include origin column
- Add indexes for fast managed assignment cleanup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract inline fontSize/color styles from LogTab, LayoutShell,
UsersTab, GroupsTab, RolesTab, and LevelFilterBar into CSS modules.
Follows project convention of CSS modules over inline styles.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update @cameleer/design-system to v0.1.38 (12px minimum font size).
Replace all 10px and 11px font sizes with 12px across 25 CSS modules
and 5 TSX inline styles to match the new DS floor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Expose getDirectRolesForUser on RbacService interface so syncOidcRoles
compares against directly-assigned roles only, not group-inherited ones
- Remove early-return that preserved existing roles when OIDC returned
none — now always applies defaultRoles as fallback
- Update CLAUDE.md and SERVER-CAPABILITIES.md to reflect changes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logto returns opaque access tokens unless the resource parameter is
included in both the authorization request AND the token exchange.
Append resource to the token endpoint POST body per RFC 8707 so Logto
returns a JWT access token with Custom JWT claims.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend holds client_secret and does the token exchange server-side,
making PKCE redundant. Removes code_verifier/code_challenge from all
frontend auth paths and backend exchange method. Eliminates the source
of "grant request is invalid" errors from verifier mismatches.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OIDC login flow now reads roles from the access_token (JWT) in
addition to the id_token. This fixes role extraction with providers
like Logto that put scopes/roles in access tokens rather than id_tokens.
- Add audience and additionalScopes to OidcConfig for RFC 8707 resource
indicator support and configurable extra scopes
- OidcTokenExchanger decodes access_token with at+jwt-compatible processor,
falls back to id_token if access_token is opaque or has no roles
- syncOidcRoles preserves existing local roles when OIDC returns none
- SPA includes resource and additionalScopes in authorization requests
- Admin UI exposes new config fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
extractRoles() only handled List claims (JSON arrays). When rolesClaim
is configured as "scope", the JWT value is a space-delimited string,
which was silently returning [] and falling back to defaultRoles.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logs received scopes, rolesClaim path, extracted roles, and all claim
keys at each stage of the OIDC auth flow to help debug Logto integration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker build copies package.json before source, so public/ doesn't
exist when npm ci runs postinstall. Use mkdirSync(recursive:true).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
favicon.svg is now copied from @cameleer/design-system/assets on
npm install via postinstall hook. Removed from git tracking
(.gitignore). Updates automatically when DS version changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fire end-session via fetch(no-cors) instead of window.location redirect.
Always navigate to /login?local regardless of whether end-session
succeeds, preventing broken JSON responses from blocking logout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace PNG favicons and brand logos with cameleer3-logo.svg from
@cameleer/design-system/assets. Favicon, login dialog, and sidebar
all use the same SVG. Remove PNG favicon files from public/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DS now exports ./assets/* — import PNGs directly via Vite instead of
copying to public/. Removes duplicated brand files from public/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The @cameleer/design-system package.json exports field doesn't include
assets/, causing production build failures. Copy PNGs to public/ and
reference via basePath until DS adds asset exports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add VITE_APP_VERSION build arg to UI Dockerfile, pass short SHA from
CI docker build step. vite.config.ts truncates to 7 chars so both
CI build and Docker build produce consistent short hashes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Import PNGs via Vite from @cameleer/design-system/assets instead of
copying to public/. Only favicons remain in public/ (needed by HTML).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SVG uses fill=currentColor (inherits text color). Switch to the
full-color PNG brand icons: 192px for login dialog, 48px for sidebar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hand-crafted favicon.svg with official brand assets from
@cameleer/design-system v0.1.32: PNG favicons (16/32px) and
camel-logo.svg for login dialog and sidebar. Update SecurityConfig
public endpoints accordingly. Update documentation for architecture
cleanup (PKCE, OidcProviderHelper, role normalization, K8s hardening,
Dockerfile credential removal, CI deduplication, sidebar path fix).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The standard nginx image requires root to modify /etc/nginx/conf.d
and create /var/cache/nginx directories during startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract OidcProviderHelper for shared discovery + JWK source construction
- Add SystemRole.normalizeScope() to centralize role normalization
- Merge duplicate claim extraction in OidcTokenExchanger
- Add PKCE (S256) to OIDC authorization flow (frontend + backend)
- Add SecurityContext (runAsNonRoot) to all K8s deployments
- Fix postgres probe to use $POSTGRES_USER instead of hardcoded username
- Remove default credentials from Dockerfile
- Extract sanitize_branch() to shared .gitea/sanitize-branch.sh
- Fix sidebar to use /exchanges/ paths directly, remove legacy redirects
- Centralize basePath computation in router.tsx via config module
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logout now always redirects to /login?local, either via OIDC
end_session or as a direct fallback, preventing prompt=none
auto-redirect from logging the user back in immediately.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When on Config tab: clicking an app navigates to /config/:appId (shows
that app's config with detail panel). Clicking a route navigates to
/config/:appId (same app config, since config is per-app not per-route).
Clicking Applications header navigates to /config (all apps table).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Config tab now always visible (not just when app selected). Shows all-
app config table at /config, single-app detail at /config/:appId.
Fixed 404 when clicking sidebar nodes while on Config tab — the sidebar
navigation built /config/appId/routeId which had no route. Now falls
back to exchanges tab for route-level navigation from config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Hide Admin sidebar section for non-ADMIN users
- Add RequireAdmin route guard — /admin/* redirects to / for non-admin
- Move App Config from admin section to main Config tab (per-app,
visible when app selected). VIEWER sees read-only, OPERATOR+ can edit
- Hide diagram node toolbar for VIEWER (onNodeAction conditional)
- Add useIsAdmin/useCanControl helpers to centralize role checks
- Remove App Config from admin sidebar tree
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New input in the Claim Mapping section lets admins configure which
id_token claim is used as the unique user identifier (default: sub).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OIDC user login ID is now configurable via the admin OIDC setup
dialog (userIdClaim field). Supports dot-separated claim paths (e.g.
'email', 'preferred_username', 'custom.user_id'). Defaults to 'sub'
for backwards compatibility. Throws if the configured claim is missing
from the id_token.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Roles from the id_token's rolesClaim are now diffed against stored
system roles on each OIDC login. Missing roles are added, revoked
roles are removed. Group memberships (manually assigned) are never
touched. This propagates scope revocations from the OIDC provider
on next user login.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
M2M scope mapping now accepts both 'server:admin' and 'admin' (case-
insensitive). OIDC user login role assignment strips the 'server:'
prefix before looking up SystemRole, so 'server:viewer' from the
id_token maps to VIEWER correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without BASE_PATH the redirect fails behind a reverse proxy. Adding
?local prevents the SSO auto-redirect from immediately signing the
user back in after logout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logto signs id_tokens with ES384 by default. SecurityConfig already
included it but OidcTokenExchanger only had RS256 and ES256.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When prompt=none fails with consent_required (scopes not yet granted),
retry the OIDC flow without prompt=none so the user can grant consent
once. Uses sessionStorage flag to prevent infinite loops — falls back
to local login if the retry also fails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When OIDC is configured, the login page automatically redirects to the
provider with prompt=none. If the user has an active OIDC session, they
are signed in without seeing a login page. If the provider returns
login_required (no session), falls back to the login form via ?local.
Users can bypass auto-redirect with /login?local.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hardcoded /favicon.svg paths skip the <base> tag and fail when served
from a subpath like /server/. Now uses config.basePath in TSX and a
relative href in index.html so the <base> tag resolves correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Behind a reverse proxy with strip-prefix (e.g., Traefik at /server/),
the OIDC redirect_uri must include the prefix so the callback routes
back through the proxy. Now uses config.basePath (from <base href>)
instead of hardcoding '/'.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OidcTokenExchanger fetched the discovery document from the issuerUri
as-is, but the database stores the issuer URI (e.g. /oidc), not the
full discovery URL. Logto returns 404 for the bare issuer path.
SecurityConfig already appended the well-known suffix — now the token
exchanger does the same.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OidcTokenExchanger cached securityProperties.isOidcTlsSkipVerify() in
the constructor as a boolean field. If Spring constructed the bean
before property binding completed, the cached value was false even when
the env var was set. SecurityConfig worked because it read the property
at call time. Now OidcTokenExchanger stores the SecurityProperties
reference and reads the flag on each call, matching SecurityConfig's
pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Java's automatic redirect following creates new connections that do NOT
inherit custom SSLSocketFactory/HostnameVerifier. This caused the OIDC
discovery fetch to fail on redirect even with TLS_SKIP_VERIFY=true.
Now disables auto-redirect and follows manually with SSL on each hop.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Behind a reverse proxy the browser sends Origin matching the proxy's
public URL, which the single-origin CAMELEER_UI_ORIGIN rejects.
New env var accepts comma-separated origins and takes priority over
UI_ORIGIN, which remains as a backwards-compatible fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Self-signed CA certs on the OIDC provider (e.g. Logto behind a reverse
proxy) cause the login flow to fail because Java's truststore rejects
the connection. This adds an opt-in env var that creates a trust-all
SSLContext scoped to OIDC HTTP calls only (discovery, token exchange,
JWKS fetch) without affecting system-wide TLS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
config.apiBaseUrl now derives from <base> tag when no explicit config
is set (e.g., /server/api/v1 instead of /api/v1). commands.ts authFetch
prepends apiBaseUrl and uses relative paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The second sed matched the just-injected <base href="/server/"> and
rewrote it to <base href="/server/server/">. Since Vite builds with
base: './' (relative paths), the <base> tag alone is sufficient.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When BASE_PATH is set (e.g., /server/), the entrypoint script injects
a <base> tag and rewrites asset paths in index.html. React Router reads
the basename from the <base> tag. Vite builds with relative paths.
Default / for standalone mode (no changes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When set, fetches JWKs from this URL directly instead of discovering
from the OIDC well-known endpoint. Needed when the public issuer URL
(e.g., https://domain.com/oidc) isn't reachable from inside containers
but the internal URL (http://logto:3001/oidc/jwks) is.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pg_isready without -U defaults to OS user "root" which doesn't exist
as a PostgreSQL role, causing noisy log entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
K8s $(VAR) substitution only resolves env vars defined earlier in the
list. PG_USER and PG_PASSWORD must come before DB_URL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LOGTO_ENDPOINT and LOGTO_ADMIN_ENDPOINT are public-facing URLs that
Logto uses for OIDC discovery, issuer URI, and redirects. When behind
a reverse proxy (e.g., Traefik), set these to the external URLs.
Logto requires its own subdomain (not a path prefix).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explicit spring.datasource.url in YAML takes precedence over the env var,
causing deployed containers to connect to localhost instead of the postgres
service. Now the YAML uses ${SPRING_DATASOURCE_URL:...} so the env var
wins when set. Flyway inherits from the datasource (no separate URL).
Removed CAMELEER_DB_SCHEMA — schema is part of the datasource URL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Existing deployment has tables in public schema. The new tenant_default
default breaks startup because Flyway sees an empty schema. Override to
public for backward compat; new deployments use the tenant-derived default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Schema now defaults to tenant_${cameleer.tenant.id} (e.g. tenant_default,
tenant_acme) instead of public. Flyway create-schemas: true ensures the
schema is auto-created on first startup. CAMELEER_DB_SCHEMA env var still
available as override for feature branch isolation. Removed hardcoded
public schema from K8s base and main overlay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive standalone document covering API surface, agent protocol,
security, storage, multi-tenancy, deployment, and configuration — designed
for external systems (like the SaaS orchestration layer) that need to
understand and manage Cameleer3 Server instances.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Environment selector was losing its value on navigation because URL search
params were silently dropped by navigate() calls. Moved to a Zustand store
with localStorage persistence so the selection survives navigation, page
refresh, and new tabs. Switching environment now resets all filters, clears
URL params, invalidates queries, and remounts pages via Outlet key. Also
syncs openapi.json schema with running backend.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HeartbeatRequest now carries environmentId (cameleer3-common update).
Auto-heal prefers the heartbeat value (most current) over the JWT
claim, ensuring agents recover their correct environment immediately
on the first heartbeat after server restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 'env' claim to agent JWTs (set at registration, carried through
refresh). Auto-heal on heartbeat/SSE now reads environment from the
JWT instead of hardcoding 'default', so agents retain their correct
environment after server restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DS v0.1.31 changes .env wrapper to neutral button style matching
other TopBar controls. Simplified selector CSS to inherit all
font/color properties from the wrapper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make the select transparent (no border, no background) so it
inherits the DS .env pill styling (success-colored badge with
mono font). Negative margins compensate for the pill padding.
Dropdown chevron uses currentColor to match the pill text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use unfiltered agent query to discover environments (avoids circular
filter). Always show selector even with single environment so it's
visible as a label. Default to ['default'] when no agents connected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update @cameleer/design-system to v0.1.30 which accepts ReactNode
for the environment prop. Move EnvironmentSelector from standalone
div into TopBar, rendering between theme toggle and user menu.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: Added optional `environment` query parameter to catalog,
search, stats, timeseries, punchcard, top-errors, logs, and agents
endpoints. ClickHouse queries filter by environment when specified
(literal SQL for AggregatingMergeTree, ? binds for raw tables).
StatsStore interface methods all accept environment parameter.
UI: Added EnvironmentSelector component (compact native select).
LayoutShell extracts distinct environments from agent data and
passes selected environment to catalog and agent queries via URL
search param (?env=). TopBar shows current environment label.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.
Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
"default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
updated ORDER BY (tenant→time→env→app), added tenant_id to
usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures
Closes#123
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When no exchange is selected, the topology-only diagram now shows
the RouteControlBar above it (if the agent supports routeControl
or replay and the user has OPERATOR/ADMIN role). This fixes a gap
where suspended routes with no recent exchanges had no way to be
resumed from the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The heartbeat now carries capabilities (per protocol v2 update).
On each heartbeat, capabilities are updated in the agent registry.
On auto-heal (server restart), capabilities from the heartbeat
are used instead of empty Map.of(), so the agent's feature flags
(replay, routeControl, logForwarding, etc.) are restored
immediately on the first heartbeat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert ClickHouseUsageTracker and ClickHouseMetricsQueryStore to
use JDBC parameterized queries (? binds) — these query raw tables
without AggregateFunction columns.
Fix lit(String) in RouteMetricsController and ClickHouseStatsStore
to escape backslashes before single quotes. Without this, an input
like \' breaks out of the string literal in ClickHouse (where \
is an escaped backslash). These must remain as literal SQL because
the ClickHouse JDBC 0.9.x driver wraps PreparedStatement in
sub-queries that strip AggregateFunction types, breaking -Merge
combinators.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Running mvn sonar:sonar as a separate invocation skips child
modules. Combining verify and sonar:sonar in a single mvn
command ensures the reactor processes all modules.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Maven sonar plugin auto-detects sources and tests from the POM
module structure. Passing sonar.sources as CLI args caused path
doubling (module-dir/module-dir/src) in multi-module projects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The standalone sonar-scanner CLI has Java discovery issues in the
build container. Switch to the Maven sonar plugin (same approach
as cameleer3 agent repo), which uses Maven's own JDK. This also
removes the sonar-scanner download/install step entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonar-scanner 6.x checks SONAR_SCANNER_JAVA_HOME, not JAVA_HOME.
Despite JAVA_HOME being correct and java being on PATH, the scanner
uses its own env var for Java discovery.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
java binary may not be on PATH directly in the build container.
Derive JAVA_HOME from the jar binary location (which we know works)
and prepend JAVA_HOME/bin to PATH so sonar-scanner can find java.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonar-scanner 6.x requires JAVA_HOME or java on PATH. The build
container has Java installed but doesn't export JAVA_HOME, so
derive it from the java binary location.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jar xf doesn't preserve Unix file permissions from zip entries,
so the sonar-scanner binary lacks the execute bit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diagnostics showed ~3,200 tiny inserts per 5 minutes:
- processor_executions: 2,376 inserts (14 rows avg) — one per chunk
- logs: 803 inserts (5 rows avg) — synchronous in HTTP handler
Fix 1: Consolidate processor inserts — new insertProcessorBatches() method
flattens all ProcessorBatch records into a single INSERT per flush cycle.
Fix 2: Buffer log inserts — route through WriteBuffer<BufferedLogEntry>,
flushed on the same 5s interval as executions. LogIngestionController now
pushes to buffer instead of inserting directly.
Also reverts async_insert config (doesn't work with JDBC inline VALUES).
Expected: ~3,200 inserts/5min → ~160 (20x reduction in part creation,
MV triggers, and background merge work).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diagnostics showed 3,200 tiny inserts per 5 minutes (processor_executions:
2,376 at 14 rows avg, logs: 803 at 5 rows avg), each creating a new part
and triggering MV aggregations + background merges. This was the root cause
of ~400m CPU usage at 3 tx/s.
async_insert=1 with 5s busy timeout lets ClickHouse buffer incoming inserts
and consolidate them into fewer, larger parts before writing to disk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkAccumulator now extracts inputBody/outputBody/inputHeaders/outputHeaders
from ExecutionChunk.inputSnapshot/outputSnapshot instead of storing empty strings
- Set ClickHouse server log level to warning (was trace by default)
- Update CLAUDE.md to document Ed25519 key derivation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace DB-persisted keypair with deterministic derivation from
CAMELEER_JWT_SECRET via HMAC-SHA256 seed + seeded SHA1PRNG KeyPairGenerator.
Same secret = same key pair across restarts, no private key in the database.
Closes#121
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The keypair was generated ephemerally on each startup, causing agents
to reject all commands after a server restart (signature mismatch).
Now persisted to PostgreSQL server_config table and restored on startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Merge all V1-V11 migration scripts into one idempotent init.sql
- Simplify ClickHouseSchemaInitializer to load single file
- Replace route_diagrams projection with in-memory caches:
hashCache (routeId+instanceId → contentHash) warm-loaded on startup,
graphCache (contentHash → RouteGraph) lazy-populated on access
- Eliminates 9M+ row scans on diagram lookups
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse 24.12 requires this setting before adding projections to
ReplacingMergeTree tables. Using 'drop' mode which discards the projection
during deduplication merges and rebuilds it afterward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Alphabetical sort put V10/V11 before V2-V9 ("V11" < "V1_" in ASCII),
causing the route_diagrams projection to run before the table existed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set com.clickhouse log level to INFO and org.apache.hc.client5 to WARN
- Admin hooks (useUsers/useGroups/useRoles) now only fetch on admin pages,
eliminating AUDIT view_users entries on every UI click
- Add ClickHouse projection on route_diagrams for (tenant_id, route_id,
instance_id, created_at) to avoid full table scans on diagram lookups
- Bump @cameleer/design-system to v0.1.28 (PAUSED mode time range fix,
refreshTimeRange API)
- Call refreshTimeRange before invalidateQueries in PAUSED mode manual
refresh so sidebar clicks use current time window
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On first click, Dashboard was in non-split mode. The click set
selectedId locally then triggered split view, which remounted
Dashboard — losing the selectedId state.
Added activeExchangeId prop passed from ExchangesPage so the
selection survives the remount. Also syncs via useEffect when
parent changes selection (e.g. correlated exchange navigation).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
useLiveQuery returned enabled:false when paused, which prevented
queries from running at all. Changed to enabled:true always —
PAUSED now means "fetch once, no polling" instead of "don't fetch".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After server restart, auto-healed agents register with empty
routeIds. The catalog only looked at agent registry for routes,
so routes and counts disappeared.
Now merges route IDs from ClickHouse stats_1m_route into the
catalog. Also includes apps that only exist in ClickHouse data
(no agent currently registered). Routes and exchange counts
survive server restarts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When autoRefresh is disabled, sidebar clicks now invalidate all
queries (queryClient.invalidateQueries()), triggering a re-fetch.
This gives users "click to refresh" behavior instead of stale data.
When LIVE mode is on, queries already poll at intervals, so no
invalidation is needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the Applications section is already expanded, clicking the
header now navigates to /{tab} (all applications) instead of
collapsing. When collapsed, clicking expands as before.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three related issues caused by in-memory agent registry being empty
after server restart:
1. JwtAuthenticationFilter rejected valid agent JWTs if agent wasn't
in registry — now authenticates any valid JWT regardless
2. Heartbeat returned 404 for unknown agents — now auto-registers
the agent from JWT claims (subject, application)
3. SSE endpoint returned 404 — same auto-registration fix
JWT validation result is stored as a request attribute so downstream
controllers can extract the application claim for auto-registration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The refresh endpoint required the agent to exist in the in-memory
registry. After server restart the registry is empty, so all refresh
attempts got 404. The refresh token itself is self-contained with
subject, application, and roles — the registry lookup is optional.
Now uses application from the JWT, falling back to registry only
if the agent happens to be registered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The system.processes query was returning its own row. Added
filter: query NOT LIKE '%system.processes%'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AdminLayout was a plain div with padding but no scroll. The parent
<main> has overflow:hidden, so admin page content beyond viewport
height was clipped. Added flex:1, overflow:auto, minHeight:0 to
make AdminLayout a proper scroll container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
UsageFlushScheduler was a @Component with @ConditionalOnBean, but
ClickHouseUsageTracker is created via @Bean — component scan runs
first, so the condition always evaluated false. Events accumulated
in the WriteBuffer but flush() was never called.
Moved scheduler to @Bean in StorageBeanConfig with the same
@ConditionalOnProperty guard as the tracker.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Icons now reflect event type (UserPlus for registration, Skull
for dead, HeartPulse for recovery, Route for state changes, etc.)
while severity still drives the color. Updated in both
AgentInstance and AgentHealth pages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Disabled buttons now show reduced opacity (0.35) and muted icon
color instead of just changing the cursor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Buttons are disabled based on route state: Started disables
Start/Resume, Stopped disables Stop/Suspend/Resume, Suspended
disables Start/Suspend. State looked up from catalog API.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Version injected at build time via VITE_APP_VERSION env var.
CI sets it to branch@sha. Falls back to 'dev' in local dev.
Displayed next to "Cameleer" in the sidebar header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
collectStarredItems used 'app:' prefix for route keys but
buildAppTreeNodes uses 'route:' prefix. Routes were starred
but never matched in the starred section.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Admin section stays in its fixed position (after Starred, before
Footer). Entering admin mode collapses Applications and Starred
but does not reorder sections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove Agents and Routes sections from sidebar. Layout is now:
Header (camel logo + Cameleer) → Search → Applications section →
Starred section (when items exist) → Footer (Admin + API Docs).
Admin accordion: clicking Admin navigates to /admin/rbac and
expands Admin section at top while collapsing Applications and
Starred. Clicking Applications exits admin mode.
Removed buildAgentTreeNodes and buildRouteTreeNodes from
sidebar-utils (no longer needed).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pool was hardcoded to 10 connections serving 7 concurrent write
streams + UI reads, causing "too many simultaneous queries" and
WriteBuffer overflow. Pool now defaults to 50 (configurable via
clickhouse.pool-size), flush interval reduced from 1000ms to 500ms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace local HeartbeatRequest DTO with the shared model from
cameleer3-common. Message types exchanged between server and agent
belong in the common module.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ACK-based route state inference with agent-reported state.
Heartbeats now carry optional routeStates map, and ROUTE_STATE_CHANGED
events update the registry immediately.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defines two backward-compatible mechanisms for accurate route state
tracking: heartbeat extension (routeStates map in heartbeat body)
and ROUTE_STATE_CHANGED events for real-time updates. Covers
agent-side detection via Camel EventNotifier, server-side handling,
multi-agent conflict resolution, and migration path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add routeState field to RouteSummary DTO (null for started, 'stopped'
or 'suspended' for non-default states). Sidebar shows stop/pause icons
and state badge for affected routes in both Apps and Routes sections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In-memory registry that infers route state (started/stopped/suspended)
from successful route-control command ACKs. Updates state only when all
agents in a group confirm success.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CommandGroupResponse and ConfigUpdateResponse types. Switch
useSendGroupCommand and useSendRouteCommand from openapi-fetch to authFetch
returning CommandGroupResponse. Update useUpdateApplicationConfig to return
ConfigUpdateResponse and fix all consumer onSuccess callbacks to access
saved.config.version instead of saved.version.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add addGroupCommandWithReplies() to AgentRegistryService that sends commands
to all LIVE agents in a group and returns CompletableFuture per agent for
collecting replies. Update sendGroupCommand() and pushConfigToAgents() to
wait with a shared 10-second deadline, returning CommandGroupResponse with
per-agent status, timeouts, and overall success. Config update endpoint now
returns ConfigUpdateResponse wrapping both the saved config and push result.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stop and suspend route commands now show a ConfirmDialog requiring
typed confirmation before dispatch. Start and resume execute
immediately without confirmation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Direct navigation to /admin/* now correctly opens Admin section
and collapses operational sections on first render. Previously
the accordion effect only triggered on route transitions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review feedback: buildRouteTreeNodes was defined but never rendered.
Added Routes section between Agents and Admin. Removed duplicate
padding on admin pages (AdminLayout handles its own padding).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Search: DS renders dumb input, app owns filterQuery state and
passes it to each SidebarTree. Icon-rail click: fires both
onCollapseToggle and onToggle simultaneously, no navigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the previous "hide sidebar on admin" approach with a
composable compound component design. DS provides shell + building
blocks (Sidebar, Section, Footer, SidebarTree); consuming app
controls all content, section ordering, accordion behavior, and
icon-rail collapse.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review feedback: breadcrumb memo had an unused isAdminPage branch
(TopBar no longer renders on admin pages). Added aria-label to
icon-only logout button for screen readers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AdminLayout gains a self-contained header (Back / Admin / user+logout)
with CSS module styles, replacing the inline padding wrapper. Admin
pages now render fully without the main app chrome.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass null as sidebar prop, guard TopBar and CommandPalette with
!isAdminPage, and remove conditional admin padding from main element.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LIKE is case-sensitive in ClickHouse. Switch to ILIKE for message,
stack_trace, and logger_name searches so queries match regardless
of casing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Recursive case-insensitive highlighting of the search query in
collapsed message, expanded full message, and stack trace. Uses the
project's amber accent color for the highlight mark.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use attributeBadgeColor() (hash-based) instead of "auto" so the same
application name gets the same badge color across all pages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse 24.12 new query analyzer resolves countMerge(total_count)
in the CASE WHEN to the SELECT alias (UInt64) instead of the original
AggregateFunction column when the alias has the same name. Renamed
aliases to tc/fc to avoid the collision.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse rejects countMerge() in ORDER BY after GROUP BY because the
column is already finalized to UInt64. Use the SELECT alias instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The build container lacks unzip. The JDK jar command handles zip
extraction natively.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The table and materialized view were missing the processor_type column,
causing the RouteMetricsController query to fail and the dashboard
processor metrics table to render empty.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GET /search/attributes/keys endpoint that queries distinct
attribute key names from ClickHouse using JSONExtractKeys. Attribute
keys appear in the cmd-k Attributes tab alongside attribute value
matches from exchange results.
- SearchIndex.distinctAttributeKeys() interface method
- ClickHouseSearchIndex implementation using arrayJoin(JSONExtractKeys)
- SearchController /attributes/keys endpoint
- useAttributeKeys() React Query hook
- buildSearchData includes attribute keys as 'attribute' category items
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Routes with the same name across different applications (e.g., "route1"
in both QUARKUS-APP and BACKEND-APP) were deduplicated because they
shared the same id (routeId). Use appId/routeId as the id so all
routes appear in cmd-k results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The searchData useMemo recomputed on every catalog poll cycle because
catalogData got a new array reference even when content was unchanged.
This caused the CommandPalette list to re-render and reset scroll.
Use a ref with deep equality check to keep a stable catalog reference,
only updating when the actual data changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The command palette renders matchContext via dangerouslySetInnerHTML
expecting HTML with <mark> tags, but extractSnippet() returned plain
text. Wrap the matched term in <mark> tags and escape surrounding
text to prevent XSS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ID fields (execution_id, correlation_id, exchange_id) should use
exact equality, not LIKE with wildcards. LIKE is only needed for
the _search_text full-text columns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The _search_text materialized column only contained error messages,
bodies, and headers — not execution_id, correlation_id, exchange_id,
or route_id. Searching by ID via cmd-k returned no results.
- Add ID fields to _search_text in ClickHouse DDL (covered by ngram
bloom filter index)
- Add direct LIKE matches on execution_id, correlation_id, exchange_id
in the text search WHERE clause for faster exact ID lookups
Requires ClickHouse table recreation (fresh install).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sidebar selectedPath now uses sidebarReveal on all tabs, not just
exchanges. This fixes sidebar highlighting on dashboard and runtime.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The chunked ingestion path hardcoded hasTraceData=false because the
execution envelope doesn't carry processor bodies. But the processor
records DO have inputBody/outputBody — we just need to check them.
Track hasTraceData across chunks in PendingExchange and pass it to
MergedExecution when the final chunk arrives or on stale sweep.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The identity rename (application→applicationId) broke search filtering
because the stale schema.d.ts still had 'application' as the field name.
The backend silently ignored the unknown field, returning unfiltered results.
- Regenerate openapi.json and schema.d.ts from live backend
- Fix Dashboard: application→applicationId in search request
- Fix RouteDetail: application→applicationId in search request (2 places)
- LayoutShell: scope command palette search by appId/routeId
- LayoutShell: pass sidebarReveal state on sidebar click navigation
Note for DS team: the Sidebar selectedPath logic (line 5451 in dist)
has a hardcoded pathname.startsWith("/exchanges/") guard. This should
be broadened to simply `S ? S : $.pathname` so sidebarReveal works on
all tabs (dashboard, runtime), not just exchanges.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:
- Pass sidebarReveal state on sidebar navigation so the design system
can highlight the selected entry (it compares internal /apps/... paths
against this state value, not the browser URL)
- Command palette search now includes scope.appId and scope.routeId
so results are filtered to the current sidebar selection
Note: sidebar highlighting works on the exchanges tab. The design
system's selectedPath logic only checks pathname.startsWith("/exchanges/")
for sidebarReveal — a DS update is needed to support /dashboard/ and
/runtime/ tabs too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The AgentInstanceResponse backend DTO uses instanceId, displayName,
applicationId, status — but the stale schema.d.ts still had id, name,
application, state. This caused the runtime table to show no data.
- Update schema.d.ts AgentInstanceResponse fields
- Fix AgentHealth: row.id→instanceId, row.name→displayName,
row.application→applicationId, inst.id→instanceId
- Fix AgentInstance: agent.id→instanceId, agent.name→displayName
- Fix ExchangeHeader: agent.id→instanceId, agent.state→status
- Fix LayoutShell search: agent.state→status, agentTps→tps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite ClickHouse admin to show useful storage metrics instead of
often-empty system.events data. Add active queries section.
- Replace performance endpoint: query system.parts for disk size,
uncompressed size, compression ratio, total rows, part count
- Add /queries endpoint querying system.processes for active queries
- Frontend: storage overview strip, tables with total size, active
queries DataTable
- Fix AgentHealth.tsx type: agentId → instanceId in inline type cast
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.
- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
(docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Applications, routes within each app, and agents within each app
are now sorted by name using localeCompare.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backend identity rename (applicationName → applicationId,
agentId → instanceId) was not reflected in the frontend. This caused
drilldown to fail (detail.applicationName was undefined, disabling
the diagram fetch) and various display issues.
Updated schema.d.ts, ExchangeHeader, ExecutionDiagram, Dashboard,
AgentHealth, AgentInstance, LayoutShell, LogTab, InfoTab, DetailPanel,
ExchangesPage, and tracing-store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tracks authenticated UI user requests to understand usage patterns:
- New ClickHouse usage_events table with 90-day TTL
- UsageTrackingInterceptor captures method, path, duration, user
- Path normalization groups dynamic segments ({id}, {hash})
- Buffered writes via WriteBuffer + periodic flush
- Admin endpoint GET /api/v1/admin/usage with groupBy=endpoint|user|hour
- Skips agent requests, health checks, and data ingestion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Path containers (EIP_WHEN, EIP_OTHERWISE, etc.) don't have their own
processor records, so they never get an overlay entry. Now inferred
from descendants: green if any descendant executed, red if any failed.
Gated (amber) only when no descendants executed at all.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A container is only gated (amber) when filterMatched=false or
duplicateMessage=true AND no descendants were executed. Containers
with executed children (split, choice, idempotent that passed) now
correctly show green/red based on their execution status.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CompoundNode now uses execution overlay status to color its header:
failed (red) > completed (green) > default. Previously only used
static type-based color regardless of execution state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace synthetic wrapper node approach with direct iteration fields:
- ProcessorNode gains iteration (child's index) and iterationSize
(container's total) fields, populated from ClickHouse flat records
- Frontend hooks detect iteration containers from iterationSize != null
instead of scanning for wrapper processorTypes
- useExecutionOverlay filters children by iteration field instead of
wrapper nodes, eliminating ITERATION_WRAPPER_TYPES entirely
- Cleaner data contract: API returns exactly what the DB stores
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteCatalogController, RouteMetricsController, and AgentRegistrationController
had unqualified JdbcTemplate injection, receiving the PostgreSQL template
instead of ClickHouse. The stats queries silently failed (caught exception)
returning 0 counts. Added @Qualifier("clickHouseJdbcTemplate") to all three.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sets TZ=UTC and -Duser.timezone=UTC to guarantee all JVM time operations
use UTC regardless of the container's base image or host configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Timestamp.toString() uses JVM local timezone which can mismatch with
ClickHouse's UTC timezone, causing time-filtered queries to return empty
results. Replaced with DateTimeFormatter.withZone(UTC) in all lit() methods.
Also added warn logging to RouteCatalogController catch blocks to surface
query errors instead of silently swallowing them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse can't rename columns that are part of ORDER BY keys.
Updated V1-V8 DDL files directly with new column names (instance_id,
application_id) and removed V9 migration. Wipe ClickHouse and restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align all internal naming with the agent team's protocol v2 identity rename:
- agentId → instanceId (unique per-JVM identifier)
- applicationName → applicationId (shared app identifier)
- AgentInfo: id → instanceId, name → displayName, application → applicationId
Add SHUTDOWN lifecycle state for graceful agent shutdowns:
- New POST /data/events endpoint receives agent lifecycle events
- AGENT_STOPPED event transitions agent to SHUTDOWN (skips STALE/DEAD)
- New POST /{id}/deregister endpoint removes agent from registry
- Server now distinguishes graceful shutdown from crash (heartbeat timeout)
Includes ClickHouse V9 and PostgreSQL V14 migrations for column renames.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ElkDiagramRenderer.getElkRoot(): add null guard to prevent NPE
when node is null (SQ java:S2259)
- WriteBuffer: add offerOrWarn() that logs when buffer is full instead
of silently dropping data. ChunkAccumulator now uses this method
so ingestion backpressure is visible in logs (SQ java:S899)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChunkAccumulator now injects DiagramStore and looks up the content hash
when converting to MergedExecution. Without this, the detail page had
no diagram hash, so the overlay couldn't find the route diagram.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteCatalogController, RouteMetricsController, AgentRegistrationController
all had inline SQL using SUM() on AggregateFunction columns from stats_1m_*
AggregatingMergeTree tables. Replace with countMerge/countIfMerge/sumMerge.
Also fix time_bucket() → toStartOfInterval() and ::double → toFloat64().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES=false (required
by PROTOCOL.md) and explicit TypeReference<List<ExecutionChunk>> for
array parsing. Without this, batched chunks from ChunkedExporter
(2+ chunks in a JSON array) were silently rejected, causing final:true
chunks to be lost and all exchanges to go stale.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouseExecutionStore implements ExecutionStore, so the concrete bean
already satisfies the interface — remove redundant wrapper bean. Align
ChunkAccumulator and ExecutionFlushScheduler conditions to
cameleer.storage.executions flag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dual-mode buildTree: detects seq presence and uses seq/parentSeq linkage
instead of processorId map. Handles duplicate processorIds across
iterations correctly. Old processorId-based mode kept for PG compat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements ExecutionStore interface with findById (FINAL for
ReplacingMergeTree), findProcessors (ORDER BY seq), findProcessorById,
and findProcessorBySeq. Write methods unchanged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract LogIndex interface from OpenSearchLogIndex. Both ClickHouseLogStore
and OpenSearchLogIndex implement it. Controllers now inject LogIndex.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace partial memory config with full Altinity low-memory guide
settings. Revert container limit from 6Gi back to 4Gi — proper
tuning (mlock=false, reduced caches/pools/threads, disk spill for
aggregations) makes the original budget sufficient.
Switch all storage feature flags to ClickHouse:
- CAMELEER_STORAGE_SEARCH: opensearch → clickhouse
- CAMELEER_STORAGE_METRICS: postgres → clickhouse
- CAMELEER_STORAGE_STATS: already clickhouse
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkIngestionController: /data/chunks → /data/executions (matches
PROTOCOL.md endpoint the agent actually posts to)
- ExecutionController: conditional on ClickHouse being disabled to
avoid mapping conflict
- Persist originalExchangeId and replayExchangeId from ExecutionChunk
envelope through to ClickHouse (was silently dropped)
- V5 migration adds the two new columns to executions table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cameleer3-common removed children, loopIndex, splitIndex,
multicastIndex from ProcessorExecution (flat model only now).
Iteration context lives on synthetic wrapper nodes via processorType.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse 24.12 auto-sizes caches from the cgroup limit, leaving
insufficient headroom for MV processing and background merges.
Adds a custom config that shrinks mark/index/expression caches and
caps per-query memory at 2 GiB. Bumps container limit 4Gi → 6Gi.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The V4 DDL had a semicolon inside a comment which caused the
split-on-semicolon logic to produce a comment-only segment that
ClickHouse rejected as empty query. Fixed the comment and made
the initializer strip comment-only segments before execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements StatsStore interface for ClickHouse using AggregatingMergeTree
tables with -Merge combinators (countMerge, countIfMerge, sumMerge,
quantileMerge). Uses literal SQL for aggregate table queries to avoid
ClickHouse JDBC driver PreparedStatement issues with AggregateFunction
columns. Raw table queries (SLA, topErrors, activeErrorTypes) use normal
prepared statements.
Includes 13 integration tests covering stats, timeseries, grouped
timeseries, SLA compliance, SLA counts by app/route, top errors, active
error types, punchcard, and processor stats. Also fixes AggregateFunction
type signatures in V4 DDL (count() takes no args, countIf takes UInt8).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The jdbcTemplate() method was calling dataSource(properties) directly,
creating a new DataSource instance instead of using the Spring-managed
@Primary bean. This caused some repositories to receive the ClickHouse
connection instead of PostgreSQL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DDL for executions (ReplacingMergeTree) and processor_executions (MergeTree with seq/parentSeq/iteration)
- ClickHouseExecutionStore with batch INSERT for both tables
- ChunkAccumulator: buffers exchange envelope across chunks, inserts processors immediately, writes execution on final chunk
- ExecutionFlushScheduler drains WriteBuffers to ClickHouse
- ChunkIngestionController: POST /api/v1/data/chunks endpoint
- ClickHouseSearchIndex: ngram-accelerated SQL search implementing SearchIndex interface
- Feature flags: cameleer.storage.search=opensearch|clickhouse
- Uses cameleer3-common ExecutionChunk and FlatProcessorRecord models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ExecutionFlushScheduler drains MergedExecution and ProcessorBatch write
buffers on a fixed interval and delegates batch inserts to
ClickHouseExecutionStore. Also sweeps stale exchanges every 60s.
ChunkIngestionController exposes POST /api/v1/data/chunks, accepts
single or array ExecutionChunk payloads, and feeds them into the
ChunkAccumulator. Conditional on ChunkAccumulator bean (clickhouse.enabled).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When clickhouse.enabled=true, the ClickHouse JdbcTemplate bean prevents
Spring Boot auto-config from creating the default PG JdbcTemplate.
All PG repositories then get the CH JdbcTemplate and fail with
"Table cameleer.audit_log does not exist".
Fix: explicitly create @Primary DataSource and JdbcTemplate from
DataSourceProperties so PG remains the default for unqualified injections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clickhouse-jdbc 0.9.7 rejects async_insert and wait_for_async_insert as
unknown URL parameters. These are server-side settings, not driver config.
Can be set per-query later if needed via custom_settings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set CLICKHOUSE_USER/PASSWORD via k8s secret (fixes "disabling network
access for user 'default'" when no password is set)
- Add clickhouse-credentials secret to CI deploy + feature branch copy
- Pass CLICKHOUSE_USERNAME/PASSWORD env vars to server pod
- Make schema initializer non-fatal so server starts even if CH is
temporarily unavailable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The deploy/clickhouse.yaml manifest was created but not referenced
in the CI workflow. Add kubectl apply between OpenSearch and Authentik.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse only has the 'default' database out of the box. The JDBC URL
connects to 'cameleer', so the database must exist before the server starts.
Uses /docker-entrypoint-initdb.d/ init script via ConfigMap.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Testcontainers tests need Docker which isn't available in CI.
Rename to *IT so Surefire skips them (Failsafe runs them with -DskipITs=false).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements MetricsQueryStore using ClickHouse toStartOfInterval() for
time-bucketed aggregation queries; verified with 4 Testcontainers tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TDD implementation of MetricsStore backed by ClickHouse. Uses native
Map(String,String) column type (no JSON cast), relies on ClickHouse
DEFAULT for server_received_at, and handles null tags by substituting
an empty HashMap. All 4 Testcontainers tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ClickHouseSchemaInitializer that runs on ApplicationReadyEvent,
scanning classpath:clickhouse/*.sql in filename order and executing each
statement. Adds V1__agent_metrics.sql with MergeTree table, tenant/agent
partitioning, and 365-day TTL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds ClickHouseProperties (bound to clickhouse.*), ClickHouseConfig
(conditional HikariDataSource + JdbcTemplate beans), and extends
application.yml with clickhouse.enabled/url/username/password and
cameleer.storage.metrics properties.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ExchangesPage ignored the exchangeId URL parameter, so selecting an
exchange from the command palette navigated to the right URL but never
displayed the execution overlay. Now derives selection from URL params
as fallback, and LayoutShell passes selectedExchange in state for
exchange/attribute results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ExecutionDocument and ExecutionRecord records gained an isReplay
field but the integration tests were not updated, breaking CI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The X-Cameleer-Replay header is only available when inputSnapshot is
captured (DETAILED/DEEP engine level). The agent always sets
replayExchangeId on RouteExecution, so check that first.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ElkDiagramRenderer: guard against null containingNode before getElkRoot()
- OpenSearchAdminController: return 503/502 instead of 200 on errors
- DatabaseAdminController: return 503 instead of 200 on connection failure
- SpaForwardController: replace unbound {path} variables with /** wildcards
- WriteBuffer: check offer() return value and log on unexpected rejection
- ApiExceptionHandler: extract getReason() to local var for null safety
- Admin UI pages: handle isError state for disconnected service display
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detect replayed exchanges via X-Cameleer-Replay header during ingestion,
persist the flag through PostgreSQL and OpenSearch, and surface it in
the dashboard (amber replay icon) and exchange detail chain view.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exchanges with a _replay attribute now display a small amber
RotateCcw icon between the status dot and route name in the
correlation chain. Tooltip also indicates (replay).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replay audit log now records the agent's reply status (SUCCESS/FAILURE),
message, and error details. Timeout and internal errors are also logged
as FAILURE with the cause.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add dedicated POST /agents/{id}/replay endpoint that uses
addCommandWithReply to wait for the agent ACK (30s timeout).
Returns the actual replay result (status, message, data) instead
of just a delivery confirmation.
Frontend toast now reflects the agent's response: "Replay completed"
on success, agent error message on failure, timeout message if the
agent doesn't respond.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ROUTE_CONTROL command type and route-control mapping in
AgentCommandController. New RouteControlBar component in the exchange
header shows Start/Stop/Suspend/Resume actions (grouped pill bar) and
a Replay button, gated by agent capabilities and OPERATOR/ADMIN role.
Fix useReplayExchange hook to match protocol section 16: payload now
uses { routeId, exchange: { body, headers }, originalExchangeId, nonce }
instead of the flat { headers, body } format.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- findProcessorInTree now skips non-selected iteration wrappers so
the returned ProcessorNode has data from the correct iteration
- Gate selectedProcessor on overlay presence so processors not
executed in the current iteration don't show in the detail panel
- Header shows "Exchange Details" or "Processor Details" contextually
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CompoundNode (circuit breaker, choice, etc.) now renders at 0.35
opacity when the overlay is active but neither the compound itself
nor any of its diagram descendants appear in the execution overlay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Render hasTrace/hasTap/status badges inside the node card in both
raw diagram and overlay modes (consistent positioning)
- Pulse only on trace badge in overlay mode when hasTraceData is true
- Fix nodeConfigs to read tracedProcessors from appConfig instead of
never-synced tracing store
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove redundant processor name, status, ID, and duration from the
header bar — all visible in the Info tab and diagram overlay already.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace DetailPanel overlay with direct navigation to
/runtime/:appId/:instanceId on row click. Removes the slide-in panel,
AgentOverviewContent, and AgentPerformanceContent helper components.
The full AgentInstance page already provides all the same data plus
more (charts, routes, logs).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Recharts ScatterChart with compact SVG grid of small rounded
squares (11x11px, 2px gap). 7 rows (Mon-Sun) x 24 columns (hours).
Color intensity = value relative to max. Transactions = blue scale,
Errors = red scale. Toggle switches between modes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace two separate Transaction/Error punchcard cards with a single
card containing a Transactions/Errors toggle. Uses internal state to
switch between modes without remounting the chart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add placeholderData to useRouteMetrics and usePunchcard hooks so data
stays stable between refetches instead of going undefined → flicker
- Disable Recharts animation on Treemap (isAnimationActive=false)
- Make .content scrollable (overflow-y: auto, flex: 1, min-height: 0)
so charts below the fold are accessible
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Regenerated schema.d.ts from live backend — now includes slaCompliance
on ExecutionStats/RouteMetrics, filterMatched/duplicateMessage on
ProcessorNode, and all new dashboard endpoints (timeseries/by-app,
timeseries/by-route, punchcard, errors/top, app-settings).
Removed Record<string, unknown> casts that were working around the
stale schema.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- StatusDot: status → variant (correct prop name)
- Badge: color="muted" → color="auto" (valid BadgeColor)
- AreaChart: remove stacked prop (not in AreaChartProps)
- DataTable: remove defaultSort prop (not in DataTableProps)
- TopError → ErrorRow with id field (DataTable requires T extends {id})
- slaCompliance: type assertion for runtime field not in TS schema
- PunchcardHeatmap Scatter shape: proper typing for custom renderer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom SVG chart implementations with Recharts components:
- Treemap: uses Recharts Treemap with custom content renderer for
SLA-colored cells, labels, and click navigation
- PunchcardHeatmap: uses Recharts ScatterChart with custom Rectangle
shape for weekday x hour heatmap grid cells
Both use ResponsiveContainer (no more explicit width/height props) and
rechartsTheme from the design system for consistent tooltip styling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New exports: rechartsTheme (pre-configured Recharts prop objects matching
design system styling), CHART_COLORS (series color palette), and properly
exported ChartSeries/DataPoint interfaces. No breaking changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Always use design system CSS variables for colors, never hardcode hex.
Applies to CSS modules, inline styles, and SVG fill/stroke attributes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use var(--amber) and var(--amber-bg) in SVG fill/stroke attributes
instead of hardcoded hex values. SVG presentation attributes resolve
CSS variables correctly, and this respects dark mode theme switching.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use --amber (#C6820E) and --amber-bg (#FDF6E9) from the design system
theme instead of hardcoded #D97706/#FFFBEB.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a filter processor rejects a message (filterMatched=false) or an
idempotent consumer detects a duplicate (duplicateMessage=true), the
compound container turns amber (header, border, body tint).
Also adds red pulsing rings on the failed processor badge (same SMIL
pattern as the teal hasTraceData pulse).
Backend: ProcessorNode gains filterMatched/duplicateMessage fields,
threaded from ProcessorExecution JSON path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Failed processor nodes now show expanding/fading red rings around the
error badge (same SMIL animation pattern as the teal hasTraceData pulse).
Two staggered circles expand from r=6 to r=14 over 1.5s, making failures
immediately visible in complex route diagrams.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Treemap on left (3fr), two punchcards stacked on right (2fr) using
new .vizRow grid layout. Replaces full-width stacked arrangement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced hardcoded width/height on SVG elements with viewBox + width:100%
so both components fill their parent container instead of using fixed pixels.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Treemap: rectangle area = transaction volume, color = SLA compliance
(green→red). Shows apps at L1, routes at L2. Click navigates deeper.
Punchcard heatmap: 7-day rolling weekday x 24-hour grid showing
transaction volume and error patterns. Two side-by-side views
(transactions + errors) reveal temporal clustering.
Backend: new GET /search/stats/punchcard endpoint aggregating
stats_1m_all/app by DOW x hour over rolling 7 days.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add latencyHeatmap prop to ProcessDiagram that colors nodes green→yellow→red
based on their relative contribution to route latency (pctOfRoute). Shows avg
duration label on each node. Threaded through CompoundNode for nested EIP
patterns. Heatmap is active only when no execution overlay is present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RouteGraph no longer stores a separate nodes list; getNodes() computes
from root tree. Tests now build proper tree via setRoot() + setChildren()
instead of calling setNodes().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent now sends shallow copies (without children) in the flat nodes
list. Build nodeById map by walking graph.getRoot() tree which preserves
children, falling back to flat list via putIfAbsent for compatibility.
Also adds EIP_FILTER, EIP_IDEMPOTENT_CONSUMER, EIP_RECIPIENT_LIST as
new compound container types per updated DIAGRAMS.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Progressive drill-down dashboard following RED method (Rate, Errors,
Duration) with 3 scope levels driven by sidebar selection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restores e8039f9. The compound rendering regression was caused by
the agent sending flat nodes without children, not the renderer code.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverting e8039f9 to diagnose compound rendering regression affecting
all compound types (SPLIT, CHOICE, LOOP, DO_TRY) and error handlers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow the DO_TRY pattern: virtual _CB_MAIN wrapper for main path children,
onFallback rendered as _CB_FALLBACK section with purple dashed border.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cross-root boundary check in createElkEdges() was too aggressive,
skipping all edges where source and target have different ELK roots.
Compound nodes are their own ELK roots, so valid continuation edges
from the last child inside a compound to the next sibling were lost.
Now allows edges when nodes share a common grandparent or when one
node exits/enters a compound boundary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Show resolved endpoint URI as teal italic line on diagram nodes
when execution overlay is active
- Enable drill-down for TO and TO_DYNAMIC nodes (not just DIRECT/SEDA)
- Use runtime resolvedEndpointUri from execution overlay for drill-down
when static endpointUri doesn't match
- Increase node height from 50px to 56px to accommodate the third line
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Info tab now reads processor.resolvedEndpointUri instead of hardcoded "-"
- Toolbar buttons highlight in teal/purple when trace/tap is active
- Tooltip changes to "Disable tracing" / "Edit tap" when active
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The toMap() method was missing the has_trace_data field, so it was
never indexed despite being read back in hitToSummary().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hand-drawn teardrop paths (looked like plants) with the real
lucide Footprints SVG paths. Configured = bare teal icon, data captured
= white icon in solid teal circle with staggered pulse rings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trace data visibility:
- ProcessorNode now includes hasTraceData flag computed from captured
body/headers during tree conversion
- ConfigBadge shows teal for tracing configured, green when data captured
- Search results show green footprints icon for exchanges with trace data
- New has_trace_data column on executions table (V11 migration with backfill)
- OpenSearch documents and ExecutionSummary include the flag
Inline tap configuration:
- Extracted reusable TapConfigModal component from RouteDetail
- Diagram context menu opens tap modal inline instead of navigating away
- Toggle-trace action works immediately with toast feedback
- Modal closes only on ESC, Cancel, Save, or Delete (not backdrop click)
Detail panel tab gating:
- Headers, Input, Output tabs disabled when no data is available
- Works at both exchange and processor level
- Falls back to Info tab when active tab becomes empty
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each of the ~40 node types now has a distinct, semantically meaningful
lucide icon rendered as crisp SVG paths. Compound node headers also
show their icon left-aligned in the header bar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
25 rotating cameleer-themed login subtitles picked randomly on each
page load. Also adds the camel logo SVG next to the app name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace placeholder clock favicon with cameleer camel logo SVG
- Upgrade @cameleer/design-system from v0.1.19 to v0.1.20
- Add minHeight: 0 to main element to complete flex chain for fillHeight DataTable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same issue as IngestionService — the ObjectMapper deserializing
processors_json lacked JavaTimeModule, causing Instant parsing to fail
silently and falling back to the broken flat reconstruction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ObjectMapper used to serialize the processor tree to JSON lacked
JavaTimeModule, causing Instant fields (startTime, endTime) to fail
silently — processors_json was always null.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes iteration overlay corruption caused by flat storage collapsing
duplicate processorIds across loop iterations.
Server:
- Store raw processor tree as processors_json JSONB on executions table
- Detail endpoint serves from processors_json (faithful tree), falls back
to flat record reconstruction for older executions
- V10 migration: processors_json, error categorization (errorType,
errorCategory, rootCauseType, rootCauseMessage), OTel (traceId, spanId),
circuit breaker (circuitBreakerState, fallbackTriggered), drops
erroneous splitDepth/loopDepth columns
- Add all new fields through full ingestion/storage/API chain
UI:
- Fix overlay wrapper filtering: check wrapper type before status filter
- Add new fields to schema.d.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minimap reflects execution overlay: green for completed, red for failed,
grey for skipped nodes. ENDPOINT nodes are always green when overlay is
active (route entry point, same as main diagram logic).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add resolvedEndpointUri, splitDepth, loopDepth arguments to
ProcessorRecord constructors in TreeReconstructionTest and
PostgresExecutionStoreIT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire resolvedEndpointUri through the full chain:
- V9 migration adds resolved_endpoint_uri column
- IngestionService extracts from ProcessorExecution
- PostgresExecutionStore persists and reads the column
- ProcessorNode includes field in detail API response
- UI schema updated for ProcessorNode and PositionedNode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server:
- Add endpointUri to PositionedNode (from RouteNode)
- Add fromEndpointUri to RouteSummary (catalog API)
- Catalog controller resolves endpoint URI from diagram store
UI:
- Build endpointRouteMap from catalog's fromEndpointUri field
- Drill-down uses exact match on node.endpointUri against the map
- Remove label parsing heuristics (extractTargetEndpoint, camelToKebab)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove PORT_ALIGNMENT_DEFAULT=BEGIN so NETWORK_SIMPLEX centers edges
at the vertical midpoint of the compound instead of the top.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Edges into/out of compound nodes (DO_TRY, EIP_CHOICE, etc.) now show as
traversed (green) when any descendant node was executed, instead of grey.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Left-align all sections (try_body, doFinally, doCatch) within DO_TRY
- Shrink DO_TRY height to match actual content, removing bottom padding
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use NETWORK_SIMPLEX placement for vertical centering of root flow nodes
- Skip structural edges from all compound nodes to descendants (not just DO_TRY)
- Reduce DO_TRY section spacing from NODE_SPACING*0.4 to fixed 20px
- Use SVG clipPath for node text instead of character-count truncation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Increase node width (160→220), height (40→50), spacing (90→120)
- Use SVG clipPath for text instead of character-count truncation
- Add UI sources, ESLint report, and sonar-scanner CLI to SonarQube workflow
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use Sidebar onNavigate callback instead of display:contents click interception
- Use DataTable fillHeight prop instead of manual scroll wrapper divs
- Fix DataTable scroll/pagination by adding overflow:hidden to content container
- Fix left panel in split view to use flex column instead of overflow:auto
- Make error tab stack trace scrollable for large traces
- Add nightly SonarQube workflow with manual trigger support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each exchange selection (from table or correlation chain) pushes a
browser history entry with the selected exchange in location.state.
When the user navigates away (to agent details, app scope, etc.) and
presses Back, the previous history entry is restored and the split
view with the diagram reappears exactly as they left it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Row click no longer navigates to /exchanges/:app/:route/:id which was
changing the search scope. Instead, Dashboard calls onExchangeSelect
callback and ExchangesPage manages the selected exchange as local state.
The search criteria and scope are preserved when selecting an exchange.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Always shows exchange info row: status dot, badge, ID, route, app, duration
- Correlation chain: arrow connectors between nodes, route name + duration per node
- Click on correlated exchange navigates to /exchanges/:app/:route/:exchangeId
- Compact styling with bg-raised background, proper visual hierarchy
- Horizontal scroll for long correlation chains
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New TabKpis component shows scope-aware metrics with trend arrows
aligned right in the content tab bar. Each metric shows current value
and an arrow indicating change vs previous period (green=good, red=bad).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Row click now navigates directly to the split view with diagram.
Removed: DetailPanel, inspect column, unused imports (ExternalLink,
ProcessorTimeline, RouteFlow, useExecutionDetail, useDiagramLayout,
buildFlowSegments).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Sidebar wrapper gets height:100% to fill window
- Route-scoped Exchanges uses same Dashboard table (not compact ExchangeList)
- 50:50 grid split: table on left, diagram on right when route selected
- ContentTabs gets border-bottom and surface background for visibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thin wrapper pages that conditionally render AgentHealth/AgentInstance
and RoutesMetrics/RouteDetail based on URL params for the nav redesign.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port alignment BEGIN on DO_TRY compounds makes edges attach at the top
instead of center, keeping the main flow level. Post-processing also
stretches all DO_TRY sections (doFinally, doCatch) to match the widest
section's width for visual consistency.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
setPointerCapture on the SVG redirected click/dblclick events away from
node <g> elements, breaking drill-down (double-click) and potentially
click selection. Now only capture the pointer when clicking on empty SVG
space, preserving normal event flow on nodes while keeping drag-to-pan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ELK's partitioning doesn't reliably order disconnected children within
a compound node. Instead, let ELK lay out freely then re-stack sections
in correct order (try_body → doFinally → doCatch) by adjusting Y
positions in the ELK graph before extraction. This propagates correctly
to both node and edge coordinates via getAbsoluteY().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layer constraints (FIRST/LAST) don't work for disconnected components
in ELK's layered algorithm. Replace with invisible edges that chain
try_body → doFinally → doCatch to guarantee correct top-to-bottom order.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous fix skipped ALL edges from DO_TRY nodes, which also
removed the continuation edge to the next node in the main flow
(causing LOG nodes to appear disconnected). Now checks if the target
is a descendant of the DO_TRY ELK node — only internal edges are
skipped, continuation edges to the next main flow node are kept.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ELK TB layout places children in insertion order. Now explicitly adds
DO_FINALLY before DO_CATCH so the visual order inside DO_TRY is:
try body (top) → finally → catch blocks (bottom). Also reduces
internal spacing to keep the compound more compact.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: DO_TRY compounds now use a virtual _TRY_BODY wrapper with LR
layout for the try body, while DO_CATCH/DO_FINALLY stack below as
separate sections (TB). Edges from DO_TRY are skipped like route-level
handler edges. Removes ELK-v2 debug logging.
Frontend: _TRY_BODY renders as transparent wrapper, DO_CATCH as red
tinted section, DO_FINALLY as teal section. DO_FINALLY color changed
from red to teal (completion handler, not error).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When pressing Enter in the command palette without explicitly selecting
a result (via arrow keys or mouse), the search query is now applied as
a server-side full-text filter on the Dashboard table. Explicit
selection still navigates to the exchange. Updates design system to
v0.1.18 for the new onSubmit prop.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously onPointerDown bailed out when the target was inside a node
(data-node-id), blocking pan entirely over nodes and compound groups.
Now panning always starts, and a didPan ref distinguishes drag from
click — node click handlers skip selection when the user was dragging.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds lucide-react and replaces all HTML entity and emoji icons across
the UI with proper SVG icon components. Tree-shaken — only imported
icons are bundled.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dashboard was fetching 50 results without a status filter and
filtering client-side, causing fewer matches when filtering by error
compared to route-specific pages that filter server-side. Now passes
statusFilters to the OpenSearch query. Backend supports comma-separated
status values for multi-select filters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: graph.getNodes() is a flat list with duplicates — handler
compound children appear both nested inside their parent AND as
top-level entries. The previous separation tried to filter the flat
list but missed the duplicates, leaving handler children in rootNode.
New approach: walk from graph.getRoot() following non-ERROR edges to
discover main flow nodes. Edges targeting handler compounds (ON_EXCEPTION,
ON_COMPLETION) are not followed. This cleanly separates main flow from
handler sections using the graph's own structure.
Falls back to flat list filtering (old behavior) when graph.getRoot()
is null (legacy/test graphs).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause found: RouteGraph.getNodes() is a FLAT list that includes
handler compound children (log8, setBody1, etc.) as top-level entries
alongside the main flow nodes. The handler separation only identified
the compound PARENTS (ON_EXCEPTION) but not their children, so 7
handler children leaked into rootNode as main flow nodes, causing
ELK to place the real main flow at wrong Y positions.
Fix: two-pass separation — first identify handler compounds and
collect ALL descendant IDs, then build mainNodes excluding both
handler compounds AND their descendants.
Debug logging left in temporarily for verification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The root cause of the Y-offset: ELK places main flow nodes at
arbitrary positions (e.g., y=679) within its root graph, and the
frontend rendered them at those raw positions. Handler sections were
already normalized via shiftNodes, but the main section was not.
Now useDiagramData.ts applies the same normalization to the main
section: computes bounding box, shifts nodes and edges so the section
starts at (0,0). This fixes the Y-offset regardless of what ELK
produces internally.
Removed the backend normalizePositions (was ineffective because handler
nodes at y=12 dominated the global minimum, preventing meaningful shift
of main flow nodes at y=679).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Based on thorough code review, fixes all identified issues:
1. **Y-offset root cause**: Added post-layout normalization that shifts
all positioned nodes and edges so the bounding box starts at (0,0).
ELK can place nodes at arbitrary positions within its root graph;
normalizing compensates regardless of what ELK computes internally.
2. **Bounding box**: Compute from recursively flattened node tree +
edge point bounds. Removes double-counting of compound children
(children have absolute coords, not relative to parent).
3. **SVG double-drawing**: Compound children were drawn both inside
drawCompoundContainer and again in the allNodes loop. Now collects
compound child IDs and skips them in the second pass.
4. **findNode**: Now recurses into children for nested compound lookup.
5. **colorForType**: Removed redundant double-check on EIP_TYPES.
6. **Dead code removed**: routeNodeMap/indexNodeRecursive (populated but
never read), MIN_NODE_WIDTH/CHAR_WIDTH/LABEL_PADDING (unused).
7. **Static initialization**: LayoutMetaDataProvider registration moved
from constructor to static block (runs once, not per instance).
8. **Debug logging removed**: Removed diagnostic System.out.println.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. findCommonParent: replaced with correct lowest common ancestor
algorithm using ancestor set intersection (previous version only
walked from node 'a', not a true LCA)
2. Bounding box: compute totalWidth/totalHeight from actual positioned
node coordinates instead of rootNode.getWidth/Height. The rootNode
dimensions don't account for handler sections in separate ELK roots.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Handler section nodes were positioned relative to rootNode, but they
live under separate handlerRoot ELK graphs. Using getElkRoot() to find
each node's actual root ensures correct absolute coordinates.
This combined with the POLYLINE edge routing should eliminate the
Y-offset misalignment between main flow nodes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend:
- Set POLYLINE edge routing on ELK root — eliminates curved/bent edges
between horizontally aligned nodes
- Collect edges from handler section roots (not just main root) so
internal handler edges are included in the layout output
- Use correct root reference for coordinate calculation per edge
Frontend:
- Render ALL edge points as line segments (polylines), not cubic bezier.
ELK bend points are waypoints, not bezier control points — the cubic
bezier interpretation caused false curves.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The tap button in the node toolbar now navigates to
/admin/appconfig?app=<application>&processor=<nodeId>, which
auto-selects the application in the AppConfigPage. The AppConfigPage
reads the ?app query param to open the detail panel for that app.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Handler section ELK roots were missing INCLUDE_CHILDREN, causing
edges between a handler compound and its children to fail with
UnsupportedGraphException (cross-hierarchy edge resolution).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Edges connecting main flow nodes to handler section nodes (ON_EXCEPTION,
ON_COMPLETION) now span different ELK root graphs. ELK throws
UnsupportedGraphException when an edge connects nodes in different
layout hierarchies. Skip these cross-root edges — the frontend doesn't
render them anyway (handler sections are separated visually).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Toggle tracing: "T" → 👣 (footprints — trace = following the path)
- Configure tap: ✎ (pencil) → 🚰 (water tap — tap = intercept the flow)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The flex chain from detailArea → detailPanel → tabContent lacked
min-height: 0, so flex children never shrank below content height
and overflow-y: auto never triggered. Added min-height: 0 and
flex: 1 to propagate the height constraint correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ON_EXCEPTION, ON_COMPLETION, and ERROR_HANDLER compounds were included
in the same root ELK graph as the main flow. ELK's layered algorithm
offset the main flow nodes vertically to accommodate the handler
compounds, causing bent arrows between the ENDPOINT and first processor.
Now handler sections get their own independent ELK root graphs. The
frontend already separates and repositions them, so they just need
correct internal layout — not positioning relative to the main flow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests were using the old 18-param constructor, missing the 5 new
iteration fields (loopIndex, loopSize, splitIndex, splitSize,
multicastIndex) added in V8 migration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No longer needed — the ProcessDiagram is now integrated into
ExchangeDetail via the ExecutionDiagram wrapper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added centerOnNodeId prop to ProcessDiagram. When set, the diagram
pans to center the specified node in the viewport. Jump to Error
now selects the failed processor AND centers the viewport on it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When an onException/error handler section has any executed processors
(overlay entries), it renders with a stronger red tint (8% vs 3%),
a solid red border frame, and a solid divider line. This makes it
easy to identify which handler was triggered when multiple exist.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The execution overlay data maps to the root route's processor IDs. When
drilled into a sub-route, those IDs don't match, causing all nodes to
appear dimmed. Now clears the overlay and shows pure topology when
viewing a sub-route via drill-down.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When drilled into a sub-route, the pre-fetched diagramLayout (loaded by
content hash for the root execution) doesn't contain the sub-route's
diagram. Only use the pre-loaded layout for the root route; fall back to
useDiagramByRoute for drilled-down sub-routes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Make headers tab and timeline tab scrollable when content overflows
- Replace custom <pre> code block with design system CodeBlock component
for body tabs (Input/Output) to match existing styleguide
- Add LINEAR_SEGMENTS node placement strategy to ELK layout to fix
Y-offset misalignment between nodes in left-to-right diagrams
(e.g., ENDPOINT at different Y level than subsequent processors)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Synthesize COMPLETED state for ENDPOINT nodes when overlay is active
(endpoints are route entry points, not in the processor execution tree)
- Move status badge (check/error) inside the card (top-right, below top bar)
to avoid collision with ConfigBadge (TRACE/TAP) badges
- Include ENDPOINT nodes in edge traversal check so the edge from
endpoint to first processor renders as green/traversed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
React's onWheel is passive by default, so preventDefault() doesn't stop
page scrolling. Attach native wheel listener with { passive: false } via
useEffect instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the RouteFlow-based flow view with the new ExecutionDiagram
component which provides execution overlay, iteration stepping, and
an integrated detail panel. The gantt view and all other page sections
remain unchanged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Composes ProcessDiagram with execution overlay data, exchange summary
bar, resizable splitter, and detail panel into a single root component.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the bottom detail panel with processor header bar, tab bar
(Info, Headers, Input, Output, Error, Config, Timeline), and all tab
content components. Info shows processor/exchange metadata in a grid,
Headers fetches per-processor snapshots for side-by-side display,
Input/Output render formatted code blocks, Error extracts exception
types, Config is a placeholder, and Timeline renders a Gantt chart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
useExecutionOverlay maps processor tree to overlay state map, handling
iteration filtering, sub-route failure detection, and trace data flags.
useIterationState detects compound nodes with iterated children and
manages per-compound iteration selection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a left/right stepper widget to compound node headers (LOOP, SPLIT,
MULTICAST) when iteration overlay data is present. Thread executionOverlay,
overlayActive, iterationState, and onIterationChange props through
ProcessDiagram -> CompoundNode -> children and ProcessDiagram ->
ErrorSection -> children so leaf DiagramNode instances render with
execution state (green/red badges, dimming for skipped nodes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add green solid edges for traversed paths and dashed gray for
not-traversed when execution overlay is active. Includes green
arrowhead marker and overlay threading through CompoundNode and
ErrorSection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DiagramNode now accepts executionState and overlayActive props to render
execution status: green tint + checkmark badge for completed nodes, red
tint + exclamation badge for failed nodes, dimmed opacity for skipped
nodes. Duration is shown at bottom-right, and a drill-down arrow appears
for sub-route failures.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Define the execution overlay type system (NodeExecutionState, IterationInfo,
DetailTab) and extend ProcessDiagramProps with optional overlay props. Add
diagramLayout prop so ExecutionDiagram can pass a pre-fetched layout by content
hash, bypassing the internal route-based fetch in useDiagramData.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add iteration fields (loopIndex, loopSize, splitIndex, splitSize,
multicastIndex) to ProcessorNode schema. Add new endpoint path
/executions/{executionId}/processors/by-id/{processorId}/snapshot.
Remove stale diagramNodeId field that was dropped in V6 migration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GET /executions/{id}/processors/by-id/{processorId}/snapshot endpoint
that fetches processor snapshot data by processorId instead of positional
index, which is fragile when the tree structure changes. The existing
index-based endpoint remains unchanged for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add loop_index, loop_size, split_index, split_size, multicast_index
columns to processor_executions table and thread them through the
full storage → ingestion → detail pipeline. These fields enable
execution overlay to display iteration context for loop, split,
and multicast EIPs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design for overlaying real execution data onto the ProcessDiagram:
- Node status visualization (green OK, red failed, dimmed skipped)
- Per-compound iteration stepping for loops/splits
- Tabbed detail panel (Info, Headers, Input, Output, Error, Config, Timeline)
- Jump to Error with cross-route drill-down
- Backend prerequisites for iteration fields and snapshot-by-id endpoint
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Small overview panel in the bottom-left showing the full diagram
layout with colored node rectangles and an amber viewport indicator.
Click or drag on the minimap to pan the main diagram.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update design spec with implementation notes covering recursive
compound nesting, edge z-ordering, ON_COMPLETION sections, drill-down
navigation, CSS transform zoom, and HTML overlay toolbar.
Increase SECTION_GAP to 80px for better visual separation between
completion and error handler sections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Double-click a DIRECT or SEDA node to navigate into that route's
diagram. Breadcrumbs show the route stack and allow clicking back
to any level. Escape key goes back one level.
Route ID resolution handles camelCase endpoint URIs mapping to
kebab-case route IDs (e.g. direct:callGetProduct → call-get-product)
using the catalog's known route IDs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ON_COMPLETION to backend COMPOUND_TYPES and frontend rendering.
Completion handlers render as teal-tinted sections between the main
flow and error handlers, structurally parallel to onException.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Recursive compound rendering: CompoundNode checks if children are
themselves compound types (WHEN inside CHOICE) and renders them
recursively. Added EIP_WHEN, EIP_OTHERWISE, DO_CATCH, DO_FINALLY
to frontend COMPOUND_TYPES.
- Edge z-ordering: edges are distributed to their containing compound
and rendered after the background rect, so they're not hidden behind
compound containers.
- Error section sizing: normalize error handler node coordinates to
start at (0,0), compute red tint background height from actual
content with symmetric padding for vertical centering.
- Toolbar as HTML overlay: moved from SVG foreignObject to absolute-
positioned HTML div so it stays fixed size at any zoom level. Uses
design system tokens for consistent styling.
- Zoom: replaced viewBox approach with CSS transform on content group.
Default zoom is 100% anchored top-left. Fit-to-view still available
via button.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ELK renderer:
- Add EIP_WHEN, EIP_OTHERWISE, DO_CATCH, DO_FINALLY to COMPOUND_TYPES
so branch body processors nest inside their containers
- Rewrite node creation and result extraction as recursive methods
to support compound-inside-compound (CHOICE → WHEN → processors)
- Use fixed NODE_WIDTH=160 for leaf nodes instead of variable width
Frontend:
- Fix mousewheel crash: capture getBoundingClientRect() before
setState updater (React nulls currentTarget after handler returns)
- Anchor fitToView to top-left instead of centering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The dev diagram page was calling useRouteCatalog() without time range
params (returned empty) and parsing the wrong response shape (expected
flat {application, routeId} but catalog returns {appId, routes[]}).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New interactive route diagram component with SVG rendering using
server-computed ELK layout coordinates. TIBCO BW5-inspired top-bar
card node style with zoom/pan, hover toolbars, config badges, and
error handler sections below the main flow.
Backend: add direction query parameter (LR/TB) to diagram render
endpoints, defaulting to left-to-right layout.
Frontend: 14-file ProcessDiagram component in ui/src/components/
with DiagramNode, CompoundNode, DiagramEdge, ConfigBadge, NodeToolbar,
ErrorSection, ZoomControls, and supporting hooks. Dev test page at
/dev/diagram for validation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ON_EXCEPTION and ERROR_HANDLER nodes are now treated as compound containers
in the ELK diagram renderer, nesting their children. The frontend
diagram-mapping builds separate FlowSegments for each error handler,
displayed as distinct sections in the RouteFlow component.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add attributes_text flattened field to OpenSearch indexing for both
execution and processor levels. Include in full-text search queries,
wildcard matching, and highlighting. Merge processor-level attributes
into ExecutionSummary. Add 'attribute' category to CommandPalette
(design-system 0.1.17) with per-key-value results in the search UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
POST /api/v1/search/executions is a read-only query using POST for the
request body. Skip it in AuditInterceptor to avoid flooding the audit
log with search operations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TreeReconstructionTest and PostgresExecutionStoreIT still passed the
removed diagramNodeId parameter. Missed by mvn compile (main only);
caught by mvn verify (test compilation).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align with cameleer3-common rename: logForwardingLevel → applicationLogLevel
(root logger) and new agentLogLevel (com.cameleer3 logger). Both fields
are on ApplicationConfig, pushed via config-update. UI shows "App Log Level"
and "Agent Log Level" on AppConfig slide-in, AgentHealth config bar, and
AppConfigDetailPage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Map TRACE to its own 'trace' level instead of grouping with DEBUG,
now that the design system LogViewer supports it natively.
Bump @cameleer/design-system to 0.1.16.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add TRACE option to log forwarding level dropdowns (AppConfig,
AgentHealth), badge color mapping, and log filter ButtonGroups
on all pages that display application logs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store application_name in route_diagrams at ingestion time (V7 migration),
resolve from agent registry same as ExecutionController. Move
findProcessorRouteMapping from ExecutionStore to DiagramStore using a
JSONB query that extracts node IDs directly from stored RouteGraph
definitions. This makes the mapping available as soon as diagrams are
sent, before any executions are recorded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent now uses Camel processorId as RouteNode.id, eliminating the
nodeId mapping layer. Drop diagram_node_id column (V6 migration),
remove from ProcessorRecord/ProcessorNode/IngestionService/DetailService,
add /processor-routes endpoint for processorId→routeId lookup,
simplify frontend diagram-mapping and ExchangeDetail overlays,
replace N diagram fetches in AppConfigPage with single hook.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Route column to Traces & Taps table (diagram-based mapping, pending backend fix)
- Make tap badges clickable to navigate to route's Taps tab
- Add edit/save/cancel toolbar with design system Button components
- Move Sampling Rate to last position in settings grid
- Support ?tab= URL param on RouteDetail for direct tab navigation
- Bump @cameleer/design-system to 0.1.15 (DetailPanel overlay + backdrop)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Narrowed panel from 640px to 520px so main table columns stay visible
- Settings grid uses CSS grid (3 columns) for proper wrapping
- Removed unused PanelActions component that caused white footer bar
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the separate AppConfigDetailPage route with a 640px-wide
DetailPanel that slides in when clicking a row on the App Config
overview table. All editing functionality (settings, traces & taps,
route recording) is preserved inside the panel.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Picks up LogViewer background fix (removes --bg-inset for consistent
card backgrounds).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces text "Edit"/"Del" buttons with pencil and trash can icon
buttons matching the style used elsewhere in the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the plain checkbox with the design system Toggle component
for consistency with the recording toggle on RouteDetail and
AppConfigDetailPage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moved edit pencil and save/cancel actions to sit right after the last
badge field instead of at the start or far right of the config bar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moved the pencil edit button after the badge fields and added
margin-left: auto to push it to the far right of the config bar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New Taps column shows enabled/total count as a badge (e.g. "2/3")
next to the existing Traced column.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each type option now shows a descriptive tooltip on hover explaining
its purpose: Business Object (key identifiers), Correlation (cross-route
linking), Event (business events), Custom (general purpose).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
crypto.randomUUID() requires a secure context (HTTPS). Since the server
may be accessed via HTTP, use a timestamp + random string ID instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces hardcoded dark-theme hex fallbacks with proper tokens from
tokens.css: --success-bg/--success-border/--success for success and
--error-bg/--error-border/--error for errors. Works in both themes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The active type option was invisible because --accent-primary doesn't
exist in the design system. Now uses --amber-bg/--amber-deep/--amber
from tokens.css for a clearly visible selected state matching the
brand accent palette.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updated types now include attributes on ExecutionDetail, ProcessorNode,
and ExecutionSummary from the actual API. Removed stale detail.children
fallback that no longer exists in the schema.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The frontend sends full ISO timestamps (e.g. 2026-03-19T17:55:29Z) but
the controller expected LocalDate (yyyy-MM-dd). This caused null parsing,
which threw NullPointerException in the repository WHERE clause. Changed
to accept Instant directly with sensible defaults (last 7 days).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces null placeholders with actual getAttributes() calls now that
cameleer3-common SNAPSHOT is resolved with attributes support.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests were not updated when attributes field was added to ExecutionRecord,
ProcessorRecord, ProcessorDoc, and ExecutionDocument records.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge Logging + Observability into unified "Settings" section with
flex-wrap badge grid including new compressSuccess toggle. Merge
Traced Processors with Taps into "Traces & Taps" section showing
capture mode and tap badges per processor. Add "Route Recording"
section with per-route toggles sourced from route catalog. All new
fields (compressSuccess, routeRecording) included in form state
and save payload.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace taps tab placeholder with full DataTable showing all route taps
- Add columns: attribute, processor, expression, language, target, type, enabled toggle, actions
- Add tap modal with form fields: attribute name, processor select, language, target, expression, type selector
- Implement inline enable/disable toggle per tap row
- Add ConfirmDialog for tap deletion
- Add test expression section with Recent Exchange and Custom Payload tabs
- Add save/edit/delete tap operations via application config update
- Add all supporting CSS module classes (no inline styles)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Toggle for route recording on/off in the route header
- Fetch application config to determine recording state and route taps
- Add Active Taps KPI card showing enabled/total tap counts
- Add Taps tab to the tabbed section with placeholder content
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows up to 2 attribute badges (color="auto") per row with a +N overflow
indicator; empty rows render a muted dash. Uses CSS module classes only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a Replay button in the exchange header that opens a modal allowing
users to re-send the exchange to a live agent. The modal pre-populates
headers and body from the original exchange input, provides an agent
selector filtered to live agents for the application, and supports
editable header key-value rows with add/remove.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show route-level attributes as Badge strips in the exchange header
card, and per-processor attributes above the message IN/OUT panels.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add optional `attributes?: Record<string, string>` to ExecutionSummary,
ExecutionDetail, and ProcessorNode in the manually-maintained OpenAPI
schema to reflect the new backend attributes support.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds CompletableFuture-based request-reply mechanism for commands that
need synchronous results. CommandReply record in core, pendingReplies
map in AgentRegistryService, test-expression endpoint on config controller
with 5s timeout. CommandAckRequest extended with optional data field.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DetailService deserializes attributes JSON from ExecutionRecord/ProcessorRecord and
passes them to ExecutionDetail and ProcessorNode constructors. ExecutionDocument and
ProcessorDoc carry attributes as a JSON string. SearchIndexer passes attributes when
building documents. OpenSearchIndex includes attributes in indexed maps and
deserializes them when constructing ExecutionSummary from search hits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IngestionService passes attributes (currently null, pending cameleer3-common update)
to ExecutionRecord and ProcessorRecord. PostgresExecutionStore includes the
attributes column in INSERT and ON CONFLICT UPDATE (with COALESCE), and reads
it back in both row mappers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Map<String,String> attributes to ExecutionRecord, ProcessorRecord,
ExecutionDetail, ProcessorNode, and ExecutionSummary. ExecutionStore records
carry attributes as a JSON string; detail/summary models carry deserialized maps.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers all 5 new agent features: tap management on RouteDetail, business
attributes display on ExchangeDetail/Dashboard, enhanced replay with
editable payload, per-route recording toggles, and success compression.
Includes backend prerequisites, RBAC matrix, and TypeScript interfaces.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add prefix query parameter to /admin/opensearch/indices endpoint so
the UI can fetch execution and log indices separately. OpenSearch admin
page now shows two card sections: Execution Indices and Log Indices,
each with doc count and size summary. Page restyled with CSS module
replacing inline styles. Delete endpoint also allows log index deletion.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Click a row in the admin App Config table to navigate to a dedicated
detail page at /admin/appconfig/:appId. Shows all config fields as
badges in view mode; pencil toggles to edit mode with dropdowns.
Traced processors are now editable (capture mode dropdown + remove
button per processor). Sections and header use card styling for
visual contrast. OidcConfigPage gets the same card treatment.
List page simplified to read-only badge overview with row click
navigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update design system to v0.1.13 where both components scroll to the
top (newest entries) instead of the bottom, matching the descending
sort order used across the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pencil icon and Save/Cancel buttons now appear at the left side of
the AgentHealth config bar, matching the admin overview table where
the edit column is at the start of each row.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace subtle Unicode checkmark/X with proper labeled buttons styled
as primary (Save) and secondary (Cancel) for better visibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Settings (log level, engine level, payload capture, metrics) now
display as color-coded badges by default. Clicking the pencil icon
enters edit mode where badges become dropdowns. Save (checkmark)
persists changes and reverts to badge view; cancel discards changes.
Applied consistently on both the admin App Config page and the
AgentHealth config bar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DataTable requires rows with an { id: string } constraint. Map
ApplicationConfig to ConfigRow adding id from the application field.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add admin page at /admin/appconfig with a DataTable showing all
application configurations. Inline dropdowns allow editing log level,
engine level, payload capture mode, and metrics toggle directly from
the table. Changes push to agents via SSE immediately.
Also adds a config bar on the AgentHealth page (/agents/:appId) for
per-application config management with the same 4 settings.
Backend: GET /api/v1/config list endpoint, findAll() on repository,
sensible defaults for logForwardingLevel/engineLevel/payloadCaptureMode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /routes/catalog endpoint now accepts optional from/to query
parameters instead of hardcoding a 24h window. The UI passes the
global filter time range so sidebar counts match what the user sees.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a dedicated inspect button column (↗) to navigate to the agent
instance page, consistent with the exchange inspect pattern on the
Dashboard. Row click still opens the detail slide-in panel.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove duplicate in-page breadcrumbs (ExchangeDetail, AgentHealth scope
trail) and improve the global TopBar breadcrumb with semantic labels and
a context-based override for pages with richer navigation data.
- Add BreadcrumbProvider from design system v0.1.12
- LayoutShell: label map prettifies URL segments (apps→Applications, etc.)
- ExchangeDetail: uses useBreadcrumb() to set semantic trail via context
- AgentHealth: remove scope trail, keep live-count badge standalone
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip global time range in the logs query key when filtering by
exchangeId (exchange logs are historical, the sliding time window is
irrelevant). Add placeholderData to keep previous results visible
during query key transitions on other pages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defensive: use .keyword on the top-level exchangeId field too, in
case indices were created before the explicit keyword mapping was
added to the template.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dynamically mapped string fields in OpenSearch are multi-field
(text + keyword). Term queries require the .keyword sub-field for
exact matching.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Existing log records only have exchangeId inside the mdc object, not
as a top-level indexed field. Use a bool should clause to match on
either exchangeId (new records) or mdc.camel.exchangeId (old records).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Index exchangeId from Camel MDC (camel.exchangeId) as a top-level
keyword field in OpenSearch log indices. Add exchangeId filter to
the log query API and frontend hook. Show a LogViewer on the
ExchangeDetail page filtered to that exchange's logs, with search
input and level filter pills.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The GlobalFilterProvider now recomputes the preset time range every
10s when auto-refresh is on, so timeRange.end stays fresh instead of
being frozen at the moment the preset was clicked.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the same log + timeline side-by-side layout from AgentInstance to
the AgentHealth page (/agents/{appId}). Includes search input, level
filter pills, sort toggle, and refresh button — matching the instance
page design exactly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of calling refetch() with stale time params, the refresh
buttons now set a toOverride state to new Date().toISOString(). This
flows into the query key, triggering a fresh fetch with the current
time as the upper bound. Both useApplicationLogs and useAgentEvents
hooks accept an optional toOverride parameter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CI build runs tsc --noEmit which failed because the ExecutionDetail
type in schema.d.ts was missing the new inputBody/outputBody/inputHeaders/
outputHeaders fields added to the backend DTO.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add inputBody/outputBody/inputHeaders/outputHeaders to ExecutionDetail
DTO so exchange-level bodies are returned by the detail endpoint. Show
"Exchange Input" and "Exchange Output" panels on the detail page when
the data is available.
Fix RouteFlow node click selecting the wrong processor snapshot by
building a flowToTreeIndex mapping that correctly translates flow
display index → diagram node index → processorId → processor tree
index. Previously the diagram node index was used directly as the
processor tree index, which broke when the two orderings differed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove auto-scroll override hack. Add sort order toggle (asc/desc
by time) and manual refresh button to both the application log and
agent events timeline panels on AgentInstance and AgentHealth pages.
Default is descending (newest first); toggling reverses the array.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Give logCard the same max-height and flex layout as timelineCard so
both columns are equal height. Revert .toReversed() so events stay
in DESC order (newest at top). Override EventFeed's auto-scroll-to-
bottom with a requestAnimationFrame that resets scrollTop to 0 after
mount, keeping newest entries visible at the top of both panels.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GET /api/v1/logs endpoint to query application logs stored in
OpenSearch with filters for application, agent, level, time range,
and text search. Wire up the AgentInstance LogViewer with real data
and an EventFeed-style toolbar (search input + level filter pills).
Fix agent events timeline autoscroll by reversing the DESC-ordered
events so newest entries appear at the bottom where EventFeed
autoscrolls to.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix admin OpenSearch page always showing "Disconnected" by aligning
frontend field names (reachable/nodeCount/host) with backend DTO.
Update design system to v0.1.10 and adopt the new multi-flow RouteFlow
API — error-handler nodes now render as labeled segments with error
variant instead of relying on legacy auto-separation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same issue as the CI build — Docker layer cache can serve a stale
cameleer3-common SNAPSHOT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Maven cache can serve stale cameleer3-common SNAPSHOTs. The -U flag
forces Maven to check the remote registry for updated SNAPSHOTs on
every build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LogIndexService in server-core imported LogEntry from cameleer3-common,
but the SNAPSHOT on the registry may not have it yet when the server CI
runs. Moved the dependency to server-app where both the controller and
OpenSearch implementation live.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents can now send application log entries in batches via POST /api/v1/data/logs.
Logs are indexed directly into OpenSearch daily indices (logs-{yyyy-MM-dd}) using
the bulk API. Index template defines explicit mappings for full-text search readiness.
New DTOs (LogEntry, LogBatch) added to cameleer3-common in the agent repo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The findByApplication query only read config_val JSONB, ignoring the
version and updated_at SQL columns. The JSON blob contained version 0
from the original save, so agents saw no config and fell back to defaults.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LIVE: sidebar clicks trigger initial fetch + polling for the new route.
PAUSED: sidebar clicks navigate but queries are disabled — no fetches
until the user switches back to LIVE.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The npm install @cameleer/design-system@dev was in the same cached layer
as npm ci, so Docker never re-ran it when the registry had a new version.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add highlight field to ExecutionSummary record
- Request highlight fragments from OpenSearch when full-text search is active
- Pass matchContext to command palette for display
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add missing onOpen prop to CommandPalette (fixes Ctrl+K/Cmd+K)
- Wire server-side exchange search with debounced text query
- Use design system dev snapshot from Gitea registry in CI builds
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add application_config table (V4 migration), repository, and REST
controller. GET /api/v1/config/{app} returns config, PUT saves and
pushes CONFIG_UPDATE to all LIVE agents via SSE. UI tracing toggle
now uses config API instead of direct SET_TRACED_PROCESSORS command.
Tracing store syncs with server config on load.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update design system to 0.1.8 and pass NodeBadge[] to both
ProcessorTimeline and RouteFlow. Traced processors display a
blue "TRACED" badge that updates reactively via Zustand store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use diagram node ID as fallback processorId when no processor
execution match exists (e.g. error handlers that didn't trigger).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Payload now sends {processors: {id: "BOTH"}} map instead of
{routeId, processorIds[]} array. Tracing state keyed by application
name (global, not per-route) matching agent behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire getActions on ProcessorTimeline and RouteFlow to send
SET_TRACED_PROCESSORS commands to all agents of the same application.
Tracing state managed via Zustand store with optimistic UI and rollback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change Vite proxy pattern from /api to /api/ so /api-docs client
route is not captured and proxied to the backend
- Fix SwaggerUIBundle init: remove empty presets/layout overrides that
crashed the internal persistConfigs function
- Use correct CSS import (swagger-ui.css instead of index.css)
- Add requestInterceptor to auto-attach JWT token to Try-it-out calls
- Add swagger-ui-bundle to optimizeDeps.include for reliable loading
- Remove unused swagger-ui-dist.d.ts type declarations
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add logging to MetricsController: warn on parse failures, debug on
received metrics, buffer depth on 503
- Add GET /api/v1/admin/database/metrics-pipeline diagnostic endpoint
(buffer depth, row count, distinct agents/metrics, latest timestamp)
- Fix BackpressureIT test JSON to match actual MetricsSnapshot schema
(collectedAt/metricName/metricValue instead of timestamp/metrics)
- Upgrade cameleer3-common from 1.0-SNAPSHOT to 0.0.3 (adds engineLevel)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Indeterminate progress bars were misleading when agents don't report
JVM metrics — replaced with plain "N/A" text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lock file had "resolved": "../../design-system" from a local
install, causing npm ci in CI to silently skip the package.
Reinstalled from registry to fix the resolved URL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Build UI step ran npm ci without authenticating to the Gitea npm
registry, causing @cameleer/design-system to fail to resolve. Add
REGISTRY_TOKEN to .npmrc before npm ci.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DetailPanel now portals itself to #cameleer-detail-panel-root (a div
AppShell places as a sibling of .main in the top-level flex row).
Pages just render <DetailPanel> inline — no manual createPortal,
no context, no prop drilling.
Remove the old #detail-panel-portal div from LayoutShell and the
createPortal wrappers from Dashboard and AgentHealth.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When no OIDC config exists, the backend returns an object with all
null fields (via OidcAdminConfigResponse.unconfigured()). Normalize
all null values to sensible defaults when loading the form instead
of passing nulls through to Input components and .map() calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The API returns defaultRoles as null when no roles are configured.
Add null guards on all defaultRoles accesses to prevent .map() crash.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous approach used useEffect+context to hoist DetailPanel
content to the AppShell level, but the dependency-free useEffect
caused a re-render loop that broke sidebar navigation.
Replace with createPortal: pages render DetailPanel inline in their
JSX but portal it to a target div (#detail-panel-portal) at the
AppShell level. No state lifting, no re-render loops.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DetailPanel is a flex sibling that slides in from the right — it must
be rendered at the AppShell level via the detail prop, not inside the
page content. Add DetailPanelContext so pages can push their panel
content up to LayoutShell, which passes it to AppShell.detail.
Applied to Dashboard (exchange detail) and AgentHealth (instance detail).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upgrade @cameleer/design-system to ^0.1.3 which adds LIVE/PAUSED
toggle to TopBar backed by autoRefresh state in GlobalFilterProvider.
Add useRefreshInterval() hook that returns the polling interval when
auto-refresh is on, or false when paused. Wire it into all query
hooks that use refetchInterval (executions, catalog, agents, metrics,
admin database/opensearch).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenSearch dynamically maps string fields as text with a .keyword
subfield. Sorting on text fields throws an error; only .keyword,
date, and numeric fields support sorting. Add .keyword suffix to
all string sort columns (status, routeId, agentId, executionId,
correlationId, applicationName) while keeping start_time and
duration_ms as-is.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add application_name filter to OpenSearch query builder — sidebar
app selection now correctly filters the exchange list. The
application field was being resolved to agentIds in the controller
but never applied as a query filter in OpenSearch.
Also restore snake_case sort column mapping since the OpenSearch
toMap() serializer uses snake_case field names (start_time, route_id,
etc.), not camelCase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add executionId and applicationName to allowed sort fields. Fix sort
column mapping to use camelCase field names matching the OpenSearch
ExecutionDocument fields instead of snake_case DB column names. This
was causing sorts on most columns to either silently fall back to
startTime or return empty results from OpenSearch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upgrade @cameleer/design-system to v0.1.1 which adds onSortChange
callback to DataTable. Wire it up in Dashboard (exchanges), AuditLog,
and RouteDetail (recent executions) so sorting triggers a new API
request with sortField/sortDir instead of only sorting the current page.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline styles with CSS module matching the design system's
LoginForm visual patterns. Uses proper DS class structure (divider,
social section, form fields) while keeping username-based auth
instead of the DS component's email validation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate all page components from the @cameleer/design-system v0.0.3
example UI, replacing mock data with real backend API hooks. This brings
richer visuals (KpiStrip, GroupCard, RouteFlow, ProcessorTimeline,
DateRangePicker, expandable rows) while preserving all existing API
integration, auth, and routing infrastructure.
Pages migrated: Dashboard, RoutesMetrics, RouteDetail, ExchangeDetail,
AgentHealth, AgentInstance, OidcConfig, AuditLog, RBAC (Users/Groups/Roles).
Also enhanced LayoutShell CommandPalette with real search data from catalog.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add engine_level, input_body, output_body, input_headers, output_headers
to the executions INSERT/SELECT/UPSERT and row mapper. Required for
REGULAR mode where route-level payloads exist but no processor records.
Note: requires ALTER TABLE migration to add the new columns.
Extract inputBody/outputBody/inputHeaders/outputHeaders from RouteExecution
snapshots and pass to ExecutionRecord. Maps engineLevel field. Critical for
REGULAR mode where no processor records exist but route-level payloads do.
Adds engineLevel (NONE/MINIMAL/REGULAR/COMPLETE) and inputBody/outputBody/
inputHeaders/outputHeaders to ExecutionRecord so REGULAR mode route-level
payloads are persisted (previously only processor-level records had payloads).
- Add @RequestBody(required=false) CommandAckRequest to ack endpoint for
receiving agent command results (backward compat with old agents)
- Record command results in agent event log via AgentEventService
- Add set-traced-processors to mapCommandType switch
- Inject AgentEventService dependency
-`OidcConfigAdminController` — GET/POST /api/v1/admin/oidc, POST /test
-`SensitiveKeysAdminController` — GET/PUT /api/v1/admin/sensitive-keys. GET returns 200 with config or 204 if not configured. PUT accepts `{ keys: [...] }` with optional `?pushToAgents=true` to fan out merged keys to all LIVE agents. Stored in `server_config` table (key `sensitive_keys`).
-`AuditLogController` — GET /api/v1/admin/audit
-`MetricsController` — GET /api/v1/metrics, GET /timeseries
-`DiagramController` — GET /api/v1/diagrams/{id}, POST /
-`DiagramRenderController` — POST /api/v1/diagrams/render (ELK layout)
-`ClaimMappingAdminController` — CRUD /api/v1/admin/claim-mappings, POST /test (accepts inline rules + claims for preview without saving)
-`TraefikLabelBuilder` — generates Traefik Docker labels for path-based or subdomain routing. Also emits `cameleer.replica` and `cameleer.instance-id` labels per container for labels-first identity.
-`PrometheusLabelBuilder` — generates Prometheus Docker labels (`prometheus.scrape/path/port`) per runtime type for `docker_sd_configs` auto-discovery
-`ContainerLogForwarder` — streams Docker container stdout/stderr to ClickHouse with `source='container'`. One follow-stream thread per container, batches lines every 2s/50 lines via `ClickHouseLogStore.insertBufferedBatch()`. 60-second max capture timeout.
-`DisabledRuntimeOrchestrator` — no-op when runtime not enabled
## metrics/ — Prometheus observability
-`ServerMetrics` — centralized business metrics: gauges (agents by state, SSE connections, buffer depths), counters (ingestion drops, agent transitions, deployment outcomes, auth failures), timers (flush duration, deployment duration). Exposed via `/api/v1/prometheus`.
- Docker: multi-stage build (`Dockerfile`), `$BUILDPLATFORM` for native Maven on ARM64 runner, amd64 runtime. `docker-entrypoint.sh` imports `/certs/ca.pem` into JVM truststore before starting the app (supports custom CAs for OIDC discovery without `CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY`).
-`REGISTRY_TOKEN` build arg required for `cameleer-common` dependency resolution
- Secrets managed in CI deploy step (idempotent `--dry-run=client | kubectl apply`): `cameleer-auth`, `cameleer-postgres-credentials`, `cameleer-clickhouse-credentials`
- K8s probes: server uses `/api/v1/health`, PostgreSQL uses `pg_isready -U "$POSTGRES_USER"` (env var, not hardcoded)
- K8s security: server and database pods run with `securityContext.runAsNonRoot`. UI (nginx) runs without securityContext (needs root for entrypoint setup).
- Docker: server Dockerfile has no default credentials — all DB config comes from env vars at runtime
-`SensitiveKeysMerger` — pure function: merge(global, perApp) -> union with case-insensitive dedup, preserves first-seen casing. Returns null when both inputs null.
-`AppSettings`, `AppSettingsRepository` — per-app settings config and persistence
-`ThresholdConfig`, `ThresholdRepository` — alerting threshold config and persistence
-`AuditService` — audit logging facade
-`AuditRecord`, `AuditResult`, `AuditCategory`, `AuditRepository` — audit trail records and persistence
When deployed via the cameleer-saas platform, this server orchestrates customer app containers using Docker. Key components:
- **ConfigMerger** (`core/runtime/ConfigMerger.java`) — pure function: resolve(globalDefaults, envConfig, appConfig) -> ResolvedContainerConfig. Three-layer merge: global (application.yml) -> environment (defaultContainerConfig JSONB) -> app (containerConfig JSONB). Includes `runtimeType` (default `"auto"`) and `customArgs` (default `""`).
- **TraefikLabelBuilder** (`app/runtime/TraefikLabelBuilder.java`) — generates Traefik Docker labels for path-based (`/{envSlug}/{appSlug}/`) or subdomain-based (`{appSlug}-{envSlug}.{domain}`) routing. Supports strip-prefix and SSL offloading toggles. Also sets per-replica identity labels: `cameleer.replica` (index) and `cameleer.instance-id` (`{envSlug}-{appSlug}-{replicaIndex}`). Internal processing uses labels (not container name parsing) for extensibility.
- **PrometheusLabelBuilder** (`app/runtime/PrometheusLabelBuilder.java`) — generates Prometheus `docker_sd_configs` labels per resolved runtime type: Spring Boot `/actuator/prometheus:8081`, Quarkus/native `/q/metrics:9000`, plain Java `/metrics:9464`. Labels merged into container metadata alongside Traefik labels at deploy time.
- **DockerNetworkManager** (`app/runtime/DockerNetworkManager.java`) — manages two Docker network tiers:
-`cameleer-traefik` — shared network; Traefik, server, and all app containers attach here. Server joined via docker-compose with `cameleer-server` DNS alias.
-`cameleer-env-{slug}` — per-environment isolated network; containers in the same environment discover each other via Docker DNS. In SaaS mode, env networks are tenant-scoped: `cameleer-env-{tenantId}-{envSlug}` (overloaded `envNetworkName(tenantId, envSlug)` method) to prevent cross-tenant collisions when multiple tenants have identically-named environments.
- **DockerEventMonitor** (`app/runtime/DockerEventMonitor.java`) — persistent Docker event stream listener for containers with `managed-by=cameleer-server` label. Detects die/oom/start/stop events and updates deployment replica states. Periodic reconciliation (@Scheduled every 30s) inspects actual container state and corrects deployment status mismatches (fixes stale DEGRADED with all replicas healthy).
- **ContainerLogForwarder** (`app/runtime/ContainerLogForwarder.java`) — streams Docker container stdout/stderr to ClickHouse `logs` table with `source='container'`. Uses `docker logs --follow` per container, batches lines every 2s or 50 lines. Parses Docker timestamp prefix, infers log level via regex. `DeploymentExecutor` starts capture after each replica launches with the replica's `instanceId` (`{envSlug}-{appSlug}-{replicaIndex}`); `DockerEventMonitor` stops capture on die/oom. 60-second max capture timeout with 30s cleanup scheduler. Thread pool of 10 daemon threads. Container logs use the same `instanceId` as the agent (set via `CAMELEER_AGENT_INSTANCEID` env var) for unified log correlation at the instance level.
- **StartupLogPanel** (`ui/src/components/StartupLogPanel.tsx`) — collapsible log panel rendered below `DeploymentProgress`. Queries `/api/v1/logs?source=container&application={appSlug}&environment={envSlug}`. Auto-polls every 3s while deployment is STARTING; shows green "live" badge during polling, red "stopped" badge on FAILED. Uses `useStartupLogs` hook and `LogViewer` (design system).
## DeploymentExecutor Details
Primary network for app containers is set via `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` env var (in SaaS mode: `cameleer-tenant-{slug}`); apps also connect to `cameleer-traefik` (routing) and `cameleer-env-{tenantId}-{envSlug}` (per-environment discovery) as additional networks. Resolves `runtimeType: auto` to concrete type from `AppVersion.detectedRuntimeType` at PRE_FLIGHT (fails deployment if unresolvable). Builds Docker entrypoint per runtime type (all JVM types use `-javaagent:/app/agent.jar -jar`, plain Java uses `-cp` with main class, native runs binary directly). Sets per-replica `CAMELEER_AGENT_INSTANCEID` env var to `{envSlug}-{appSlug}-{replicaIndex}` so container logs and agent logs share the same instance identity. Sets `CAMELEER_AGENT_*` env vars from `ResolvedContainerConfig` (routeControlEnabled, replayEnabled, health port). These are startup-only agent properties — changing them requires redeployment.
## Deployment Status Model
| Status | Meaning |
|--------|---------|
| `STOPPED` | Intentionally stopped or initial state |
| `STARTING` | Deploy in progress |
| `RUNNING` | All replicas healthy and serving |
| `DEGRADED` | Some replicas healthy, some dead |
| `STOPPING` | Graceful shutdown in progress |
| `FAILED` | Terminal failure (pre-flight, health check, or crash) |
**Replica support**: deployments can specify a replica count. `DEGRADED` is used when at least one but not all replicas are healthy.
**Blue/green strategy**: when re-deploying, new replicas are started and health-checked before old ones are stopped, minimising downtime.
**Deployment uniqueness**: `DeploymentService.createDeployment()` deletes any STOPPED/FAILED deployments for the same app+environment before creating a new one, preventing duplicate rows.
## JAR Management
- **Retention policy** per environment: configurable maximum number of JAR versions to keep. Older JARs are deleted automatically.
- **Nightly cleanup job** (`JarRetentionJob`, Spring `@Scheduled` 03:00): purges JARs exceeding the retention limit and removes orphaned files not referenced by any app version. Skips versions currently deployed.
- **Volume-based JAR mounting** for Docker-in-Docker setups: set `CAMELEER_SERVER_RUNTIME_JARDOCKERVOLUME` to the Docker volume name that contains the JAR storage directory. When set, the orchestrator mounts this volume into the container instead of bind-mounting the host path (required when the SaaS container itself runs inside Docker and the host path is not accessible from sibling containers).
## Runtime Type Detection
The server detects the app framework from uploaded JARs and builds Docker entrypoints. The agent shaded JAR bundles the log appender, so no separate `cameleer-log-appender.jar` or `PropertiesLauncher` is needed:
- **Detection** (`RuntimeDetector`): runs at JAR upload time. Checks ZIP magic bytes (non-ZIP = native binary), then probes `META-INF/MANIFEST.MF` Main-Class: Spring Boot loader prefix -> `spring-boot`, Quarkus entry point -> `quarkus`, other Main-Class -> `plain-java` (extracts class name). Results stored on `AppVersion` (`detected_runtime_type`, `detected_main_class`).
- **Runtime types** (`RuntimeType` enum): `AUTO`, `SPRING_BOOT`, `QUARKUS`, `PLAIN_JAVA`, `NATIVE`. Configurable per app/environment via `containerConfig.runtimeType` (default `"auto"`).
- **Entrypoint per type**: All JVM types use `java -javaagent:/app/agent.jar -jar app.jar`. Plain Java uses `-cp` with explicit main class instead of `-jar`. Native runs the binary directly.
- **Custom arguments** (`containerConfig.customArgs`): freeform string appended to the start command. Validated against a strict pattern to prevent shell injection (entrypoint uses `sh -c`).
- **AUTO resolution**: at deploy time (PRE_FLIGHT), `"auto"` resolves to the detected type from `AppVersion`. Fails deployment if detection was unsuccessful — user must set type explicitly.
- **UI**: Resources tab shows Runtime Type dropdown (with detection hint from latest uploaded version) and Custom Arguments text field.
## SaaS Multi-Tenant Network Isolation
In SaaS mode, each tenant's server and its deployed apps are isolated at the Docker network level:
- **Tenant network** (`cameleer-tenant-{slug}`) — primary internal bridge for all of a tenant's containers. Set as `CAMELEER_SERVER_RUNTIME_DOCKERNETWORK` for the tenant's server instance. Tenant A's apps cannot reach tenant B's apps.
- **Shared services network** — server also connects to the shared infrastructure network (PostgreSQL, ClickHouse, Logto) and `cameleer-traefik` for HTTP routing.
- **Tenant-scoped environment networks** (`cameleer-env-{tenantId}-{envSlug}`) — per-environment discovery is scoped per tenant, so `alpha-corp`'s "dev" environment network is separate from `beta-corp`'s "dev" environment network.
## nginx / Reverse Proxy
-`client_max_body_size 200m` is required in the nginx config to allow JAR uploads up to 200 MB. Without this, large JAR uploads return 413.
This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
## Always Do
- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
## When Debugging
1.`gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
2.`gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
3.`READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
## When Refactoring
- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
## Never Do
- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
Before completing any code modification task, verify:
1.`gitnexus_impact` was run for all modified symbols
2. No HIGH/CRITICAL risk warnings were ignored
3.`gitnexus_detect_changes()` confirms changes match expected scope
4. All d=1 (WILL BREAK) dependents were updated
## Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
```bash
npx gitnexus analyze
```
If the index previously included embeddings, preserve them by adding `--embeddings`:
```bash
npx gitnexus analyze --embeddings
```
To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
## CLI
| Task | Read this skill file |
|------|---------------------|
| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
Server exposes `/api/v1/prometheus` (unauthenticated, Prometheus text format). Spring Boot Actuator provides JVM, GC, thread pool, and `http.server.requests` metrics automatically. Business metrics via `ServerMetrics` component:
All containers also get `prometheus.scrape=true`. These labels enable Prometheus `docker_sd_configs` auto-discovery.
## Agent Metric Names (Micrometer)
Agents send `MetricsSnapshot` records with Micrometer-convention metric names. The server stores them generically (ClickHouse `agent_metrics.metric_name`). The UI references specific names in `AgentInstance.tsx` for JVM charts.
### JVM metrics (used by UI)
| Metric name | UI usage |
|---|---|
| `process.cpu.usage.value` | CPU % stat card + chart |
Mean processing time = `camel.route.policy.total_time / camel.route.policy.count`. Min processing time is not available (Micrometer does not track minimums).
-`ui/src/components/StartupLogPanel.tsx` — deployment startup log viewer (container logs from ClickHouse, polls 3s while STARTING)
-`ui/src/api/queries/logs.ts` — `useStartupLogs` hook for container startup log polling, `useLogs`/`useApplicationLogs` for general log search
## UI Styling
- Always use `@cameleer/design-system` CSS variables for colors (`var(--amber)`, `var(--error)`, `var(--success)`, etc.) — never hardcode hex values. This applies to CSS modules, inline styles, and SVG `fill`/`stroke` attributes. SVG presentation attributes resolve `var()` correctly. All colors use CSS variables (no hardcoded hex).
- Shared CSS modules in `ui/src/styles/` (table-section, log-panel, rate-colors, refresh-indicator, chart-card, section-card) — import these instead of duplicating patterns.
- Design system components used consistently: `Select`, `Tabs`, `Toggle`, `Button`, `LogViewer`, `Label` — prefer DS components over raw HTML elements. `LogViewer` renders optional source badges (`container`, `app`, `agent`) via `LogEntry.source` field (DS v0.1.49+).
- Environment slugs are auto-computed from display name (read-only in UI).
- Brand assets: `@cameleer/design-system/assets/` provides `camel-logo.svg` (currentColor), `cameleer-{16,32,48,192,512}.png`, and `cameleer-logo.png`. Copied to `ui/public/` for use as favicon (`favicon-16.png`, `favicon-32.png`) and logo (`camel-logo.svg` — login dialog 36px, sidebar 28x24px).
- Sidebar generates `/exchanges/` paths directly (no legacy `/apps/` redirects). basePath is centralized in `ui/src/config.ts`; router.tsx imports it instead of re-reading `<base>` tag.
- Global user preferences (environment selection) use Zustand stores with localStorage persistence — never URL search params. URL params are for page-specific state only (e.g. `?text=` search query). Switching environment resets all filters and remounts pages.
An observability server that receives, stores, and serves Apache Camel route execution data from distributed Cameleer3 agents. Think njams Server (by Integration Matters) — but built incrementally, API-first, with a modern stack. Users can search through millions of recorded transactions by state, time, duration, full text, and correlate executions across multiple Camel instances. The server also pushes configuration, tracing controls, and ad-hoc commands to agents via SSE.
An observability server that receives, stores, and serves Apache Camel route execution data from distributed Cameleer agents. Think njams Server (by Integration Matters) — but built incrementally, API-first, with a modern stack. Users can search through millions of recorded transactions by state, time, duration, full text, and correlate executions across multiple Camel instances. The server also pushes configuration, tracing controls, and ad-hoc commands to agents via SSE.
## Core Value
@@ -16,7 +16,7 @@ Users can reliably search and find any transaction across all connected Camel in
### Active
- [ ] Receive and ingest transaction/activity data from Cameleer3 agents via HTTP POST
- [ ] Receive and ingest transaction/activity data from Cameleer agents via HTTP POST
- [ ] Store transactions in a high-volume, horizontally scalable data store with 30-day retention
- [ ] Search transactions by state, execution date/time, duration, and full-text content
- [ ] Correlate activities across multiple routes and Camel instances within a single transaction
@@ -38,8 +38,8 @@ Users can reliably search and find any transaction across all connected Camel in
## Context
- **Agent side**: cameleer3 agent (`https://gitea.siegeln.net/cameleer/cameleer3`) is under active development; already supports creating diagrams and capturing executions
- **Shared library**: `com.cameleer3:cameleer3-common` contains shared models and the graph API; protocol defined in `cameleer3-common/PROTOCOL.md`
- **Agent side**: cameleer agent (`https://gitea.siegeln.net/cameleer/cameleer`) is under active development; already supports creating diagrams and capturing executions
- **Shared library**: `com.cameleer:cameleer-common` contains shared models and the graph API; protocol defined in `cameleer-common/PROTOCOL.md`
- **Data model**: Hierarchical — a **transaction** represents a message's full journey, containing **activities** per route execution. Transactions can span multiple Camel instances (e.g., route A calls route B on another instance via endpoint)
- **Scale target**: Millions of transactions per day, 50+ connected agents, 30-day data retention
- **Query pattern**: Incident-driven — mostly recent data queries, with deep historical dives during incidents
@@ -49,7 +49,7 @@ Users can reliably search and find any transaction across all connected Camel in
## Constraints
- **Tech stack**: Java 17+, Spring Boot 3.4.3, Maven multi-module — already established
- **Dependency**: Must consume `com.cameleer3:cameleer3-common` from Gitea Maven registry
- **Dependency**: Must consume `com.cameleer:cameleer-common` from Gitea Maven registry
- **Protocol**: Agent protocol is still evolving — server must adapt as it stabilizes
- **Incremental delivery**: Build step by step; storage and search first, then layer features on top
**Core Value:** Users can reliably search and find any transaction across all connected Camel instances — by any combination of state, time, duration, or content — even at millions of transactions per day with 30-day retention.
- MetricsRepository: void insertBatch(List<MetricsSnapshot> metrics) -- use a generic type or the cameleer3-common metrics model if available; if not, create a simple MetricsData record in core module
- MetricsRepository: void insertBatch(List<MetricsSnapshot> metrics) -- use a generic type or the cameleer-common metrics model if available; if not, create a simple MetricsData record in core module
4. Create IngestionConfig as @ConfigurationProperties("ingestion"):
- bufferCapacity (int, default 50000)
@@ -193,7 +193,7 @@ Output: Working ClickHouse via Docker Compose, DDL with TTL, WriteBuffer with un
- No custom bean needed if relying on auto-config; only create if explicit JdbcTemplate customization required
</action>
<verify>
<automated>mvn test -pl cameleer3-server-core -Dtest=WriteBufferTest -q 2>&1 | tail -10</automated>
<automated>mvn test -pl cameleer-server-core -Dtest=WriteBufferTest -q 2>&1 | tail -10</automated>
</verify>
<done>WriteBuffer passes all unit tests. Repository interfaces exist with correct method signatures. IngestionConfig reads from application.yml.</done>
</task>
@@ -201,7 +201,7 @@ Output: Working ClickHouse via Docker Compose, DDL with TTL, WriteBuffer with un
</tasks>
<verification>
-`mvn test -pl cameleer3-server-core -q` -- all WriteBuffer unit tests pass
-`mvn test -pl cameleer-server-core -q` -- all WriteBuffer unit tests pass
-`mvn clean compile -q` -- full project compiles with new dependencies
- POST /api/v1/data/executions with single RouteExecution JSON returns 202
@@ -245,7 +245,7 @@ public class IngestionConfig {
Note: All integration tests must include X-Cameleer-Protocol-Version:1 header (API-04 will be enforced by Plan 03's interceptor, but include the header now for forward compatibility).
</action>
<verify>
<automated>mvn test -pl cameleer3-server-app -Dtest="ExecutionControllerIT,DiagramControllerIT,MetricsControllerIT,BackpressureIT" -q 2>&1 | tail -15</automated>
<automated>mvn test -pl cameleer-server-app -Dtest="ExecutionControllerIT,DiagramControllerIT,MetricsControllerIT,BackpressureIT" -q 2>&1 | tail -15</automated>
</verify>
<done>All three ingestion endpoints return 202 on valid data. Data arrives in ClickHouse after flush. Buffer-full returns 503 with Retry-After. Unknown JSON fields accepted. Integration tests green.</done>
</task>
@@ -253,7 +253,7 @@ public class IngestionConfig {
</tasks>
<verification>
-`mvn test -pl cameleer3-server-app -Dtest="ExecutionControllerIT,DiagramControllerIT,MetricsControllerIT,BackpressureIT" -q` -- all integration tests pass
-`mvn test -pl cameleer-server-app -Dtest="ExecutionControllerIT,DiagramControllerIT,MetricsControllerIT,BackpressureIT" -q` -- all integration tests pass
-`cameleer-server-app/.../controller/ExecutionControllerIT.java` - 4 tests: single, array, flush, unknown fields
-`cameleer-server-app/.../controller/DiagramControllerIT.java` - 3 tests: single, array, flush
-`cameleer-server-app/.../controller/MetricsControllerIT.java` - 2 tests: POST, flush
-`cameleer-server-app/.../controller/BackpressureIT.java` - 2 tests: 503 response, data not lost
## Decisions Made
- Controllers accept raw String body and detect single vs array JSON (starts with `[`), supporting both payload formats per protocol spec
@@ -119,7 +119,7 @@ Each task was committed atomically:
- **Found during:** Task 2 (integration test context startup)
- **Issue:** IngestionConfig had both `@Configuration` and `@ConfigurationProperties`, while `@EnableConfigurationProperties(IngestionConfig.class)` on the app class created a second bean, causing "expected single matching bean but found 2"
- **Fix:** Removed `@Configuration` from IngestionConfig, relying solely on `@EnableConfigurationProperties`
<done>AbstractClickHouseIT base class ready for integration tests. ProtocolVersionInterceptor validates header on data/agent paths. Health, swagger, and api-docs paths excluded. Application class enables scheduling and config properties.</done>
</task>
@@ -129,10 +129,10 @@ Output: AbstractClickHouseIT base class, working health, Swagger UI, protocol he
- GET /api/v1/health returns 200 with JSON containing status field
@@ -178,7 +178,7 @@ Output: AbstractClickHouseIT base class, working health, Swagger UI, protocol he
Note: All tests that POST to data endpoints must include X-Cameleer-Protocol-Version:1 header.
</action>
<verify>
<automated>mvn test -pl cameleer3-server-app -Dtest="HealthControllerIT,OpenApiIT,ProtocolVersionIT,ForwardCompatIT" -q 2>&1 | tail -15</automated>
<automated>mvn test -pl cameleer-server-app -Dtest="HealthControllerIT,OpenApiIT,ProtocolVersionIT,ForwardCompatIT" -q 2>&1 | tail -15</automated>
</verify>
<done>Health returns 200. OpenAPI docs are available and list endpoints. Protocol version header enforced on data paths, not on health/docs. Unknown JSON fields accepted. TTL confirmed in ClickHouse DDL via HealthControllerIT test methods.</done>
</task>
@@ -186,7 +186,7 @@ Output: AbstractClickHouseIT base class, working health, Swagger UI, protocol he
</tasks>
<verification>
-`mvn test -pl cameleer3-server-app -Dtest="HealthControllerIT,OpenApiIT,ProtocolVersionIT,ForwardCompatIT" -q` -- all tests pass
-`mvn test -pl cameleer-server-app -Dtest="HealthControllerIT,OpenApiIT,ProtocolVersionIT,ForwardCompatIT" -q` -- all tests pass
- GET /api/v1/health returns 200
- GET /api/v1/api-docs returns OpenAPI spec
- Missing protocol header returns 400 on data endpoints
- ProtocolVersionInterceptor validates X-Cameleer-Protocol-Version:1 on /api/v1/data/** and /api/v1/agents/** paths, returning 400 JSON error for missing or wrong version
- AbstractClickHouseIT base class with Testcontainers ClickHouse 25.3, shared static container, schema init from 01-schema.sql
- 12 integration tests: health endpoint (2), OpenAPI docs (2), protocol version enforcement (5), forward compatibility (1), TTL verification (2)
- Cameleer3ServerApplication with @EnableScheduling, @EnableConfigurationProperties, and dual package scanning
- CameleerServerApplication with @EnableScheduling, @EnableConfigurationProperties, and dual package scanning
## Task Commits
@@ -80,17 +80,17 @@ Each task was committed atomically:
2.**Task 2: Integration tests for health, OpenAPI, protocol version, forward compat, and TTL** - `2d3fde3` (test)
## Files Created/Modified
-`cameleer3-server-app/src/main/java/.../Cameleer3ServerApplication.java` - Spring Boot entry point with scheduling and config properties
-`cameleer3-server-app/src/main/java/.../interceptor/ProtocolVersionInterceptor.java` - Validates protocol version header on data/agent paths
-`cameleer3-server-app/src/main/java/.../config/WebConfig.java` - Registers interceptor with path patterns and exclusions
-`cameleer3-server-app/src/test/java/.../AbstractClickHouseIT.java` - Shared Testcontainers base class for ITs
-`cameleer3-server-app/src/test/resources/application-test.yml` - Test profile with small buffer config
-`cameleer3-server-app/src/test/java/.../controller/HealthControllerIT.java` - Health endpoint and TTL tests
-`cameleer3-server-app/src/test/java/.../controller/OpenApiIT.java` - OpenAPI and Swagger UI tests
Phase 1 establishes the data pipeline and API skeleton for Cameleer3 Server. Agents POST execution data, diagrams, and metrics to REST endpoints; the server buffers these in memory and batch-flushes to ClickHouse. The ClickHouse schema design is the most critical and least reversible decision in this phase -- ORDER BY and partitioning cannot be changed without table recreation.
Phase 1 establishes the data pipeline and API skeleton for Cameleer Server. Agents POST execution data, diagrams, and metrics to REST endpoints; the server buffers these in memory and batch-flushes to ClickHouse. The ClickHouse schema design is the most critical and least reversible decision in this phase -- ORDER BY and partitioning cannot be changed without table recreation.
The ClickHouse Java ecosystem has undergone significant changes. The recommended approach is **clickhouse-jdbc v0.9.7** (JDBC V2 driver) with Spring Boot's JdbcTemplate for batch inserts. An alternative is the standalone **client-v2** artifact which offers a POJO-based insert API, but JDBC integration with Spring Boot is more conventional and better documented. ClickHouse now has a native full-text index (TYPE text, GA as of March 2026) that supersedes the older tokenbf_v1 bloom filter approach -- this is relevant for Phase 2 but should be accounted for in schema design now.
@@ -17,7 +17,7 @@ The ClickHouse Java ecosystem has undergone significant changes. The recommended
| ID | Description | Research Support |
|----|-------------|-----------------|
| INGST-01 (#1) | Accept RouteExecution via POST /api/v1/data/executions, return 202 | REST controller + async write buffer pattern; Jackson deserialization of cameleer3-common models |
| INGST-01 (#1) | Accept RouteExecution via POST /api/v1/data/executions, return 202 | REST controller + async write buffer pattern; Jackson deserialization of cameleer-common models |
| INGST-02 (#2) | Accept RouteGraph via POST /api/v1/data/diagrams, return 202 | Same pattern; separate ClickHouse table for diagrams with content-hash dedup |
| INGST-03 (#3) | Accept metrics via POST /api/v1/data/metrics, return 202 | Same pattern; separate ClickHouse table for metrics |
| INGST-04 (#4) | In-memory batch buffer with configurable flush interval/size | ArrayBlockingQueue + @Scheduled flush; configurable via application.yml |
@@ -60,7 +60,7 @@ The ClickHouse Java ecosystem has undergone significant changes. The recommended
| ArrayBlockingQueue | LMAX Disruptor | Disruptor is faster under extreme contention but adds complexity; ABQ is sufficient for this throughput |
| Spring JdbcTemplate | Raw JDBC PreparedStatement | JdbcTemplate provides cleaner error handling and resource management; no meaningful overhead |
**Installation (add to cameleer3-server-app/pom.xml):**
**Installation (add to cameleer-server-app/pom.xml):**
```xml
<!-- ClickHouse JDBC V2 -->
<dependency>
@@ -103,7 +103,7 @@ The ClickHouse Java ecosystem has undergone significant changes. The recommended
</dependency>
```
**Add to cameleer3-server-core/pom.xml:**
**Add to cameleer-server-core/pom.xml:**
```xml
<!-- SLF4J for logging (no Spring dependency) -->
<dependency>
@@ -117,7 +117,7 @@ The ClickHouse Java ecosystem has undergone significant changes. The recommended
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/ExecutionController.java` | POST /api/v1/data/executions endpoint | VERIFIED | 79 lines; `@PostMapping("/executions")`; handles single/array via raw String parsing; returns 202 or 503 + Retry-After |
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseExecutionRepository.java` | Batch insert to route_executions via JdbcTemplate | VERIFIED | 118 lines; `@Repository`; `BatchPreparedStatementSetter`; flattens processor tree to parallel arrays |
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/ingestion/ClickHouseFlushScheduler.java` | Scheduled drain of WriteBuffer into ClickHouse | VERIFIED | 160 lines; `@Scheduled(fixedDelayString="${ingestion.flush-interval-ms:1000}")`; implements `SmartLifecycle` for shutdown drain |
| `cameleer3-server-core/src/main/java/com/cameleer3/server/core/ingestion/IngestionService.java` | Routes data to appropriate WriteBuffer instances | VERIFIED | 115 lines; plain class; `acceptExecution`, `acceptExecutions`, `acceptDiagram`, `acceptDiagrams`, `acceptMetrics`; delegates to typed `WriteBuffer` instances |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/ExecutionController.java` | POST /api/v1/data/executions endpoint | VERIFIED | 79 lines; `@PostMapping("/executions")`; handles single/array via raw String parsing; returns 202 or 503 + Retry-After |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseExecutionRepository.java` | Batch insert to route_executions via JdbcTemplate | VERIFIED | 118 lines; `@Repository`; `BatchPreparedStatementSetter`; flattens processor tree to parallel arrays |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/ingestion/ClickHouseFlushScheduler.java` | Scheduled drain of WriteBuffer into ClickHouse | VERIFIED | 160 lines; `@Scheduled(fixedDelayString="${ingestion.flush-interval-ms:1000}")`; implements `SmartLifecycle` for shutdown drain |
| `cameleer-server-core/src/main/java/com/cameleer/server/core/ingestion/IngestionService.java` | Routes data to appropriate WriteBuffer instances | VERIFIED | 115 lines; plain class; `acceptExecution`, `acceptExecutions`, `acceptDiagram`, `acceptDiagrams`, `acceptMetrics`; delegates to typed `WriteBuffer` instances |
#### Plan 01-03 Artifacts
| Artifact | Expected | Status | Details |
|---|---|---|---|
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/interceptor/ProtocolVersionInterceptor.java` | Validates X-Cameleer-Protocol-Version:1 header on data endpoints | VERIFIED | 47 lines; implements `HandlerInterceptor.preHandle`; returns 400 JSON on missing/wrong version |
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/config/WebConfig.java` | Registers interceptor with path patterns | VERIFIED | 35 lines; `addInterceptors` registers interceptor on `/api/v1/data/**` and `/api/v1/agents/**`; excludes health, api-docs, swagger-ui |
| `cameleer3-server-app/src/test/java/com/cameleer3/server/app/AbstractClickHouseIT.java` | Shared Testcontainers base class for integration tests | VERIFIED | 73 lines; static `ClickHouseContainer`; `@DynamicPropertySource`; `@BeforeAll` schema init from SQL file; `JdbcTemplate` exposed to subclasses |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/interceptor/ProtocolVersionInterceptor.java` | Validates X-Cameleer-Protocol-Version:1 header on data endpoints | VERIFIED | 47 lines; implements `HandlerInterceptor.preHandle`; returns 400 JSON on missing/wrong version |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/config/WebConfig.java` | Registers interceptor with path patterns | VERIFIED | 35 lines; `addInterceptors` registers interceptor on `/api/v1/data/**` and `/api/v1/agents/**`; excludes health, api-docs, swagger-ui |
| `cameleer-server-app/src/test/java/com/cameleer/server/app/AbstractClickHouseIT.java` | Shared Testcontainers base class for integration tests | VERIFIED | 73 lines; static `ClickHouseContainer`; `@DynamicPropertySource`; `@BeforeAll` schema init from SQL file; `JdbcTemplate` exposed to subclasses |
---
@@ -113,7 +113,7 @@ No orphaned requirements — all 11 IDs declared in plan frontmatter match the R
### Anti-Patterns Found
No anti-patterns detected. Scanned all source files in `cameleer3-server-app/src/main` and `cameleer3-server-core/src/main` for TODO/FIXME/PLACEHOLDER/stub return patterns. None found.
No anti-patterns detected. Scanned all source files in `cameleer-server-app/src/main` and `cameleer-server-core/src/main` for TODO/FIXME/PLACEHOLDER/stub return patterns. None found.
- Add tokenbf_v1 skip indexes on exchange_bodies and exchange_headers (GRANULARITY 4, same as idx_error)
- Add tokenbf_v1 skip index on error_stacktrace (it has no index yet, needed for SRCH-05 full-text search across stack traces)
2. Create core search domain types in `com.cameleer3.server.core.search`:
2. Create core search domain types in `com.cameleer.server.core.search`:
-`SearchRequest` record: status (String, nullable), timeFrom (Instant), timeTo (Instant), durationMin (Long), durationMax (Long), correlationId (String), text (String — global full-text), textInBody (String), textInHeaders (String), textInErrors (String), offset (int), limit (int). Compact constructor validates: limit defaults to 50 if <= 0, capped at 500; offset defaults to 0 if < 0.
-`SearchResult<T>` record: data (List<T>), total (long), offset (int), limit (int). Include static factory `empty(int offset, int limit)`.
-`ExecutionSummary` record: executionId (String), routeId (String), agentId (String), status (String), startTime (Instant), endTime (Instant), durationMs (long), correlationId (String), errorMessage (String), diagramContentHash (String). This is the lightweight list-view DTO — NOT the full processor arrays.
-`SearchEngine` interface with methods: `SearchResult<ExecutionSummary> search(SearchRequest request)` and `long count(SearchRequest request)`. This is the swappable backend (ClickHouse now, OpenSearch later per user decision).
-`SearchService` class: plain class (no Spring annotations, same pattern as IngestionService). Constructor takes SearchEngine. `search(SearchRequest)` delegates to engine.search(). This thin orchestration layer allows adding cross-cutting concerns later.
3. Create core detail domain types in `com.cameleer3.server.core.detail`:
3. Create core detail domain types in `com.cameleer.server.core.detail`:
-`ProcessorNode` record: processorId (String), processorType (String), status (String), startTime (Instant), endTime (Instant), durationMs (long), diagramNodeId (String), errorMessage (String), errorStackTrace (String), children (List<ProcessorNode>). This is the nested tree node.
-`ExecutionDetail` record: executionId (String), routeId (String), agentId (String), status (String), startTime (Instant), endTime (Instant), durationMs (long), correlationId (String), exchangeId (String), errorMessage (String), errorStackTrace (String), diagramContentHash (String), processors (List<ProcessorNode>). This is the full detail response.
-`DetailService` class: plain class (no Spring annotations). Constructor takes ExecutionRepository. Method `getDetail(String executionId)` returns `Optional<ExecutionDetail>`. Calls repository's new `findDetailById` method, then calls `reconstructTree()` to convert flat arrays into nested ProcessorNode tree. The `reconstructTree` method: takes parallel arrays (ids, types, statuses, starts, ends, durations, diagramNodeIds, errorMessages, errorStackTraces, depths, parentIndexes), creates ProcessorNode[] array, then wires children using parentIndexes (parentIndex == -1 means root).
Actually, use a different approach per the layering: add a `findRawById(String executionId)` method that returns `Optional<RawExecutionRow>` — a new record containing all parallel arrays. DetailService takes this and reconstructs. Create `RawExecutionRow` as a record in the detail package with all fields needed for reconstruction.
<done>Schema migration SQL exists, all core domain types compile, SearchEngine interface and SearchService defined, ExecutionRepository extended with query method, DetailService has tree reconstruction logic</done>
- Test: After inserting a RouteExecution with processors that have exchange snapshots and nested children, the route_executions row has non-empty exchange_bodies, exchange_headers, processor_depths (correct depth values), processor_parent_indexes (correct parent wiring), processor_input_bodies, processor_output_bodies, processor_input_headers, processor_output_headers, processor_diagram_node_ids, and diagram_content_hash columns
- Verifies a second insertion with null snapshots succeeds with empty defaults
</action>
<verify>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer3-server && mvn test -pl cameleer3-server-app -Dtest=IngestionSchemaIT</automated>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer-server && mvn test -pl cameleer-server-app -Dtest=IngestionSchemaIT</automated>
</verify>
<done>All new columns populated correctly during ingestion, tree metadata (depth/parent) correct for nested processors, exchange data concatenated for search, existing ingestion tests still pass</done>
- Used FlatProcessor record to carry depth and parentIndex alongside the ProcessorExecution during DFS flattening -- single pass, no separate traversal
@@ -116,9 +116,9 @@ Each task was committed atomically:
**1. [Rule 3 - Blocking] Created DiagramRenderer and DiagramLayout stub interfaces**
- **Found during:** Task 2 (compilation step)
- **Issue:** Pre-existing `ElkDiagramRenderer` in app module referenced `DiagramRenderer` and `DiagramLayout` interfaces that did not exist in core module, causing compilation failure
- **Fix:** Created minimal stub interfaces in `com.cameleer3.server.core.diagram` package
- **Fix:** Created minimal stub interfaces in `com.cameleer.server.core.diagram` package
2. Create core diagram rendering interfaces in `com.cameleer3.server.core.diagram`:
2. Create core diagram rendering interfaces in `com.cameleer.server.core.diagram`:
- `PositionedNode` record: id (String), label (String), type (String — NodeType name), x (double), y (double), width (double), height (double), children (List<PositionedNode> — for compound/swimlane groups). JSON-serializable for the JSON layout response.
- Unit test: ElkDiagramRenderer.renderSvg with a simple 3-node graph (from->process->to) produces valid SVG containing svg element, rect elements for nodes, line/path elements for edges
- Integration test: GET /api/v1/diagrams/{hash} with no Accept preference defaults to SVG
</behavior>
<action>
1. Create `ElkDiagramRenderer` implementing `DiagramRenderer` in `com.cameleer3.server.app.diagram`:
1. Create `ElkDiagramRenderer` implementing `DiagramRenderer` in `com.cameleer.server.app.diagram`:
**Layout phase (shared by both SVG and JSON):**
- Convert RouteGraph to ELK graph: create ElkNode root, set properties for LayeredOptions.ALGORITHM_ID, Direction.DOWN (top-to-bottom per user decision), spacing 40px node-node, 20px edge-node.
- GET /api/v1/diagrams/{hash}/render with no Accept header -> assert SVG response (default).
</action>
<verify>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer3-server && mvn test -pl cameleer3-server-app -Dtest="ElkDiagramRendererTest,DiagramRenderControllerIT"</automated>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer-server && mvn test -pl cameleer-server-app -Dtest="ElkDiagramRendererTest,DiagramRenderControllerIT"</automated>
</verify>
<done>Diagram rendering produces color-coded top-to-bottom SVG and JSON layout, content negotiation works via Accept header, compound nodes group nested processors, all tests pass</done>
- Test searchByStatus: Insert 3 executions (COMPLETED, FAILED, RUNNING). GET /api/v1/search/executions?status=FAILED returns only the FAILED execution. Response has envelope: {"data":[...],"total":1,"offset":0,"limit":50}
@@ -221,7 +221,7 @@ Established controller pattern (from Phase 1):
- Test emptyResults: Search with no matches returns {"data":[],"total":0,"offset":0,"limit":50}
</behavior>
<action>
1. Create `ClickHouseSearchEngine` in `com.cameleer3.server.app.search`:
1. Create `ClickHouseSearchEngine` in `com.cameleer.server.app.search`:
- Implements SearchEngine interface from core module.
- Constructor takes JdbcTemplate.
-`search(SearchRequest)` method:
@@ -244,13 +244,13 @@ Established controller pattern (from Phase 1):
-`escapeLike(String)` utility: escape `%`, `_`, `\` characters in user input to prevent LIKE injection. Replace `\` with `\\`, `%` with `\%`, `_` with `\_`.
-`count(SearchRequest)` method: same WHERE building, just count query.
2. Create `SearchBeanConfig` in `com.cameleer3.server.app.config`:
2. Create `SearchBeanConfig` in `com.cameleer.server.app.config`:
-`DetailService` bean (takes the execution query interface from Plan 01)
3. Create `SearchController` in `com.cameleer3.server.app.controller`:
3. Create `SearchController` in `com.cameleer.server.app.controller`:
- Inject SearchService.
-`GET /api/v1/search/executions` with @RequestParam for basic filters:
- status (optional String)
@@ -274,7 +274,7 @@ Established controller pattern (from Phase 1):
- Assert response structure matches the envelope format.
</action>
<verify>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer3-server && mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT</automated>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer-server && mvn test -pl cameleer-server-app -Dtest=SearchControllerIT</automated>
</verify>
<done>All search filter types work independently and in combination, response envelope has correct format, pagination works correctly, full-text search finds matches in all text fields, LIKE patterns are properly escaped</done>
</task>
@@ -282,10 +282,10 @@ Established controller pattern (from Phase 1):
<task type="auto" tdd="true">
<name>Task 2: DetailController, tree reconstruction, exchange snapshot endpoint, and integration tests</name>
- Unit test: reconstructTree with [root, child, grandchild], depths=[0,1,2], parents=[-1,0,1] produces single root with one child that has one grandchild
@@ -308,7 +308,7 @@ Established controller pattern (from Phase 1):
- Add `findRawById(String executionId)` method that queries all columns from route_executions WHERE execution_id = ?. Return Optional<RawExecutionRow> (use the record created in Plan 01 or create it here if needed). The RawExecutionRow should contain ALL columns including the parallel arrays for processors.
- Add `findProcessorSnapshot(String executionId, int processorIndex)` method: queries processor_input_bodies[index+1], processor_output_bodies[index+1], processor_input_headers[index+1], processor_output_headers[index+1] for the given execution. Returns a DTO with inputBody, outputBody, inputHeaders, outputHeaders. ClickHouse arrays are 1-indexed in SQL, so add 1 to the Java 0-based index.
3. Create `DetailController` in `com.cameleer3.server.app.controller`:
3. Create `DetailController` in `com.cameleer.server.app.controller`:
- Inject DetailService.
-`GET /api/v1/executions/{executionId}`: call detailService.getDetail(executionId). If empty, return 404. Otherwise return 200 with ExecutionDetail JSON. The processors field is a nested tree of ProcessorNode objects.
-`GET /api/v1/executions/{executionId}/processors/{index}/snapshot`: call repository's findProcessorSnapshot. If execution not found or index out of bounds, return 404. Return JSON with inputBody, outputBody, inputHeaders, outputHeaders. Per user decision: exchange snapshot data fetched separately per processor, not inlined in detail response.
@@ -323,7 +323,7 @@ Established controller pattern (from Phase 1):
- Test GET /api/v1/executions/{id}/processors/999/snapshot: returns 404 for out-of-bounds index.
</action>
<verify>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer3-server && mvn test -pl cameleer3-server-core -Dtest=TreeReconstructionTest && mvn test -pl cameleer3-server-app -Dtest=DetailControllerIT</automated>
<automated>cd C:/Users/Hendrik/Documents/projects/cameleer-server && mvn test -pl cameleer-server-core -Dtest=TreeReconstructionTest && mvn test -pl cameleer-server-app -Dtest=DetailControllerIT</automated>
</verify>
<done>Tree reconstruction correctly rebuilds nested processor trees from flat arrays, detail endpoint returns nested tree with all fields, snapshot endpoint returns per-processor exchange data, diagram hash included in detail response, all tests pass</done>
</task>
@@ -331,9 +331,9 @@ Established controller pattern (from Phase 1):
</tasks>
<verification>
-`mvn test -pl cameleer3-server-core -Dtest=TreeReconstructionTest` passes (unit test for tree rebuild)
-`mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT` passes (all search filters)
-`mvn test -pl cameleer3-server-app -Dtest=DetailControllerIT` passes (detail + snapshot)
-`mvn test -pl cameleer-server-core -Dtest=TreeReconstructionTest` passes (unit test for tree rebuild)
-`mvn test -pl cameleer-server-app -Dtest=SearchControllerIT` passes (all search filters)
-`mvn test -pl cameleer-server-app -Dtest=DetailControllerIT` passes (detail + snapshot)
- **Issue:** Core module only had JUnit Jupiter as test dependency, TreeReconstructionTest needed assertj for assertions and mockito for mock(ExecutionRepository.class)
- **Fix:** Added assertj-core and mockito-core test-scoped dependencies to cameleer3-server-core/pom.xml
- Test 1: When a RouteGraph is ingested before a RouteExecution for the same routeId+agentId, the execution's diagram_content_hash column contains the SHA-256 hash of the diagram (not empty string)
@@ -117,7 +117,7 @@ public class ClickHouseDiagramRepository implements DiagramRepository {
**Gap 2 — Surefire classloader isolation:**
5. In `cameleer3-server-app/pom.xml`, add a `<build><plugins>` section (after the existing `spring-boot-maven-plugin`) with `maven-surefire-plugin` configuration:
5. In `cameleer-server-app/pom.xml`, add a `<build><plugins>` section (after the existing `spring-boot-maven-plugin`) with `maven-surefire-plugin` configuration:
```xml
<plugin>
<groupId>org.apache.maven.plugins</groupId>
@@ -131,12 +131,12 @@ public class ClickHouseDiagramRepository implements DiagramRepository {
This forces Surefire to fork a fresh JVM for each test class, isolating ELK's static initializer (LayeredMetaDataProvider + xtext CollectionLiterals) from Spring Boot's classloader. Trade-off: slightly slower test execution, but correct results.
- DiagramRepository injected via constructor into ClickHouseExecutionRepository -- both are @Repository Spring beans, so constructor injection autowires cleanly
| SRCH-01 | Filter by status returns matching executions | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByStatus` | No -- Wave 0 |
| SRCH-02 | Filter by time range returns matching executions | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByTimeRange` | No -- Wave 0 |
| SRCH-03 | Filter by duration range returns matching | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByDuration` | No -- Wave 0 |
| SRCH-04 | Filter by correlationId returns correlated | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#searchByCorrelationId` | No -- Wave 0 |
| SRCH-05 | Full-text search across bodies/headers/errors | integration | `mvn test -pl cameleer3-server-app -Dtest=SearchControllerIT#fullTextSearch` | No -- Wave 0 |
| SRCH-06 | Detail returns nested processor tree | integration | `mvn test -pl cameleer3-server-app -Dtest=DetailControllerIT#detailReturnsNestedTree` | No -- Wave 0 |
| DIAG-01 | Content-hash dedup stores identical defs once | integration | `mvn test -pl cameleer3-server-app -Dtest=DiagramControllerIT#contentHashDedup` | Partial (ingestion test exists) |
| DIAG-02 | Transaction links to active diagram version | integration | `mvn test -pl cameleer3-server-app -Dtest=DetailControllerIT#detailIncludesDiagramHash` | No -- Wave 0 |
| DIAG-03 | Diagram rendered as SVG or JSON layout | integration | `mvn test -pl cameleer3-server-app -Dtest=DiagramRenderControllerIT#renderSvg` | No -- Wave 0 |
| SRCH-01 | Filter by status returns matching executions | integration | `mvn test -pl cameleer-server-app -Dtest=SearchControllerIT#searchByStatus` | No -- Wave 0 |
| SRCH-02 | Filter by time range returns matching executions | integration | `mvn test -pl cameleer-server-app -Dtest=SearchControllerIT#searchByTimeRange` | No -- Wave 0 |
| SRCH-03 | Filter by duration range returns matching | integration | `mvn test -pl cameleer-server-app -Dtest=SearchControllerIT#searchByDuration` | No -- Wave 0 |
| SRCH-04 | Filter by correlationId returns correlated | integration | `mvn test -pl cameleer-server-app -Dtest=SearchControllerIT#searchByCorrelationId` | No -- Wave 0 |
| SRCH-05 | Full-text search across bodies/headers/errors | integration | `mvn test -pl cameleer-server-app -Dtest=SearchControllerIT#fullTextSearch` | No -- Wave 0 |
| SRCH-06 | Detail returns nested processor tree | integration | `mvn test -pl cameleer-server-app -Dtest=DetailControllerIT#detailReturnsNestedTree` | No -- Wave 0 |
| DIAG-01 | Content-hash dedup stores identical defs once | integration | `mvn test -pl cameleer-server-app -Dtest=DiagramControllerIT#contentHashDedup` | Partial (ingestion test exists) |
| DIAG-02 | Transaction links to active diagram version | integration | `mvn test -pl cameleer-server-app -Dtest=DetailControllerIT#detailIncludesDiagramHash` | No -- Wave 0 |
| DIAG-03 | Diagram rendered as SVG or JSON layout | integration | `mvn test -pl cameleer-server-app -Dtest=DiagramRenderControllerIT#renderSvg` | No -- Wave 0 |
### Sampling Rate
- **Per task commit:** `mvn test -pl cameleer3-server-app -Dtest=<relevant>IT`
- **Per task commit:** `mvn test -pl cameleer-server-app -Dtest=<relevant>IT`
- **Per wave merge:** `mvn clean verify`
- **Phase gate:** Full suite green before `/gsd:verify-work`
@@ -551,7 +551,7 @@ private void populateExchangeColumns(PreparedStatement ps, List<FlatProcessor> p
### Primary (HIGH confidence)
- ClickHouse JDBC 0.9.7, ClickHouse 25.3 -- verified from project pom.xml and AbstractClickHouseIT
- cameleer3-common 1.0-SNAPSHOT JAR -- decompiled to verify RouteGraph, RouteNode, RouteEdge, NodeType, ProcessorExecution, ExchangeSnapshot field structures
- cameleer-common 1.0-SNAPSHOT JAR -- decompiled to verify RouteGraph, RouteNode, RouteEdge, NodeType, ProcessorExecution, ExchangeSnapshot field structures
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/storage/ClickHouseExecutionRepository.java` | VERIFIED | DiagramRepository injected via constructor (line 59); `findContentHashForRoute` called in `setValues()` (lines 144–147); former `""` placeholder removed |
| `cameleer3-server-app/pom.xml` | VERIFIED | `maven-surefire-plugin` with `forkCount=1``reuseForks=false` at lines 95–100; `maven-failsafe-plugin` same config at lines 103–108 |
| `cameleer3-server-app/src/test/java/com/cameleer3/server/app/storage/DiagramLinkingIT.java` | VERIFIED | 152 lines; 2 integration tests; positive case asserts 64-char hex hash; negative case asserts empty string; uses `ignoreExceptions()` for ClickHouse eventual consistency |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/storage/ClickHouseExecutionRepository.java` | VERIFIED | DiagramRepository injected via constructor (line 59); `findContentHashForRoute` called in `setValues()` (lines 144–147); former `""` placeholder removed |
| `cameleer-server-app/pom.xml` | VERIFIED | `maven-surefire-plugin` with `forkCount=1``reuseForks=false` at lines 95–100; `maven-failsafe-plugin` same config at lines 103–108 |
| `cameleer-server-app/src/test/java/com/cameleer/server/app/storage/DiagramLinkingIT.java` | VERIFIED | 152 lines; 2 integration tests; positive case asserts 64-char hex hash; negative case asserts empty string; uses `ignoreExceptions()` for ClickHouse eventual consistency |
| `Surefire/Failsafe` | ELK classloader isolation | `reuseForks=false` forces fresh JVM per test class | WIRED | Lines 95–116 in `cameleer3-server-app/pom.xml` |
| `Surefire/Failsafe` | ELK classloader isolation | `reuseForks=false` forces fresh JVM per test class | WIRED | Lines 95–116 in `cameleer-server-app/pom.xml` |
### Requirements Coverage
@@ -108,7 +108,7 @@ Two blockers from the initial verification (2026-03-11T16:00:00Z) have been reso
**Gap 1 resolved — DIAG-02 diagram hash linking:**`ClickHouseExecutionRepository` now injects `DiagramRepository` via constructor and calls `findContentHashForRoute(exec.getRouteId(), "")` in `insertBatch()`. Both the diagram store path and the execution ingest path use `agent_id=""` consistently, so the lookup is correct. `DiagramLinkingIT` provides integration test coverage for both the positive case (hash populated when diagram exists) and negative case (empty string when no diagram exists for the route).
**Gap 2 resolved — Test suite stability:** Both `maven-surefire-plugin` and `maven-failsafe-plugin` in `cameleer3-server-app/pom.xml` are now configured with `forkCount=1``reuseForks=false`. This forces a fresh JVM per test class, isolating ELK's `LayeredMetaDataProvider` static initializer from Spring Boot's classloader. The SUMMARY reports 51 tests, 0 failures. Test count across 16 test files totals 80 `@Test` methods; the difference from 51 reflects how Surefire/Failsafe counts parameterized and nested tests vs. raw annotation count.
**Gap 2 resolved — Test suite stability:** Both `maven-surefire-plugin` and `maven-failsafe-plugin` in `cameleer-server-app/pom.xml` are now configured with `forkCount=1``reuseForks=false`. This forces a fresh JVM per test class, isolating ELK's `LayeredMetaDataProvider` static initializer from Spring Boot's classloader. The SUMMARY reports 51 tests, 0 failures. Test count across 16 test files totals 80 `@Test` methods; the difference from 51 reflects how Surefire/Failsafe counts parameterized and nested tests vs. raw annotation count.
No regressions were introduced. All 10 observable truths and all 9 phase requirements are now satisfied. Two items remain for human visual verification (SVG rendering correctness).
3.**Update Cameleer3ServerApplication**: Add AgentRegistryConfig.class to @EnableConfigurationProperties.
3.**Update CameleerServerApplication**: Add AgentRegistryConfig.class to @EnableConfigurationProperties.
4.**Update application.yml**: Add agent-registry section with all defaults (see RESEARCH.md code example). Also add `spring.mvc.async.request-timeout: -1` for SSE support (Plan 02 needs it, but set it now).
- Use TestRestTemplate (already available from AbstractClickHouseIT's @SpringBootTest)
</action>
<verify>
<automated>mvn test -pl cameleer3-server-core,cameleer3-server-app -Dtest="Agent*"</automated>
<automated>mvn test -pl cameleer-server-core,cameleer-server-app -Dtest="Agent*"</automated>
</verify>
<done>POST /register returns 200 with agentId + sseEndpoint + heartbeatIntervalMs. POST /{id}/heartbeat returns 200 for known agents, 404 for unknown. GET /agents returns all agents with optional ?status= filter. AgentLifecycleMonitor runs on schedule. All integration tests pass. mvn clean verify passes.</done>
Build the SSE infrastructure and command delivery system:
@@ -181,7 +181,7 @@ From cameleer3-server-app/.../config/AgentRegistryConfig.java:
5.**Update WebConfig**: The SSE endpoint GET /api/v1/agents/{id}/events is already covered by the interceptor pattern "/api/v1/agents/**". Agents send the protocol version header on all requests (per research recommendation), so no exclusion needed. However, if the SSE GET causes issues because browsers/clients may not easily add custom headers to EventSource, add the SSE events path to excludePathPatterns: `/api/v1/agents/*/events`. This is a practical consideration -- add the exclusion to be safe.
@@ -224,7 +224,7 @@ From cameleer3-server-app/.../config/AgentRegistryConfig.java:
**Test configuration**: If ping interval needs to be shorter for tests, add to test application.yml or use @TestPropertySource with agent-registry.ping-interval-ms=1000.
</action>
<verify>
<automated>mvn test -pl cameleer3-server-core,cameleer3-server-app -Dtest="Agent*"</automated>
<automated>mvn test -pl cameleer-server-core,cameleer-server-app -Dtest="Agent*"</automated>
</verify>
<done>All SSE integration tests pass: connect/disconnect, config-update/deep-trace/replay delivery via SSE, ping keepalive received, Last-Event-ID accepted, command targeting (single/group/broadcast), command acknowledgement. mvn clean verify passes with all existing tests still green.</done>
This phase adds agent registration, heartbeat-based lifecycle management (LIVE/STALE/DEAD), and real-time command push via SSE to the Cameleer3 server. The technology stack is straightforward: Spring MVC's `SseEmitter` for server-push, `ConcurrentHashMap` for the in-memory agent registry, and `@Scheduled` for periodic lifecycle checks (same pattern already used by `ClickHouseFlushScheduler`).
This phase adds agent registration, heartbeat-based lifecycle management (LIVE/STALE/DEAD), and real-time command push via SSE to the Cameleer server. The technology stack is straightforward: Spring MVC's `SseEmitter` for server-push, `ConcurrentHashMap` for the in-memory agent registry, and `@Scheduled` for periodic lifecycle checks (same pattern already used by `ClickHouseFlushScheduler`).
The main architectural challenge is managing per-agent SSE connections reliably -- handling disconnections, timeouts, and cleanup without leaking threads or emitters. The command delivery model (PENDING with 60s expiry, acknowledgement) adds a second concurrent data structure to manage alongside the registry itself.
@@ -93,7 +93,7 @@ No new dependencies required. Everything is already on the classpath.
| `cameleer3-server-core/src/main/java/com/cameleer3/server/core/agent/AgentInfo.java` | Immutable record with all fields and wither methods | VERIFIED | 63 lines; record with 10 fields and 5 wither-style methods |
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/controller/AgentRegistrationController.java` | POST /register, POST /{id}/heartbeat, GET /agents | VERIFIED | 153 lines; all three endpoints implemented with OpenAPI annotations |
| `cameleer3-server-app/src/main/java/com/cameleer3/server/app/agent/AgentLifecycleMonitor.java` | @Scheduled LIVE->STALE->DEAD transitions | VERIFIED | 37 lines; calls `registryService.checkLifecycle()` and `expireOldCommands()` on schedule |
| `cameleer-server-core/src/main/java/com/cameleer/server/core/agent/AgentInfo.java` | Immutable record with all fields and wither methods | VERIFIED | 63 lines; record with 10 fields and 5 wither-style methods |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/controller/AgentRegistrationController.java` | POST /register, POST /{id}/heartbeat, GET /agents | VERIFIED | 153 lines; all three endpoints implemented with OpenAPI annotations |
| `cameleer-server-app/src/main/java/com/cameleer/server/app/agent/AgentLifecycleMonitor.java` | @Scheduled LIVE->STALE->DEAD transitions | VERIFIED | 37 lines; calls `registryService.checkLifecycle()` and `expireOldCommands()` on schedule |
1. Add Maven dependencies to cameleer3-server-app/pom.xml:
1. Add Maven dependencies to cameleer-server-app/pom.xml:
-`spring-boot-starter-security` (managed version)
-`com.nimbusds:nimbus-jose-jwt:9.47` (explicit, may not be transitive without OAuth2 resource server)
-`spring-security-test` scope test (managed version)
@@ -165,12 +165,12 @@ public class AgentRegistryConfig { ... }
5. Update application-test.yml: Add `security.bootstrap-token: test-bootstrap-token`, `security.bootstrap-token-previous: old-bootstrap-token`. Also set `CAMELEER_AUTH_TOKEN: test-bootstrap-token` as an env override if needed.
6. IMPORTANT: Adding spring-boot-starter-security will break ALL existing tests immediately (401 on all endpoints). To prevent this during Plan 01 (before the security filter chain is configured in Plan 02), add a temporary test security config class `src/test/java/com/cameleer3/server/app/security/TestSecurityConfig.java` annotated `@TestConfiguration` that creates a `SecurityFilterChain` permitting all requests. This keeps existing tests green while security services are built. Plan 02 will replace this with real security config and update tests.
6. IMPORTANT: Adding spring-boot-starter-security will break ALL existing tests immediately (401 on all endpoints). To prevent this during Plan 01 (before the security filter chain is configured in Plan 02), add a temporary test security config class `src/test/java/com/cameleer/server/app/security/TestSecurityConfig.java` annotated `@TestConfiguration` that creates a `SecurityFilterChain` permitting all requests. This keeps existing tests green while security services are built. Plan 02 will replace this with real security config and update tests.
7. Write unit tests per the behavior spec above. Tests should NOT require Spring context -- construct implementations directly with test SecurityProperties.
</action>
<verify>
<automated>cd /c/Users/Hendrik/Documents/projects/cameleer3-server && mvn test -pl cameleer3-server-app -Dtest="JwtServiceTest,Ed25519SigningServiceTest,BootstrapTokenValidatorTest" -Dsurefire.reuseForks=false</automated>
<automated>cd /c/Users/Hendrik/Documents/projects/cameleer-server && mvn test -pl cameleer-server-app -Dtest="JwtServiceTest,Ed25519SigningServiceTest,BootstrapTokenValidatorTest" -Dsurefire.reuseForks=false</automated>
</verify>
<done>
- JwtService creates and validates access/refresh JWTs with correct claims and expiry
1. Create `JwtAuthenticationFilter extends OncePerRequestFilter` (NOT annotated @Component -- constructed in SecurityConfig to avoid double registration):
@@ -176,7 +176,7 @@ public class AgentRegistryService {
6. Update `WebConfig` if needed: The `ProtocolVersionInterceptor` excluded paths should align with Spring Security public paths. The SSE events path is already excluded from protocol version check (Phase 3 decision). Verify no conflicts.
1. Replace the Plan 01 temporary `TestSecurityConfig` (permit-all) with real security active in tests. Remove the permit-all override so tests run with actual security enforcement.
@@ -259,7 +259,7 @@ public class AgentRegistryService {
- Test: New access token from refresh can access protected endpoints
@@ -155,7 +155,7 @@ public record AgentCommand(String id, CommandType type, String payload, String a
- NOTE: This test depends on Plan 02's bootstrap token and JWT auth being in place. If Plan 03 executes before Plan 02, the test will need the TestSecurityHelper or a different auth approach. Since both are Wave 2 but independent, document this: "If Plan 02 is not yet complete, use TestSecurityHelper from Plan 01's temporary permit-all config."
</action>
<verify>
<automated>cd /c/Users/Hendrik/Documents/projects/cameleer3-server && mvn test -pl cameleer3-server-app -Dtest="SsePayloadSignerTest,SseSigningIT" -Dsurefire.reuseForks=false</automated>
<automated>cd /c/Users/Hendrik/Documents/projects/cameleer-server && mvn test -pl cameleer-server-app -Dtest="SsePayloadSignerTest,SseSigningIT" -Dsurefire.reuseForks=false</automated>
</verify>
<done>
- SsePayloadSigner signs JSON payloads with Ed25519 and adds signature field
This phase adds authentication and integrity protection to the Cameleer3 server. The implementation uses Spring Security 6.4.3 (managed by Spring Boot 3.4.3) with a custom `OncePerRequestFilter` for JWT validation, JDK 17 built-in Ed25519 for signing SSE payloads, and environment variable-based bootstrap tokens for agent registration. The approach is deliberately simple -- no OAuth2 resource server, no external identity provider, just symmetric HMAC JWTs for access control and Ed25519 signatures for payload integrity.
This phase adds authentication and integrity protection to the Cameleer server. The implementation uses Spring Security 6.4.3 (managed by Spring Boot 3.4.3) with a custom `OncePerRequestFilter` for JWT validation, JDK 17 built-in Ed25519 for signing SSE payloads, and environment variable-based bootstrap tokens for agent registration. The approach is deliberately simple -- no OAuth2 resource server, no external identity provider, just symmetric HMAC JWTs for access control and Ed25519 signatures for payload integrity.
The existing codebase has clear integration points: `AgentRegistrationController.register()` already returns `serverPublicKey: null` as a placeholder, `SseConnectionManager.onCommandReady()` is the signing hook for SSE events, and `WebConfig` already defines excluded paths that align with the public endpoint list. Spring Security's `SecurityFilterChain` replaces the need for hand-rolled authorization logic -- endpoints are protected by default, with explicit `permitAll()` for health, register, and docs.
@@ -89,7 +89,7 @@ The existing codebase has clear integration points: `AgentRegistrationController
- **Ed25519 library:** Use JDK built-in. Zero external dependencies, native performance, well-tested in JDK 17+.
- **Refresh token storage:** Use stateless signed refresh tokens (also HMAC-signed JWTs with different claims/expiry). This avoids any in-memory storage for refresh tokens and scales naturally. The refresh token is just a JWT with `type=refresh`, `sub=agentId`, and 7-day expiry. On refresh, validate the refresh JWT, check agent still exists, issue new access JWT.
**Installation (add to cameleer3-server-app pom.xml):**
**Installation (add to cameleer-server-app pom.xml):**
**Domain:** Transaction observability server for Apache Camel integrations
**Researched:** 2026-03-11
@@ -6,7 +6,7 @@
## Executive Summary
Cameleer3 Server is a write-heavy, read-occasional observability system that receives millions of transaction records per day from distributed Apache Camel agents, stores them with 30-day retention, and provides structured + full-text search. The architecture closely parallels established observability platforms like Jaeger, Zipkin, and njams Server, with the key differentiator being Camel route diagram visualization tied to individual transactions.
Cameleer Server is a write-heavy, read-occasional observability system that receives millions of transaction records per day from distributed Apache Camel agents, stores them with 30-day retention, and provides structured + full-text search. The architecture closely parallels established observability platforms like Jaeger, Zipkin, and njams Server, with the key differentiator being Camel route diagram visualization tied to individual transactions.
The recommended stack centers on **ClickHouse** as the primary data store. ClickHouse's columnar MergeTree engine provides the exact properties this project needs: massive batch insert throughput, excellent time-range query performance, native TTL-based retention, and 10-20x compression on structured observability data. This is a well-established pattern used by production observability platforms (SigNoz, Uptrace, PostHog all run on ClickHouse).
@@ -93,9 +93,9 @@ Based on research, suggested phase structure:
## Gaps to Address
- **ClickHouse Java client API:** The clickhouse-java library has undergone significant changes. Exact API, connection pooling, and Spring Boot integration patterns need phase-specific research
- **cameleer3-common PROTOCOL.md:** Must read the agent protocol definition before designing ClickHouse schema -- this defines the exact data structures being ingested
- **cameleer-common PROTOCOL.md:** Must read the agent protocol definition before designing ClickHouse schema -- this defines the exact data structures being ingested
- **ClickHouse Docker setup:** Optimal ClickHouse Docker configuration (memory limits, merge settings) for development and production
- **Full-text search decision:** ClickHouse skip indexes may or may not meet the "search by any content" requirement. This needs prototyping with realistic data
- **Diagram rendering library:** Server-side route diagram rendering is a significant unknown; needs prototyping with actual Camel route graph data from cameleer3-common
- **Diagram rendering library:** Server-side route diagram rendering is a significant unknown; needs prototyping with actual Camel route graph data from cameleer-common
- **Frontend framework:** No research on UI technology -- deferred to UI phase
- **Agent protocol stability:** The cameleer3-common protocol is still evolving. Schema evolution strategy needs alignment with agent development
- **Agent protocol stability:** The cameleer-common protocol is still evolving. Schema evolution strategy needs alignment with agent development
<p>Click to select a node (amber highlight ring). Right-click for context menu with tracing/tap/snapshot actions. Clean separation of concerns. Standard desktop UX.</p>
<p>Hover reveals a dark floating icon toolbar above the node. Click still selects. More discoverable than right-click, but can feel cluttered on dense diagrams.</p>
<p>MuleSoft-style: colored icon strip on the left, label + detail on the right. Color encodes node type. Compound nodes (choice, split) use dashed containers.</p>
</div>
</div>
<!-- Option B: Rounded pill with centered icon -->
<p>TIBCO BW5-inspired: white cards with colored top accent bar. Clean, professional, card-like. Compound nodes get a full colored header bar with white title text.</p>
<p>Diagram on top, bottom split into processor list (left) + detail tabs (right). Clicking processor in list or diagram syncs selection. Most information density.</p>
<divclass="pros-cons">
<divclass="pros"><h4>Pros</h4><ul><li>Processor list as navigation</li><li>Full diagram width</li><li>Maximum information density</li></ul></div>
<divstyle="font-family:monospace;font-size:11px;color:#f87171;">Expression evaluation timed out (50ms limit)</div>
</div>
</div>
<divstyle="font-size:10px;color:#6b7280;margin-top:8px;">Evaluated by agent <spanstyle="font-family:monospace;">order-svc-01</span> using Camel's <spanstyle="font-family:monospace;">simple</span> language</div>
This project is indexed by GitNexus as **cameleer-server** (6306 symbols, 15892 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
## Always Do
- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
## When Debugging
1.`gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
2.`gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
3.`READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
## When Refactoring
- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
## Never Do
- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
Before completing any code modification task, verify:
1.`gitnexus_impact` was run for all modified symbols
2. No HIGH/CRITICAL risk warnings were ignored
3.`gitnexus_detect_changes()` confirms changes match expected scope
4. All d=1 (WILL BREAK) dependents were updated
## Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
```bash
npx gitnexus analyze
```
If the index previously included embeddings, preserve them by adding `--embeddings`:
```bash
npx gitnexus analyze --embeddings
```
To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
## CLI
| Task | Read this skill file |
|------|---------------------|
| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
@@ -4,18 +4,18 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project
Cameleer3 Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer3 agents. Pushes config and commands to agents via SSE.
Cameleer Server — observability server that receives, stores, and serves Camel route execution data and route diagrams from Cameleer agents. Pushes config and commands to agents via SSE. Also orchestrates Docker container deployments when running under cameleer-saas.
## Related Project
- **cameleer3** (`https://gitea.siegeln.net/cameleer/cameleer3`) — the Java agent that instruments Camel applications
- Protocol defined in `cameleer3-common/PROTOCOL.md` in the agent repo
- This server depends on `com.cameleer3:cameleer3-common` (shared models and graph API)
- **cameleer** (`https://gitea.siegeln.net/cameleer/cameleer`) — the Java agent that instruments Camel applications
- Protocol defined in `cameleer-common/PROTOCOL.md` in the agent repo
- This server depends on `com.cameleer:cameleer-common` (shared models and graph API)
- Depends on `com.cameleer3:cameleer3-common` from Gitea Maven registry
- Depends on `com.cameleer:cameleer-common` from Gitea Maven registry
- Jackson `JavaTimeModule` for `Instant` deserialization
- Communication: receives HTTP POST data from agents, serves SSE event streams for config push/commands
-Maintains agent instance registry with states: LIVE → STALE → DEAD
-Storage: PostgreSQL (TimescaleDB) for structured data, OpenSearch for full-text search
-Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing, bootstrap token for registration
-OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API, stored in database (`server_config` table)
- Communication: receives HTTP POST data from agents (executions, diagrams, metrics, logs), serves SSE event streams for config push/commands (config-update, deep-trace, replay, route-control)
-Environment filtering: all data queries filter by the selected environment. All commands target only agents in the selected environment. Backend endpoints accept optional `environment` query parameter; null = all environments (backward compatible).
-Maintains agent instance registry (in-memory) with states: LIVE -> STALE -> DEAD. Auto-heals from JWT `env` claim + heartbeat body on heartbeat/SSE after server restart (priority: heartbeat `environmentId` > JWT `env` claim > `"default"`). Capabilities and route states updated on every heartbeat (protocol v2). Route catalog merges three sources: in-memory agent registry, persistent `route_catalog` table (ClickHouse), and `stats_1m_route` execution stats. The persistent catalog tracks `first_seen`/`last_seen` per route per environment, updated on every registration and heartbeat. Routes appear in the sidebar when their lifecycle overlaps the selected time window (`first_seen <= to AND last_seen >= from`), so historical routes remain visible even after being dropped from newer app versions.
-Multi-tenancy: each server instance serves one tenant (configured via `CAMELEER_SERVER_TENANT_ID`, default: `"default"`). Environments (dev/staging/prod) are first-class. PostgreSQL isolated via schema-per-tenant (`?currentSchema=tenant_{id}`) and `ApplicationName=tenant_{id}` on the JDBC URL. ClickHouse shared DB with `tenant_id` + `environment` columns, partitioned by `(tenant_id, toYYYYMM(timestamp))`.
-Storage: PostgreSQL for RBAC, config, and audit; ClickHouse for all observability data (executions, search, logs, metrics, stats, diagrams). ClickHouse schema migrations in `clickhouse/*.sql`, run idempotently on startup by `ClickHouseSchemaInitializer`. Use `IF NOT EXISTS` for CREATE and ADD PROJECTION.
- Log exchange correlation: `ClickHouseLogStore` extracts `exchange_id` from log entry MDC, preferring `cameleer.exchangeId` over `camel.exchangeId` (fallback for older agents). For `ON_COMPLETION` exchange copies, the agent sets `cameleer.exchangeId` to the parent's exchange ID via `CORRELATION_ID`.
- Log processor correlation: The agent sets `cameleer.processorId` in MDC, identifying which processor node emitted a log line.
- Logging: ClickHouse JDBC set to INFO (`com.clickhouse`), HTTP client to WARN (`org.apache.hc.client5`) in application.yml
- Security: JWT auth with RBAC (AGENT/VIEWER/OPERATOR/ADMIN roles), Ed25519 config signing (key derived deterministically from JWT secret via HMAC-SHA256), bootstrap token for registration. CORS: `CAMELEER_SERVER_SECURITY_CORSALLOWEDORIGINS` (comma-separated) overrides `CAMELEER_SERVER_SECURITY_UIORIGIN` for multi-origin setups. Infrastructure access: `CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false` disables Database and ClickHouse admin endpoints. Last-ADMIN guard: system prevents removal of the last ADMIN role (409 Conflict). Password policy: min 12 chars, 3-of-4 character classes, no username match. Brute-force protection: 5 failed attempts -> 15 min lockout. Token revocation: `token_revoked_before` column on users, checked in `JwtAuthenticationFilter`, set on password change.
- OIDC: Optional external identity provider support (token exchange pattern). Configured via admin API/UI, stored in database (`server_config` table). Resource server mode: accepts external access tokens (Logto M2M) via JWKS validation when `CAMELEER_SERVER_SECURITY_OIDCISSUERURI` is set. Scope-based role mapping via `SystemRole.normalizeScope()`. System roles synced on every OIDC login via `applyClaimMappings()` in `OidcAuthController` (calls `clearManagedAssignments` + `assignManagedRole` on `RbacService`) — always overwrites managed role assignments; uses managed assignment origin to avoid touching group-inherited or directly-assigned roles. Supports ES384, ES256, RS256.
- OIDC role extraction: `OidcTokenExchanger` reads roles from the **access_token** first (JWT with `at+jwt` type), then falls back to id_token. `OidcConfig` includes `audience` (RFC 8707 resource indicator) and `additionalScopes`. All provider-specific configuration is external — no provider-specific code in the server.
- Sensitive keys: Global enforced baseline for masking sensitive data in agent payloads. Merge rule: `final = global UNION per-app` (case-insensitive dedup, per-app can only add, never remove global keys).
- User persistence: PostgreSQL `users` table, admin CRUD at `/api/v1/admin/users`
-V8 — Deployment active config (resolved_config JSONB on deployments)
-V9 — Password hardening (failed_login_attempts, locked_until, token_revoked_before on users)
-V10 — Runtime type detection (detected_runtime_type, detected_main_class on app_versions)
ClickHouse: `cameleer-server-app/src/main/resources/clickhouse/init.sql` (run idempotently on startup)
## Maintaining .claude/rules/
When adding, removing, or renaming classes, controllers, endpoints, UI components, or metrics, update the corresponding `.claude/rules/` file as part of the same change. The rule files are the class/API map that future sessions rely on — stale rules cause wrong assumptions. Treat rule file updates like updating an import: part of the change, not a separate task.
## Disabled Skills
- Do NOT use any `gsd:*` skills in this project. This includes all `/gsd:` prefixed commands.
<!-- gitnexus:start -->
# GitNexus — Code Intelligence
This project is indexed by GitNexus as **cameleer-server** (6281 symbols, 15871 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely.
> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first.
## Always Do
- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user.
- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows.
- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits.
- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance.
- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`.
## When Debugging
1.`gitnexus_query({query: "<error or symptom>"})` — find execution flows related to the issue
2.`gitnexus_context({name: "<suspect function>"})` — see all callers, callees, and process participation
3.`READ gitnexus://repo/cameleer-server/process/{processName}` — trace the full execution flow step by step
4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed
## When Refactoring
- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`.
- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code.
- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed.
## Never Do
- NEVER edit a function, class, or method without first running `gitnexus_impact` on it.
- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis.
- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph.
- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope.
Before completing any code modification task, verify:
1.`gitnexus_impact` was run for all modified symbols
2. No HIGH/CRITICAL risk warnings were ignored
3.`gitnexus_detect_changes()` confirms changes match expected scope
4. All d=1 (WILL BREAK) dependents were updated
## Keeping the Index Fresh
After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it:
```bash
npx gitnexus analyze
```
If the index previously included embeddings, preserve them by adding `--embeddings`:
```bash
npx gitnexus analyze --embeddings
```
To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.**
> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`.
## CLI
| Task | Read this skill file |
|------|---------------------|
| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` |
| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` |
| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` |
- Access to the Gitea Maven registry (for `cameleer3-common` dependency)
- Access to the Gitea Maven registry (for `cameleer-common` dependency)
## Build
@@ -21,31 +21,36 @@ mvn clean verify # compile + run all tests (needs Docker for integrati
## Infrastructure Setup
Start PostgreSQL and OpenSearch:
Start PostgreSQL:
```bash
docker compose up -d
```
This starts TimescaleDB (PostgreSQL 16) and OpenSearch 2.19. The database schema is applied automatically via Flyway migrations on server startup.
This starts PostgreSQL 16. The database schema is applied automatically via Flyway migrations on server startup. ClickHouse tables are created by the schema initializer on startup.
The server starts on **port 8081**. The `CAMELEER_AUTH_TOKEN` environment variable is **required** — the server fails fast on startup if it is not set.
> **Note:** The Docker image no longer includes default database credentials. When running via `docker run`, pass `-e SPRING_DATASOURCE_URL=...` etc. The docker-compose setup provides these automatically.
For token rotation without downtime, set `CAMELEER_AUTH_TOKEN_PREVIOUS` to the old token while rolling out the new one. The server accepts both during the overlap window.
The server starts on **port 8081**. The `CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKEN` environment variable is **required** — the server fails fast on startup if it is not set.
For token rotation without downtime, set `CAMELEER_SERVER_SECURITY_BOOTSTRAPTOKENPREVIOUS` to the old token while rolling out the new one. The server accepts both during the overlap window.
## API Endpoints
@@ -84,7 +89,7 @@ curl -s -X POST http://localhost:8081/api/v1/auth/refresh \
-d '{"refreshToken":"<refreshToken>"}'
```
UI credentials are configured via `CAMELEER_UI_USER` / `CAMELEER_UI_PASSWORD` env vars (default: `admin` / `admin`).
UI credentials are configured via `CAMELEER_SERVER_SECURITY_UIUSER` / `CAMELEER_SERVER_SECURITY_UIPASSWORD` env vars (default: `admin` / `admin`).
The env-var local user gets `ADMIN` role. Agents get `AGENT` role at registration.
**UI role gating:** The sidebar hides the Admin section for non-ADMIN users. Admin routes (`/admin/*`) redirect to `/` for non-admin. The diagram node toolbar and route control bar are hidden for VIEWER. Config is a main tab (`/config` shows all apps, `/config/:appId` filters to one app with detail panel; sidebar clicks stay on config tab, route clicks resolve to parent app). VIEWER sees read-only, OPERATOR+ can edit.
### OIDC Login (Optional)
OIDC configuration is stored in PostgreSQL and managed via the admin API or UI. The SPA checks if OIDC is available:
@@ -139,7 +146,7 @@ curl -s -X PUT http://localhost:8081/api/v1/admin/oidc \
**Initial provisioning**: OIDC can also be seeded from `CAMELEER_OIDC_*` env vars on first startup (when DB is empty). After that, the admin API takes over.
**Initial provisioning**: OIDC can also be seeded from `CAMELEER_SERVER_SECURITY_OIDC*` env vars on first startup (when DB is empty). After that, the admin API takes over.
### Authentik Setup (OIDC Provider)
### Logto Setup (OIDC Provider)
Authentik is deployed alongside the Cameleer stack. After first deployment:
Logto is deployed alongside the Cameleer stack. After first deployment:
1.**Initial setup**: Open`http://192.168.50.86:30950/if/flow/initial-setup/` and create the admin account
- Redirect URIs: `http://192.168.50.86:30090/callback` (or your UI URL)
Logto is proxy-aware via `TRUST_PROXY_HEADER=1`. The `LOGTO_ENDPOINT` and `LOGTO_ADMIN_ENDPOINT` secrets define the public-facing URLs that Logto uses for OIDC discovery, issuer URI, and redirect URLs. When behind a reverse proxy (e.g., Traefik), set these to the external URLs (e.g.,`https://auth.cameleer.my.domain`). Logto needs its own subdomain — it cannot be path-prefixed under another app.
1.**Initial setup**: Open the Logto admin console (the `LOGTO_ADMIN_ENDPOINT` URL) and create the admin account
2.**Create SPA application**: Applications → Create → Single Page App
-Name: `Cameleer UI`
- Redirect URI: your UI URL + `/oidc/callback`
- Note the **Client ID**
3.**Create API Resource**: API Resources → Create
- Name: `Cameleer Server API`
- Indicator: your API URL (e.g., `https://cameleer.siegeln.net/api`)
4.**Configure roles** (optional): Create groups in Authentik and map them to Cameleer roles via the `roles-claim` config. Default claim path is `realm_access.roles`. For Authentik, you may need to customize the OIDC scope to include group claims.
5.**Configure Cameleer**: Use the admin API (`PUT /api/v1/admin/oidc`) or set env vars for initial seeding:
5.**Configure Cameleer OIDC login**: Use the admin API (`PUT /api/v1/admin/oidc`) or the admin UI. OIDC login configuration is stored in the database — no env vars needed for the SPA OIDC flow.
CAMELEER_SERVER_SECURITY_OIDCTLSSKIPVERIFY=true # optional — skip cert verification for self-signed CAs
```
`OIDCJWKSETURI` is needed when the public issuer URL isn't reachable from inside containers — it fetches JWKS directly from the internal Logto service. `OIDCTLSSKIPVERIFY` disables certificate verification for all OIDC HTTP calls (discovery, token exchange, JWKS); use only when the provider has a self-signed CA.
### SSO Behavior
When OIDC is configured and enabled, the UI automatically redirects to the OIDC provider for silent SSO (`prompt=none`). Users with an active provider session are signed in without seeing a login form. On first login, the provider may show a consent screen (scopes), after which subsequent logins are seamless. If auto-signup is enabled, new users are automatically provisioned with the configured default roles.
- **Bypass SSO**: Navigate to `/login?local` to see the local login form
- **Subpath deployments**: The OIDC redirect_uri respects `BASE_PATH` (e.g., `https://host/server/oidc/callback`)
- **Role sync**: System roles (ADMIN/OPERATOR/VIEWER) are synced from OIDC scopes on every login — revoking a scope in the provider takes effect on next login. Manually assigned group memberships are preserved.
### User Management (ADMIN only)
@@ -220,6 +241,19 @@ curl -s -X POST http://localhost:8081/api/v1/data/metrics \
curl -s -X POST http://localhost:8081/api/v1/agents/commands \
-H "Content-Type: application/json" \
@@ -324,43 +364,98 @@ curl -s -X POST http://localhost:8081/api/v1/agents/agent-1/commands/{commandId}
**Agent lifecycle:** LIVE (heartbeat within 90s) → STALE (missed 3 heartbeats) → DEAD (5min after STALE). DEAD agents kept indefinitely.
**SSE events:** `config-update`, `deep-trace`, `replay` commands pushed in real time. Server sends ping keepalive every 15s.
**Server restart resilience:** The agent registry is in-memory and lost on server restart. Agents auto-re-register on their next heartbeat or SSE connection — the server reconstructs registry entries from JWT claims (subject, application). Route catalog uses ClickHouse execution data as fallback until agents re-register with full route IDs. Agents should also handle 404 on heartbeat by triggering a full re-registration.
**SSE events:** `config-update`, `deep-trace`, `replay`, `route-control` commands pushed in real time. Server sends ping keepalive every 15s.
**Command expiry:** Unacknowledged commands expire after 60 seconds.
**Route control responses:** Route control commands return `CommandGroupResponse` with per-agent status, response count, and timed-out agent IDs.
### Backpressure
When the write buffer is full (default capacity: 50,000), ingestion endpoints return **503 Service Unavailable**. Already-buffered data is not lost.
## Configuration
Key settings in `cameleer3-server-app/src/main/resources/application.yml`:
Key settings in `cameleer-server-app/src/main/resources/application.yml`. All custom properties live under `cameleer.server.*`. Env vars are a mechanical 1:1 mapping (dots to underscores, uppercase).
| Setting | Default | Description |
|---------|---------|-------------|
| `server.port` | 8081 | Server port |
| `ingestion.buffer-capacity` | 50000 | Max items in write buffer |
| `ingestion.batch-size` | 5000 | Items per batch insert |
**Note:** OIDC *login* configuration (issuer, client ID, client secret, roles claim, default roles) is stored in the database and managed via the admin API (`PUT /api/v1/admin/oidc`) or admin UI. The env vars above are for resource server mode (M2M token validation) only.
**Ingestion** (`cameleer.server.ingestion.*`):
| Setting | Default | Env var | Description |
|---------|---------|---------|-------------|
| `cameleer.server.ingestion.buffercapacity` | `50000` | `CAMELEER_SERVER_INGESTION_BUFFERCAPACITY` | Max items in write buffer |
| `cameleer.server.agentregistry.stalethresholdms` | `90000` | `CAMELEER_SERVER_AGENTREGISTRY_STALETHRESHOLDMS` | Time before agent marked STALE (ms) |
| `cameleer.server.agentregistry.deadthresholdms` | `300000` | `CAMELEER_SERVER_AGENTREGISTRY_DEADTHRESHOLDMS` | Time after STALE before DEAD (ms) |
| `cameleer.server.runtime.container.cpushares` | `512` | `CAMELEER_SERVER_RUNTIME_CONTAINER_CPUSHARES` | Default CPU shares for app containers |
**Other** (`cameleer.server.*`):
| Setting | Default | Env var | Description |
|---------|---------|---------|-------------|
| `cameleer.server.catalog.discoveryttldays` | `7` | `CAMELEER_SERVER_CATALOG_DISCOVERYTTLDAYS` | Days before stale discovered apps auto-hide from sidebar |
Push to `main` triggers: **build** (UI npm + Maven, unit tests) → **docker** (buildx amd64 for server + UI, push to Gitea registry) → **deploy** (kubectl apply + rolling update).
**Methodology:** Playwright-driven navigation of all major pages (14 screenshots), evaluated by 3 specialist agents: Visual Design, Information Architecture & Usability, Readability & Accessibility.
---
## Executive Summary
The Cameleer dashboard has a **distinctive, well-crafted warm amber design language** that stands out in the observability space. The core monitoring pages (Dashboard, Exchange Detail, Routes, Agents) are polished and consistent. The design system provides a solid foundation.
1.**Font sizes too small** — pervasive 10-11px text for critical data impairs reading under stress
2.**Color contrast failures** — `--text-muted` and `--text-faint` fail WCAG AA in both themes
3.**Status indicators rely on color alone** — not accessible for color-blind users
4.**Admin infrastructure pages lag in polish** — Database/OpenSearch use ad-hoc styling
5.**Dashboard is a monitoring display, not yet an incident response tool** — missing error highlighting, per-route error breakdowns, actionable status pages
**Overall Score: 7/10** — Strong foundation, needs targeted fixes for production readiness under stress.
- **[Critical]** Processor timeline label column too narrow — processor names are truncated/illegible. This is the page's primary visualization.
- **[Critical]** No error highlighting in processor timeline — failed processors need red bars/icons. During incidents, engineers must instantly see WHICH processor failed.
- **[Important]** No linkage to route diagram — "View in Route Diagram" would overlay execution on the visual route graph.
- **[Important]** Long exchange ID in breadcrumb is visually heavy — truncate with copy button.
- **[Important]** Header stat labels at 10px uppercase with `--text-muted` — same contrast issue.
### Routes Metrics
- **[Important]** KPI number formatting inconsistent — Dashboard shows "11.742 ms" (decimal + space), Routes shows "11742ms" (no decimal, no space).
- **[Important]** No per-route error rate column — error rate in KPI strip but not broken down per route.
- **[Important]** Charts disconnected from table — clicking a route should filter/highlight its chart data.
- **[Nice-to-have]** No visual comparison between routes (bar chart or heatmap for quick identification of slowest).
### Agent Health
- **[Critical]** Stale/Dead agent visual distinction is too subtle — at 3am, the difference between LIVE and DEAD must scream. Dead agents should have prominent red background or strikethrough, not just `--text-muted`.
- **[Critical]** Agent state dots (green live, amber stale, gray dead) use color alone — no shape variation for color-blind users.
- **[Important]** "2/26" active routes KPI is ambiguous — unit and meaning need to be explicit.
- **[Nice-to-have]** Timeline at bottom takes significant space — consider making it collapsible.
### Agent Instance Detail
- **[Important]** Charts lack threshold/alert lines — CPU at 2% is fine, but where is "concerning"? Configurable thresholds (CPU > 80%, Memory > 90%) would make charts actionable.
- **[Important]** Chart axis labels appear too small.
- **[Nice-to-have]** GC Pauses uses area fill while others use line charts — minor inconsistency.
- **[Nice-to-have]** Six charts in 2x3 grid can create cognitive overload — consider collapsible groups.
### Admin — RBAC
- **[Important]** KPI strip for "Users: 1, Groups: 2, Roles: 4" has too much visual weight — these low-value numbers don't need full stat-card treatment.
- **[Important]** "ADMIN" role badge vs "ADMINS" group badge look identical — different badge styles needed (outlined for groups, filled for roles).
- **[Nice-to-have]** Empty detail panel ("Select a user to view details") needs icon/illustration.
### Admin — Audit Log
- **[Important]** "no data" empty state is uninformative — should explain "No audit events match your filters" with guidance.
- **[Important]** No export functionality — audit logs need CSV/JSON export for compliance.
- **[Important]** Date range filters use raw datetime inputs — inconsistent with dashboard's polished time range pills.
### Admin — OIDC Config
- **[Critical]** "Delete OIDC Configuration" is a destructive action without confirmation dialog — could lock out all SSO users.
- **[Important]** No inline validation — Issuer URL should validate format on blur, required fields need indicators.
- **[Nice-to-have]** No connection test result display area.
### Admin — Database
- **[Important]** Visual treatment inconsistent with rest of app — "Connected" status and pool stats use ad-hoc text, not design system components.
- **[Important]** Page title "Database Administration" implies actions, but page is read-only — rename to "Database Status" or add operations.
- **[Nice-to-have]** Table row counts should be right-aligned for numerical scanning.
### Admin — OpenSearch
- **[Critical]** "Disconnected" status displayed as plain text — needs error styling (red text, error badge, or status banner). Infrastructure disconnection is a critical state.
- **[Important]** "Yellow" cluster health displayed as plain text with no visual hierarchy — same size/weight as version number and node count.
- **[Important]** Indexing pipeline stats use ad-hoc inline format — should use consistent stat-card pattern.
- **[Important]** "Disconnected" + "Yellow" health shown simultaneously is contradictory — if disconnected, clarify whether data is stale.
### Command Palette
- **[Nice-to-have]** No visible keyboard navigation hint for currently selected item.
- **[Nice-to-have]** Empty palette should show recent/frequent items instead of requiring typing.
- Overall well-executed — categories, counts, keyboard hints in footer.
### Dark Mode
- **[Critical]** `--text-muted` (#7A7068) on `--bg-surface` (#242019) is ~2.9:1 — fails WCAG AA. Affects ALL muted labels across every page.
- **[Critical]** `--text-faint` (#4A4238) on `--bg-surface` (#242019) is ~1.4:1 — catastrophically fails WCAG AA. Essentially invisible.
- **[Important]** `--amber` (#D4941E) on `--bg-surface` (#242019) is ~3.6:1 — amber links/active text fail AA.
- **[Important]** KPI sparkline chart lines are harder to read — thin strokes need increased width or brightness.
- **[Important]** Sidebar boundary contrast drops significantly (`--sidebar-bg`#141210 vs `--bg-body`#1A1714 is only ~6 units apart).
- **[Important]** Table row alternation contrast near zero in dark mode.
- **[Nice-to-have]** Amber accent color shift from #C6820E to #D4941E is well-handled.
**Fix:** Change `--text-muted` to **#766A5E** (light) / **#9A9088** (dark). Restrict `--text-faint` to decorative use only or lighten dark variant to #6A6058.
### 2. Font Size Floor
10px text is used for: StatCard labels, overview labels, chain labels, section meta, error class names, detail labels, sidebar tree labels. 11px is used for: table meta, error messages, pagination, toggle buttons, chart titles.
**Fix:** Establish `--font-size-min: 12px` as a design system floor. Update all 10px instances to 12px, all 11px instances to 12px.
### 3. Number/Unit Formatting
Inconsistent across pages:
- Dashboard: "11.742 ms" (decimal + space)
- Routes: "11742ms" (no decimal, no space)
- Dashboard: "1.1 msg/s" vs Agent Instance: "0.1/s"
**Fix:** Create a shared formatting utility enforcing: consistent decimal precision, space before unit, consistent abbreviations.
### 4. KPI Strip Inconsistency
Used on Dashboard, Routes, Agents, Agent Instance (consistent). But RBAC uses oversized cards for trivial counts, and Database/OpenSearch use ad-hoc text rendering.
**Fix:** Admin infra pages should adopt KPI stat strip or a compact-stat component.
### 5. Empty States
Inconsistent handling:
- Audit Log: "no data" in plain gray
- RBAC detail: "Select a user to view details" in gray
- No consistent empty state component with icon + message + CTA
**Fix:** Design system EmptyState component with icon, message, and optional action.
### 6. Status Indicator Accessibility
Color-only status encoding throughout:
- Duration: green (fast), amber (slow), red (breach) — no icons
- Status dots: green (live), amber (stale), gray (dead) — no shapes
- Agent dead state uses `--text-muted` instead of `--error`
**Fix:** Add shape variation (checkmark/triangle/X), increase dot size to 10px minimum, always render text label alongside.
### 7. Sidebar Structure
Same apps listed 3x (under Applications, Agents, Routes) — triples sidebar length and scales poorly.
**Fix:** Unified application-centric tree where expanding an app shows its agents and routes as children.
---
## Prioritized Recommendations
### Critical (fix now)
| # | Recommendation | Impact |
|---|---------------|--------|
| 1 | **Bump `--text-muted` to WCAG AA compliance** — #766A5E (light) / #9A9088 (dark). Single highest-impact fix across all pages. | Fixes majority of contrast failures |
| 2 | **Establish 12px minimum font size** — update all 10px and 11px instances. Especially StatCard labels, overview labels, table meta. | Readable under stress |
| 3 | **Add error highlighting to processor timeline** — red bars, error icons for failed processors. Core debugging view. | Incident response speed |
| 4 | **Make Stale/Dead agent states unmistakable** — full card background color (yellow stale, red dead), prominent badge. Change dead from `--text-muted` to `--error`. | Prevents missed outages |
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.