Commit Graph

426 Commits

Author SHA1 Message Date
hsiegeln
f110169d54 feat: add POST /test endpoint for claim mapping rule evaluation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:42:54 +02:00
hsiegeln
0827fd21e3 feat: persist and display exchange properties from agent
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m59s
CI / docker (push) Successful in 2m13s
CI / deploy (push) Successful in 58s
CI / deploy-feature (push) Has been skipped
Add support for exchange properties sent by the agent alongside headers.
Properties flow through the same pipeline as headers: ClickHouse columns
(input_properties, output_properties) on both executions and
processor_executions tables, MergedExecution record, ChunkAccumulator
extraction, DetailService snapshot, and REST API response.

UI adds a Properties tab next to Headers in the process diagram detail
panel, with the same input/output split table layout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:23:53 +02:00
hsiegeln
882198d59a fix: use lagInFrame instead of lag for ClickHouse compatibility
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
ClickHouse does not have lag() as a window function. Use lagInFrame()
with explicit ROWS BETWEEN frame instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:06:25 +02:00
hsiegeln
5edb833d21 chore: remove stats table migration logic from ClickHouseSchemaInitializer
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m44s
Not needed yet -- all deployments are under our control and can be
reset manually if the old schema is encountered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:51:34 +02:00
hsiegeln
3f2392b8f7 refactor: consolidate ClickHouse init.sql as clean idempotent schema
Some checks failed
CI / build (push) Has been cancelled
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (push) Has been cancelled
Rewrite init.sql as a pure CREATE IF NOT EXISTS file with no DROP or
INSERT statements. Safe for repeated runs on every startup without
corrupting aggregated stats data.

Old deployments with count()-based stats tables are migrated
automatically: ClickHouseSchemaInitializer checks system.columns for
the old AggregateFunction(count) type and drops those tables before
init.sql recreates them with the correct uniq() schema. This runs
once per table and is a no-op on fresh installs or already-migrated
deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:49:53 +02:00
hsiegeln
e57343e3df feat: add delta mode for counter metrics using ClickHouse lag()
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Counter metrics like chunks.exported.count are monotonically increasing.
Add mode=delta query parameter to the agent metrics API that computes
per-bucket deltas server-side using ClickHouse lag() window function:
max(value) per bucket, then greatest(0, current - previous) to get the
increase per period with counter-reset handling.

The chunks exported/dropped charts now show throughput per bucket
instead of the ever-increasing cumulative total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:56:06 +02:00
hsiegeln
ae908fb382 fix: deduplicate all stats MVs and preserve loop iterations
All checks were successful
CI / build (push) Successful in 2m25s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m20s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m3s
SonarQube / sonarqube (push) Successful in 3m49s
Extend uniq-based dedup from processor tables to all stats tables
(stats_1m_all, stats_1m_app, stats_1m_route). Execution-level tables
use uniq(execution_id). Processor-level tables now use
uniq(concat(execution_id, toString(seq))) so loop iterations (same
exchange, different seq) are counted while chunk retry duplicates
(same exchange+seq) are collapsed.

All stats tables are dropped, recreated, and backfilled from raw
data on startup. All Java queries updated: countMerge -> uniqMerge,
countIfMerge -> uniqIfMerge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:48:01 +02:00
hsiegeln
1872d46466 fix: remove semicolons from SQL comments that broke schema initializer
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m30s
CI / docker (push) Successful in 1m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 50s
The ClickHouseSchemaInitializer splits on semicolons before filtering
comments, so semicolons inside comment text created invalid statements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:17:35 +02:00
hsiegeln
e2f784bf82 fix: deduplicate processor stats using uniq(execution_id)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Processor execution counts were inflated by duplicate inserts into the
plain MergeTree processor_executions table (chunk retries, reconnects).
Replace count()/countIf() with uniq(execution_id)/uniqIf() in both
stats_1m_processor and stats_1m_processor_detail MVs so each exchange
is counted once per processor regardless of duplicates.

Tables are dropped and rebuilt from raw data on startup. MV created
after backfill to avoid double-counting.

Also adds stats_1m_processor_detail to the catalog purge list (was
missing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:12:00 +02:00
hsiegeln
66248f6b1c fix: accept logs from unregistered agents using JWT claims
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
After server restart, agents send logs before re-registering. Instead
of dropping these logs, fall back to application and environment from
the JWT token claims. Only drops logs when neither registry nor JWT
provide an applicationId.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:29:05 +02:00
hsiegeln
6bf7175a6c feat: add Micrometer Prometheus metrics to server
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m36s
CI / deploy (push) Has been cancelled
CI / docker (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
Adds micrometer-registry-prometheus and exposes /api/v1/prometheus
endpoint (unauthenticated for scraping). ServerMetrics component
provides business metrics beyond default JVM/HTTP:

Gauges: agents by state, SSE connections, buffer depths (execution,
processor, log, metrics), accumulator pending exchanges.

Counters: ingestion drops (buffer_full, no_agent, no_identity),
agent transitions (went_stale, went_dead, recovered), deployment
outcomes (running, failed, degraded), auth failures (invalid_token,
revoked, oidc_rejected).

Timers: ClickHouse flush duration by type, deployment duration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:23:27 +02:00
hsiegeln
caaa1ab0cc feat: add Prometheus docker_sd_configs labels to agent containers
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Labels prometheus.scrape, prometheus.path, and prometheus.port are now
set on every deployed container based on the resolved runtime type,
enabling automatic Prometheus service discovery via docker_sd_configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:00:32 +02:00
hsiegeln
dadab2b5f7 fix: align payloadCaptureMode default with agent (BOTH, not NONE)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
Server defaultConfig() and UI fallbacks returned "NONE" for payload
capture, but the agent defaults to "BOTH". This caused unwanted
reconfiguration when users saved other settings — payload capture
would silently change from the agent's default BOTH to NONE.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:12:21 +02:00
hsiegeln
be96336974 feat: add extra Docker networks to container config
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Apps can now join additional Docker networks (e.g., monitoring,
prometheus) configured via containerConfig.extraNetworks. Flows through
the 3-layer config merge. Networks are created if absent and containers
are connected during deployment. UI adds a pill-list field on the
Resources tab (both create and edit views).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:53:01 +02:00
hsiegeln
90c82238a0 feat: add orphaned app cleanup — auto-filter stale discovered apps, manual dismiss with data purge
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:19:59 +02:00
hsiegeln
d161ad38a8 fix: log deserialization failures on log ingestion endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m4s
CI / deploy (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
Spring's default handler silently returns 400 for malformed payloads
with no server-side log. Added @ExceptionHandler to catch and WARN with
the agent instance ID and root cause message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:33:57 +02:00
hsiegeln
2d3817b296 fix: downgrade successful log ingestion message to DEBUG
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:29:38 +02:00
hsiegeln
e55ee93dcf fix: add proper logging to log ingestion endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m47s
CI / docker (push) Successful in 1m38s
CI / deploy (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
Previously the endpoint silently returned 202 for all failures: missing
agent identity, unregistered agents, empty payloads, and buffer-full
drops. Now logs WARN for each failure case with context (instanceId,
entry count, reason). Normal ingestion logged at INFO with accepted
count. Buffer-full drops tracked individually with accepted/dropped
counts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:20:07 +02:00
hsiegeln
d5b611cc32 feat: validate runtimeType and customArgs on container config save
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:08:52 +02:00
hsiegeln
e941256e6e feat: build Docker entrypoint per runtime type with custom args support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:06:54 +02:00
hsiegeln
f4bbc1f65f feat: add detected_runtime_type and detected_main_class to app_versions
Flyway V10 migration adds the two nullable columns. AppVersion record,
AppVersionRepository interface, and PostgresAppVersionRepository are
updated to carry and persist detected runtime information.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:01:24 +02:00
hsiegeln
0603e62a69 fix: revert LogEntry to 7-arg constructor (source is not a ctor param)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
LogEntry.getSource() exists but source is not a constructor parameter
in cameleer3-common — it uses a default value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:40:48 +02:00
hsiegeln
00115a16ac fix: add source parameter to LogSearchRequest/LogEntry calls in ClickHouseLogStoreIT
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 58s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
All constructor calls updated to include the new source field added
in the log forwarding v2 changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:37:56 +02:00
hsiegeln
b03dfee4f3 feat: log forwarding v2 — accept List<LogEntry>, add source field
Replace LogBatch wrapper with raw List<LogEntry> on the ingestion endpoint.
Add source column to ClickHouse logs table and propagate it through the
storage, search, and HTTP layers (LogSearchRequest, LogEntryResult,
LogEntryResponse, LogQueryController).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 10:25:46 +02:00
hsiegeln
9de51014e7 feat: expose infrastructureEndpoints flag in health endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 23:10:15 +02:00
hsiegeln
293d11e52b feat: add infrastructureendpoints flag with conditional DB/CH controllers
Add cameleer.server.security.infrastructureendpoints property (default true) and
@ConditionalOnProperty to DatabaseAdminController and ClickHouseAdminController so
the SaaS provisioner can set CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false
to suppress these endpoints (404) on tenant server containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 23:09:28 +02:00
hsiegeln
350e769948 Group container settings under cameleer.server.runtime.container.*
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Move container resource defaults into their own sub-namespace for
future extensibility:

  cameleer.server.runtime.container.memorylimit → CAMELEER_SERVER_RUNTIME_CONTAINER_MEMORYLIMIT
  cameleer.server.runtime.container.cpushares   → CAMELEER_SERVER_RUNTIME_CONTAINER_CPUSHARES

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:33:07 +02:00
hsiegeln
534e936cd4 Group OIDC settings under cameleer.server.security.oidc.*
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m59s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Move OIDC properties into a nested Oidc class within SecurityProperties
for clearer grouping. Env vars gain an extra separator:

  cameleer.server.security.oidc.issueruri     → CAMELEER_SERVER_SECURITY_OIDC_ISSUERURI
  cameleer.server.security.oidc.jwkseturi     → CAMELEER_SERVER_SECURITY_OIDC_JWKSETURI
  cameleer.server.security.oidc.audience      → CAMELEER_SERVER_SECURITY_OIDC_AUDIENCE
  cameleer.server.security.oidc.tlsskipverify → CAMELEER_SERVER_SECURITY_OIDC_TLSSKIPVERIFY

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:30:33 +02:00
hsiegeln
60fb5fe21a Remove vestigial clickhouse.enabled flag
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
ClickHouse is the only storage backend — there is no alternative.
The enabled flag created a false sense of optionality: setting it to
false would crash on startup because most beans unconditionally depend
on the ClickHouse JdbcTemplate.

Remove all @ConditionalOnProperty annotations gating ClickHouse beans,
the enabled property from application.yml, and the K8s manifest entry.
Also fix old property names in AbstractPostgresIT test config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:27:10 +02:00
hsiegeln
8fe48bbf02 Migrate config to cameleer.server.* naming convention
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m52s
CI / docker (push) Successful in 1m30s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.

Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
  ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
  (e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:10:51 +02:00
hsiegeln
3501f32110 feat: make route control and replay configurable per environment/app
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Added routeControlEnabled and replayEnabled to ResolvedContainerConfig,
flowing through the three-layer config merge (global -> env -> app).
Both default to true. Admins can disable them per environment (e.g.
prod) via the defaultContainerConfig JSONB, or per app via the app's
containerConfig JSONB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:56:13 +02:00
hsiegeln
4da81b21ba fix: enable route control and replay capabilities for deployed apps
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
buildEnvVars was missing CAMELEER_ROUTE_CONTROL_ENABLED and
CAMELEER_REPLAY_ENABLED, so deployed app containers defaulted to false
and agents didn't announce these capabilities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:53:49 +02:00
hsiegeln
e9486bd05a feat: allow M2M password resets when OIDC is enabled
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m50s
CI / docker (push) Successful in 1m34s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
The password reset endpoint was fully blocked under OIDC mode. Now
M2M callers (identified by oidc: principal prefix) can reset local
user passwords, enabling the SaaS platform to manage the server's
built-in admin credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 09:46:26 +02:00
hsiegeln
cfc42eaf46 feat: add cameleer.tenant label to deployed app containers
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m32s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Allows the SaaS platform to identify and clean up all containers
belonging to a tenant on delete (cameleer/cameleer-saas#55).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 09:10:59 +02:00
hsiegeln
0d610be3dc fix: use OIDC token roles when no claim mapping rules exist
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
The OIDC callback extracted roles from the token's Custom JWT claim
(e.g. roles: [server:admin]) but never used them. The
applyClaimMappings fallback only assigned defaultRoles (VIEWER).

Now the fallback priority is: claim mapping rules > OIDC token
roles > defaultRoles. This ensures users get their org-mapped
roles (owner → server:admin) without requiring manual claim
mapping rule configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 12:17:12 +02:00
hsiegeln
2ac52d3918 feat: tenant-scoped environment network names
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Environment networks now include the tenant ID to prevent cross-tenant
collisions: cameleer-env-{tenantId}-{envSlug} instead of cameleer-env-
{envSlug}. Without this, two tenants with a "dev" environment would
share the same Docker network.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:13:47 +02:00
hsiegeln
50e3f1ade6 feat: use configured DOCKER_NETWORK as primary for deployed apps
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Instead of hardcoding cameleer-traefik as the primary network for
deployed app containers, use CAMELEER_DOCKER_NETWORK (env var). In
SaaS mode this is the tenant-isolated network (cameleer-tenant-{slug}).
Apps still connect to cameleer-traefik (for routing) and cameleer-env-
{slug} (for intra-environment discovery) as additional networks.

This enables per-tenant network isolation: apps deployed by tenant A
cannot reach apps deployed by tenant B since they share no network.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:08:48 +02:00
hsiegeln
be585934b9 fix: show descriptive error when creating local user with OIDC enabled
Return a JSON error body from UserAdminController instead of an empty 400,
and extract API error messages in adminFetch so toasts display the reason.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:19:10 +02:00
hsiegeln
1971c70638 fix: commands respect selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Backend: AgentRegistryService gains findByApplicationAndEnvironment()
and environment-aware addGroupCommandWithReplies() overload.
AgentCommandController and ApplicationConfigController accept optional
environment query parameter. When set, commands only target agents in
that environment. Backward compatible — null means all environments.

Frontend: All command mutations (config update, route control, traced
processors, tap config, route recording) now pass selectedEnv to the
backend via query parameter.

Prevents cross-environment command leakage — e.g., updating config for
prod no longer pushes to dev agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:28:09 +02:00
hsiegeln
69dcce2a8f fix: Runtime tab respects selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Add environment parameter to AgentEventsController, AgentEventService,
  and ClickHouseAgentEventRepository (filters agent_events by environment)
- Wire selectedEnv to useAgents and useAgentEvents in both AgentHealth
  and AgentInstance pages
- Wire selectedEnv to useStatsTimeseries in AgentInstance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:12:33 +02:00
hsiegeln
cb36d7936f fix: auto-compute environment slug + respect environment filter globally
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Part A: Environment creation slug is now auto-derived from display name
and shown read-only (matching app creation pattern). Removes manual slug
input.

Part B: All data queries now pass the selected environment to backend:
- Exchanges search, Dashboard L1/L2/L3 stats, Routes metrics, Route
  detail, correlation chains, and processor metrics all filter by
  selected environment.
- Backend RouteMetricsController now accepts environment parameter for
  both route and processor metrics endpoints.

Closes #XYZ

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:01:50 +02:00
hsiegeln
f95a78a380 fix: add periodic deployment status reconciliation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The DockerEventMonitor only reacted to Docker events. If an event was
missed (e.g., during reconnect or startup race), a DEGRADED deployment
with all replicas healthy would never promote back to RUNNING.

Add a @Scheduled reconciliation (every 30s) that inspects actual
container state and corrects deployment status mismatches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:40:18 +02:00
hsiegeln
827ba3c798 feat: last-ADMIN guard and password hardening (#87, #89)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m57s
CI / docker (push) Successful in 1m48s
CI / deploy (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
- Prevent removal of last ADMIN role via role unassign, user delete,
  or group role removal (returns 409 Conflict)
- Add password policy: min 12 chars, 3/4 character classes, no username
- Add brute-force protection: 5 attempts then 15min lockout, IP rate limit
- Add token revocation on password change via token_revoked_before column
- V9 migration adds failed_login_attempts, locked_until, token_revoked_before

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:58:03 +02:00
hsiegeln
2df5e0d7ba feat: active config snapshot, composite StatusDot with tooltip
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 43s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Part 1 — Config snapshot:
- V8 migration adds resolved_config JSONB to deployments table
- DeploymentExecutor saves the full resolved config at deploy time
- Deployment record includes resolvedConfig for auditability

Part 2 — Composite health StatusDot:
- CatalogController computes composite health from deployment status +
  agent health (green only when RUNNING AND agent live)
- CatalogApp includes healthTooltip (e.g. "Deployment: RUNNING,
  Agents: live (1 connected)")
- StatusDot added to app detail header with deployment status Badge
- StatusDot added to deployment table rows
- Sidebar passes composite health + tooltip through to tree nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:00:54 +02:00
hsiegeln
e88db56f79 refactor: CPU config to millicores, fix replica health, reorder tabs
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m18s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
- Rename cpuShares to cpuRequest (millicores), cpuLimit from cores to
  millicores. ResolvedContainerConfig translates to Docker-native units
  via dockerCpuShares() and dockerCpuQuota() helpers. Future K8s
  orchestrator can pass millicores through directly.
- Fix waitForAnyHealthy to wait for ALL replicas instead of returning
  on first healthy one. Prevents false DEGRADED status with 2+ replicas.
- Default app detail to Configuration tab (was Overview)
- Reorder config sub-tabs: Monitoring, Resources, Variables, Traces &
  Taps, Route Recording

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 07:38:23 +02:00
hsiegeln
b86e95f08e feat: unified catalog endpoint and slug-based app navigation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m47s
Consolidate route catalog (agent-driven) and apps table (deployment-
driven) into a single GET /api/v1/catalog?environment={slug} endpoint.
Apps table is authoritative; agent data enriches with live health,
routes, and metrics. Unmanaged apps (agents without App record) appear
with managed=false.

- Add CatalogController merging App records + agent registry + ClickHouse
- Add CatalogApp DTO with deployment summary, managed flag, health
- Change AppController and DeploymentController to accept slugs (not UUIDs)
- Add AppRepository.findBySlug() and AppService.getBySlug()
- Replace useRouteCatalog() with useCatalog() across all UI components
- Navigate to /apps/{slug} instead of /apps/{UUID}
- Update sidebar, search, and all catalog lookups to use slug

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:43:14 +02:00
hsiegeln
a4a569a253 fix: improve deployment progress UI and prevent duplicate deployment rows
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m55s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m6s
- Redesign DeploymentProgress component: track-based layout with amber
  brand color, checkmarks for completed steps, user-friendly labels
  (Prepare, Image, Network, Launch, Verify, Activate, Live)
- Delete terminal (STOPPED/FAILED) deployments before creating new ones
  for the same app+environment, preventing duplicate rows in the UI
- Update CLAUDE.md with comprehensive key class locations, correct deploy
  stages, database migration reference, and REST endpoint summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:10:59 +02:00
hsiegeln
64ebf19ad3 refactor: use CAMELEER_SERVER_URL for agent export endpoint
Some checks failed
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (push) Has been cancelled
CI / build (push) Has been cancelled
The runtime-base image and all agent Dockerfiles now read
CAMELEER_SERVER_URL instead of CAMELEER_EXPORT_ENDPOINT.
Updated the volume-mode entrypoint override to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:07:07 +02:00
hsiegeln
20f3dfe59d feat: support Docker volume-based JAR mounting for Docker-in-Docker
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m8s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
When CAMELEER_JAR_DOCKER_VOLUME is set, the orchestrator mounts the
named volume at the jar storage path instead of using a host bind mount.
This solves the path translation issue in Docker-in-Docker setups where
the server runs inside a container and manages sibling containers.

The entrypoint is overridden to use the volume-mounted JAR path via
the CAMELEER_APP_JAR env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:38:34 +02:00
hsiegeln
c923d8233b fix: move network attachment from orchestrator to executor
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Docker's connectToNetworkCmd needs the network ID (not name) and the
container's network sandbox must be ready. Moving network connection
to DeploymentExecutor where DockerNetworkManager handles ID resolution
and the container is already started.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:29:13 +02:00