Commit Graph

432 Commits

Author SHA1 Message Date
hsiegeln
77aa3c3d6f test: add SensitiveKeysAdminController integration tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:21:46 +02:00
hsiegeln
2fad8811c6 feat: merge global sensitive keys into app config GET and SSE push
- GET /config/{app} now returns AppConfigResponse with globalSensitiveKeys and mergedSensitiveKeys alongside the config
- PUT /config/{app} merges global + per-app sensitive keys before pushing CONFIG_UPDATE to agents via SSE
- extractSensitiveKeys() uses JsonNode reflection to avoid compile-time dependency on cameleer3-common getSensitiveKeys()
- SensitiveKeysRepository injected as new constructor parameter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 18:19:59 +02:00
hsiegeln
28e38e4dee fix: add audit logging to GET /admin/sensitive-keys
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:17:42 +02:00
hsiegeln
c3892151a5 feat: add SensitiveKeysAdminController with fan-out support
GET/PUT /api/v1/admin/sensitive-keys (ADMIN only). PUT accepts optional
pushToAgents param — when true, fans out merged global+per-app sensitive
keys to all live agents via CONFIG_UPDATE SSE commands with 10-second
shared deadline. Per-app keys extracted via JsonNode to avoid depending
on ApplicationConfig.getSensitiveKeys() not yet in the published
cameleer3-common jar. Includes audit logging on every PUT.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 18:16:27 +02:00
hsiegeln
84641fe81a feat: add PostgresSensitiveKeysRepository 2026-04-14 18:08:45 +02:00
hsiegeln
dcd0b4ebcd fix: use managed assignments for OIDC fallback role paths
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The roles-claim and default-roles fallback paths in applyClaimMappings
were using assignRoleToUser (origin='direct'), causing OIDC-derived
roles to accumulate across logins and never be cleared. Changed both
to assignManagedRole (origin='managed') so all OIDC-assigned roles
are cleared and re-evaluated on every login, same as claim mapping
rules. Only roles assigned directly via the admin UI are preserved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:19:20 +02:00
hsiegeln
f110169d54 feat: add POST /test endpoint for claim mapping rule evaluation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:42:54 +02:00
hsiegeln
0827fd21e3 feat: persist and display exchange properties from agent
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m59s
CI / docker (push) Successful in 2m13s
CI / deploy (push) Successful in 58s
CI / deploy-feature (push) Has been skipped
Add support for exchange properties sent by the agent alongside headers.
Properties flow through the same pipeline as headers: ClickHouse columns
(input_properties, output_properties) on both executions and
processor_executions tables, MergedExecution record, ChunkAccumulator
extraction, DetailService snapshot, and REST API response.

UI adds a Properties tab next to Headers in the process diagram detail
panel, with the same input/output split table layout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:23:53 +02:00
hsiegeln
882198d59a fix: use lagInFrame instead of lag for ClickHouse compatibility
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
ClickHouse does not have lag() as a window function. Use lagInFrame()
with explicit ROWS BETWEEN frame instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:06:25 +02:00
hsiegeln
5edb833d21 chore: remove stats table migration logic from ClickHouseSchemaInitializer
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m44s
Not needed yet -- all deployments are under our control and can be
reset manually if the old schema is encountered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:51:34 +02:00
hsiegeln
3f2392b8f7 refactor: consolidate ClickHouse init.sql as clean idempotent schema
Some checks failed
CI / build (push) Has been cancelled
CI / docker (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / cleanup-branch (push) Has been cancelled
Rewrite init.sql as a pure CREATE IF NOT EXISTS file with no DROP or
INSERT statements. Safe for repeated runs on every startup without
corrupting aggregated stats data.

Old deployments with count()-based stats tables are migrated
automatically: ClickHouseSchemaInitializer checks system.columns for
the old AggregateFunction(count) type and drops those tables before
init.sql recreates them with the correct uniq() schema. This runs
once per table and is a no-op on fresh installs or already-migrated
deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:49:53 +02:00
hsiegeln
e57343e3df feat: add delta mode for counter metrics using ClickHouse lag()
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
Counter metrics like chunks.exported.count are monotonically increasing.
Add mode=delta query parameter to the agent metrics API that computes
per-bucket deltas server-side using ClickHouse lag() window function:
max(value) per bucket, then greatest(0, current - previous) to get the
increase per period with counter-reset handling.

The chunks exported/dropped charts now show throughput per bucket
instead of the ever-increasing cumulative total.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:56:06 +02:00
hsiegeln
ae908fb382 fix: deduplicate all stats MVs and preserve loop iterations
All checks were successful
CI / build (push) Successful in 2m25s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 1m20s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m3s
SonarQube / sonarqube (push) Successful in 3m49s
Extend uniq-based dedup from processor tables to all stats tables
(stats_1m_all, stats_1m_app, stats_1m_route). Execution-level tables
use uniq(execution_id). Processor-level tables now use
uniq(concat(execution_id, toString(seq))) so loop iterations (same
exchange, different seq) are counted while chunk retry duplicates
(same exchange+seq) are collapsed.

All stats tables are dropped, recreated, and backfilled from raw
data on startup. All Java queries updated: countMerge -> uniqMerge,
countIfMerge -> uniqIfMerge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:48:01 +02:00
hsiegeln
1872d46466 fix: remove semicolons from SQL comments that broke schema initializer
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m30s
CI / docker (push) Successful in 1m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 50s
The ClickHouseSchemaInitializer splits on semicolons before filtering
comments, so semicolons inside comment text created invalid statements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:17:35 +02:00
hsiegeln
e2f784bf82 fix: deduplicate processor stats using uniq(execution_id)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m17s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Processor execution counts were inflated by duplicate inserts into the
plain MergeTree processor_executions table (chunk retries, reconnects).
Replace count()/countIf() with uniq(execution_id)/uniqIf() in both
stats_1m_processor and stats_1m_processor_detail MVs so each exchange
is counted once per processor regardless of duplicates.

Tables are dropped and rebuilt from raw data on startup. MV created
after backfill to avoid double-counting.

Also adds stats_1m_processor_detail to the catalog purge list (was
missing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 23:12:00 +02:00
hsiegeln
66248f6b1c fix: accept logs from unregistered agents using JWT claims
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
After server restart, agents send logs before re-registering. Instead
of dropping these logs, fall back to application and environment from
the JWT token claims. Only drops logs when neither registry nor JWT
provide an applicationId.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:29:05 +02:00
hsiegeln
6bf7175a6c feat: add Micrometer Prometheus metrics to server
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m36s
CI / deploy (push) Has been cancelled
CI / docker (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
Adds micrometer-registry-prometheus and exposes /api/v1/prometheus
endpoint (unauthenticated for scraping). ServerMetrics component
provides business metrics beyond default JVM/HTTP:

Gauges: agents by state, SSE connections, buffer depths (execution,
processor, log, metrics), accumulator pending exchanges.

Counters: ingestion drops (buffer_full, no_agent, no_identity),
agent transitions (went_stale, went_dead, recovered), deployment
outcomes (running, failed, degraded), auth failures (invalid_token,
revoked, oidc_rejected).

Timers: ClickHouse flush duration by type, deployment duration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:23:27 +02:00
hsiegeln
caaa1ab0cc feat: add Prometheus docker_sd_configs labels to agent containers
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Labels prometheus.scrape, prometheus.path, and prometheus.port are now
set on every deployed container based on the resolved runtime type,
enabling automatic Prometheus service discovery via docker_sd_configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:00:32 +02:00
hsiegeln
dadab2b5f7 fix: align payloadCaptureMode default with agent (BOTH, not NONE)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 49s
Server defaultConfig() and UI fallbacks returned "NONE" for payload
capture, but the agent defaults to "BOTH". This caused unwanted
reconfiguration when users saved other settings — payload capture
would silently change from the agent's default BOTH to NONE.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:12:21 +02:00
hsiegeln
be96336974 feat: add extra Docker networks to container config
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m7s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Apps can now join additional Docker networks (e.g., monitoring,
prometheus) configured via containerConfig.extraNetworks. Flows through
the 3-layer config merge. Networks are created if absent and containers
are connected during deployment. UI adds a pill-list field on the
Resources tab (both create and edit views).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:53:01 +02:00
hsiegeln
90c82238a0 feat: add orphaned app cleanup — auto-filter stale discovered apps, manual dismiss with data purge
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:19:59 +02:00
hsiegeln
d161ad38a8 fix: log deserialization failures on log ingestion endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m23s
CI / docker (push) Successful in 1m4s
CI / deploy (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
Spring's default handler silently returns 400 for malformed payloads
with no server-side log. Added @ExceptionHandler to catch and WARN with
the agent instance ID and root cause message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:33:57 +02:00
hsiegeln
2d3817b296 fix: downgrade successful log ingestion message to DEBUG
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:29:38 +02:00
hsiegeln
e55ee93dcf fix: add proper logging to log ingestion endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m47s
CI / docker (push) Successful in 1m38s
CI / deploy (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
Previously the endpoint silently returned 202 for all failures: missing
agent identity, unregistered agents, empty payloads, and buffer-full
drops. Now logs WARN for each failure case with context (instanceId,
entry count, reason). Normal ingestion logged at INFO with accepted
count. Buffer-full drops tracked individually with accepted/dropped
counts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:20:07 +02:00
hsiegeln
d5b611cc32 feat: validate runtimeType and customArgs on container config save
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:08:52 +02:00
hsiegeln
e941256e6e feat: build Docker entrypoint per runtime type with custom args support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 13:06:54 +02:00
hsiegeln
f4bbc1f65f feat: add detected_runtime_type and detected_main_class to app_versions
Flyway V10 migration adds the two nullable columns. AppVersion record,
AppVersionRepository interface, and PostgresAppVersionRepository are
updated to carry and persist detected runtime information.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:01:24 +02:00
hsiegeln
0603e62a69 fix: revert LogEntry to 7-arg constructor (source is not a ctor param)
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
LogEntry.getSource() exists but source is not a constructor parameter
in cameleer3-common — it uses a default value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:40:48 +02:00
hsiegeln
00115a16ac fix: add source parameter to LogSearchRequest/LogEntry calls in ClickHouseLogStoreIT
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 58s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
All constructor calls updated to include the new source field added
in the log forwarding v2 changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:37:56 +02:00
hsiegeln
b03dfee4f3 feat: log forwarding v2 — accept List<LogEntry>, add source field
Replace LogBatch wrapper with raw List<LogEntry> on the ingestion endpoint.
Add source column to ClickHouse logs table and propagate it through the
storage, search, and HTTP layers (LogSearchRequest, LogEntryResult,
LogEntryResponse, LogQueryController).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 10:25:46 +02:00
hsiegeln
9de51014e7 feat: expose infrastructureEndpoints flag in health endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 23:10:15 +02:00
hsiegeln
293d11e52b feat: add infrastructureendpoints flag with conditional DB/CH controllers
Add cameleer.server.security.infrastructureendpoints property (default true) and
@ConditionalOnProperty to DatabaseAdminController and ClickHouseAdminController so
the SaaS provisioner can set CAMELEER_SERVER_SECURITY_INFRASTRUCTUREENDPOINTS=false
to suppress these endpoints (404) on tenant server containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 23:09:28 +02:00
hsiegeln
350e769948 Group container settings under cameleer.server.runtime.container.*
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m2s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Move container resource defaults into their own sub-namespace for
future extensibility:

  cameleer.server.runtime.container.memorylimit → CAMELEER_SERVER_RUNTIME_CONTAINER_MEMORYLIMIT
  cameleer.server.runtime.container.cpushares   → CAMELEER_SERVER_RUNTIME_CONTAINER_CPUSHARES

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:33:07 +02:00
hsiegeln
534e936cd4 Group OIDC settings under cameleer.server.security.oidc.*
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m59s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
Move OIDC properties into a nested Oidc class within SecurityProperties
for clearer grouping. Env vars gain an extra separator:

  cameleer.server.security.oidc.issueruri     → CAMELEER_SERVER_SECURITY_OIDC_ISSUERURI
  cameleer.server.security.oidc.jwkseturi     → CAMELEER_SERVER_SECURITY_OIDC_JWKSETURI
  cameleer.server.security.oidc.audience      → CAMELEER_SERVER_SECURITY_OIDC_AUDIENCE
  cameleer.server.security.oidc.tlsskipverify → CAMELEER_SERVER_SECURITY_OIDC_TLSSKIPVERIFY

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:30:33 +02:00
hsiegeln
60fb5fe21a Remove vestigial clickhouse.enabled flag
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
ClickHouse is the only storage backend — there is no alternative.
The enabled flag created a false sense of optionality: setting it to
false would crash on startup because most beans unconditionally depend
on the ClickHouse JdbcTemplate.

Remove all @ConditionalOnProperty annotations gating ClickHouse beans,
the enabled property from application.yml, and the K8s manifest entry.
Also fix old property names in AbstractPostgresIT test config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:27:10 +02:00
hsiegeln
8fe48bbf02 Migrate config to cameleer.server.* naming convention
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m52s
CI / docker (push) Successful in 1m30s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.

Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
  ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
  (e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:10:51 +02:00
hsiegeln
3501f32110 feat: make route control and replay configurable per environment/app
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m16s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Added routeControlEnabled and replayEnabled to ResolvedContainerConfig,
flowing through the three-layer config merge (global -> env -> app).
Both default to true. Admins can disable them per environment (e.g.
prod) via the defaultContainerConfig JSONB, or per app via the app's
containerConfig JSONB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:56:13 +02:00
hsiegeln
4da81b21ba fix: enable route control and replay capabilities for deployed apps
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m24s
CI / deploy (push) Has been cancelled
CI / deploy-feature (push) Has been cancelled
CI / docker (push) Has been cancelled
buildEnvVars was missing CAMELEER_ROUTE_CONTROL_ENABLED and
CAMELEER_REPLAY_ENABLED, so deployed app containers defaulted to false
and agents didn't announce these capabilities.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:53:49 +02:00
hsiegeln
e9486bd05a feat: allow M2M password resets when OIDC is enabled
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m50s
CI / docker (push) Successful in 1m34s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
The password reset endpoint was fully blocked under OIDC mode. Now
M2M callers (identified by oidc: principal prefix) can reset local
user passwords, enabling the SaaS platform to manage the server's
built-in admin credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 09:46:26 +02:00
hsiegeln
cfc42eaf46 feat: add cameleer.tenant label to deployed app containers
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m48s
CI / docker (push) Successful in 1m32s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Allows the SaaS platform to identify and clean up all containers
belonging to a tenant on delete (cameleer/cameleer-saas#55).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 09:10:59 +02:00
hsiegeln
0d610be3dc fix: use OIDC token roles when no claim mapping rules exist
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m29s
CI / docker (push) Successful in 1m15s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
The OIDC callback extracted roles from the token's Custom JWT claim
(e.g. roles: [server:admin]) but never used them. The
applyClaimMappings fallback only assigned defaultRoles (VIEWER).

Now the fallback priority is: claim mapping rules > OIDC token
roles > defaultRoles. This ensures users get their org-mapped
roles (owner → server:admin) without requiring manual claim
mapping rule configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 12:17:12 +02:00
hsiegeln
2ac52d3918 feat: tenant-scoped environment network names
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Environment networks now include the tenant ID to prevent cross-tenant
collisions: cameleer-env-{tenantId}-{envSlug} instead of cameleer-env-
{envSlug}. Without this, two tenants with a "dev" environment would
share the same Docker network.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:13:47 +02:00
hsiegeln
50e3f1ade6 feat: use configured DOCKER_NETWORK as primary for deployed apps
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m13s
CI / docker (push) Successful in 1m9s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 36s
Instead of hardcoding cameleer-traefik as the primary network for
deployed app containers, use CAMELEER_DOCKER_NETWORK (env var). In
SaaS mode this is the tenant-isolated network (cameleer-tenant-{slug}).
Apps still connect to cameleer-traefik (for routing) and cameleer-env-
{slug} (for intra-environment discovery) as additional networks.

This enables per-tenant network isolation: apps deployed by tenant A
cannot reach apps deployed by tenant B since they share no network.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:08:48 +02:00
hsiegeln
be585934b9 fix: show descriptive error when creating local user with OIDC enabled
Return a JSON error body from UserAdminController instead of an empty 400,
and extract API error messages in adminFetch so toasts display the reason.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:19:10 +02:00
hsiegeln
1971c70638 fix: commands respect selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 40s
Backend: AgentRegistryService gains findByApplicationAndEnvironment()
and environment-aware addGroupCommandWithReplies() overload.
AgentCommandController and ApplicationConfigController accept optional
environment query parameter. When set, commands only target agents in
that environment. Backward compatible — null means all environments.

Frontend: All command mutations (config update, route control, traced
processors, tap config, route recording) now pass selectedEnv to the
backend via query parameter.

Prevents cross-environment command leakage — e.g., updating config for
prod no longer pushes to dev agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:28:09 +02:00
hsiegeln
69dcce2a8f fix: Runtime tab respects selected environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Add environment parameter to AgentEventsController, AgentEventService,
  and ClickHouseAgentEventRepository (filters agent_events by environment)
- Wire selectedEnv to useAgents and useAgentEvents in both AgentHealth
  and AgentInstance pages
- Wire selectedEnv to useStatsTimeseries in AgentInstance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:12:33 +02:00
hsiegeln
cb36d7936f fix: auto-compute environment slug + respect environment filter globally
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Part A: Environment creation slug is now auto-derived from display name
and shown read-only (matching app creation pattern). Removes manual slug
input.

Part B: All data queries now pass the selected environment to backend:
- Exchanges search, Dashboard L1/L2/L3 stats, Routes metrics, Route
  detail, correlation chains, and processor metrics all filter by
  selected environment.
- Backend RouteMetricsController now accepts environment parameter for
  both route and processor metrics endpoints.

Closes #XYZ

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:01:50 +02:00
hsiegeln
f95a78a380 fix: add periodic deployment status reconciliation
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 1m3s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
The DockerEventMonitor only reacted to Docker events. If an event was
missed (e.g., during reconnect or startup race), a DEGRADED deployment
with all replicas healthy would never promote back to RUNNING.

Add a @Scheduled reconciliation (every 30s) that inspects actual
container state and corrects deployment status mismatches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:40:18 +02:00
hsiegeln
827ba3c798 feat: last-ADMIN guard and password hardening (#87, #89)
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m57s
CI / docker (push) Successful in 1m48s
CI / deploy (push) Successful in 51s
CI / deploy-feature (push) Has been skipped
- Prevent removal of last ADMIN role via role unassign, user delete,
  or group role removal (returns 409 Conflict)
- Add password policy: min 12 chars, 3/4 character classes, no username
- Add brute-force protection: 5 attempts then 15min lockout, IP rate limit
- Add token revocation on password change via token_revoked_before column
- V9 migration adds failed_login_attempts, locked_until, token_revoked_before

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:58:03 +02:00
hsiegeln
2df5e0d7ba feat: active config snapshot, composite StatusDot with tooltip
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 43s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Part 1 — Config snapshot:
- V8 migration adds resolved_config JSONB to deployments table
- DeploymentExecutor saves the full resolved config at deploy time
- Deployment record includes resolvedConfig for auditability

Part 2 — Composite health StatusDot:
- CatalogController computes composite health from deployment status +
  agent health (green only when RUNNING AND agent live)
- CatalogApp includes healthTooltip (e.g. "Deployment: RUNNING,
  Agents: live (1 connected)")
- StatusDot added to app detail header with deployment status Badge
- StatusDot added to deployment table rows
- Sidebar passes composite health + tooltip through to tree nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:00:54 +02:00