cameleer/cameleer-server

Fork 0

Files

hsiegeln 9b1ef51d77

CI / cleanup-branch (push) Has been skipped

Details

CI / build (push) Successful in 1m27s

Details

CI / docker (push) Successful in 1m10s

Details

CI / deploy-feature (push) Has been skipped

Details

CI / deploy (push) Successful in 1m40s

Details

SonarQube / sonarqube (push) Successful in 4m29s

Details

feat!: scope per-app config and settings by environment

BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).

Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.

- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
  stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
  JWT env claim (agents cannot spoof env); non-agent callers provide env
  via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
  env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
  route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
  removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
  useAllAppSettings, useUpdateAppSettings) take env, wired to
  useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.

Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-16 22:25:21 +02:00

9.2 KiB

Raw Blame History

Environment-scoped config — fixing cross-env data bleed

Date: 2026-04-16 Status: Not started Backwards compatibility: None (pre-1.0; user will wipe dev DB)

Problem

Two PostgreSQL tables key per-app state on the application name alone, despite environments (dev/staging/prod) being first-class in the rest of the system:

application_config PK (application) — traced processors, taps, route recording, per-app sensitive keys. All env-sensitive.
app_settings PK (application_id) — SLA threshold, health warn/crit thresholds. All env-sensitive.

Consequences:

Config corruption: PUT /api/v1/config/{app}?environment=dev correctly scopes the SSE fan-out but overwrites the single DB row, so when prod agents restart and fetch config they get the dev config.
Agent startup is env-blind: GET /api/v1/config/{app} reads neither JWT env claim nor any request parameter; returns whichever row exists.
Dashboard settings ambiguous: AppSettings endpoints have no env parameter; SLA/health displayed without env context.
Ancillary: ClickHouseDiagramStore.findProcessorRouteMapping(appId) doesn't filter by env even though the table has an environment column.
Ancillary: AgentRegistrationController accepts registrations without environmentId and silently defaults to "default" — masks misconfigured agents.

Non-goals / working correctly (do not touch)

All ClickHouse observability tables (executions, logs, metrics, stats_1m_*) — already env-scoped.
AgentCommandController / SSE command fan-out — already env-filtered via AgentRegistryService.findByApplicationAndEnvironment.
SearchController search path — fixed in commit e2d9428.
RBAC (users/roles/groups/claim mappings) — tenant-wide by design.
Global sensitive-keys push to all envs (SensitiveKeysAdminController.fanOutToAllAgents) — by design; global baseline.
Admin UI per-page env indicator — not needed, already shown in top-right of the shell.

Design decisions (fixed)

Question	Answer
Schema migration strategy	Edit `V1__init.sql` in place. User wipes dev DB.
Agent config fetch with no/unknown env	Return `404 Not Found`. No `"default"` fallback.
`cameleer-common` `ApplicationConfig` model	Add `environment` field in-place; agent team coordinates the bump (SNAPSHOT).
Agent registration without `environmentId`	Return `400 Bad Request`. Registration MUST include env.
UI per-screen env display	Already covered by top-right global env indicator — no extra UI work.

Plan

Phase 1 — PostgreSQL schema

Edit cameleer-server-app/src/main/resources/db/migration/V1__init.sql:
- application_config: add environment TEXT NOT NULL column; change PK to (application, environment).
- app_settings: add environment TEXT NOT NULL column; change PK to (application_id, environment).
Commit message MUST call out: "Wipe dev DB before deploying — Flyway V1 checksum changes."

Phase 2 — Shared model (`cameleer-common`)

ApplicationConfig stays untouched on the server side. The agent team is adding environment to the common class separately; the server doesn't depend on it. On the server, environment flows as a sidecar parameter through repositories/controllers and as a dedicated environment column on application_config. The stored JSON body contains only the config content. If/when the field appears in the common class, we'll hydrate it from the DB column into the returned DTO — no code change needed today.
Add environment field to AppSettings record in cameleer-server-core (admin/AppSettings.java). Done.

Phase 3 — Repositories

PostgresApplicationConfigRepository:
- findByApplicationAndEnvironment(String app, String env) replaces findByApplication(app).
- findAll(String env) (env required) replaces findAll().
- save(String app, String env, ApplicationConfig, String updatedBy) replaces save(app, config, updatedBy).
- Keep behaviour identical except for the PK.
AppSettingsRepository interface (core) and PostgresAppSettingsRepository (app) — same treatment with (applicationId, environment).

Phase 4 — REST controllers

ApplicationConfigController:
- getConfig(@PathVariable app): dual-mode by caller role. For AGENT role → env taken from JWT env claim, query param ignored (agents cannot spoof env). For non-agent callers (admin UI, with user JWTs whose env="default" is a placeholder) → env must be passed via ?environment= query param. If neither produces a value → 404.
- updateConfig(@PathVariable app, @RequestParam String environment, ...): make environment required. Forward to repo save. SSE push already env-scoped — keep.
- listConfigs(@RequestParam String environment): require env; filter.
- getProcessorRouteMapping(@PathVariable app, @RequestParam String environment): require env; forward to ClickHouse.
- testExpression(@PathVariable app, @RequestParam String environment, ...): make env required (already accepted as optional — tighten).
AppSettingsController:
- GET /api/v1/admin/app-settings?environment=: list filtered.
- GET /api/v1/admin/app-settings/{appId}?environment=: require env.
- PUT /api/v1/admin/app-settings/{appId}?environment=: require env.
SensitiveKeysAdminController: review — global sensitive keys are server-wide (one row in server_config), no change needed. Add code comment clarifying env-wide push is intentional.
SearchController.stats: the SLA threshold lookup appSettingsRepository.findByApplicationId(app) becomes env-aware via the existing environment query param.

Phase 5 — Storage

ClickHouseDiagramStore.findProcessorRouteMapping(app) → findProcessorRouteMapping(app, env). Include environment = ? in WHERE.

Phase 6 — JWT surface

Expose env claim via Spring Authentication principal — simplest path is a small custom AuthenticationPrincipal or @RequestAttribute("env") populated by JwtAuthenticationFilter. Keep scope minimal; only ApplicationConfigController.getConfig needs it directly for the 404 rule.

Phase 7 — Agent registration hardening

AgentRegistrationController.register:
- If request.environmentId() is null or blank → 400 Bad Request with an explicit error message. Drop the "default" fallback on line 122.
- Log the rejection (agent identity + remote IP) at INFO for diagnostics.
AgentRegistrationController.refreshToken: remove the "default" fallback at line 211 (dead after Phase 7.13, but harmless to clean up).
AgentRegistrationController.heartbeat: already falls back to JWT claim; after Phase 7.13 every JWT has a real env, so the "default" fallback at line 247 is dead code — remove.

Phase 8 — UI queries

ui/src/api/queries/dashboard.ts: useAppSettings(appId) → useAppSettings(appId, environment); same for useAllAppSettings(). Pull env from useEnvironmentStore.
ui/src/api/queries/commands.ts: verify useApplicationConfig(appId) / useUpdateApplicationConfig already pass env. Add if missing. (Audit pass only, may be no-op.)
Verify no other UI hook fetches per-app state without env.

Phase 9 — Tests

Integration: write config for (app=X, env=dev); read for (app=X, env=prod) returns empty/default.
Integration: agent JWT with env=dev calling GET /api/v1/config/X returns the dev config row. JWT with no env claim → 404.
Integration: POST /api/v1/agents/register with no environmentId → 400.
Unit: AppSettingsRepository env-isolation test.

Phase 10 — Documentation

CLAUDE.md:
- "Storage" section: update application_config and app_settings PK description.
- Agent lifecycle section: note that registration requires environmentId (was optional, defaulted to "default").
- Remove the "priority: heartbeat environmentId > JWT env claim > "default"" note — after fix, every agent has a real env on every path.
.claude/rules/app-classes.md:
- ApplicationConfigController — reflect env-required endpoints.
- AppSettingsController — reflect env-required endpoints.
- AgentRegistrationController — note env required.
.claude/rules/core-classes.md:
- PostgresApplicationConfigRepository, PostgresAppSettingsRepository — updated signatures.

Execution order

Phases are mostly sequential by dependency: 1 → 2 → 3 → (4, 5 in parallel) → 6 → 7 → 8 → 9 → 10. Phase 6 (JWT surfacing) is a small dependency for Phase 4 controller changes; do them together.

Verification

mvn clean verify passes.
detect_changes scope matches the files touched per phase.
Manual: spin up two envs (dev + prod) locally; configure tap in dev; confirm prod agent doesn't receive it and its DB row is untouched.
Manual: stop an agent without env in the registration payload; confirm server returns 400.

Out of scope / follow-ups

audit_log has no environment column; filtering audit by env would be nice-to-have but not a correctness issue. Defer.
Agent bootstrap-token scoping to env (so a dev token can't register as prod) — security hardening for after 1.0.

9.2 KiB Raw Blame History

Environment-scoped config — fixing cross-env data bleed

Problem

Non-goals / working correctly (do not touch)

Design decisions (fixed)

Plan

Phase 1 — PostgreSQL schema

Phase 2 — Shared model (cameleer-common)

Phase 3 — Repositories

Phase 4 — REST controllers

Phase 5 — Storage

Phase 6 — JWT surface

Phase 7 — Agent registration hardening

Phase 8 — UI queries

Phase 9 — Tests

Phase 10 — Documentation

Execution order

Verification

Out of scope / follow-ups

9.2 KiB

Raw Blame History

Phase 2 — Shared model (`cameleer-common`)