All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m40s
SonarQube / sonarqube (push) Successful in 4m29s
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes. Agents must now send environmentId on registration (400 if missing). Two tables previously keyed on app name alone caused cross-environment data bleed: writing config for (app=X, env=dev) would overwrite the row used by (app=X, env=prod) agents, and agent startup fetches ignored env entirely. - V1 schema: application_config and app_settings are now PK (app, env). - Repositories: env-keyed finders/saves; env is the authoritative column, stamped on the stored JSON so the row agrees with itself. - ApplicationConfigController.getConfig is dual-mode — AGENT role uses JWT env claim (agents cannot spoof env); non-agent callers provide env via ?environment= query param. - AppSettingsController endpoints now require ?environment=. - SensitiveKeysAdminController fan-out iterates (app, env) slices so each env gets its own merged keys. - DiagramController ingestion stamps env on TaggedDiagram; ClickHouse route_diagrams INSERT + findProcessorRouteMapping are env-scoped. - AgentRegistrationController: environmentId is required on register; removed all "default" fallbacks from register/refresh/heartbeat auto-heal. - UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings, useAllAppSettings, useUpdateAppSettings) take env, wired to useEnvironmentStore at all call sites. - New ConfigEnvIsolationIT covers env-isolation for both repositories. Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.2 KiB
9.2 KiB
Environment-scoped config — fixing cross-env data bleed
Date: 2026-04-16 Status: Not started Backwards compatibility: None (pre-1.0; user will wipe dev DB)
Problem
Two PostgreSQL tables key per-app state on the application name alone, despite environments (dev/staging/prod) being first-class in the rest of the system:
application_configPK(application)— traced processors, taps, route recording, per-app sensitive keys. All env-sensitive.app_settingsPK(application_id)— SLA threshold, health warn/crit thresholds. All env-sensitive.
Consequences:
- Config corruption:
PUT /api/v1/config/{app}?environment=devcorrectly scopes the SSE fan-out but overwrites the single DB row, so whenprodagents restart and fetch config they get thedevconfig. - Agent startup is env-blind:
GET /api/v1/config/{app}reads neither JWTenvclaim nor any request parameter; returns whichever row exists. - Dashboard settings ambiguous:
AppSettingsendpoints have no env parameter; SLA/health displayed without env context. - Ancillary:
ClickHouseDiagramStore.findProcessorRouteMapping(appId)doesn't filter by env even though the table has anenvironmentcolumn. - Ancillary:
AgentRegistrationControlleraccepts registrations withoutenvironmentIdand silently defaults to"default"— masks misconfigured agents.
Non-goals / working correctly (do not touch)
- All ClickHouse observability tables (executions, logs, metrics, stats_1m_*) — already env-scoped.
AgentCommandController/ SSE command fan-out — already env-filtered viaAgentRegistryService.findByApplicationAndEnvironment.SearchControllersearch path — fixed in commite2d9428.- RBAC (users/roles/groups/claim mappings) — tenant-wide by design.
- Global sensitive-keys push to all envs (
SensitiveKeysAdminController.fanOutToAllAgents) — by design; global baseline. - Admin UI per-page env indicator — not needed, already shown in top-right of the shell.
Design decisions (fixed)
| Question | Answer |
|---|---|
| Schema migration strategy | Edit V1__init.sql in place. User wipes dev DB. |
| Agent config fetch with no/unknown env | Return 404 Not Found. No "default" fallback. |
cameleer-common ApplicationConfig model |
Add environment field in-place; agent team coordinates the bump (SNAPSHOT). |
Agent registration without environmentId |
Return 400 Bad Request. Registration MUST include env. |
| UI per-screen env display | Already covered by top-right global env indicator — no extra UI work. |
Plan
Phase 1 — PostgreSQL schema
- Edit
cameleer-server-app/src/main/resources/db/migration/V1__init.sql:application_config: addenvironment TEXT NOT NULLcolumn; change PK to(application, environment).app_settings: addenvironment TEXT NOT NULLcolumn; change PK to(application_id, environment).
- Commit message MUST call out: "Wipe dev DB before deploying — Flyway V1 checksum changes."
Phase 2 — Shared model (cameleer-common)
ApplicationConfigstays untouched on the server side. The agent team is addingenvironmentto the common class separately; the server doesn't depend on it. On the server,environmentflows as a sidecar parameter through repositories/controllers and as a dedicatedenvironmentcolumn onapplication_config. The stored JSON body contains only the config content. If/when the field appears in the common class, we'll hydrate it from the DB column into the returned DTO — no code change needed today.- Add
environmentfield toAppSettingsrecord incameleer-server-core(admin/AppSettings.java). Done.
Phase 3 — Repositories
PostgresApplicationConfigRepository:findByApplicationAndEnvironment(String app, String env)replacesfindByApplication(app).findAll(String env)(env required) replacesfindAll().save(String app, String env, ApplicationConfig, String updatedBy)replacessave(app, config, updatedBy).- Keep behaviour identical except for the PK.
AppSettingsRepositoryinterface (core) andPostgresAppSettingsRepository(app) — same treatment with(applicationId, environment).
Phase 4 — REST controllers
ApplicationConfigController:getConfig(@PathVariable app): dual-mode by caller role. For AGENT role → env taken from JWTenvclaim, query param ignored (agents cannot spoof env). For non-agent callers (admin UI, with user JWTs whoseenv="default"is a placeholder) → env must be passed via?environment=query param. If neither produces a value → 404.updateConfig(@PathVariable app, @RequestParam String environment, ...): makeenvironmentrequired. Forward to repo save. SSE push already env-scoped — keep.listConfigs(@RequestParam String environment): require env; filter.getProcessorRouteMapping(@PathVariable app, @RequestParam String environment): require env; forward to ClickHouse.testExpression(@PathVariable app, @RequestParam String environment, ...): make env required (already accepted as optional — tighten).
AppSettingsController:GET /api/v1/admin/app-settings?environment=: list filtered.GET /api/v1/admin/app-settings/{appId}?environment=: require env.PUT /api/v1/admin/app-settings/{appId}?environment=: require env.
SensitiveKeysAdminController: review — global sensitive keys are server-wide (one row inserver_config), no change needed. Add code comment clarifying env-wide push is intentional.SearchController.stats: the SLA threshold lookupappSettingsRepository.findByApplicationId(app)becomes env-aware via the existingenvironmentquery param.
Phase 5 — Storage
ClickHouseDiagramStore.findProcessorRouteMapping(app)→findProcessorRouteMapping(app, env). Includeenvironment = ?inWHERE.
Phase 6 — JWT surface
- Expose
envclaim via SpringAuthenticationprincipal — simplest path is a small customAuthenticationPrincipalor@RequestAttribute("env")populated byJwtAuthenticationFilter. Keep scope minimal; onlyApplicationConfigController.getConfigneeds it directly for the 404 rule.
Phase 7 — Agent registration hardening
AgentRegistrationController.register:- If
request.environmentId()is null or blank →400 Bad Requestwith an explicit error message. Drop the"default"fallback on line 122. - Log the rejection (agent identity + remote IP) at INFO for diagnostics.
- If
AgentRegistrationController.refreshToken: remove the"default"fallback at line 211 (dead after Phase 7.13, but harmless to clean up).AgentRegistrationController.heartbeat: already falls back to JWT claim; after Phase 7.13 every JWT has a real env, so the"default"fallback at line 247 is dead code — remove.
Phase 8 — UI queries
ui/src/api/queries/dashboard.ts:useAppSettings(appId)→useAppSettings(appId, environment); same foruseAllAppSettings(). Pull env fromuseEnvironmentStore.ui/src/api/queries/commands.ts: verifyuseApplicationConfig(appId)/useUpdateApplicationConfigalready pass env. Add if missing. (Audit pass only, may be no-op.)- Verify no other UI hook fetches per-app state without env.
Phase 9 — Tests
- Integration: write config for
(app=X, env=dev); read for(app=X, env=prod)returns empty/default. - Integration: agent JWT with
env=devcallingGET /api/v1/config/Xreturns the dev config row. JWT with no env claim → 404. - Integration:
POST /api/v1/agents/registerwith noenvironmentId→ 400. - Unit:
AppSettingsRepositoryenv-isolation test.
Phase 10 — Documentation
CLAUDE.md:- "Storage" section: update
application_configandapp_settingsPK description. - Agent lifecycle section: note that registration requires
environmentId(was optional, defaulted to"default"). - Remove the "priority: heartbeat
environmentId> JWTenvclaim >"default"" note — after fix, every agent has a real env on every path.
- "Storage" section: update
.claude/rules/app-classes.md:ApplicationConfigController— reflect env-required endpoints.AppSettingsController— reflect env-required endpoints.AgentRegistrationController— note env required.
.claude/rules/core-classes.md:PostgresApplicationConfigRepository,PostgresAppSettingsRepository— updated signatures.
Execution order
Phases are mostly sequential by dependency: 1 → 2 → 3 → (4, 5 in parallel) → 6 → 7 → 8 → 9 → 10. Phase 6 (JWT surfacing) is a small dependency for Phase 4 controller changes; do them together.
Verification
mvn clean verifypasses.detect_changesscope matches the files touched per phase.- Manual: spin up two envs (
dev+prod) locally; configure tap indev; confirmprodagent doesn't receive it and its DB row is untouched. - Manual: stop an agent without env in the registration payload; confirm server returns 400.
Out of scope / follow-ups
audit_loghas noenvironmentcolumn; filtering audit by env would be nice-to-have but not a correctness issue. Defer.- Agent bootstrap-token scoping to env (so a dev token can't register as prod) — security hardening for after 1.0.