Commit Graph

23 Commits

Author SHA1 Message Date
hsiegeln
59e76bdfb6 feat(alerting): V12 flyway migration for alerting tables 2026-04-19 18:28:09 +02:00
hsiegeln
cacedd3f16 fix(outbound): null-guard TRUST_PATHS check; add RBAC test for probe endpoint
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 3m5s
CI / build (pull_request) Successful in 2m13s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / docker (push) Successful in 4m48s
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 32s
- OutboundConnectionRequest compact ctor: avoid NPE if tlsTrustMode is null
  (defense-in-depth alongside @NotNull Bean Validation).
- Add operatorCannotTest IT case to lock the ADMIN-only contract on
  POST /{id}/test — was previously untested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:19:37 +02:00
hsiegeln
7358555d56 test(outbound): add @AfterEach cleanup to avoid leaking user/connection rows
Shared Spring test context meant seeded test-admin/test-operator/test-viewer/test-alice
users persisted across IT classes, breaking FlywayMigrationIT's "users is empty" assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:10:25 +02:00
hsiegeln
87b8a71205 feat(outbound): admin test action for reachability + TLS summary
POST /{id}/test issues a synthetic probe against the connection URL.
TLS protocol/cipher/peer-cert details stubbed for now (Plan 02 follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:47:36 +02:00
hsiegeln
ea4c56e7f6 feat(outbound): admin CRUD REST + RBAC + audit
New audit categories: OUTBOUND_CONNECTION_CHANGE, OUTBOUND_HTTP_TRUST_CHANGE.
Controller-level @PreAuthorize defaults to ADMIN; GETs relaxed to ADMIN|OPERATOR.
SecurityConfig permits OPERATOR GETs on /api/v1/admin/outbound-connections/**.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:43:48 +02:00
hsiegeln
a3c35c7df9 feat(outbound): request + response + test-result DTOs with Bean Validation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:37:00 +02:00
hsiegeln
642c040116 feat(outbound): Postgres repository for outbound_connections
- PostgresOutboundConnectionRepository: JdbcTemplate impl of
  OutboundConnectionRepository; UUID arrays via ConnectionCallback,
  JSONB for headers/auth/ca-paths, enum casts for method/trust/auth-kind
- OutboundBeanConfig: wires the repo + SecretCipher beans
- PostgresOutboundConnectionRepositoryIT: 5 Testcontainers tests
  (save+read, unique-name, allowed-env-ids round-trip, tenant isolation,
  delete); validates V11 Flyway migration end-to-end
- application-test.yml: add jwtsecret default so SecretCipher bean
  starts up in the Spring test context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 16:23:51 +02:00
hsiegeln
b8565af039 feat(outbound): SecretCipher - AES-GCM with JWT-derived key for at-rest secret encryption
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:13:57 +02:00
hsiegeln
0c9d12d8e0 test(http): tighten SSL-failure assertion + null-guard WireMock teardown
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:08:43 +02:00
hsiegeln
000e9d2847 feat(http): ApacheOutboundHttpClientFactory with memoization and startup validation
Adds ApacheOutboundHttpClientFactory (Apache HttpClient 5) that memoizes
CloseableHttpClient instances keyed on effective TLS + timeout config, and
OutboundHttpConfig (@ConfigurationProperties) that validates trusted CA paths
at startup and exposes OutboundHttpClientFactory as a Spring bean.

TRUST_ALL mode disables both cert validation (TrustAllManager in SslContextBuilder)
and hostname verification (NoopHostnameVerifier on SSLConnectionSocketFactoryBuilder).
WireMock HTTPS integration test covers trust-all bypass, system-default PKIX rejection,
and client memoization.

OIDC audit: OidcProviderHelper and OidcTokenExchanger use Nimbus SDK's own HTTP layer
(DefaultResourceRetriever for JWKS, HTTPRequest.send() for token exchange) plus the
bespoke InsecureTlsHelper for TLS skip-verify; neither uses OutboundHttpClientFactory.
Retrofit deferred to a separate follow-up per plan §20.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:03:56 +02:00
hsiegeln
4922748599 refactor(http): tighten SslContextBuilder throws clause, classpath test fixture, system trust-all test
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:59:06 +02:00
hsiegeln
262ee91684 feat(http): SslContextBuilder supports system/trust-all/trust-paths modes
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:54:15 +02:00
hsiegeln
89c9b53edd fix(pagination): add insert_id UUID tiebreak to cursor keyset
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m26s
CI / docker (push) Successful in 1m12s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Same-millisecond rows were silently skipped between pages because the
log cursor had no tiebreak and the events cursor tied by instance_id
(which also collides when one instance emits multiple events within a
millisecond). Add an insert_id UUID (DEFAULT generateUUIDv4()) column
to both logs and agent_events, order by (timestamp, insert_id)
consistently, and encode the cursor as 'timestamp|insert_id'. Existing
data is materialized via ALTER TABLE MATERIALIZE COLUMN (one-time
background mutation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 14:25:36 +02:00
hsiegeln
6d3956935d refactor(events): remove dead non-paginated query path
AgentEventService.queryEvents, AgentEventRepository.query, and the
ClickHouse implementation have had no callers since /agents/events
became cursor-paginated. Remove them along with their dedicated IT
tests. queryPage and its tests remain as the single query path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 13:16:28 +02:00
hsiegeln
0194549f25 fix(events): reject malformed pagination cursors as 400 errors
Wraps DateTimeParseException from Instant.parse in IllegalArgumentException
so the controller maps it to 400. Also rejects cursors with empty
instance_id (trailing '|') which would otherwise produce a vacuous
keyset predicate.
2026-04-17 12:02:40 +02:00
hsiegeln
d293dafb99 feat(events): cursor-paginate agent events (ClickHouse impl)
Orders by (timestamp DESC, instance_id ASC). Cursor is
base64url('timestampIso|instanceId') with a tuple keyset predicate
for stable paging across ties.
2026-04-17 11:57:35 +02:00
hsiegeln
769752a327 feat(logs): widen source filter to multi-value OR list
Replaces LogSearchRequest.source (String) with sources (List<String>)
and emits 'source IN (...)' when non-empty. LogQueryController parses
?source=a,b,c the same way it parses ?level=a,b,c.
2026-04-17 11:48:10 +02:00
hsiegeln
62dd71b860 fix: stamp environment on agent_events rows
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 1m13s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
The agent_events table has an `environment` column and AgentEventsController
filters on it, but the INSERT never populated it — every row got the
column default ('default'). Result: Timeline on the Application Runtime
page was empty whenever the user's selected env was anything other than
'default'.

Thread env through the write path:
- AgentEventRepository.insert + AgentEventService.recordEvent gain an
  `environment` param; delete the no-env query overload (unused).
- ClickHouseAgentEventRepository.insert writes the column (falls back to
  'default' on null to match column DEFAULT).
- All 5 callers source env from the agent registry (AgentInfo.environmentId)
  or the registration request body; AgentLifecycleMonitor, deregister,
  command ack, event ingestion, register/re-register.
- Integration test updated for the new signatures.

Pre-existing rows in deployed CH will still report environment='default'.
New events from this build forward will carry the correct env. Backfill
(UPDATE ... FROM apps) is left as a manual DB step if historical timeline
is needed for non-default envs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:30:56 +02:00
hsiegeln
b7a107d33f test: update integration tests for env-scoped URL shape
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m49s
CI / docker (push) Successful in 2m5s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m37s
Picks up the URL moves from P2/P3A/P3B/P3C. Also fixes a latent bug in
AppControllerIT.uploadJar_asOperator_returns201 / DeploymentControllerIT
setUp: the tests were passing the app's UUID as the {appSlug} path
variable (via `path("id").asText()`); the old AppController looked up
apps via getBySlug(), so the legacy URL call would 404 when the slug
literal was a UUID. Now the test tracks the known slug string and uses
it for every /apps/{appSlug}/... path.

Test URL updates:
- SearchControllerIT: /api/v1/search/executions →
  /api/v1/environments/default/executions (GET) and
  /api/v1/environments/default/executions/search (POST).
- AppControllerIT: /api/v1/apps → /api/v1/environments/default/apps.
  Request bodies drop environmentId (it's in the path).
- DeploymentControllerIT: /api/v1/apps/{appId}/deployments →
  /api/v1/environments/default/apps/{appSlug}/deployments. DeployRequest
  body drops environmentId.
- JwtRefreshIT + RegistrationSecurityIT: smoke-test protected endpoint
  call updated to the new /environments/default/executions shape.

All tests compile clean. Runtime behavior requires a full stack
(Postgres + ClickHouse + Docker); validating integration tests is a
pre-merge step before merging the feature branch.

Remaining pre-merge items (not blocked by code):
1. Regenerate ui/src/api/schema.d.ts + openapi.json by running
   `cd ui && npm run generate-api:live` against a running backend.
   SearchController, DeploymentController, etc. DTO signatures have
   changed; schema.d.ts is frozen at the pre-migration shape.
   Raw-fetch call sites introduced in P3A/P3C work at runtime without
   the schema; the regen only sharpens TypeScript coverage.
2. Smoke test locally: boot server, verify EnvironmentsPage,
   AppsTab, Exchanges, Dashboard, Runtime pages all function.
3. Run `mvn verify` end-to-end (Testcontainers + Docker required).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:53:55 +02:00
hsiegeln
6b5ee10944 feat!: environment admin URLs use slug; validate and immutabilize slug
UUID-based admin paths were the only remaining UUID-in-URL pattern in
the API. Migrates /api/v1/admin/environments/{id} to /{envSlug} so
slugs are the single environment identifier in every URL. UUIDs stay
internal to the database.

- Controller: @PathVariable UUID id → @PathVariable String envSlug on
  get/update/delete and the two nested endpoints (default-container-
  config, jar-retention). Handlers resolve slug → Environment via
  EnvironmentService.getBySlug, then delegate to existing UUID-based
  service methods.
- Service: create() now validates slug against ^[a-z0-9][a-z0-9-]{0,63}$
  and returns 400 on invalid slugs. Rationale documented in the class:
  slugs are immutable after creation because they appear in URLs,
  Docker network names, container names, and ClickHouse partition keys.
- UpdateEnvironmentRequest has no slug field and Jackson's default
  ignore-unknown behavior drops any slug supplied in a PUT body;
  regression test (updateEnvironment_withSlugInBody_ignoresSlug)
  documents this invariant.
- SPA: mutation args change from { id } to { slug }. EnvironmentsPage
  still uses env.id for local selection state (UUID from DB) but
  passes env.slug to every mutation.

BREAKING CHANGE: /api/v1/admin/environments/{id:UUID}/... paths removed.
Clients must use /{envSlug}/... (slug from the environments list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:23:31 +02:00
hsiegeln
9b1ef51d77 feat!: scope per-app config and settings by environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m40s
SonarQube / sonarqube (push) Successful in 4m29s
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).

Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.

- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
  stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
  JWT env claim (agents cannot spoof env); non-agent callers provide env
  via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
  env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
  route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
  removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
  useAllAppSettings, useUpdateAppSettings) take env, wired to
  useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.

Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 22:25:21 +02:00
hsiegeln
e2d9428dff fix: drop stale instance_id filter from search and scope route stats by app
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m28s
CI / docker (push) Successful in 1m11s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 42s
The exchange search silently filtered by the in-memory agent registry's
current instance IDs on top of application_id. Historical exchanges written
by previous agent instances (or any instance not currently registered, e.g.
after a server restart before agents heartbeat back) were hidden from
results even though they matched the application filter.

Fix: drop the applicationId -> instanceIds resolution in SearchController.
Rely on application_id = ? in ClickHouseSearchIndex; keep explicit
instanceIds filtering only when a client passes them.

Related cleanup: the agentIds parameter on StatsStore.statsForRoute /
timeseriesForRoute was silently discarded inside ClickHouseStatsStore, so
per-route stats aggregated across any apps sharing a routeId. Replace with
String applicationId and add application_id to the stats_1m_route filters
so per-route stats are correctly scoped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:49:55 +02:00
hsiegeln
cb3ebfea7c chore: rename cameleer3 to cameleer
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:42 +02:00