Root cause of the mismatch that prompted the one-shot cameleer-seed
docker service: UiAuthController stored users.user_id as the JWT
subject "user:admin" (JWT sub format). Every env-scoped controller
(Alert, AlertSilence, AlertRule, OutboundConnectionAdmin) already
strips the "user:" prefix on the read path — so the rest of the
system expects the DB key to be the bare username. With UiAuth
storing prefixed, fresh docker stacks hit
"alert_rules_created_by_fkey violation" on the first rule create.
Fix: inside login(), compute `userId = request.username()` and use
it everywhere the DB/RBAC layer is touched (isLocked, getPasswordHash,
record/clearFailedLogins, upsert, assignRoleToUser, addUserToGroup,
getSystemRoleNames). Keep `subject = "user:" + userId` — we still
sign JWTs with the namespaced subject so JwtAuthenticationFilter can
distinguish user vs agent tokens.
refresh() and me() follow the same rule via a stripSubjectPrefix()
helper (JWT subject in, bare DB key out).
With the write path aligned, the docker bridge is no longer needed:
- Deleted deploy/docker/postgres-init.sql
- Deleted cameleer-seed service from docker-compose.yml
Scope: UiAuthController only. UserAdminController + OidcAuthController
still prefix on upsert — that's the bug class the triage identified
as "Option A or B either way OK". Not changing them now because:
a) prod admins are provisioned unprefixed through some other path,
so those two files aren't the docker-only failure observed;
b) stripping them would need a data migration for any existing
prod users stored prefixed, which is out of scope for a cleanup
phase. Follow-up worth scheduling if we ever wire OIDC or admin-
created users into alerting FKs.
Verified: 33/33 alerting+outbound controller ITs pass (9 outbound,
10 rules, 9 silences, 5 alert inbox).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Alerting + outbound controllers resolve acting user via
authentication.name with 'user:' prefix stripped → 'admin'. But
UserRepository.upsert stores env-admin as 'user:admin' (JWT sub format).
The resulting FK mismatch manifests as 500 'alert_rules_created_by_fkey'
on any create operation in a fresh docker stack.
Workaround: run-once 'cameleer-seed' compose service runs psql against
deploy/docker/postgres-init.sql after the server is healthy (i.e. after
Flyway migrations have created tenant_default.users), inserting
user_id='admin' idempotently. The root-cause fix belongs in the backend
(either stop stripping the prefix in alerting/outbound controllers, or
normalise storage to the unprefixed form) and is out of scope for
Plan 03.
Add cameleer-clickhouse-external Service (NodePort 30123) matching the
pattern used by cameleer-postgres-external (NodePort 30432).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DatabaseAdminController's active-queries and kill-query endpoints could
expose SQL text from other tenants sharing the same PostgreSQL instance.
Added ApplicationName=tenant_{id} to the JDBC URL and filter
pg_stat_activity by application_name so each tenant only sees its own
connections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse is the only storage backend — there is no alternative.
The enabled flag created a false sense of optionality: setting it to
false would crash on startup because most beans unconditionally depend
on the ClickHouse JdbcTemplate.
Remove all @ConditionalOnProperty annotations gating ClickHouse beans,
the enabled property from application.yml, and the K8s manifest entry.
Also fix old property names in AbstractPostgresIT test config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.
Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
(e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The standard nginx image requires root to modify /etc/nginx/conf.d
and create /var/cache/nginx directories during startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract OidcProviderHelper for shared discovery + JWK source construction
- Add SystemRole.normalizeScope() to centralize role normalization
- Merge duplicate claim extraction in OidcTokenExchanger
- Add PKCE (S256) to OIDC authorization flow (frontend + backend)
- Add SecurityContext (runAsNonRoot) to all K8s deployments
- Fix postgres probe to use $POSTGRES_USER instead of hardcoded username
- Remove default credentials from Dockerfile
- Extract sanitize_branch() to shared .gitea/sanitize-branch.sh
- Fix sidebar to use /exchanges/ paths directly, remove legacy redirects
- Centralize basePath computation in router.tsx via config module
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pg_isready without -U defaults to OS user "root" which doesn't exist
as a PostgreSQL role, causing noisy log entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
K8s $(VAR) substitution only resolves env vars defined earlier in the
list. PG_USER and PG_PASSWORD must come before DB_URL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LOGTO_ENDPOINT and LOGTO_ADMIN_ENDPOINT are public-facing URLs that
Logto uses for OIDC discovery, issuer URI, and redirects. When behind
a reverse proxy (e.g., Traefik), set these to the external URLs.
Logto requires its own subdomain (not a path prefix).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explicit spring.datasource.url in YAML takes precedence over the env var,
causing deployed containers to connect to localhost instead of the postgres
service. Now the YAML uses ${SPRING_DATASOURCE_URL:...} so the env var
wins when set. Flyway inherits from the datasource (no separate URL).
Removed CAMELEER_DB_SCHEMA — schema is part of the datasource URL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Existing deployment has tables in public schema. The new tenant_default
default breaks startup because Flyway sees an empty schema. Override to
public for backward compat; new deployments use the tenant-derived default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Schema now defaults to tenant_${cameleer.tenant.id} (e.g. tenant_default,
tenant_acme) instead of public. Flyway create-schemas: true ensures the
schema is auto-created on first startup. CAMELEER_DB_SCHEMA env var still
available as override for feature branch isolation. Removed hardcoded
public schema from K8s base and main overlay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.
Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
"default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
updated ORDER BY (tenant→time→env→app), added tenant_id to
usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures
Closes#123
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diagnostics showed ~3,200 tiny inserts per 5 minutes:
- processor_executions: 2,376 inserts (14 rows avg) — one per chunk
- logs: 803 inserts (5 rows avg) — synchronous in HTTP handler
Fix 1: Consolidate processor inserts — new insertProcessorBatches() method
flattens all ProcessorBatch records into a single INSERT per flush cycle.
Fix 2: Buffer log inserts — route through WriteBuffer<BufferedLogEntry>,
flushed on the same 5s interval as executions. LogIngestionController now
pushes to buffer instead of inserting directly.
Also reverts async_insert config (doesn't work with JDBC inline VALUES).
Expected: ~3,200 inserts/5min → ~160 (20x reduction in part creation,
MV triggers, and background merge work).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diagnostics showed 3,200 tiny inserts per 5 minutes (processor_executions:
2,376 at 14 rows avg, logs: 803 at 5 rows avg), each creating a new part
and triggering MV aggregations + background merges. This was the root cause
of ~400m CPU usage at 3 tx/s.
async_insert=1 with 5s busy timeout lets ClickHouse buffer incoming inserts
and consolidate them into fewer, larger parts before writing to disk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ChunkAccumulator now extracts inputBody/outputBody/inputHeaders/outputHeaders
from ExecutionChunk.inputSnapshot/outputSnapshot instead of storing empty strings
- Set ClickHouse server log level to warning (was trace by default)
- Update CLAUDE.md to document Ed25519 key derivation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.
- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
(docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace partial memory config with full Altinity low-memory guide
settings. Revert container limit from 6Gi back to 4Gi — proper
tuning (mlock=false, reduced caches/pools/threads, disk spill for
aggregations) makes the original budget sufficient.
Switch all storage feature flags to ClickHouse:
- CAMELEER_STORAGE_SEARCH: opensearch → clickhouse
- CAMELEER_STORAGE_METRICS: postgres → clickhouse
- CAMELEER_STORAGE_STATS: already clickhouse
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse 24.12 auto-sizes caches from the cgroup limit, leaving
insufficient headroom for MV processing and background merges.
Adds a custom config that shrinks mark/index/expression caches and
caps per-query memory at 2 GiB. Bumps container limit 4Gi → 6Gi.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clickhouse-jdbc 0.9.7 rejects async_insert and wait_for_async_insert as
unknown URL parameters. These are server-side settings, not driver config.
Can be set per-query later if needed via custom_settings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set CLICKHOUSE_USER/PASSWORD via k8s secret (fixes "disabling network
access for user 'default'" when no password is set)
- Add clickhouse-credentials secret to CI deploy + feature branch copy
- Pass CLICKHOUSE_USERNAME/PASSWORD env vars to server pod
- Make schema initializer non-fatal so server starts even if CH is
temporarily unavailable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ClickHouse only has the 'default' database out of the box. The JDBC URL
connects to 'cameleer', so the database must exist before the server starts.
Uses /docker-entrypoint-initdb.d/ init script via ConfigMap.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browser requests now go to the UI origin and nginx proxies them to the
backend within the cluster. Removes the separate API Ingress host rule
since API traffic no longer needs its own subdomain.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When spring.flyway.url is set independently, Spring Boot does not
inherit credentials from spring.datasource. Add explicit user/password
to both application.yml and K8s deployment to prevent "no password"
failures on feature branch deployments.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flyway needs public in the search_path to access TimescaleDB extension
functions (create_hypertable). The app datasource must NOT include public
to prevent accidental cross-schema reads from production data.
- spring.flyway.url: currentSchema=<branch>,public (extensions accessible)
- spring.datasource.url: currentSchema=<branch> (strict isolation)
- SPRING_FLYWAY_URL env var added to K8s base manifest
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OIDC configuration should be managed by the server itself (database-backed),
not injected via K8s secrets. Remove all CAMELEER_OIDC_* env vars from
deployment manifests and the cameleer-oidc secret from CI. The server
defaults to OIDC disabled via application.yml.
This also fixes the Kustomize strategic merge conflict where the feature
overlay tried to set value on an env var that had valueFrom in the base.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
deploy/server.yaml and deploy/ui.yaml are no longer referenced by CI,
which now uses deploy/base/ + deploy/overlays/main/ via kubectl apply -k.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable deploying feature branches into isolated environments on the same
k3s cluster. Each branch gets its own namespace (cam-<slug>), PostgreSQL
schema, and OpenSearch index prefix for data isolation while sharing the
underlying infrastructure.
- Make OpenSearch index prefix and DB schema configurable via env vars
(defaults preserve existing behavior)
- Restructure deploy/ into Kustomize base + overlays (main/feature)
- Extend CI to build Docker images for all branches, not just main
- Add deploy-feature job with namespace creation, secret copying,
Traefik Ingress routing (<slug>-api/ui.cameleer.siegeln.net)
- Add cleanup-branch job to remove namespace, PG schema, OS indices
on branch deletion
- Install required tools (git, jq, curl) in CI deploy containers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- deploy/authentik.yaml: PostgreSQL StatefulSet, Redis, Authentik
server (NodePort 30900) and worker, all in cameleer namespace
- deploy/server.yaml: Add CAMELEER_JWT_SECRET and CAMELEER_OIDC_*
env vars from secrets (all optional for backward compat)
- ci.yml: Create authentik-credentials and cameleer-oidc secrets,
deploy Authentik before the server
- HOWTO.md: Authentik setup instructions, updated architecture
diagram and Gitea secrets list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Execution rows are wide (29 cols with serialized arrays/JSON), so 500
rows can exceed ClickHouse's memory limit. Reduce default batch size
from 500 to 100 and bump ClickHouse memory limit from 2Gi to 4Gi.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port 30080 is already allocated. Updated deploy manifests,
CORS origin, and HOWTO.md references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Scaffold Vite + React + TypeScript frontend in ui/ with full design
system (dark/light themes) matching the HTML mockups
- Implement Execution Explorer page: search filters, results table with
expandable processor tree and exchange detail sidebar, pagination
- Add UI authentication: UiAuthController (login/refresh endpoints),
JWT filter handles ui: subject prefix, CORS configuration
- Shared components: StatusPill, DurationBar, StatCard, AppBadge,
FilterChip, Pagination — all using CSS Modules with design tokens
- API client layer: openapi-fetch with auth middleware, TanStack Query
hooks for search/detail/snapshot queries, Zustand for state
- Standalone deployment: Nginx Dockerfile, K8s Deployment + ConfigMap +
NodePort (30080), runtime config.js for API base URL
- Embedded mode: maven-resources-plugin copies ui/dist into JAR static
resources, SPA forward controller for client-side routing
- CI/CD: UI build step, Docker build/push for server-ui image, K8s
deploy step for UI, UI credential secrets
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ClickHouse user/password now injected via `clickhouse-credentials` Secret
instead of hardcoded plaintext in deploy manifests (#33)
- CI deploy step creates the secret idempotently from Gitea CI secrets
- Added liveness/readiness probes: server uses /api/v1/health, ClickHouse
uses /ping (#35)
- Updated HOWTO.md and CLAUDE.md with new secrets and probe details
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>