Commit Graph

53 Commits

Author SHA1 Message Date
hsiegeln
ac680b7f3f refactor: prefix all third-party service names with cameleer-
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m7s
CI / docker (push) Successful in 1m33s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m51s
SonarQube / sonarqube (push) Successful in 3m28s
Rename all Docker/K8s service names, DNS hostnames, secrets, volumes,
and manifest files to use the cameleer- prefix, making it clear which
software package each container belongs to.

Services renamed:
- postgres → cameleer-postgres
- clickhouse → cameleer-clickhouse
- logto → cameleer-logto
- logto-postgresql → cameleer-logto-postgresql
- traefik (service) → cameleer-traefik
- postgres-external → cameleer-postgres-external

Secrets renamed:
- postgres-credentials → cameleer-postgres-credentials
- clickhouse-credentials → cameleer-clickhouse-credentials
- logto-credentials → cameleer-logto-credentials

Volumes renamed:
- pgdata → cameleer-pgdata
- chdata → cameleer-chdata
- certs → cameleer-certs
- bootstrapdata → cameleer-bootstrapdata

K8s manifests renamed:
- deploy/postgres.yaml → deploy/cameleer-postgres.yaml
- deploy/clickhouse.yaml → deploy/cameleer-clickhouse.yaml
- deploy/logto.yaml → deploy/cameleer-logto.yaml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:51:08 +02:00
hsiegeln
60fb5fe21a Remove vestigial clickhouse.enabled flag
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m19s
CI / docker (push) Successful in 1m4s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
ClickHouse is the only storage backend — there is no alternative.
The enabled flag created a false sense of optionality: setting it to
false would crash on startup because most beans unconditionally depend
on the ClickHouse JdbcTemplate.

Remove all @ConditionalOnProperty annotations gating ClickHouse beans,
the enabled property from application.yml, and the K8s manifest entry.
Also fix old property names in AbstractPostgresIT test config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:27:10 +02:00
hsiegeln
8fe48bbf02 Migrate config to cameleer.server.* naming convention
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m52s
CI / docker (push) Successful in 1m30s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.

Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
  ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
  (e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 18:10:51 +02:00
hsiegeln
0de392ff6e fix: remove securityContext from UI pod — nginx needs root for setup
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
The standard nginx image requires root to modify /etc/nginx/conf.d
and create /var/cache/nginx directories during startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:06:07 +02:00
hsiegeln
c502a42f17 refactor: architecture cleanup — OIDC dedup, PKCE, K8s hardening
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 59s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m59s
- Extract OidcProviderHelper for shared discovery + JWK source construction
- Add SystemRole.normalizeScope() to centralize role normalization
- Merge duplicate claim extraction in OidcTokenExchanger
- Add PKCE (S256) to OIDC authorization flow (frontend + backend)
- Add SecurityContext (runAsNonRoot) to all K8s deployments
- Fix postgres probe to use $POSTGRES_USER instead of hardcoded username
- Remove default credentials from Dockerfile
- Extract sanitize_branch() to shared .gitea/sanitize-branch.sh
- Fix sidebar to use /exchanges/ paths directly, remove legacy redirects
- Centralize basePath computation in router.tsx via config module

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:57:29 +02:00
hsiegeln
12bb734c2d fix: use tcpSocket probe for logto-postgresql instead of pg_isready
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m4s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 35s
pg_isready without -U defaults to OS user "root" which doesn't exist
as a PostgreSQL role, causing noisy log entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:44:59 +02:00
hsiegeln
cbeaf30bc7 fix: move PG_USER/PG_PASSWORD before DB_URL in logto.yaml
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m9s
K8s $(VAR) substitution only resolves env vars defined earlier in the
list. PG_USER and PG_PASSWORD must come before DB_URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:39:50 +02:00
hsiegeln
c4d2fa90ab docs: clarify Logto proxy setup and ENDPOINT/ADMIN_ENDPOINT semantics
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 3m15s
LOGTO_ENDPOINT and LOGTO_ADMIN_ENDPOINT are public-facing URLs that
Logto uses for OIDC discovery, issuer URI, and redirects. When behind
a reverse proxy (e.g., Traefik), set these to the external URLs.
Logto requires its own subdomain (not a path prefix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:31:17 +02:00
hsiegeln
22d812d832 feat: replace Authentik with Logto K8s deployment 2026-04-05 13:12:01 +02:00
hsiegeln
de85cdf5a2 fix: let SPRING_DATASOURCE_URL fully control datasource connection
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
SonarQube / sonarqube (push) Successful in 3m26s
Explicit spring.datasource.url in YAML takes precedence over the env var,
causing deployed containers to connect to localhost instead of the postgres
service. Now the YAML uses ${SPRING_DATASOURCE_URL:...} so the env var
wins when set. Flyway inherits from the datasource (no separate URL).
Removed CAMELEER_DB_SCHEMA — schema is part of the datasource URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:24:22 +02:00
hsiegeln
2277a0498f fix: set CAMELEER_DB_SCHEMA=public for existing main deployment
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m1s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Has been cancelled
Existing deployment has tables in public schema. The new tenant_default
default breaks startup because Flyway sees an empty schema. Override to
public for backward compat; new deployments use the tenant-derived default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:21:17 +02:00
hsiegeln
ac87aa6eb2 fix: derive PG schema from tenant ID instead of defaulting to public
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Failing after 2m17s
Schema now defaults to tenant_${cameleer.tenant.id} (e.g. tenant_default,
tenant_acme) instead of public. Flyway create-schemas: true ensures the
schema is auto-created on first startup. CAMELEER_DB_SCHEMA env var still
available as override for feature branch isolation. Removed hardcoded
public schema from K8s base and main overlay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:46:57 +02:00
hsiegeln
a188308ec5 feat: implement multitenancy with tenant isolation + environment support
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m25s
Adds configurable tenant ID (CAMELEER_TENANT_ID env var, default:
"default") and environment as a first-class concept. Each server
instance serves one tenant with multiple environments.

Changes across 36 files:
- TenantProperties config bean for tenant ID injection
- AgentInfo: added environmentId field
- AgentRegistrationRequest: added environmentId field
- All 9 ClickHouse stores: inject tenant ID, replace hardcoded
  "default" constant, add environment to writes/reads
- ChunkAccumulator: configurable tenant ID + environment resolver
- MergedExecution/ProcessorBatch/BufferedLogEntry: added environment
- ClickHouse init.sql: added environment column to all tables,
  updated ORDER BY (tenant→time→env→app), added tenant_id to
  usage_events, updated all MV GROUP BY clauses
- Controllers: pass environmentId through registration/auto-heal
- K8s deploy: added CAMELEER_TENANT_ID env var
- All tests updated for new signatures

Closes #123

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 15:00:18 +02:00
hsiegeln
633a61d89d perf: batch processor and log inserts to reduce ClickHouse part creation
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 39s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m2s
SonarQube / sonarqube (push) Failing after 1m58s
Diagnostics showed ~3,200 tiny inserts per 5 minutes:
- processor_executions: 2,376 inserts (14 rows avg) — one per chunk
- logs: 803 inserts (5 rows avg) — synchronous in HTTP handler

Fix 1: Consolidate processor inserts — new insertProcessorBatches() method
flattens all ProcessorBatch records into a single INSERT per flush cycle.

Fix 2: Buffer log inserts — route through WriteBuffer<BufferedLogEntry>,
flushed on the same 5s interval as executions. LogIngestionController now
pushes to buffer instead of inserting directly.

Also reverts async_insert config (doesn't work with JDBC inline VALUES).

Expected: ~3,200 inserts/5min → ~160 (20x reduction in part creation,
MV triggers, and background merge work).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:48:04 +02:00
hsiegeln
e0aac4bf0a perf: enable ClickHouse async_insert to batch small inserts server-side
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 43s
Diagnostics showed 3,200 tiny inserts per 5 minutes (processor_executions:
2,376 at 14 rows avg, logs: 803 at 5 rows avg), each creating a new part
and triggering MV aggregations + background merges. This was the root cause
of ~400m CPU usage at 3 tx/s.

async_insert=1 with 5s busy timeout lets ClickHouse buffer incoming inserts
and consolidate them into fewer, larger parts before writing to disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:33:48 +02:00
hsiegeln
ac94a67a49 fix: reduce ClickHouse CPU by increasing flush interval, rename LIVE→AUTO labels
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 1m24s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
- Increase ingestion flush interval from 500ms to 5000ms to reduce MV merge storms
- Reduce ClickHouse background_schedule_pool_size from 8 to 4
- Rename LIVE/PAUSED badge labels to AUTO/MANUAL across all pages
- Update design system to v0.1.29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 22:05:29 +02:00
hsiegeln
e1cb9d7872 fix: extract snapshot data from chunks, reduce ClickHouse log noise
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 41s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 37s
- ChunkAccumulator now extracts inputBody/outputBody/inputHeaders/outputHeaders
  from ExecutionChunk.inputSnapshot/outputSnapshot instead of storing empty strings
- Set ClickHouse server log level to warning (was trace by default)
- Update CLAUDE.md to document Ed25519 key derivation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 17:58:54 +02:00
hsiegeln
188810e54b feat: remove TimescaleDB, dead PG stores, and storage feature flags
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 32s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Complete the ClickHouse migration by removing all PostgreSQL analytics
code. PostgreSQL now serves only RBAC, config, and audit — all
observability data is exclusively in ClickHouse.

- Delete 6 dead PostgreSQL store classes (executions, stats, diagrams,
  events, metrics, metrics-query) and 2 integration tests
- Delete RetentionScheduler (ClickHouse TTL handles retention)
- Remove all 7 cameleer.storage.* feature flags from application.yml
- Remove all @ConditionalOnProperty from ClickHouse beans in StorageBeanConfig
- Consolidate 14 Flyway migrations (V1-V14) into single clean V1 with
  only RBAC/config/audit tables (no TimescaleDB, no analytics tables)
- Switch from timescale/timescaledb-ha:pg16 to postgres:16 everywhere
  (docker-compose, deploy/postgres.yaml, test containers)
- Remove TimescaleDB check and /metrics-pipeline from DatabaseAdminController
- Set clickhouse.enabled default to true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 20:10:58 +02:00
hsiegeln
283e38a20d feat: remove OpenSearch, add ClickHouse admin page
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 33s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Remove all OpenSearch code, dependencies, configuration, deployment
manifests, and CI/CD references. Replace the OpenSearch admin page
with a ClickHouse admin page showing cluster status, table sizes,
performance metrics, and indexer pipeline stats.

- Delete 11 OpenSearch Java files (config, search impl, admin controller, DTOs, tests)
- Delete 3 OpenSearch frontend files (admin page, CSS, query hooks)
- Delete deploy/opensearch.yaml K8s manifest
- Remove opensearch Maven dependencies from pom.xml
- Remove opensearch config from application.yml, Dockerfile, docker-compose
- Remove opensearch from CI workflow (secrets, deploy, cleanup steps)
- Simplify ThresholdConfig (remove OpenSearch thresholds, database-only)
- Change default search backend from opensearch to clickhouse
- Add ClickHouseAdminController with /status, /tables, /performance, /pipeline
- Add ClickHouseAdminPage with StatCards, pipeline ProgressBar, tables DataTable
- Update CLAUDE.md, HOWTO.md, and source comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:56:06 +02:00
hsiegeln
95b9dea5c4 feat(clickhouse): wire ClickHouseExecutionStore as active ExecutionStore
Add cameleer.storage.executions feature flag (default: clickhouse).
PostgresExecutionStore activates only when explicitly set to postgres.
Add by-seq snapshot endpoint for iteration-aware processor lookup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:09:14 +02:00
hsiegeln
968117c41a feat(clickhouse): wire Phase 4 stores with feature flags
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 43s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Add conditional beans for ClickHouseDiagramStore, ClickHouseAgentEventRepository,
and ClickHouseLogStore. All default to ClickHouse (matchIfMissing=true).
PG/OS stores activate only when explicitly configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:44:10 +02:00
hsiegeln
af080337f5 feat: comprehensive ClickHouse low-memory tuning and switch all storage to ClickHouse
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m9s
CI / docker (push) Successful in 42s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 58s
Replace partial memory config with full Altinity low-memory guide
settings. Revert container limit from 6Gi back to 4Gi — proper
tuning (mlock=false, reduced caches/pools/threads, disk spill for
aggregations) makes the original budget sufficient.

Switch all storage feature flags to ClickHouse:
- CAMELEER_STORAGE_SEARCH: opensearch → clickhouse
- CAMELEER_STORAGE_METRICS: postgres → clickhouse
- CAMELEER_STORAGE_STATS: already clickhouse

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:27:10 +02:00
hsiegeln
a669df08bd fix(clickhouse): tune memory settings to prevent OOM on insert
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 40s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
ClickHouse 24.12 auto-sizes caches from the cgroup limit, leaving
insufficient headroom for MV processing and background merges.
Adds a custom config that shrinks mark/index/expression caches and
caps per-query memory at 2 GiB. Bumps container limit 4Gi → 6Gi.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:54:43 +02:00
hsiegeln
9df00fdde0 feat(clickhouse): wire ClickHouseStatsStore with cameleer.storage.stats feature flag (default: clickhouse)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:51:45 +02:00
hsiegeln
31f7113b3f feat(clickhouse): wire ChunkAccumulator, flush scheduler, and search feature flag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 19:21:19 +02:00
hsiegeln
8eeaecf6f3 fix: remove unsupported async_insert params from ClickHouse JDBC URL
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m6s
CI / docker (push) Successful in 55s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m39s
CI / deploy (push) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (push) Successful in 51s
CI / deploy-feature (pull_request) Has been skipped
clickhouse-jdbc 0.9.7 rejects async_insert and wait_for_async_insert as
unknown URL parameters. These are server-side settings, not driver config.
Can be set per-query later if needed via custom_settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:02:53 +02:00
hsiegeln
f8505401d7 fix: ClickHouse auth credentials and non-fatal schema init
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m5s
CI / docker (push) Successful in 43s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 13s
CI / cleanup-branch (pull_request) Has been skipped
CI / build (pull_request) Successful in 1m47s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
- Set CLICKHOUSE_USER/PASSWORD via k8s secret (fixes "disabling network
  access for user 'default'" when no password is set)
- Add clickhouse-credentials secret to CI deploy + feature branch copy
- Pass CLICKHOUSE_USERNAME/PASSWORD env vars to server pod
- Make schema initializer non-fatal so server starts even if CH is
  temporarily unavailable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:54:44 +02:00
hsiegeln
59dd629b0e fix: create cameleer database on ClickHouse startup
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (pull_request) Successful in 1m49s
CI / cleanup-branch (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
CI / deploy-feature (pull_request) Has been skipped
CI / build (push) Successful in 1m7s
CI / docker (push) Successful in 10s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been cancelled
ClickHouse only has the 'default' database out of the box. The JDBC URL
connects to 'cameleer', so the database must exist before the server starts.
Uses /docker-entrypoint-initdb.d/ init script via ConfigMap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:31:17 +02:00
hsiegeln
1b991f99a3 deploy: add ClickHouse StatefulSet and server env vars
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:08:42 +02:00
hsiegeln
fc412f7251 fix: use relative API URL in feature branch UI to eliminate CORS errors
All checks were successful
CI / build (push) Successful in 1m4s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 13s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Successful in 34s
Browser requests now go to the UI origin and nginx proxies them to the
backend within the cluster. Removes the separate API Ingress host rule
since API traffic no longer needs its own subdomain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 13:49:01 +01:00
hsiegeln
82117deaab fix: pass credentials to Flyway when using separate datasource URL
All checks were successful
CI / build (push) Successful in 1m6s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 42s
CI / deploy (push) Successful in 40s
CI / deploy-feature (push) Has been skipped
When spring.flyway.url is set independently, Spring Boot does not
inherit credentials from spring.datasource. Add explicit user/password
to both application.yml and K8s deployment to prevent "no password"
failures on feature branch deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 13:34:41 +01:00
hsiegeln
247fdb01c0 fix: separate Flyway and app datasource search paths for schema isolation
Some checks failed
CI / build (push) Successful in 1m6s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 41s
CI / deploy (push) Failing after 2m19s
CI / deploy-feature (push) Has been skipped
Flyway needs public in the search_path to access TimescaleDB extension
functions (create_hypertable). The app datasource must NOT include public
to prevent accidental cross-schema reads from production data.

- spring.flyway.url: currentSchema=<branch>,public (extensions accessible)
- spring.datasource.url: currentSchema=<branch> (strict isolation)
- SPRING_FLYWAY_URL env var added to K8s base manifest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 13:26:01 +01:00
hsiegeln
ff3a046f5a refactor: remove OIDC config from K8s manifests
All checks were successful
CI / build (push) Successful in 1m8s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 37s
CI / deploy-feature (push) Has been skipped
OIDC configuration should be managed by the server itself (database-backed),
not injected via K8s secrets. Remove all CAMELEER_OIDC_* env vars from
deployment manifests and the cameleer-oidc secret from CI. The server
defaults to OIDC disabled via application.yml.

This also fixes the Kustomize strategic merge conflict where the feature
overlay tried to set value on an env var that had valueFrom in the base.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 13:12:41 +01:00
hsiegeln
c1cf8ae260 chore: remove old flat deploy manifests superseded by Kustomize
Some checks failed
CI / build (push) Successful in 1m7s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 14s
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Failing after 19s
deploy/server.yaml and deploy/ui.yaml are no longer referenced by CI,
which now uses deploy/base/ + deploy/overlays/main/ via kubectl apply -k.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:52:58 +01:00
hsiegeln
15f20d22ad feat: add feature branch deployments with per-branch isolation
Some checks failed
CI / build (push) Successful in 1m8s
CI / cleanup-branch (push) Has been skipped
CI / docker (push) Successful in 42s
CI / deploy (push) Failing after 5s
CI / deploy-feature (push) Has been skipped
Enable deploying feature branches into isolated environments on the same
k3s cluster. Each branch gets its own namespace (cam-<slug>), PostgreSQL
schema, and OpenSearch index prefix for data isolation while sharing the
underlying infrastructure.

- Make OpenSearch index prefix and DB schema configurable via env vars
  (defaults preserve existing behavior)
- Restructure deploy/ into Kustomize base + overlays (main/feature)
- Extend CI to build Docker images for all branches, not just main
- Add deploy-feature job with namespace creation, secret copying,
  Traefik Ingress routing (<slug>-api/ui.cameleer.siegeln.net)
- Add cleanup-branch job to remove namespace, PG schema, OS indices
  on branch deletion
- Install required tools (git, jq, curl) in CI deploy containers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:35:07 +01:00
hsiegeln
c346babe33 fix: correct PostgreSQL mountPath and add external NodePort services
All checks were successful
CI / build (pull_request) Successful in 1m9s
CI / docker (pull_request) Has been skipped
CI / deploy (pull_request) Has been skipped
- Fix postgres.yaml mountPath to /home/postgres/pgdata matching timescaledb-ha PGDATA
- Add NodePort services for external access: PostgreSQL (30432), OpenSearch (30920)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 01:00:20 +01:00
hsiegeln
2634f60e59 fix: use timescaledb-ha image in K8s manifest for toolkit support 2026-03-16 19:14:15 +01:00
hsiegeln
ea687a342c deploy: remove obsolete ClickHouse K8s manifest 2026-03-16 19:01:26 +01:00
hsiegeln
a344be3a49 deploy: replace ClickHouse with PostgreSQL/TimescaleDB + OpenSearch in K8s manifests
- Dockerfile: update default SPRING_DATASOURCE_URL to jdbc:postgresql, add OPENSEARCH_URL default env
- deploy/postgres.yaml: new TimescaleDB StatefulSet + headless Service (10Gi PVC, pg_isready probes)
- deploy/opensearch.yaml: new OpenSearch 2.19.0 StatefulSet + headless Service (10Gi PVC, single-node, security disabled)
- deploy/server.yaml: switch datasource env from clickhouse-credentials to postgres-credentials, add OPENSEARCH_URL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 18:58:35 +01:00
a1e1c8f6ff deploy/authentik.yaml aktualisiert
Some checks failed
CI / build (push) Successful in 1m8s
CI / docker (push) Successful in 42s
CI / deploy (push) Failing after 2m9s
change authentik UI port to 30950
2026-03-14 12:52:13 +01:00
hsiegeln
554d6822c0 Add Authentik OIDC provider K8s manifests and wire deployment
Some checks failed
CI / build (push) Successful in 1m11s
CI / docker (push) Successful in 40s
CI / deploy (push) Failing after 8s
- deploy/authentik.yaml: PostgreSQL StatefulSet, Redis, Authentik
  server (NodePort 30900) and worker, all in cameleer namespace
- deploy/server.yaml: Add CAMELEER_JWT_SECRET and CAMELEER_OIDC_*
  env vars from secrets (all optional for backward compat)
- ci.yml: Create authentik-credentials and cameleer-oidc secrets,
  deploy Authentik before the server
- HOWTO.md: Authentik setup instructions, updated architecture
  diagram and Gitea secrets list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 12:45:02 +01:00
hsiegeln
4cdf2ac012 Fix ClickHouse OOM on batch insert: reduce batch size, increase memory
All checks were successful
CI / build (push) Successful in 1m3s
CI / docker (push) Successful in 41s
CI / deploy (push) Successful in 46s
Execution rows are wide (29 cols with serialized arrays/JSON), so 500
rows can exceed ClickHouse's memory limit. Reduce default batch size
from 500 to 100 and bump ClickHouse memory limit from 2Gi to 4Gi.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:44:03 +01:00
hsiegeln
fc2daddf54 Change UI NodePort from 30080 to 30090
All checks were successful
CI / build (push) Successful in 58s
CI / docker (push) Successful in 39s
CI / deploy (push) Successful in 27s
Port 30080 is already allocated. Updated deploy manifests,
CORS origin, and HOWTO.md references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 14:11:50 +01:00
hsiegeln
3eb83f97d3 Add React UI with Execution Explorer, auth, and standalone deployment
Some checks failed
CI / build (push) Failing after 1m53s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
- Scaffold Vite + React + TypeScript frontend in ui/ with full design
  system (dark/light themes) matching the HTML mockups
- Implement Execution Explorer page: search filters, results table with
  expandable processor tree and exchange detail sidebar, pagination
- Add UI authentication: UiAuthController (login/refresh endpoints),
  JWT filter handles ui: subject prefix, CORS configuration
- Shared components: StatusPill, DurationBar, StatCard, AppBadge,
  FilterChip, Pagination — all using CSS Modules with design tokens
- API client layer: openapi-fetch with auth middleware, TanStack Query
  hooks for search/detail/snapshot queries, Zustand for state
- Standalone deployment: Nginx Dockerfile, K8s Deployment + ConfigMap +
  NodePort (30080), runtime config.js for API base URL
- Embedded mode: maven-resources-plugin copies ui/dist into JAR static
  resources, SPA forward controller for client-side routing
- CI/CD: UI build step, Docker build/push for server-ui image, K8s
  deploy step for UI, UI credential secrets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:59:22 +01:00
hsiegeln
9c2391e5d4 Move ClickHouse credentials to K8s Secret and add health probes
All checks were successful
CI / build (push) Successful in 41s
CI / docker (push) Successful in 13s
CI / deploy (push) Successful in 38s
- ClickHouse user/password now injected via `clickhouse-credentials` Secret
  instead of hardcoded plaintext in deploy manifests (#33)
- CI deploy step creates the secret idempotently from Gitea CI secrets
- Added liveness/readiness probes: server uses /api/v1/health, ClickHouse
  uses /ping (#35)
- Updated HOWTO.md and CLAUDE.md with new secrets and probe details

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:59:15 +01:00
hsiegeln
88da1a0dd8 Fix ClickHouse OOM during Proxmox nightly backups
All checks were successful
CI / build (push) Successful in 48s
CI / docker (push) Successful in 1m36s
CI / deploy (push) Successful in 20s
Increase ClickHouse memory limit from 1Gi to 2Gi and reduce default
batch size from 5000 to 500. During VM backup snapshots, I/O contention
prevents ClickHouse from flushing writes fast enough, causing buffer
accumulation that exceeds the 1Gi container limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 09:55:46 +01:00
hsiegeln
48bdb46760 Server fully owns ClickHouse schema — create database + tables on startup
All checks were successful
CI / build (push) Successful in 48s
CI / docker (push) Successful in 39s
CI / deploy (push) Successful in 9s
ClickHouseConfig.ensureDatabaseExists() connects without the database path
to run CREATE DATABASE IF NOT EXISTS before the main DataSource is used.
Removes the ConfigMap-based init scripts from the K8s manifest — the server
is now the single owner of all ClickHouse schema management.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:41:35 +01:00
hsiegeln
f7ed91ef9c Use fully qualified cameleer3.* table names in ClickHouse init schema
All checks were successful
CI / build (push) Successful in 47s
CI / docker (push) Successful in 38s
CI / deploy (push) Successful in 10s
Init scripts run against the default database, not CLICKHOUSE_DB.
Prefix all table references with cameleer3.* and add CREATE DATABASE.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:38:09 +01:00
hsiegeln
5576b50a3a Add ClickHouse schema init via ConfigMap + docker-entrypoint-initdb.d
All checks were successful
CI / build (push) Successful in 47s
CI / docker (push) Successful in 37s
CI / deploy (push) Successful in 8s
Mounts the schema SQL files as a ConfigMap into ClickHouse's init
directory so tables are created automatically on fresh starts. All
statements use IF NOT EXISTS so they're safe to re-run. This ensures
the schema exists even if the PVC is lost or the pod is recreated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:29:58 +01:00
hsiegeln
a1280609f6 Add NodePort service to expose ClickHouse externally
All checks were successful
CI / build (push) Successful in 1m21s
CI / docker (push) Successful in 38s
CI / deploy (push) Successful in 6s
HTTP on port 30123, native protocol on port 30900. Keeps the existing
headless service for internal StatefulSet DNS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:16:45 +01:00