Commit Graph

114 Commits

Author SHA1 Message Date
hsiegeln
35319dc666 refactor(ui): server metrics page uses global time range
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m31s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
Drop the page-local DS Select window picker. Drive from() / to() off
useGlobalFilters().timeRange so the dashboard tracks the same TopBar range
as Exchanges / Dashboard / Runtime. Bucket size auto-scales via
stepSecondsFor(windowSeconds) (10 s for ≤30 min → 1 h for >48 h). Query
hooks now take ServerMetricsRange = { from: Date; to: Date } instead of a
windowSeconds number, so they support arbitrary absolute or rolling ranges
the TopBar may supply (not just "now − N"). Toolbar collapses to just the
server-instance badges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:19:20 +02:00
hsiegeln
3c2409ed6e docs(server-metrics): document the built-in admin dashboard
SERVER-CAPABILITIES.md now lists the two consumption paths (UI + REST API)
side-by-side with visibility rules; the dashboard-builder doc leads with a
"Built-in admin dashboard" section and a 2026-04-24 changelog entry so
first-time readers know they don't have to build anything before seeing
server health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 09:05:22 +02:00
hsiegeln
d58c8cde2e feat(server): REST API over server_metrics for SaaS dashboards
Adds /api/v1/admin/server-metrics/{catalog,instances,query} so SaaS control
planes can build the server-health dashboard without direct ClickHouse
access. One generic /query endpoint covers every panel in the
server-self-metrics doc: aggregation (avg/sum/max/min/latest), group-by-tag,
filter-by-tag, counter-delta mode with per-server_instance_id rotation
handling, and a derived 'mean' statistic for timers. Regex-validated
identifiers, parameterised literals, 31-day range cap, 500-series response
cap. ADMIN-only via the existing /api/v1/admin/** RBAC gate. Docs updated:
all 17 suggested panels now expressed as single-endpoint queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:41:02 +02:00
hsiegeln
48ce75bf38 feat(server): persist server self-metrics into ClickHouse
Snapshot the full Micrometer registry (cameleer business metrics, alerting
metrics, and Spring Boot Actuator defaults) every 60s into a new
server_metrics table so server health survives restarts without an external
Prometheus. Includes a dashboard-builder reference for the SaaS team.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 23:20:45 +02:00
hsiegeln
663a6624a7 docs(plan): checkpoints grid row + locale time + remove History (7 TDD tasks)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:54:42 +02:00
hsiegeln
cc3cd610b2 docs(spec): checkpoints into identity grid + locale time + remove History
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:51:08 +02:00
hsiegeln
13f218d522 docs(plan): deployment page polish (9 TDD tasks)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:42:06 +02:00
hsiegeln
900fba5af6 docs(spec): deployment page polish (upload-in-button, sort/refresh, collapsible checkpoints, DS Select, tab reorder)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:57 +02:00
hsiegeln
064c302073 docs(plan): V2 → V4 migration filename (V2/V3 already taken) 2026-04-23 11:49:12 +02:00
hsiegeln
e558494f8d plan(deploy): checkpoints table redesign + audit gap
15 tasks across 5 phases (backend foundation → SideDrawer →
ConfigTabs readOnly → CheckpointsTable + DetailDrawer → polish).
TDD throughout with per-task commits. Backend phase ships
independently to close the audit gap as quickly as possible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:39:11 +02:00
hsiegeln
1f0ab002d6 spec(deploy): checkpoints table redesign + deployment audit gap
Replaces the cramped Checkpoints disclosure with a real DataTable + a
side drawer (Logs / Config with snapshot/diff modes) and closes the
audit-log gap discovered in DeploymentController (deploy/stop/promote
currently make zero auditService.log calls).

Cap visible checkpoints at Environment.jarRetentionCount — beyond that,
JARs are pruned and rows aren't restorable. Logs scoped per-deployment
via instance_id IN (...) computed from replicaStates (no time window
needed). Compare folded into Config as a view-mode toggle. Two-phase
rollout (backend ships first to close the audit gap immediately).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:31:50 +02:00
hsiegeln
2c82f29aef docs(plans): deployment strategies (blue-green + rolling) plan
7-phase plan to replace the interim destroy-then-start flow (f8dccaae)
with a strategy-aware executor. Adds gen-suffixed container names so
old + new replicas can coexist, plus a cameleer.generation label for
Prometheus/Grafana deploy-boundary annotations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:41:43 +02:00
hsiegeln
837e5d46f5 docs(deploy): session handoff + refresh GitNexus index stats
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m9s
CI / docker (push) Successful in 1m17s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 38s
Handoff summarises the unified deployment page implementation (spec,
plan, 43 commits, opened Gitea issues #147 and #148), open gaps, and
recommended kickoff for the next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 00:17:26 +02:00
hsiegeln
1a376eb25f plan(deploy): unified app deployment page implementation plan
13 phases, TDD-oriented: Flyway V3 snapshot column, staged/live config
write flag, dirty-state endpoint, regen OpenAPI, then the new React page
(Identity, Checkpoints, 7 tabs including the live-apply Traces+Taps and
Route Recording with banner), primary Save/Redeploy state machine,
router blocker, old view cleanup, rules docs, and a manual QA walkthrough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:14:11 +02:00
hsiegeln
58ec67aef9 spec(deploy): unified app deployment page design
Single page at /apps/:slug (+ /apps/new in net-new mode) replacing the
CreateAppView/AppDetailView split. Save ↔ Redeploy state machine driven
by a deployment snapshot on the deployments table, agent-config writes
gain ?apply=staged|live, Identity & Artifact always visible, new
Deployment tab carries progress + startup log, and checkpoints restore
full prior state (JAR + config) from past successful deploys.

Concurrent-edit protection deferred to #147.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:02:50 +02:00
hsiegeln
88b003d4f0 docs(spec): explicit env switcher + per-env color (design)
Replace env dropdown with button+modal pattern, remove All Envs,
add 8-swatch preset color palette per env rendered as 3px top bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 19:13:00 +02:00
hsiegeln
eda74b7339 docs(alerting): PER_EXCHANGE exactly-once — fireMode reference + deploy-backlog-cap
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 2m7s
CI / docker (push) Successful in 1m22s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 41s
Fix stale `AGGREGATE` label (actual enum: `COUNT_IN_WINDOW`). Expand
EXCHANGE_MATCH section with both fire modes, PER_EXCHANGE config-surface
restrictions (0 for reNotifyMinutes/forDurationSeconds, at-least-one-sink
rule), exactly-once guarantee scope, and the first-run backlog-cap knob.

Surface the new config in application.yml with the 24h default and the
opt-out-to-0 semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:39:49 +02:00
hsiegeln
031fe725b5 docs(plan): PER_EXCHANGE exactly-once — implementation plan (21 tasks, 6 phases)
Plan for executing the tightened spec. TDD per task: RED test first,
minimal GREEN impl, commit. Phases 1-2 land the cursor + atomic batch
commit; phase 3 validates config; phase 4 fixes the UI mode-toggle
leakage + empty-targets guard + render-preview pane; phases 5-6 close
with full-lifecycle IT and regression sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:39:31 +02:00
hsiegeln
2f9b9c9b0f docs(spec): PER_EXCHANGE — tighten motivation, fold in njams review
Correct the factual claim that the cursor advances — it is dead code:
_nextCursor is computed but never persisted by applyBatchFiring/reschedule,
so every tick re-enqueues notifications for every matching exchange in
retention. Clarify that instance-level dedup already works via the unique
index; notification-level dedup is what's broken. Reframe §2 as "make it
atomic before §1 goes live."

Add builder-UX lessons from the njams Server_4 rules editor: clear stale
fields on fireMode toggle (not just hide them); block save on empty
webhooks+targets; wire the already-existing /render-preview endpoint into
the Review step. Add Test 5 (red-first notification-bleed regression) and
Test 6 (form-state clear on mode toggle).

Park two follow-ups explicitly: sealed condition-type hierarchy (backend
lags the UI's condition-forms/* sharding) and a coalesceSeconds primitive
for Inbox-storm taming. Amend cursor-format-churn risk: benign in theory,
but first post-deploy tick against long-standing rules could scan from
rule.createdAt forward — suggests a deployBacklogCap clamp to bound the
one-time backlog flood.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:57:25 +02:00
hsiegeln
817b61058a docs(spec): PER_EXCHANGE exactly-once-per-exchange alerting
Four focused correctness fixes for the "fire exactly once per FAILED
exchange" use case (alerting layer only; HTTP-level idempotency is a
separate scope):

1. Composite cursor (startTime, executionId) replaces the current
   single-timestamp, inclusive cursor — prevents same-millisecond
   drops and same-exchange re-selection.
2. First-run cursor initialized to rule createdAt (not null) —
   prevents the current unbounded historical-retention scan on first
   tick of a new rule.
3. Transactional coupling of instance writes + notification enqueue +
   cursor advance — eliminates partial-progress failure modes on crash
   or rollback.
4. Config hygiene: reNotifyMinutes forced to 0, forDurationSeconds
   rejected, perExchangeLingerSeconds removed entirely (was validated
   as required but never read) — the rule shape stops admitting
   nonsensical PER_EXCHANGE combinations.

Alert stays FIRING until human ack/resolve (no auto-resolve); webhook
fires exactly once per AlertInstance; Inbox never sees duplicates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:17:18 +02:00
hsiegeln
d32208d403 docs(plan): IT triage follow-ups — implementation plan
Task-by-task plan for the 2026-04-21-it-triage-followups-design spec.
Autonomous execution variant — SSE diagnose-then-fix branches to either
apply-fix or park-with-@Disabled based on diagnosis confidence, since
this runs unattended overnight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 23:10:55 +02:00
hsiegeln
6c1cbc289c docs(spec): IT triage follow-ups — design
Design for closing the 12 parked IT failures (ClickHouseStatsStoreIT
timezone, SSE flakiness in AgentSseControllerIT/SseSigningIT) plus two
production-code side notes the ExecutionController removal surfaced:

- ClickHouseStatsStore timezone fix — column-level DateTime('UTC') on
  bucket, greenfield CH
- SSE flakiness — diagnose-then-fix with user checkpoint between phases
- MetricsFlushScheduler property-key fix — bind via SpEL, single source
  of truth in IngestionConfig
- Dead-code cleanup — SearchIndexer.onExecutionUpdated listener +
  unused TaggedExecution record

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 23:03:08 +02:00
hsiegeln
70bf59daca docs(alerts): implementation plan — inbox redesign (16 tasks)
16 TDD tasks covering V17 migration (drop ACKNOWLEDGED + add read_at/deleted_at +
drop alert_reads + rework open-rule index), backend repo/controller/endpoints
including /restore for undo-toast backing, OpenAPI regen, UI rebuild (single
filterable inbox, row/bulk actions, silence-rule quick menu, SilencesPage
?ruleId= prefill), concrete test bodies, and rules/CLAUDE.md updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:56:53 +02:00
hsiegeln
c0b8c9a1ad docs(alerts): spec — inbox redesign (single filterable inbox)
Collapse /alerts/inbox, /alerts/all, /alerts/history into a single
filterable inbox. Drop ACKNOWLEDGED from AlertState; add read_at and
deleted_at as orthogonal timestamp flags. Retire per-user alert_reads
tracking. Add Silence-rule and Delete row/bulk actions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:45:04 +02:00
hsiegeln
52a08a8769 docs(alerts): Implementation plan — design-system alignment for /alerts pages
Task-by-task TDD plan implementing the design spec. Splits the work
into 14 tasks: helper utilities (TDD), shared renderer, CSS token
migration, per-page rewrites (Inbox/All/History/Rules/Silences),
wizard banner migration, AlertRow deletion, E2E adaptation for
ConfirmDialog, and full verification pass. Each task produces an
atomic commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:49:47 +02:00
hsiegeln
3d0a4d289b docs(alerts): Design spec — design-system alignment for /alerts pages
Rework all pages under /alerts to use @cameleer/design-system components
and tokens. Unified DataTable shell for Inbox/All/History with expandable
rows; DataTable + Dropdown + ConfirmDialog for Rules list; FormField grid
+ DataTable for Silences; DS Alert for wizard banners. Replaces undefined
CSS variables (--bg, --fg, --muted, --accent) with DS tokens and removes
raw <table>/<select>/confirm() usage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:43:19 +02:00
hsiegeln
e7ce1a73d0 docs(alerting): Plan 04 implementation plan — post-ship hardening
13 atomic commits covering 5 hardening tasks:

  Task 1-2: @Schema(discriminatorMapping) on AlertCondition, derive
            polymorphic unions in enums.ts from schema
  Task 3-7: AgentState / DeploymentStatus / LogLevel / ExecutionStatus
            enum migrations + @Schema(allowableValues) on JvmMetric
  Task 8:   ContextStartupSmokeTest (unit-tier, no Testcontainers)
  Task 9-12: AlertTemplateVariables registry + round-trip test +
             SSOT endpoint + UI consumer
  Task 13:  alerting-editor.spec.ts Playwright spec

Each task has bite-sized write-test/red/green/commit steps with
exact paths and full code. Pre-flight SQL check and post-flight
self-verification scripts included.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:54:09 +02:00
hsiegeln
46867cc659 docs(alerting): Plan 04 design spec — post-ship hardening
Closes the loop on three bug classes from Plan 03 triage:
context-load regressions (missing @Autowired), UI/backend drift
on template variables, and hand-maintained TS enum unions caused
by springdoc polymorphic schema quirk.

Covers 5 tasks: context-startup smoke test, template-variables
SSOT endpoint, second Playwright spec, String-to-enum migrations
on 5 condition fields, and @DiscriminatorMapping on AlertCondition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:44:41 +02:00
hsiegeln
18cacb33ee docs(alerting): align @JsonTypeInfo spec with shipped code
Design spec and Plan 02 described AlertCondition polymorphism as
Id.DEDUCTION, but the code that shipped in PR #140 uses Id.NAME with
property="kind" and include=EXISTING_PROPERTY. The `kind` field is
real on every subtype and the DB stores it in a separate column
(condition_kind), so reading the discriminator directly is simpler
than deduction — update the docs to match. Also add `"kind"` to the
example JSON payloads so they match on-wire reality.

OutboundAuth (Plan 01) correctly still uses Id.DEDUCTION and is
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:04:17 +02:00
hsiegeln
f75ee9f352 docs(alerting): UI map + admin-guide walkthrough for Plan 03
.claude/rules/ui.md now maps every Plan 03 UI surface. Admin guide gains
an inbox/rules/silences walkthrough so ops teams can start in the UI
without reading the spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:55:36 +02:00
hsiegeln
2942025a54 docs(alerting): Plan 03 — UI + backfills implementation plan
32 tasks across 10 phases:
 - Foundation: Vitest, CodeMirror 6, Playwright scaffolding + schema regen.
 - API: env-scoped query hooks for alerts/rules/silences/notifications.
 - Components: AlertStateChip, SeverityBadge, NotificationBell (with tab-hidden poll pause), MustacheEditor (CM6 with variable autocomplete + linter).
 - Routes: /alerts/* section with sidebar accordion; bell mounted in TopBar.
 - Pages: Inbox / All / History / Rules (with env promotion) / Silences.
 - Wizard: 5-step editor with kind-specific condition forms + test-evaluate + render-preview + prefill warnings.
 - CMD-K: alerts + rules sources via LayoutShell extension.
 - Backend backfills: SSRF guard on outbound URL + 30s AlertingMetrics gauge cache.
 - Final: Playwright smoke, .claude/rules/ui.md + admin-guide updates, full build/test/PR.

Decisions: CM6 over Monaco/textarea (90KB gzipped, ARIA-conformant); CMD-K extension via existing LayoutShell searchData (not a new registry); REST-API-driven tests per project test policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:12:21 +02:00
hsiegeln
f1abca3a45 refactor(alerting): rename P95_LATENCY_MS → AVG_DURATION_MS to match what stats_1m_route exposes
The evaluator mapped P95_LATENCY_MS to ExecutionStats.avgDurationMs because
stats_1m_route has no p95 column. Exposing the old name implied p95 semantics
operators did not get. Rename to AVG_DURATION_MS makes the contract honest.
Updated RouteMetric enum (with javadoc), evaluator switch, and admin guide.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 07:36:43 +02:00
hsiegeln
144915563c docs(alerting): whole-branch final review report
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 07:25:33 +02:00
hsiegeln
c79a6234af test(alerting): fix duplicate @MockBean after AbstractPostgresIT centralised mocks + Plan 02 verification report
AbstractPostgresIT gained clickHouseSearchIndex and agentRegistryService mocks in Phase 9.
All 14 alerting IT subclasses that re-declared the same @MockBean fields now fail with
"Duplicate mock definition". Removed the redundant declarations; per-class clickHouseLogStore
mock kept where needed. 120 alerting tests now pass (0 failures).

Also adds docs/alerting-02-verification.md (Task 43).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 23:27:19 +02:00
hsiegeln
63669bd1d7 docs(alerting): default config + admin guide
Adds alerting stanza to application.yml with all AlertingProperties
fields backed by env-var overrides.  Creates docs/alerting.md covering
six condition kinds (with example JSON), template variables, webhook
setup (Slack/PagerDuty examples), silence patterns, circuit-breaker
and retention troubleshooting, and Prometheus metrics reference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 22:16:38 +02:00
hsiegeln
087dcee5df docs(alerting): Plan 02 — backend (domain, storage, evaluators, dispatch) 2026-04-19 18:24:16 +02:00
hsiegeln
609a86dd03 docs: admin guide for outbound connections
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:03:18 +02:00
hsiegeln
77a23c270b docs(alerting): Plan 01 — outbound HTTP infra + admin-managed outbound connections
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m57s
CI / docker (push) Successful in 1m6s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 44s
First of three sequenced plans for the alerting feature. Covers:
- Cross-cutting http/ module (OutboundHttpClientFactory, SslContextBuilder,
  TLS trust composition, startup validation)
- Admin-managed OutboundConnection with PG persistence, AES-GCM-encrypted
  HMAC secret (resolves spec §20 item 2)
- Admin CRUD REST + test endpoint + RBAC + audit
- Admin UI page with TLS config, allowed-envs multi-select, test action
- OIDC retrofit deliberately deferred (documented in Task 4 audit)

Plan 02 (alerting backend) and Plan 03 (alerting UI) written after Plan 01
executes — lets reality inform their details, especially the secret-cipher
interface and the rules-referencing integration point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:26:00 +02:00
hsiegeln
e71edcdd5e docs(alerting): add BL-002 for native provider integrations + Mustache auto-complete
BL-002 / gitea#138 tracks deferred native provider types (Slack Block Kit,
PagerDuty Events v2, Teams connector) with shipped templates as a post-v1
fast-follow once usage data informs which providers matter.

Spec §13 folds in context-aware variable auto-complete for the shared
<MustacheEditor /> component used in rule editor, webhook overrides, and
outbound-connection admin. Available variables filter by condition kind.
Completion engine choice added to §20 as a planning-phase decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:10:00 +02:00
hsiegeln
a9ad0eb841 docs(alerting): spec for alerting feature + backlog entry BL-001
Comprehensive design spec for a confined, env-scoped alerting feature:
6 signal sources, shared env-scoped rules with RBAC-targeted notifications,
in-app inbox + webhook delivery via admin-managed outbound connections,
claim-based polling for horizontal scalability, 4 CH projections for hot-path
reads. Backlog entry BL-001 / gitea#137 tracks deferred managed-CA investigation
(reuse SaaS-layer CA handling first before building in-server storage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:58:38 +02:00
hsiegeln
e8d6cc5b5d docs: implementation plan for log filters + infinite scroll
13 tasks covering backend multi-value source filter, cursor-paginated
agent events, shared useInfiniteStream/InfiniteScrollArea primitives,
and page-level refactors of AgentHealth + AgentInstance. Bounded log
views (LogTab, StartupLogPanel) keep single-page hooks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 11:37:06 +02:00
hsiegeln
b14551de4e docs: spec for multi-select log filters + infinite scroll
Establishes the shared pattern for streaming views (logs + agent
events): server-side multi-select source/level filters, cursor-based
infinite scroll, and top-gated auto-refetch. Scoped to AgentHealth and
AgentInstance; bounded views (LogTab, StartupLogPanel) keep
single-page hooks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 11:26:39 +02:00
hsiegeln
9b1ef51d77 feat!: scope per-app config and settings by environment
All checks were successful
CI / cleanup-branch (push) Has been skipped
CI / build (push) Successful in 1m27s
CI / docker (push) Successful in 1m10s
CI / deploy-feature (push) Has been skipped
CI / deploy (push) Successful in 1m40s
SonarQube / sonarqube (push) Successful in 4m29s
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).

Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.

- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
  stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
  JWT env claim (agents cannot spoof env); non-agent callers provide env
  via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
  env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
  route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
  removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
  useAllAppSettings, useUpdateAppSettings) take env, wired to
  useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.

Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 22:25:21 +02:00
hsiegeln
dd0f0e73b3 docs: add persistent route catalog implementation plan
9-task plan covering ClickHouse schema, core interface, cached
implementation, bean wiring, write path (register/heartbeat),
read path (both catalog controllers), dismiss cleanup, and
rule file updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:42:37 +02:00
hsiegeln
2542e430ac docs: add persistent route catalog design spec
Routes with zero executions (sub-routes) vanish from the sidebar after
server restart because the catalog is purely in-memory with a ClickHouse
stats fallback that only covers executed routes. This spec describes a
persistent route_catalog table in ClickHouse with lifecycle tracking
(first_seen/last_seen) to reconstruct the sidebar without agent
reconnection and support historical time-window queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:39:49 +02:00
hsiegeln
23d24487d1 docs: add runtime compact view implementation plan
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:32:14 +02:00
hsiegeln
bf289aa1b1 docs: add runtime compact view design spec
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:27:53 +02:00
hsiegeln
cb3ebfea7c chore: rename cameleer3 to cameleer
Some checks failed
CI / cleanup-branch (push) Has been skipped
CI / build (push) Failing after 18s
CI / docker (push) Has been skipped
CI / deploy (push) Has been skipped
CI / deploy-feature (push) Has been skipped
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:28:42 +02:00
hsiegeln
8fabc2b308 docs: add container startup log capture implementation plan
12 tasks covering RuntimeOrchestrator extension, ContainerLogForwarder,
deployment/SSE/event monitor integration, and UI startup log panel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:12:01 +02:00
hsiegeln
14215bebec docs: add container startup log capture design spec
Covers streaming Docker logs to ClickHouse until agent SSE connect,
deployment log panel UI, and source badge in general log views.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:04:24 +02:00