Replaces the cramped Checkpoints disclosure with a real DataTable + a
side drawer (Logs / Config with snapshot/diff modes) and closes the
audit-log gap discovered in DeploymentController (deploy/stop/promote
currently make zero auditService.log calls).
Cap visible checkpoints at Environment.jarRetentionCount — beyond that,
JARs are pruned and rows aren't restorable. Logs scoped per-deployment
via instance_id IN (...) computed from replicaStates (no time window
needed). Compare folded into Config as a view-mode toggle. Two-phase
rollout (backend ships first to close the audit gap immediately).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7-phase plan to replace the interim destroy-then-start flow (f8dccaae)
with a strategy-aware executor. Adds gen-suffixed container names so
old + new replicas can coexist, plus a cameleer.generation label for
Prometheus/Grafana deploy-boundary annotations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Handoff summarises the unified deployment page implementation (spec,
plan, 43 commits, opened Gitea issues #147 and #148), open gaps, and
recommended kickoff for the next session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 phases, TDD-oriented: Flyway V3 snapshot column, staged/live config
write flag, dirty-state endpoint, regen OpenAPI, then the new React page
(Identity, Checkpoints, 7 tabs including the live-apply Traces+Taps and
Route Recording with banner), primary Save/Redeploy state machine,
router blocker, old view cleanup, rules docs, and a manual QA walkthrough.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single page at /apps/:slug (+ /apps/new in net-new mode) replacing the
CreateAppView/AppDetailView split. Save ↔ Redeploy state machine driven
by a deployment snapshot on the deployments table, agent-config writes
gain ?apply=staged|live, Identity & Artifact always visible, new
Deployment tab carries progress + startup log, and checkpoints restore
full prior state (JAR + config) from past successful deploys.
Concurrent-edit protection deferred to #147.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace env dropdown with button+modal pattern, remove All Envs,
add 8-swatch preset color palette per env rendered as 3px top bar.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix stale `AGGREGATE` label (actual enum: `COUNT_IN_WINDOW`). Expand
EXCHANGE_MATCH section with both fire modes, PER_EXCHANGE config-surface
restrictions (0 for reNotifyMinutes/forDurationSeconds, at-least-one-sink
rule), exactly-once guarantee scope, and the first-run backlog-cap knob.
Surface the new config in application.yml with the 24h default and the
opt-out-to-0 semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan for executing the tightened spec. TDD per task: RED test first,
minimal GREEN impl, commit. Phases 1-2 land the cursor + atomic batch
commit; phase 3 validates config; phase 4 fixes the UI mode-toggle
leakage + empty-targets guard + render-preview pane; phases 5-6 close
with full-lifecycle IT and regression sweep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Correct the factual claim that the cursor advances — it is dead code:
_nextCursor is computed but never persisted by applyBatchFiring/reschedule,
so every tick re-enqueues notifications for every matching exchange in
retention. Clarify that instance-level dedup already works via the unique
index; notification-level dedup is what's broken. Reframe §2 as "make it
atomic before §1 goes live."
Add builder-UX lessons from the njams Server_4 rules editor: clear stale
fields on fireMode toggle (not just hide them); block save on empty
webhooks+targets; wire the already-existing /render-preview endpoint into
the Review step. Add Test 5 (red-first notification-bleed regression) and
Test 6 (form-state clear on mode toggle).
Park two follow-ups explicitly: sealed condition-type hierarchy (backend
lags the UI's condition-forms/* sharding) and a coalesceSeconds primitive
for Inbox-storm taming. Amend cursor-format-churn risk: benign in theory,
but first post-deploy tick against long-standing rules could scan from
rule.createdAt forward — suggests a deployBacklogCap clamp to bound the
one-time backlog flood.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four focused correctness fixes for the "fire exactly once per FAILED
exchange" use case (alerting layer only; HTTP-level idempotency is a
separate scope):
1. Composite cursor (startTime, executionId) replaces the current
single-timestamp, inclusive cursor — prevents same-millisecond
drops and same-exchange re-selection.
2. First-run cursor initialized to rule createdAt (not null) —
prevents the current unbounded historical-retention scan on first
tick of a new rule.
3. Transactional coupling of instance writes + notification enqueue +
cursor advance — eliminates partial-progress failure modes on crash
or rollback.
4. Config hygiene: reNotifyMinutes forced to 0, forDurationSeconds
rejected, perExchangeLingerSeconds removed entirely (was validated
as required but never read) — the rule shape stops admitting
nonsensical PER_EXCHANGE combinations.
Alert stays FIRING until human ack/resolve (no auto-resolve); webhook
fires exactly once per AlertInstance; Inbox never sees duplicates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-by-task plan for the 2026-04-21-it-triage-followups-design spec.
Autonomous execution variant — SSE diagnose-then-fix branches to either
apply-fix or park-with-@Disabled based on diagnosis confidence, since
this runs unattended overnight.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Design for closing the 12 parked IT failures (ClickHouseStatsStoreIT
timezone, SSE flakiness in AgentSseControllerIT/SseSigningIT) plus two
production-code side notes the ExecutionController removal surfaced:
- ClickHouseStatsStore timezone fix — column-level DateTime('UTC') on
bucket, greenfield CH
- SSE flakiness — diagnose-then-fix with user checkpoint between phases
- MetricsFlushScheduler property-key fix — bind via SpEL, single source
of truth in IngestionConfig
- Dead-code cleanup — SearchIndexer.onExecutionUpdated listener +
unused TaggedExecution record
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse /alerts/inbox, /alerts/all, /alerts/history into a single
filterable inbox. Drop ACKNOWLEDGED from AlertState; add read_at and
deleted_at as orthogonal timestamp flags. Retire per-user alert_reads
tracking. Add Silence-rule and Delete row/bulk actions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-by-task TDD plan implementing the design spec. Splits the work
into 14 tasks: helper utilities (TDD), shared renderer, CSS token
migration, per-page rewrites (Inbox/All/History/Rules/Silences),
wizard banner migration, AlertRow deletion, E2E adaptation for
ConfirmDialog, and full verification pass. Each task produces an
atomic commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rework all pages under /alerts to use @cameleer/design-system components
and tokens. Unified DataTable shell for Inbox/All/History with expandable
rows; DataTable + Dropdown + ConfirmDialog for Rules list; FormField grid
+ DataTable for Silences; DS Alert for wizard banners. Replaces undefined
CSS variables (--bg, --fg, --muted, --accent) with DS tokens and removes
raw <table>/<select>/confirm() usage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the loop on three bug classes from Plan 03 triage:
context-load regressions (missing @Autowired), UI/backend drift
on template variables, and hand-maintained TS enum unions caused
by springdoc polymorphic schema quirk.
Covers 5 tasks: context-startup smoke test, template-variables
SSOT endpoint, second Playwright spec, String-to-enum migrations
on 5 condition fields, and @DiscriminatorMapping on AlertCondition.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Design spec and Plan 02 described AlertCondition polymorphism as
Id.DEDUCTION, but the code that shipped in PR #140 uses Id.NAME with
property="kind" and include=EXISTING_PROPERTY. The `kind` field is
real on every subtype and the DB stores it in a separate column
(condition_kind), so reading the discriminator directly is simpler
than deduction — update the docs to match. Also add `"kind"` to the
example JSON payloads so they match on-wire reality.
OutboundAuth (Plan 01) correctly still uses Id.DEDUCTION and is
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.claude/rules/ui.md now maps every Plan 03 UI surface. Admin guide gains
an inbox/rules/silences walkthrough so ops teams can start in the UI
without reading the spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The evaluator mapped P95_LATENCY_MS to ExecutionStats.avgDurationMs because
stats_1m_route has no p95 column. Exposing the old name implied p95 semantics
operators did not get. Rename to AVG_DURATION_MS makes the contract honest.
Updated RouteMetric enum (with javadoc), evaluator switch, and admin guide.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AbstractPostgresIT gained clickHouseSearchIndex and agentRegistryService mocks in Phase 9.
All 14 alerting IT subclasses that re-declared the same @MockBean fields now fail with
"Duplicate mock definition". Removed the redundant declarations; per-class clickHouseLogStore
mock kept where needed. 120 alerting tests now pass (0 failures).
Also adds docs/alerting-02-verification.md (Task 43).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds alerting stanza to application.yml with all AlertingProperties
fields backed by env-var overrides. Creates docs/alerting.md covering
six condition kinds (with example JSON), template variables, webhook
setup (Slack/PagerDuty examples), silence patterns, circuit-breaker
and retention troubleshooting, and Prometheus metrics reference.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First of three sequenced plans for the alerting feature. Covers:
- Cross-cutting http/ module (OutboundHttpClientFactory, SslContextBuilder,
TLS trust composition, startup validation)
- Admin-managed OutboundConnection with PG persistence, AES-GCM-encrypted
HMAC secret (resolves spec §20 item 2)
- Admin CRUD REST + test endpoint + RBAC + audit
- Admin UI page with TLS config, allowed-envs multi-select, test action
- OIDC retrofit deliberately deferred (documented in Task 4 audit)
Plan 02 (alerting backend) and Plan 03 (alerting UI) written after Plan 01
executes — lets reality inform their details, especially the secret-cipher
interface and the rules-referencing integration point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BL-002 / gitea#138 tracks deferred native provider types (Slack Block Kit,
PagerDuty Events v2, Teams connector) with shipped templates as a post-v1
fast-follow once usage data informs which providers matter.
Spec §13 folds in context-aware variable auto-complete for the shared
<MustacheEditor /> component used in rule editor, webhook overrides, and
outbound-connection admin. Available variables filter by condition kind.
Completion engine choice added to §20 as a planning-phase decision.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive design spec for a confined, env-scoped alerting feature:
6 signal sources, shared env-scoped rules with RBAC-targeted notifications,
in-app inbox + webhook delivery via admin-managed outbound connections,
claim-based polling for horizontal scalability, 4 CH projections for hot-path
reads. Backlog entry BL-001 / gitea#137 tracks deferred managed-CA investigation
(reuse SaaS-layer CA handling first before building in-server storage).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BREAKING: wipe dev PostgreSQL before deploying — V1 checksum changes.
Agents must now send environmentId on registration (400 if missing).
Two tables previously keyed on app name alone caused cross-environment
data bleed: writing config for (app=X, env=dev) would overwrite the row
used by (app=X, env=prod) agents, and agent startup fetches ignored env
entirely.
- V1 schema: application_config and app_settings are now PK (app, env).
- Repositories: env-keyed finders/saves; env is the authoritative column,
stamped on the stored JSON so the row agrees with itself.
- ApplicationConfigController.getConfig is dual-mode — AGENT role uses
JWT env claim (agents cannot spoof env); non-agent callers provide env
via ?environment= query param.
- AppSettingsController endpoints now require ?environment=.
- SensitiveKeysAdminController fan-out iterates (app, env) slices so each
env gets its own merged keys.
- DiagramController ingestion stamps env on TaggedDiagram; ClickHouse
route_diagrams INSERT + findProcessorRouteMapping are env-scoped.
- AgentRegistrationController: environmentId is required on register;
removed all "default" fallbacks from register/refresh/heartbeat auto-heal.
- UI hooks (useApplicationConfig, useProcessorRouteMapping, useAppSettings,
useAllAppSettings, useUpdateAppSettings) take env, wired to
useEnvironmentStore at all call sites.
- New ConfigEnvIsolationIT covers env-isolation for both repositories.
Plan in docs/superpowers/plans/2026-04-16-environment-scoping.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes with zero executions (sub-routes) vanish from the sidebar after
server restart because the catalog is purely in-memory with a ClickHouse
stats fallback that only covers executed routes. This spec describes a
persistent route_catalog table in ClickHouse with lifecycle tracking
(first_seen/last_seen) to reconstruct the sidebar without agent
reconnection and support historical time-window queries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename Java packages from com.cameleer3 to com.cameleer, module
directories from cameleer3-* to cameleer-*, and all references
throughout workflows, Dockerfiles, docs, migrations, and pom.xml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers streaming Docker logs to ClickHouse until agent SSE connect,
deployment log panel UI, and source badge in general log views.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOWTO.md: log ingestion example updated from LogBatch wrapper to raw
JSON array with source field. CLAUDE.md: added LogIngestionController,
updated LogQueryController with new filters. SERVER-CAPABILITIES.md:
updated log ingestion and query descriptions, ClickHouse table note.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14-task plan covering server-side @ConditionalOnProperty flag,
health endpoint capability exposure, UI sidebar filtering,
SaaS provisioner env var, and vendor infrastructure dashboard
with per-tenant PostgreSQL and ClickHouse visibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers restricting DB/ClickHouse admin endpoints in SaaS-managed
server instances via @ConditionalOnProperty flag, and building a
vendor-facing infrastructure dashboard in the SaaS platform with
per-tenant PostgreSQL and ClickHouse visibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOWTO.md configuration table rewritten with correct cameleer.server.*
property names, grouped by functional area. Removed stale CAMELEER_OIDC_*
env var references. SERVER-CAPABILITIES.md updated with correct env var
names for ingestion and agent registry tuning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all configuration properties under the cameleer.server.* namespace
with all-lowercase dot-separated names and mechanical env var mapping
(dots→underscores, uppercase). This aligns with the agent's convention
(cameleer.agent.*) and establishes a predictable pattern across all
components.
Changes:
- Move 6 config prefixes under cameleer.server.*: agent-registry,
ingestion, security, license, clickhouse, and cameleer.tenant/runtime/indexer
- Rename all kebab-case properties to concatenated lowercase
(e.g., bootstrap-token → bootstraptoken, jar-storage-path → jarstoragepath)
- Update all env vars to CAMELEER_SERVER_* mechanical mapping
- Fix container-cpu-request/container-cpu-shares mismatch bug
- Remove displayName from AgentRegistrationRequest (redundant with instanceId)
- Update agent container env vars to CAMELEER_AGENT_* convention
- Update K8s manifests and CI workflow for new env var names
- Update CLAUDE.md, HOWTO.md, SERVER-CAPABILITIES.md documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>